Have sed echo string after match

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-1
down vote

favorite












I need to have the string directly following/pbs.twimg.com/profile_images/echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:



read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'


This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name= I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.



I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.



EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/ gives the ID of the user and is on both private accounts and open accounts.



Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:



Username="thematrix1o1"
717835108540030976


But if I run it on a private account I will get no ID (because the line isn't there)



Username="touchmytweets"
.


(there is no dot in the report, it's just blank)



Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report
ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n










share|improve this question



















  • 2




    I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
    – Michael Vehrs
    May 20 '16 at 12:11










  • @MichaelVehrs I have updated the question now. sorry about that.
    – Matt
    May 20 '16 at 14:31










  • So, basically, what you are looking for is just a sed expression that will return the number following "profile_images"?
    – Michael Vehrs
    May 21 '16 at 7:26










  • @MichaelVehrs yes.
    – Matt
    May 21 '16 at 19:12














up vote
-1
down vote

favorite












I need to have the string directly following/pbs.twimg.com/profile_images/echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:



read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'


This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name= I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.



I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.



EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/ gives the ID of the user and is on both private accounts and open accounts.



Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:



Username="thematrix1o1"
717835108540030976


But if I run it on a private account I will get no ID (because the line isn't there)



Username="touchmytweets"
.


(there is no dot in the report, it's just blank)



Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report
ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n










share|improve this question



















  • 2




    I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
    – Michael Vehrs
    May 20 '16 at 12:11










  • @MichaelVehrs I have updated the question now. sorry about that.
    – Matt
    May 20 '16 at 14:31










  • So, basically, what you are looking for is just a sed expression that will return the number following "profile_images"?
    – Michael Vehrs
    May 21 '16 at 7:26










  • @MichaelVehrs yes.
    – Matt
    May 21 '16 at 19:12












up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I need to have the string directly following/pbs.twimg.com/profile_images/echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:



read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'


This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name= I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.



I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.



EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/ gives the ID of the user and is on both private accounts and open accounts.



Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:



Username="thematrix1o1"
717835108540030976


But if I run it on a private account I will get no ID (because the line isn't there)



Username="touchmytweets"
.


(there is no dot in the report, it's just blank)



Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report
ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n










share|improve this question















I need to have the string directly following/pbs.twimg.com/profile_images/echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:



read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'


This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name= I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.



I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.



EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/ gives the ID of the user and is on both private accounts and open accounts.



Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:



Username="thematrix1o1"
717835108540030976


But if I run it on a private account I will get no ID (because the line isn't there)



Username="touchmytweets"
.


(there is no dot in the report, it's just blank)



Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report
ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n







bash sed wget






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 20 '16 at 14:31

























asked May 20 '16 at 2:44









Matt

11




11







  • 2




    I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
    – Michael Vehrs
    May 20 '16 at 12:11










  • @MichaelVehrs I have updated the question now. sorry about that.
    – Matt
    May 20 '16 at 14:31










  • So, basically, what you are looking for is just a sed expression that will return the number following "profile_images"?
    – Michael Vehrs
    May 21 '16 at 7:26










  • @MichaelVehrs yes.
    – Matt
    May 21 '16 at 19:12












  • 2




    I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
    – Michael Vehrs
    May 20 '16 at 12:11










  • @MichaelVehrs I have updated the question now. sorry about that.
    – Matt
    May 20 '16 at 14:31










  • So, basically, what you are looking for is just a sed expression that will return the number following "profile_images"?
    – Michael Vehrs
    May 21 '16 at 7:26










  • @MichaelVehrs yes.
    – Matt
    May 21 '16 at 19:12







2




2




I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
– Michael Vehrs
May 20 '16 at 12:11




I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
– Michael Vehrs
May 20 '16 at 12:11












@MichaelVehrs I have updated the question now. sorry about that.
– Matt
May 20 '16 at 14:31




@MichaelVehrs I have updated the question now. sorry about that.
– Matt
May 20 '16 at 14:31












So, basically, what you are looking for is just a sed expression that will return the number following "profile_images"?
– Michael Vehrs
May 21 '16 at 7:26




So, basically, what you are looking for is just a sed expression that will return the number following "profile_images"?
– Michael Vehrs
May 21 '16 at 7:26












@MichaelVehrs yes.
– Matt
May 21 '16 at 19:12




@MichaelVehrs yes.
– Matt
May 21 '16 at 19:12










2 Answers
2






active

oldest

votes

















up vote
0
down vote













Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:



 sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'


This could be made slightly more efficient by quitting immediately after that line has been processed.



In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.






share|improve this answer






















  • This ends up printing the entire set of HTML page into the terminal
    – Matt
    May 22 '16 at 15:14










  • Not for me. Are you sure the sample you posted is accurate?
    – Michael Vehrs
    May 22 '16 at 16:29










  • the sample i posted is a direct copy from the wget.
    – Matt
    May 22 '16 at 19:51






  • 1




    Well, your sample is pretty much useless.
    – Michael Vehrs
    May 24 '16 at 5:14






  • 1




    The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
    – Michael Vehrs
    May 25 '16 at 4:51

















up vote
0
down vote













When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:



$ sed -E 's:[0-9]+:n&n:g' filename 
| grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1


That puts any digit-string a line by itself, greps for your string, and prints the one after.



It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.



A slightly neater approach would use awk, definitely worth learning if you deal much with such things.






share|improve this answer




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f284322%2fhave-sed-echo-string-after-match%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:



     sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'


    This could be made slightly more efficient by quitting immediately after that line has been processed.



    In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.






    share|improve this answer






















    • This ends up printing the entire set of HTML page into the terminal
      – Matt
      May 22 '16 at 15:14










    • Not for me. Are you sure the sample you posted is accurate?
      – Michael Vehrs
      May 22 '16 at 16:29










    • the sample i posted is a direct copy from the wget.
      – Matt
      May 22 '16 at 19:51






    • 1




      Well, your sample is pretty much useless.
      – Michael Vehrs
      May 24 '16 at 5:14






    • 1




      The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
      – Michael Vehrs
      May 25 '16 at 4:51














    up vote
    0
    down vote













    Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:



     sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'


    This could be made slightly more efficient by quitting immediately after that line has been processed.



    In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.






    share|improve this answer






















    • This ends up printing the entire set of HTML page into the terminal
      – Matt
      May 22 '16 at 15:14










    • Not for me. Are you sure the sample you posted is accurate?
      – Michael Vehrs
      May 22 '16 at 16:29










    • the sample i posted is a direct copy from the wget.
      – Matt
      May 22 '16 at 19:51






    • 1




      Well, your sample is pretty much useless.
      – Michael Vehrs
      May 24 '16 at 5:14






    • 1




      The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
      – Michael Vehrs
      May 25 '16 at 4:51












    up vote
    0
    down vote










    up vote
    0
    down vote









    Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:



     sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'


    This could be made slightly more efficient by quitting immediately after that line has been processed.



    In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.






    share|improve this answer














    Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:



     sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'


    This could be made slightly more efficient by quitting immediately after that line has been processed.



    In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited May 24 '16 at 5:19

























    answered May 22 '16 at 7:06









    Michael Vehrs

    2,19037




    2,19037











    • This ends up printing the entire set of HTML page into the terminal
      – Matt
      May 22 '16 at 15:14










    • Not for me. Are you sure the sample you posted is accurate?
      – Michael Vehrs
      May 22 '16 at 16:29










    • the sample i posted is a direct copy from the wget.
      – Matt
      May 22 '16 at 19:51






    • 1




      Well, your sample is pretty much useless.
      – Michael Vehrs
      May 24 '16 at 5:14






    • 1




      The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
      – Michael Vehrs
      May 25 '16 at 4:51
















    • This ends up printing the entire set of HTML page into the terminal
      – Matt
      May 22 '16 at 15:14










    • Not for me. Are you sure the sample you posted is accurate?
      – Michael Vehrs
      May 22 '16 at 16:29










    • the sample i posted is a direct copy from the wget.
      – Matt
      May 22 '16 at 19:51






    • 1




      Well, your sample is pretty much useless.
      – Michael Vehrs
      May 24 '16 at 5:14






    • 1




      The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
      – Michael Vehrs
      May 25 '16 at 4:51















    This ends up printing the entire set of HTML page into the terminal
    – Matt
    May 22 '16 at 15:14




    This ends up printing the entire set of HTML page into the terminal
    – Matt
    May 22 '16 at 15:14












    Not for me. Are you sure the sample you posted is accurate?
    – Michael Vehrs
    May 22 '16 at 16:29




    Not for me. Are you sure the sample you posted is accurate?
    – Michael Vehrs
    May 22 '16 at 16:29












    the sample i posted is a direct copy from the wget.
    – Matt
    May 22 '16 at 19:51




    the sample i posted is a direct copy from the wget.
    – Matt
    May 22 '16 at 19:51




    1




    1




    Well, your sample is pretty much useless.
    – Michael Vehrs
    May 24 '16 at 5:14




    Well, your sample is pretty much useless.
    – Michael Vehrs
    May 24 '16 at 5:14




    1




    1




    The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
    – Michael Vehrs
    May 25 '16 at 4:51




    The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
    – Michael Vehrs
    May 25 '16 at 4:51












    up vote
    0
    down vote













    When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:



    $ sed -E 's:[0-9]+:n&n:g' filename 
    | grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1


    That puts any digit-string a line by itself, greps for your string, and prints the one after.



    It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.



    A slightly neater approach would use awk, definitely worth learning if you deal much with such things.






    share|improve this answer
























      up vote
      0
      down vote













      When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:



      $ sed -E 's:[0-9]+:n&n:g' filename 
      | grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1


      That puts any digit-string a line by itself, greps for your string, and prints the one after.



      It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.



      A slightly neater approach would use awk, definitely worth learning if you deal much with such things.






      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:



        $ sed -E 's:[0-9]+:n&n:g' filename 
        | grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1


        That puts any digit-string a line by itself, greps for your string, and prints the one after.



        It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.



        A slightly neater approach would use awk, definitely worth learning if you deal much with such things.






        share|improve this answer












        When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:



        $ sed -E 's:[0-9]+:n&n:g' filename 
        | grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1


        That puts any digit-string a line by itself, greps for your string, and prints the one after.



        It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.



        A slightly neater approach would use awk, definitely worth learning if you deal much with such things.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 25 '16 at 2:35









        James K. Lowden

        1,28469




        1,28469



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f284322%2fhave-sed-echo-string-after-match%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?