Have sed echo string after match
Clash Royale CLAN TAG#URR8PPP
up vote
-1
down vote
favorite
I need to have the string directly following/pbs.twimg.com/profile_images/
echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:
read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'
This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/
then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name=
I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.
I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.
EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/
gives the ID of the user and is on both private accounts and open accounts.
Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:
Username="thematrix1o1"
717835108540030976
But if I run it on a private account I will get no ID (because the line isn't there)
Username="touchmytweets"
.
(there is no dot in the report, it's just blank)
Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n
bash sed wget
add a comment |Â
up vote
-1
down vote
favorite
I need to have the string directly following/pbs.twimg.com/profile_images/
echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:
read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'
This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/
then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name=
I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.
I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.
EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/
gives the ID of the user and is on both private accounts and open accounts.
Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:
Username="thematrix1o1"
717835108540030976
But if I run it on a private account I will get no ID (because the line isn't there)
Username="touchmytweets"
.
(there is no dot in the report, it's just blank)
Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n
bash sed wget
2
I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
â Michael Vehrs
May 20 '16 at 12:11
@MichaelVehrs I have updated the question now. sorry about that.
â Matt
May 20 '16 at 14:31
So, basically, what you are looking for is just ased
expression that will return the number following "profile_images"?
â Michael Vehrs
May 21 '16 at 7:26
@MichaelVehrs yes.
â Matt
May 21 '16 at 19:12
add a comment |Â
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I need to have the string directly following/pbs.twimg.com/profile_images/
echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:
read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'
This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/
then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name=
I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.
I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.
EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/
gives the ID of the user and is on both private accounts and open accounts.
Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:
Username="thematrix1o1"
717835108540030976
But if I run it on a private account I will get no ID (because the line isn't there)
Username="touchmytweets"
.
(there is no dot in the report, it's just blank)
Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n
bash sed wget
I need to have the string directly following/pbs.twimg.com/profile_images/
echoed out. This is for a search tool I have created with a bit of help, and everyone recommends sed, but I never really use sed so I don't understand it well. But here is the script:
read -r Username ;
wget -q -O - https://twitter.com/"$Username" |
sed -n '/data-screen-name=.'"$Username"'".*data-user-id=/I
s/^.*data-screen-name=.'"$Username"'".*data-user-id="([0-9]*)".*$/1/Ip;q'
This script works perfectly fine for most accounts, but if the account is private then it will fail. If it searches instead for/pbs.twimg.com/profile_images/
then only the ID will come up and it doesn't come up with different variables like it does in the original script (that is why it selects the row with data-screem-name=
I can't use the twitter API because I'm exporting this to people who wouldn't even know how to go about obtaining the API and so I'm trying to make it as simple as possible for the user.
I have looked through several SO/SE posts and I don't believe this is a duplicate, so sorry if it is.
EDIT----
This doesn't work on private profiles because the line that it goes to is not inside the HTML. After going through further I noticed that /pbs.twimg.com/profile_images/
gives the ID of the user and is on both private accounts and open accounts.
Sample data:
Right now, if I were to run this on my account it would work and you would see my ID:
Username="thematrix1o1"
717835108540030976
But if I run it on a private account I will get no ID (because the line isn't there)
Username="touchmytweets"
.
(there is no dot in the report, it's just blank)
Here is the image with what I need it to find http://imgur.com/Yp8Okx7
As you can see, her ID is: 726618076633030656
Small Sample Report ile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/
726618076633030656/wwYbLwbs_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/418265825/1463628965","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":n
bash sed wget
bash sed wget
edited May 20 '16 at 14:31
asked May 20 '16 at 2:44
Matt
11
11
2
I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
â Michael Vehrs
May 20 '16 at 12:11
@MichaelVehrs I have updated the question now. sorry about that.
â Matt
May 20 '16 at 14:31
So, basically, what you are looking for is just ased
expression that will return the number following "profile_images"?
â Michael Vehrs
May 21 '16 at 7:26
@MichaelVehrs yes.
â Matt
May 21 '16 at 19:12
add a comment |Â
2
I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
â Michael Vehrs
May 20 '16 at 12:11
@MichaelVehrs I have updated the question now. sorry about that.
â Matt
May 20 '16 at 14:31
So, basically, what you are looking for is just ased
expression that will return the number following "profile_images"?
â Michael Vehrs
May 21 '16 at 7:26
@MichaelVehrs yes.
â Matt
May 21 '16 at 19:12
2
2
I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
â Michael Vehrs
May 20 '16 at 12:11
I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
â Michael Vehrs
May 20 '16 at 12:11
@MichaelVehrs I have updated the question now. sorry about that.
â Matt
May 20 '16 at 14:31
@MichaelVehrs I have updated the question now. sorry about that.
â Matt
May 20 '16 at 14:31
So, basically, what you are looking for is just a
sed
expression that will return the number following "profile_images"?â Michael Vehrs
May 21 '16 at 7:26
So, basically, what you are looking for is just a
sed
expression that will return the number following "profile_images"?â Michael Vehrs
May 21 '16 at 7:26
@MichaelVehrs yes.
â Matt
May 21 '16 at 19:12
@MichaelVehrs yes.
â Matt
May 21 '16 at 19:12
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
0
down vote
Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:
sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'
This could be made slightly more efficient by quitting immediately after that line has been processed.
In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
1
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
1
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
 |Â
show 4 more comments
up vote
0
down vote
When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:
$ sed -E 's:[0-9]+:n&n:g' filename
| grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1
That puts any digit-string a line by itself, greps for your string, and prints the one after.
It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.
A slightly neater approach would use awk, definitely worth learning if you deal much with such things.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:
sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'
This could be made slightly more efficient by quitting immediately after that line has been processed.
In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
1
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
1
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
 |Â
show 4 more comments
up vote
0
down vote
Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:
sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'
This could be made slightly more efficient by quitting immediately after that line has been processed.
In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
1
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
1
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
 |Â
show 4 more comments
up vote
0
down vote
up vote
0
down vote
Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:
sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'
This could be made slightly more efficient by quitting immediately after that line has been processed.
In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.
Your output contains any amount of random cruft, and one line you are interested in. Select that line, discard anything but the ID and print the result:
sed -n '/profile_images/s/.*profile_images\/([0-9]+).*/1/p'
This could be made slightly more efficient by quitting immediately after that line has been processed.
In fact, that is pretty much exactly what the original code does. The only thing that has changed is the regular expression used.
edited May 24 '16 at 5:19
answered May 22 '16 at 7:06
Michael Vehrs
2,19037
2,19037
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
1
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
1
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
 |Â
show 4 more comments
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
1
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
1
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
This ends up printing the entire set of HTML page into the terminal
â Matt
May 22 '16 at 15:14
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
Not for me. Are you sure the sample you posted is accurate?
â Michael Vehrs
May 22 '16 at 16:29
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
the sample i posted is a direct copy from the wget.
â Matt
May 22 '16 at 19:51
1
1
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
Well, your sample is pretty much useless.
â Michael Vehrs
May 24 '16 at 5:14
1
1
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
The problem is that the sample is not representative. The other lines are completely different. You should have included part of the surrounding HTML to give a better impression of the problem space.
â Michael Vehrs
May 25 '16 at 4:51
 |Â
show 4 more comments
up vote
0
down vote
When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:
$ sed -E 's:[0-9]+:n&n:g' filename
| grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1
That puts any digit-string a line by itself, greps for your string, and prints the one after.
It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.
A slightly neater approach would use awk, definitely worth learning if you deal much with such things.
add a comment |Â
up vote
0
down vote
When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:
$ sed -E 's:[0-9]+:n&n:g' filename
| grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1
That puts any digit-string a line by itself, greps for your string, and prints the one after.
It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.
A slightly neater approach would use awk, definitely worth learning if you deal much with such things.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:
$ sed -E 's:[0-9]+:n&n:g' filename
| grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1
That puts any digit-string a line by itself, greps for your string, and prints the one after.
It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.
A slightly neater approach would use awk, definitely worth learning if you deal much with such things.
When I'm confronted with needle-in-haystack work like this, I like to turn it into a line-oriented problem if I can. You might be able to do that with something like this:
$ sed -E 's:[0-9]+:n&n:g' filename
| grep -F -A1 '/pbs.twimg.com/profile_images' | tail -1
That puts any digit-string a line by itself, greps for your string, and prints the one after.
It's quite the hack; the right way to parse HTML is with an HTML parser. But it might get the job done for controlled inputs where you just need one string.
A slightly neater approach would use awk, definitely worth learning if you deal much with such things.
answered May 25 '16 at 2:35
James K. Lowden
1,28469
1,28469
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f284322%2fhave-sed-echo-string-after-match%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
I have no idea what that is supposed to mean. Please provide an example for the data you receive, and the output you expect.
â Michael Vehrs
May 20 '16 at 12:11
@MichaelVehrs I have updated the question now. sorry about that.
â Matt
May 20 '16 at 14:31
So, basically, what you are looking for is just a
sed
expression that will return the number following "profile_images"?â Michael Vehrs
May 21 '16 at 7:26
@MichaelVehrs yes.
â Matt
May 21 '16 at 19:12