Extract unique string from each line containing
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
Here is an example block of text from a file:
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid
Question:
How would I extract all unique numbers from lines that have "is the" in them?
I've tried using grep -o -P -u '(?<=blah:).*(?=;)
' but it doesn't like the semi colon
text-processing awk sed grep
add a comment |Â
up vote
0
down vote
favorite
Here is an example block of text from a file:
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid
Question:
How would I extract all unique numbers from lines that have "is the" in them?
I've tried using grep -o -P -u '(?<=blah:).*(?=;)
' but it doesn't like the semi colon
text-processing awk sed grep
1
Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
â blake
Sep 26 '17 at 18:16
I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
â blake
Sep 26 '17 at 18:29
1
You should put all your requirements in the question (you can edit your question)
â glenn jackman
Sep 26 '17 at 19:47
If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
â Kusalananda
Sep 26 '17 at 19:51
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Here is an example block of text from a file:
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid
Question:
How would I extract all unique numbers from lines that have "is the" in them?
I've tried using grep -o -P -u '(?<=blah:).*(?=;)
' but it doesn't like the semi colon
text-processing awk sed grep
Here is an example block of text from a file:
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid
Question:
How would I extract all unique numbers from lines that have "is the" in them?
I've tried using grep -o -P -u '(?<=blah:).*(?=;)
' but it doesn't like the semi colon
text-processing awk sed grep
text-processing awk sed grep
edited Sep 26 '17 at 18:44
Jeff Schaller
32.3k849110
32.3k849110
asked Sep 26 '17 at 17:41
blake
1
1
1
Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
â blake
Sep 26 '17 at 18:16
I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
â blake
Sep 26 '17 at 18:29
1
You should put all your requirements in the question (you can edit your question)
â glenn jackman
Sep 26 '17 at 19:47
If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
â Kusalananda
Sep 26 '17 at 19:51
add a comment |Â
1
Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
â blake
Sep 26 '17 at 18:16
I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
â blake
Sep 26 '17 at 18:29
1
You should put all your requirements in the question (you can edit your question)
â glenn jackman
Sep 26 '17 at 19:47
If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
â Kusalananda
Sep 26 '17 at 19:51
1
1
Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
â blake
Sep 26 '17 at 18:16
Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
â blake
Sep 26 '17 at 18:16
I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
â blake
Sep 26 '17 at 18:29
I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
â blake
Sep 26 '17 at 18:29
1
1
You should put all your requirements in the question (you can edit your question)
â glenn jackman
Sep 26 '17 at 19:47
You should put all your requirements in the question (you can edit your question)
â glenn jackman
Sep 26 '17 at 19:47
If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
â Kusalananda
Sep 26 '17 at 19:51
If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
â Kusalananda
Sep 26 '17 at 19:51
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
5
down vote
You're looking for the K
directive to forget about the stuff you just matched.
grep -oP 'is the.*?blah:Kd+'
Then sort -u
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
1
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
add a comment |Â
up vote
3
down vote
Using sed
:
$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876
The substitution replaces the contents of all lines containing the string is the
with the number between blah:
and ;
. Lines not containing the string are ignored.
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
@blake Thesed
solution would be easiest to modify:sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
add a comment |Â
up vote
0
down vote
cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u
You can combine cat|grep|awk|awk withgawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
add a comment |Â
up vote
0
down vote
Try this:
grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u
Explanation:
grep
gets all lines with "is the
" (in any part of the line)sed
remove all before ":
" and after ";
" (you could usesed -e 's/.*blah://' -e 's/;.*//'
instead for best understanding)sort
sorts lines
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
You're looking for the K
directive to forget about the stuff you just matched.
grep -oP 'is the.*?blah:Kd+'
Then sort -u
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
1
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
add a comment |Â
up vote
5
down vote
You're looking for the K
directive to forget about the stuff you just matched.
grep -oP 'is the.*?blah:Kd+'
Then sort -u
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
1
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
add a comment |Â
up vote
5
down vote
up vote
5
down vote
You're looking for the K
directive to forget about the stuff you just matched.
grep -oP 'is the.*?blah:Kd+'
Then sort -u
You're looking for the K
directive to forget about the stuff you just matched.
grep -oP 'is the.*?blah:Kd+'
Then sort -u
edited Sep 26 '17 at 19:47
answered Sep 26 '17 at 19:12
glenn jackman
47.4k265103
47.4k265103
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
1
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
add a comment |Â
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
1
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
grumble grumble, hidden requirements, mutter, mutter
â glenn jackman
Sep 26 '17 at 19:47
1
1
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
The story of every software developer...
â Kusalananda
Sep 26 '17 at 19:48
add a comment |Â
up vote
3
down vote
Using sed
:
$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876
The substitution replaces the contents of all lines containing the string is the
with the number between blah:
and ;
. Lines not containing the string are ignored.
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
@blake Thesed
solution would be easiest to modify:sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
add a comment |Â
up vote
3
down vote
Using sed
:
$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876
The substitution replaces the contents of all lines containing the string is the
with the number between blah:
and ;
. Lines not containing the string are ignored.
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
@blake Thesed
solution would be easiest to modify:sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
add a comment |Â
up vote
3
down vote
up vote
3
down vote
Using sed
:
$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876
The substitution replaces the contents of all lines containing the string is the
with the number between blah:
and ;
. Lines not containing the string are ignored.
Using sed
:
$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876
The substitution replaces the contents of all lines containing the string is the
with the number between blah:
and ;
. Lines not containing the string are ignored.
edited Sep 26 '17 at 18:34
answered Sep 26 '17 at 17:55
Kusalananda
106k14209327
106k14209327
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
@blake Thesed
solution would be easiest to modify:sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
add a comment |Â
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
@blake Thesed
solution would be easiest to modify:sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
â blake
Sep 26 '17 at 18:27
@blake The
sed
solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
@blake The
sed
solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
â Kusalananda
Sep 26 '17 at 18:33
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
Perfect! Thank you Kusalananda - that worked perfectly
â blake
Sep 26 '17 at 18:43
add a comment |Â
up vote
0
down vote
cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u
You can combine cat|grep|awk|awk withgawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
add a comment |Â
up vote
0
down vote
cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u
You can combine cat|grep|awk|awk withgawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
add a comment |Â
up vote
0
down vote
up vote
0
down vote
cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u
cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u
answered Sep 26 '17 at 18:01
Emilio Galarraga
32628
32628
You can combine cat|grep|awk|awk withgawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
add a comment |Â
You can combine cat|grep|awk|awk withgawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
You can combine cat|grep|awk|awk with
gawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
You can combine cat|grep|awk|awk with
gawk -F '[:;]' '/is the/ print $2' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:
gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
And then you can get the unique ids by using an associative array:
gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
â glenn jackman
Sep 26 '17 at 19:14
add a comment |Â
up vote
0
down vote
Try this:
grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u
Explanation:
grep
gets all lines with "is the
" (in any part of the line)sed
remove all before ":
" and after ";
" (you could usesed -e 's/.*blah://' -e 's/;.*//'
instead for best understanding)sort
sorts lines
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
add a comment |Â
up vote
0
down vote
Try this:
grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u
Explanation:
grep
gets all lines with "is the
" (in any part of the line)sed
remove all before ":
" and after ";
" (you could usesed -e 's/.*blah://' -e 's/;.*//'
instead for best understanding)sort
sorts lines
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Try this:
grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u
Explanation:
grep
gets all lines with "is the
" (in any part of the line)sed
remove all before ":
" and after ";
" (you could usesed -e 's/.*blah://' -e 's/;.*//'
instead for best understanding)sort
sorts lines
Try this:
grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u
Explanation:
grep
gets all lines with "is the
" (in any part of the line)sed
remove all before ":
" and after ";
" (you could usesed -e 's/.*blah://' -e 's/;.*//'
instead for best understanding)sort
sorts lines
edited Sep 26 '17 at 18:48
answered Sep 26 '17 at 17:56
Egor Vasilyev
1,792129
1,792129
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
add a comment |Â
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
â blake
Sep 26 '17 at 18:32
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
@blake i corrected the answer according your comments
â Egor Vasilyev
Sep 26 '17 at 18:50
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394597%2fextract-unique-string-from-each-line-containing-string%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
â blake
Sep 26 '17 at 18:16
I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
â blake
Sep 26 '17 at 18:29
1
You should put all your requirements in the question (you can edit your question)
â glenn jackman
Sep 26 '17 at 19:47
If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
â Kusalananda
Sep 26 '17 at 19:51