Extract unique string from each line containing

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












Here is an example block of text from a file:




Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid



Question:
How would I extract all unique numbers from lines that have "is the" in them?



I've tried using grep -o -P -u '(?<=blah:).*(?=;)' but it doesn't like the semi colon










share|improve this question



















  • 1




    Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
    – blake
    Sep 26 '17 at 18:16










  • I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
    – blake
    Sep 26 '17 at 18:29






  • 1




    You should put all your requirements in the question (you can edit your question)
    – glenn jackman
    Sep 26 '17 at 19:47










  • If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
    – Kusalananda
    Sep 26 '17 at 19:51















up vote
0
down vote

favorite












Here is an example block of text from a file:




Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid



Question:
How would I extract all unique numbers from lines that have "is the" in them?



I've tried using grep -o -P -u '(?<=blah:).*(?=;)' but it doesn't like the semi colon










share|improve this question



















  • 1




    Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
    – blake
    Sep 26 '17 at 18:16










  • I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
    – blake
    Sep 26 '17 at 18:29






  • 1




    You should put all your requirements in the question (you can edit your question)
    – glenn jackman
    Sep 26 '17 at 19:47










  • If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
    – Kusalananda
    Sep 26 '17 at 19:51













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Here is an example block of text from a file:




Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid



Question:
How would I extract all unique numbers from lines that have "is the" in them?



I've tried using grep -o -P -u '(?<=blah:).*(?=;)' but it doesn't like the semi colon










share|improve this question















Here is an example block of text from a file:




Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid



Question:
How would I extract all unique numbers from lines that have "is the" in them?



I've tried using grep -o -P -u '(?<=blah:).*(?=;)' but it doesn't like the semi colon







text-processing awk sed grep






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 26 '17 at 18:44









Jeff Schaller

32.3k849110




32.3k849110










asked Sep 26 '17 at 17:41









blake

1




1







  • 1




    Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
    – blake
    Sep 26 '17 at 18:16










  • I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
    – blake
    Sep 26 '17 at 18:29






  • 1




    You should put all your requirements in the question (you can edit your question)
    – glenn jackman
    Sep 26 '17 at 19:47










  • If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
    – Kusalananda
    Sep 26 '17 at 19:51













  • 1




    Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
    – blake
    Sep 26 '17 at 18:16










  • I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
    – blake
    Sep 26 '17 at 18:29






  • 1




    You should put all your requirements in the question (you can edit your question)
    – glenn jackman
    Sep 26 '17 at 19:47










  • If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
    – Kusalananda
    Sep 26 '17 at 19:51








1




1




Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
– blake
Sep 26 '17 at 18:16




Ok a little clarification: the 'print $2' solution won't work as the text preceding "blah:" varies up to appx 30 characters and includes many special characters that get interpreted as separate fields
– blake
Sep 26 '17 at 18:16












I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
– blake
Sep 26 '17 at 18:29




I need to pull out only the numbers immediately following the "blah:" string because each line has multiple numbers followed by a colon
– blake
Sep 26 '17 at 18:29




1




1




You should put all your requirements in the question (you can edit your question)
– glenn jackman
Sep 26 '17 at 19:47




You should put all your requirements in the question (you can edit your question)
– glenn jackman
Sep 26 '17 at 19:47












If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
– Kusalananda
Sep 26 '17 at 19:51





If you're happy with one or several of the answers, upvote them. If one is solving your issue, accepting it would be the best way of saying "Thank You!" :-)
– Kusalananda
Sep 26 '17 at 19:51











4 Answers
4






active

oldest

votes

















up vote
5
down vote













You're looking for the K directive to forget about the stuff you just matched.



grep -oP 'is the.*?blah:Kd+'


Then sort -u






share|improve this answer






















  • grumble grumble, hidden requirements, mutter, mutter
    – glenn jackman
    Sep 26 '17 at 19:47






  • 1




    The story of every software developer...
    – Kusalananda
    Sep 26 '17 at 19:48

















up vote
3
down vote













Using sed:



$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876


The substitution replaces the contents of all lines containing the string is the with the number between blah: and ;. Lines not containing the string are ignored.






share|improve this answer






















  • This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
    – blake
    Sep 26 '17 at 18:27










  • @blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
    – Kusalananda
    Sep 26 '17 at 18:33










  • Perfect! Thank you Kusalananda - that worked perfectly
    – blake
    Sep 26 '17 at 18:43

















up vote
0
down vote













cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u





share|improve this answer




















  • You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
    – glenn jackman
    Sep 26 '17 at 19:14











  • And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
    – glenn jackman
    Sep 26 '17 at 19:14


















up vote
0
down vote













Try this:



grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u


Explanation:




  1. grep gets all lines with "is the" (in any part of the line)


  2. sed remove all before ":" and after ";" (you could use sed -e 's/.*blah://' -e 's/;.*//' instead for best understanding)


  3. sort sorts lines





share|improve this answer






















  • Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
    – blake
    Sep 26 '17 at 18:32










  • @blake i corrected the answer according your comments
    – Egor Vasilyev
    Sep 26 '17 at 18:50










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394597%2fextract-unique-string-from-each-line-containing-string%23new-answer', 'question_page');

);

Post as a guest






























4 Answers
4






active

oldest

votes








4 Answers
4






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote













You're looking for the K directive to forget about the stuff you just matched.



grep -oP 'is the.*?blah:Kd+'


Then sort -u






share|improve this answer






















  • grumble grumble, hidden requirements, mutter, mutter
    – glenn jackman
    Sep 26 '17 at 19:47






  • 1




    The story of every software developer...
    – Kusalananda
    Sep 26 '17 at 19:48














up vote
5
down vote













You're looking for the K directive to forget about the stuff you just matched.



grep -oP 'is the.*?blah:Kd+'


Then sort -u






share|improve this answer






















  • grumble grumble, hidden requirements, mutter, mutter
    – glenn jackman
    Sep 26 '17 at 19:47






  • 1




    The story of every software developer...
    – Kusalananda
    Sep 26 '17 at 19:48












up vote
5
down vote










up vote
5
down vote









You're looking for the K directive to forget about the stuff you just matched.



grep -oP 'is the.*?blah:Kd+'


Then sort -u






share|improve this answer














You're looking for the K directive to forget about the stuff you just matched.



grep -oP 'is the.*?blah:Kd+'


Then sort -u







share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 26 '17 at 19:47

























answered Sep 26 '17 at 19:12









glenn jackman

47.4k265103




47.4k265103











  • grumble grumble, hidden requirements, mutter, mutter
    – glenn jackman
    Sep 26 '17 at 19:47






  • 1




    The story of every software developer...
    – Kusalananda
    Sep 26 '17 at 19:48
















  • grumble grumble, hidden requirements, mutter, mutter
    – glenn jackman
    Sep 26 '17 at 19:47






  • 1




    The story of every software developer...
    – Kusalananda
    Sep 26 '17 at 19:48















grumble grumble, hidden requirements, mutter, mutter
– glenn jackman
Sep 26 '17 at 19:47




grumble grumble, hidden requirements, mutter, mutter
– glenn jackman
Sep 26 '17 at 19:47




1




1




The story of every software developer...
– Kusalananda
Sep 26 '17 at 19:48




The story of every software developer...
– Kusalananda
Sep 26 '17 at 19:48












up vote
3
down vote













Using sed:



$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876


The substitution replaces the contents of all lines containing the string is the with the number between blah: and ;. Lines not containing the string are ignored.






share|improve this answer






















  • This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
    – blake
    Sep 26 '17 at 18:27










  • @blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
    – Kusalananda
    Sep 26 '17 at 18:33










  • Perfect! Thank you Kusalananda - that worked perfectly
    – blake
    Sep 26 '17 at 18:43














up vote
3
down vote













Using sed:



$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876


The substitution replaces the contents of all lines containing the string is the with the number between blah: and ;. Lines not containing the string are ignored.






share|improve this answer






















  • This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
    – blake
    Sep 26 '17 at 18:27










  • @blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
    – Kusalananda
    Sep 26 '17 at 18:33










  • Perfect! Thank you Kusalananda - that worked perfectly
    – blake
    Sep 26 '17 at 18:43












up vote
3
down vote










up vote
3
down vote









Using sed:



$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876


The substitution replaces the contents of all lines containing the string is the with the number between blah: and ;. Lines not containing the string are ignored.






share|improve this answer














Using sed:



$ sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
1
10
108636
1194996
4321
9876


The substitution replaces the contents of all lines containing the string is the with the number between blah: and ;. Lines not containing the string are ignored.







share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 26 '17 at 18:34

























answered Sep 26 '17 at 17:55









Kusalananda

106k14209327




106k14209327











  • This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
    – blake
    Sep 26 '17 at 18:27










  • @blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
    – Kusalananda
    Sep 26 '17 at 18:33










  • Perfect! Thank you Kusalananda - that worked perfectly
    – blake
    Sep 26 '17 at 18:43
















  • This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
    – blake
    Sep 26 '17 at 18:27










  • @blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
    – Kusalananda
    Sep 26 '17 at 18:33










  • Perfect! Thank you Kusalananda - that worked perfectly
    – blake
    Sep 26 '17 at 18:43















This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
– blake
Sep 26 '17 at 18:27




This works great however I ran into a problem where some lines have multiple numbers followed by a colon, so I need to pull out ONLY the numbers immediately following the "blah:" string
– blake
Sep 26 '17 at 18:27












@blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
– Kusalananda
Sep 26 '17 at 18:33




@blake The sed solution would be easiest to modify: sed -n '/is the/s/^.*blah:([0-9]*);.*$/1/p' file | sort -u
– Kusalananda
Sep 26 '17 at 18:33












Perfect! Thank you Kusalananda - that worked perfectly
– blake
Sep 26 '17 at 18:43




Perfect! Thank you Kusalananda - that worked perfectly
– blake
Sep 26 '17 at 18:43










up vote
0
down vote













cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u





share|improve this answer




















  • You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
    – glenn jackman
    Sep 26 '17 at 19:14











  • And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
    – glenn jackman
    Sep 26 '17 at 19:14















up vote
0
down vote













cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u





share|improve this answer




















  • You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
    – glenn jackman
    Sep 26 '17 at 19:14











  • And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
    – glenn jackman
    Sep 26 '17 at 19:14













up vote
0
down vote










up vote
0
down vote









cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u





share|improve this answer












cat file | grep "is the" | awk -F':' 'print $2'|awk -F';' 'print $1'|sort -u






share|improve this answer












share|improve this answer



share|improve this answer










answered Sep 26 '17 at 18:01









Emilio Galarraga

32628




32628











  • You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
    – glenn jackman
    Sep 26 '17 at 19:14











  • And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
    – glenn jackman
    Sep 26 '17 at 19:14

















  • You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
    – glenn jackman
    Sep 26 '17 at 19:14











  • And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
    – glenn jackman
    Sep 26 '17 at 19:14
















You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
– glenn jackman
Sep 26 '17 at 19:14





You can combine cat|grep|awk|awk with gawk -F '[:;]' '/is the/ print $2' file
– glenn jackman
Sep 26 '17 at 19:14













And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
– glenn jackman
Sep 26 '17 at 19:14





And then you can get the unique ids by using an associative array: gawk -F '[:;]' '/is the/ id[$2] END for (i in id) print i' file
– glenn jackman
Sep 26 '17 at 19:14











up vote
0
down vote













Try this:



grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u


Explanation:




  1. grep gets all lines with "is the" (in any part of the line)


  2. sed remove all before ":" and after ";" (you could use sed -e 's/.*blah://' -e 's/;.*//' instead for best understanding)


  3. sort sorts lines





share|improve this answer






















  • Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
    – blake
    Sep 26 '17 at 18:32










  • @blake i corrected the answer according your comments
    – Egor Vasilyev
    Sep 26 '17 at 18:50














up vote
0
down vote













Try this:



grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u


Explanation:




  1. grep gets all lines with "is the" (in any part of the line)


  2. sed remove all before ":" and after ";" (you could use sed -e 's/.*blah://' -e 's/;.*//' instead for best understanding)


  3. sort sorts lines





share|improve this answer






















  • Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
    – blake
    Sep 26 '17 at 18:32










  • @blake i corrected the answer according your comments
    – Egor Vasilyev
    Sep 26 '17 at 18:50












up vote
0
down vote










up vote
0
down vote









Try this:



grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u


Explanation:




  1. grep gets all lines with "is the" (in any part of the line)


  2. sed remove all before ":" and after ";" (you could use sed -e 's/.*blah://' -e 's/;.*//' instead for best understanding)


  3. sort sorts lines





share|improve this answer














Try this:



grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u


Explanation:




  1. grep gets all lines with "is the" (in any part of the line)


  2. sed remove all before ":" and after ";" (you could use sed -e 's/.*blah://' -e 's/;.*//' instead for best understanding)


  3. sort sorts lines






share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 26 '17 at 18:48

























answered Sep 26 '17 at 17:56









Egor Vasilyev

1,792129




1,792129











  • Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
    – blake
    Sep 26 '17 at 18:32










  • @blake i corrected the answer according your comments
    – Egor Vasilyev
    Sep 26 '17 at 18:50
















  • Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
    – blake
    Sep 26 '17 at 18:32










  • @blake i corrected the answer according your comments
    – Egor Vasilyev
    Sep 26 '17 at 18:50















Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
– blake
Sep 26 '17 at 18:32




Thanks Egor - I should have been more clear. in some of the lines there are multiple numbers followed by a colon. I need to pull out ONLY the numbers following the "blah:" string
– blake
Sep 26 '17 at 18:32












@blake i corrected the answer according your comments
– Egor Vasilyev
Sep 26 '17 at 18:50




@blake i corrected the answer according your comments
– Egor Vasilyev
Sep 26 '17 at 18:50

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394597%2fextract-unique-string-from-each-line-containing-string%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?