Find only GUIDs in file

Find only GUIDs in file - Bash

I have a file that might contain GUIDs (their canonical textual representation).

I want to do an action for each GUID in the file. It might contain any number of GUIDs.

I have already a file ready for reading. How do I spot the GUIDS?

I know I need to use while read FILENAME

An example of my file :

GUIDs
--------------------------------------
cf6e328c-c918-4d2f-80d3-71ecaf09bf7b
91d523b0-4926-456e-a9d2-ade713f5b07f
(2 rows)
// THERE IS AN EMPTY LINE HERE AFTER NUMBER OF ROWS

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

asked Jan 15 at 7:41

MathEnthusiast

233

Post your sample file.

– Tuyen Pham
Jan 15 at 7:44

You're looking for any digit(s) from 0 to 10k, in any format? Or what exactly

– Xen2050
Jan 15 at 7:46

I wrote a file as example

– MathEnthusiast
Jan 15 at 7:47

What's the action you want to perform? It alters the possible solution

– roaima
Jan 15 at 7:49

I need to run a command and then wait 5 seconds

– MathEnthusiast
Jan 15 at 7:50

|
show 2 more comments

I have a file that might contain GUIDs (their canonical textual representation).

I want to do an action for each GUID in the file. It might contain any number of GUIDs.

I have already a file ready for reading. How do I spot the GUIDS?

I know I need to use while read FILENAME

An example of my file :

GUIDs
--------------------------------------
cf6e328c-c918-4d2f-80d3-71ecaf09bf7b
91d523b0-4926-456e-a9d2-ade713f5b07f
(2 rows)
// THERE IS AN EMPTY LINE HERE AFTER NUMBER OF ROWS

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

asked Jan 15 at 7:41

MathEnthusiast

233

Post your sample file.

– Tuyen Pham
Jan 15 at 7:44

You're looking for any digit(s) from 0 to 10k, in any format? Or what exactly

– Xen2050
Jan 15 at 7:46

I wrote a file as example

– MathEnthusiast
Jan 15 at 7:47

What's the action you want to perform? It alters the possible solution

– roaima
Jan 15 at 7:49

I need to run a command and then wait 5 seconds

– MathEnthusiast
Jan 15 at 7:50

|
show 2 more comments

I have a file that might contain GUIDs (their canonical textual representation).

I want to do an action for each GUID in the file. It might contain any number of GUIDs.

I have already a file ready for reading. How do I spot the GUIDS?

I know I need to use while read FILENAME

An example of my file :

GUIDs
--------------------------------------
cf6e328c-c918-4d2f-80d3-71ecaf09bf7b
91d523b0-4926-456e-a9d2-ade713f5b07f
(2 rows)
// THERE IS AN EMPTY LINE HERE AFTER NUMBER OF ROWS

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

asked Jan 15 at 7:41

MathEnthusiast

233

I have a file that might contain GUIDs (their canonical textual representation).

I want to do an action for each GUID in the file. It might contain any number of GUIDs.

I have already a file ready for reading. How do I spot the GUIDS?

I know I need to use while read FILENAME

An example of my file :

GUIDs
--------------------------------------
cf6e328c-c918-4d2f-80d3-71ecaf09bf7b
91d523b0-4926-456e-a9d2-ade713f5b07f
(2 rows)
// THERE IS AN EMPTY LINE HERE AFTER NUMBER OF ROWS

bash shell-script scripting wildcards

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

asked Jan 15 at 7:41

MathEnthusiast

233

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

asked Jan 15 at 7:41

MathEnthusiast

233

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

edited Jan 15 at 8:04

Stéphane Chazelas

303k57570926

asked Jan 15 at 7:41

MathEnthusiast

233

asked Jan 15 at 7:41

MathEnthusiast

233

asked Jan 15 at 7:41

MathEnthusiast

233

Post your sample file.

– Tuyen Pham
Jan 15 at 7:44

You're looking for any digit(s) from 0 to 10k, in any format? Or what exactly

– Xen2050
Jan 15 at 7:46

I wrote a file as example

– MathEnthusiast
Jan 15 at 7:47

What's the action you want to perform? It alters the possible solution

– roaima
Jan 15 at 7:49

I need to run a command and then wait 5 seconds

– MathEnthusiast
Jan 15 at 7:50

|
show 2 more comments

Post your sample file.

– Tuyen Pham
Jan 15 at 7:44

You're looking for any digit(s) from 0 to 10k, in any format? Or what exactly

– Xen2050
Jan 15 at 7:46

I wrote a file as example

– MathEnthusiast
Jan 15 at 7:47

What's the action you want to perform? It alters the possible solution

– roaima
Jan 15 at 7:49

I need to run a command and then wait 5 seconds

– MathEnthusiast
Jan 15 at 7:50

Post your sample file.

– Tuyen Pham
Jan 15 at 7:44

You're looking for any digit(s) from 0 to 10k, in any format? Or what exactly

– Xen2050
Jan 15 at 7:46

I wrote a file as example

– MathEnthusiast
Jan 15 at 7:47

What's the action you want to perform? It alters the possible solution

– roaima
Jan 15 at 7:49

I need to run a command and then wait 5 seconds

– MathEnthusiast
Jan 15 at 7:50

|
show 2 more comments

2 Answers
2

active

oldest

votes

With the GNU implementation of grep (or compatible):

<your-file grep -Ewo '[[:xdigit:]]8(-[[:xdigit:]]4)3-[[:xdigit:]]12' |
 while IFS= read -r guid; do
 your-action "$guid"
 sleep 5
 done

Would find those GUIDs wherever they are in the input (and provided they are neither preceded nor followed by word characters).

GNU grep has a -o option that prints the non-empty matches of the regular expression.

-w is another non-standard extension coming I believe from SysV to match on whole words only. It matches only if the matched text is between a transition between a non-word and word character and one between a word and non-word character (where word characters are alphanumerics or underscore). That's to guard against matching on things like:


aaaaaaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaaaaaaaaaaa

The rest is standard POSIX syntax. Note that [[:xdigit:]] matches on ABCDEF as well. You can replace it with [0123456789abcdef] if you want to match only lower case GUIDs.

edited Jan 15 at 10:45

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

Can you please explain? What is that "<" in the beginning ? Also - what is GNU tools ? Can we assume my file name is GUIDS.TXT ?

– MathEnthusiast
Jan 15 at 7:51

Also - what is GNU tools ?

– MathEnthusiast
Jan 15 at 7:53

@MathEnthusiast, see edit. The GNU project is an effort by the Free Software Foundation to provide with a FLOSS reimplementation of Unix. Some people confuse it with Linux as GNU systems generally use Linux as their kernel. They have written extended versions of the Unix utilities (like grep here) which support extensions like that -o and < (< was in SysV grep before GNU's). GNU utilities are now more common than the original versions, and many other non-GNU implementations have copied some of the GNU extensions. In particular, -o is found in many other implementations.

– Stéphane Chazelas
Jan 15 at 8:01

@StéphaneChazelas, how do you guard against matching cf6e328c-c918-4d2f-80d3-71ecaf09bf7b-91d523b0-4926-456e-a9d2-ade713f5b07f? (i.e. some non-guid thing that looks like two guids joined by a hyphen)

– Noach
Jan 15 at 9:58

@StéphaneChazelas: What edge-case are you guarding for with the IFS= read -r vs. a simple read?

– Noach
Jan 15 at 10:01

|
show 2 more comments

While I love Regular Expressions, I prefer to avoid over-specifying.
For this particular data set (known data format, one GUID per line, plus header and footer), I'd just strip out the header/footers:

$ cat guids.txt | egrep -v 'GUIDs|--|rows|^$' |
 while read guid ; do
 some_command "$guid"
 sleep 5
 done

Alternatively, I'd grep out the lines I want, but also keep the regexp as simple as possible for the current data set:

egrep '^[0-9a-f-]36$'

edited Jan 23 at 8:45

answered Jan 15 at 9:56

Noach

1904

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f494546%2ffind-only-guids-in-file-bash%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

With the GNU implementation of grep (or compatible):

<your-file grep -Ewo '[[:xdigit:]]8(-[[:xdigit:]]4)3-[[:xdigit:]]12' |
 while IFS= read -r guid; do
 your-action "$guid"
 sleep 5
 done

Would find those GUIDs wherever they are in the input (and provided they are neither preceded nor followed by word characters).

GNU grep has a -o option that prints the non-empty matches of the regular expression.


aaaaaaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaaaaaaaaaaa

The rest is standard POSIX syntax. Note that [[:xdigit:]] matches on ABCDEF as well. You can replace it with [0123456789abcdef] if you want to match only lower case GUIDs.

edited Jan 15 at 10:45

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

Can you please explain? What is that "<" in the beginning ? Also - what is GNU tools ? Can we assume my file name is GUIDS.TXT ?

– MathEnthusiast
Jan 15 at 7:51

Also - what is GNU tools ?

– MathEnthusiast
Jan 15 at 7:53

@MathEnthusiast, see edit. The GNU project is an effort by the Free Software Foundation to provide with a FLOSS reimplementation of Unix. Some people confuse it with Linux as GNU systems generally use Linux as their kernel. They have written extended versions of the Unix utilities (like grep here) which support extensions like that -o and < (< was in SysV grep before GNU's). GNU utilities are now more common than the original versions, and many other non-GNU implementations have copied some of the GNU extensions. In particular, -o is found in many other implementations.

– Stéphane Chazelas
Jan 15 at 8:01

@StéphaneChazelas, how do you guard against matching cf6e328c-c918-4d2f-80d3-71ecaf09bf7b-91d523b0-4926-456e-a9d2-ade713f5b07f? (i.e. some non-guid thing that looks like two guids joined by a hyphen)

– Noach
Jan 15 at 9:58

@StéphaneChazelas: What edge-case are you guarding for with the IFS= read -r vs. a simple read?

– Noach
Jan 15 at 10:01

|
show 2 more comments

With the GNU implementation of grep (or compatible):

<your-file grep -Ewo '[[:xdigit:]]8(-[[:xdigit:]]4)3-[[:xdigit:]]12' |
 while IFS= read -r guid; do
 your-action "$guid"
 sleep 5
 done

Would find those GUIDs wherever they are in the input (and provided they are neither preceded nor followed by word characters).

GNU grep has a -o option that prints the non-empty matches of the regular expression.


aaaaaaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaaaaaaaaaaa

The rest is standard POSIX syntax. Note that [[:xdigit:]] matches on ABCDEF as well. You can replace it with [0123456789abcdef] if you want to match only lower case GUIDs.

edited Jan 15 at 10:45

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

Can you please explain? What is that "<" in the beginning ? Also - what is GNU tools ? Can we assume my file name is GUIDS.TXT ?

– MathEnthusiast
Jan 15 at 7:51

Also - what is GNU tools ?

– MathEnthusiast
Jan 15 at 7:53

@MathEnthusiast, see edit. The GNU project is an effort by the Free Software Foundation to provide with a FLOSS reimplementation of Unix. Some people confuse it with Linux as GNU systems generally use Linux as their kernel. They have written extended versions of the Unix utilities (like grep here) which support extensions like that -o and < (< was in SysV grep before GNU's). GNU utilities are now more common than the original versions, and many other non-GNU implementations have copied some of the GNU extensions. In particular, -o is found in many other implementations.

– Stéphane Chazelas
Jan 15 at 8:01

@StéphaneChazelas, how do you guard against matching cf6e328c-c918-4d2f-80d3-71ecaf09bf7b-91d523b0-4926-456e-a9d2-ade713f5b07f? (i.e. some non-guid thing that looks like two guids joined by a hyphen)

– Noach
Jan 15 at 9:58

@StéphaneChazelas: What edge-case are you guarding for with the IFS= read -r vs. a simple read?

– Noach
Jan 15 at 10:01

|
show 2 more comments

With the GNU implementation of grep (or compatible):

<your-file grep -Ewo '[[:xdigit:]]8(-[[:xdigit:]]4)3-[[:xdigit:]]12' |
 while IFS= read -r guid; do
 your-action "$guid"
 sleep 5
 done

Would find those GUIDs wherever they are in the input (and provided they are neither preceded nor followed by word characters).

GNU grep has a -o option that prints the non-empty matches of the regular expression.


aaaaaaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaaaaaaaaaaa

The rest is standard POSIX syntax. Note that [[:xdigit:]] matches on ABCDEF as well. You can replace it with [0123456789abcdef] if you want to match only lower case GUIDs.

edited Jan 15 at 10:45

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

With the GNU implementation of grep (or compatible):

<your-file grep -Ewo '[[:xdigit:]]8(-[[:xdigit:]]4)3-[[:xdigit:]]12' |
 while IFS= read -r guid; do
 your-action "$guid"
 sleep 5
 done

Would find those GUIDs wherever they are in the input (and provided they are neither preceded nor followed by word characters).

GNU grep has a -o option that prints the non-empty matches of the regular expression.


aaaaaaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaaaaaaaaaaa

The rest is standard POSIX syntax. Note that [[:xdigit:]] matches on ABCDEF as well. You can replace it with [0123456789abcdef] if you want to match only lower case GUIDs.

edited Jan 15 at 10:45

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

edited Jan 15 at 10:45

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

answered Jan 15 at 7:49

Stéphane Chazelas

303k57570926

Can you please explain? What is that "<" in the beginning ? Also - what is GNU tools ? Can we assume my file name is GUIDS.TXT ?

– MathEnthusiast
Jan 15 at 7:51

Also - what is GNU tools ?

– MathEnthusiast
Jan 15 at 7:53

@MathEnthusiast, see edit. The GNU project is an effort by the Free Software Foundation to provide with a FLOSS reimplementation of Unix. Some people confuse it with Linux as GNU systems generally use Linux as their kernel. They have written extended versions of the Unix utilities (like grep here) which support extensions like that -o and < (< was in SysV grep before GNU's). GNU utilities are now more common than the original versions, and many other non-GNU implementations have copied some of the GNU extensions. In particular, -o is found in many other implementations.

– Stéphane Chazelas
Jan 15 at 8:01

@StéphaneChazelas, how do you guard against matching cf6e328c-c918-4d2f-80d3-71ecaf09bf7b-91d523b0-4926-456e-a9d2-ade713f5b07f? (i.e. some non-guid thing that looks like two guids joined by a hyphen)

– Noach
Jan 15 at 9:58

@StéphaneChazelas: What edge-case are you guarding for with the IFS= read -r vs. a simple read?

– Noach
Jan 15 at 10:01

|
show 2 more comments

Can you please explain? What is that "<" in the beginning ? Also - what is GNU tools ? Can we assume my file name is GUIDS.TXT ?

– MathEnthusiast
Jan 15 at 7:51

Also - what is GNU tools ?

– MathEnthusiast
Jan 15 at 7:53

@MathEnthusiast, see edit. The GNU project is an effort by the Free Software Foundation to provide with a FLOSS reimplementation of Unix. Some people confuse it with Linux as GNU systems generally use Linux as their kernel. They have written extended versions of the Unix utilities (like grep here) which support extensions like that -o and < (< was in SysV grep before GNU's). GNU utilities are now more common than the original versions, and many other non-GNU implementations have copied some of the GNU extensions. In particular, -o is found in many other implementations.

– Stéphane Chazelas
Jan 15 at 8:01

@StéphaneChazelas, how do you guard against matching cf6e328c-c918-4d2f-80d3-71ecaf09bf7b-91d523b0-4926-456e-a9d2-ade713f5b07f? (i.e. some non-guid thing that looks like two guids joined by a hyphen)

– Noach
Jan 15 at 9:58

@StéphaneChazelas: What edge-case are you guarding for with the IFS= read -r vs. a simple read?

– Noach
Jan 15 at 10:01

Can you please explain? What is that "<" in the beginning ? Also - what is GNU tools ? Can we assume my file name is GUIDS.TXT ?

– MathEnthusiast
Jan 15 at 7:51

Also - what is GNU tools ?

– MathEnthusiast
Jan 15 at 7:53

@MathEnthusiast, see edit. The GNU project is an effort by the Free Software Foundation to provide with a FLOSS reimplementation of Unix. Some people confuse it with Linux as GNU systems generally use Linux as their kernel. They have written extended versions of the Unix utilities (like grep here) which support extensions like that -o and < (< was in SysV grep before GNU's). GNU utilities are now more common than the original versions, and many other non-GNU implementations have copied some of the GNU extensions. In particular, -o is found in many other implementations.

– Stéphane Chazelas
Jan 15 at 8:01

@StéphaneChazelas, how do you guard against matching cf6e328c-c918-4d2f-80d3-71ecaf09bf7b-91d523b0-4926-456e-a9d2-ade713f5b07f? (i.e. some non-guid thing that looks like two guids joined by a hyphen)

– Noach
Jan 15 at 9:58

@StéphaneChazelas: What edge-case are you guarding for with the IFS= read -r vs. a simple read?

– Noach
Jan 15 at 10:01

|
show 2 more comments

$ cat guids.txt | egrep -v 'GUIDs|--|rows|^$' |
 while read guid ; do
 some_command "$guid"
 sleep 5
 done

Alternatively, I'd grep out the lines I want, but also keep the regexp as simple as possible for the current data set:

egrep '^[0-9a-f-]36$'

edited Jan 23 at 8:45

answered Jan 15 at 9:56

Noach

1904

add a comment |

$ cat guids.txt | egrep -v 'GUIDs|--|rows|^$' |
 while read guid ; do
 some_command "$guid"
 sleep 5
 done

Alternatively, I'd grep out the lines I want, but also keep the regexp as simple as possible for the current data set:

egrep '^[0-9a-f-]36$'

edited Jan 23 at 8:45

answered Jan 15 at 9:56

Noach

1904

add a comment |

$ cat guids.txt | egrep -v 'GUIDs|--|rows|^$' |
 while read guid ; do
 some_command "$guid"
 sleep 5
 done

Alternatively, I'd grep out the lines I want, but also keep the regexp as simple as possible for the current data set:

egrep '^[0-9a-f-]36$'

edited Jan 23 at 8:45

answered Jan 15 at 9:56

Noach

1904

$ cat guids.txt | egrep -v 'GUIDs|--|rows|^$' |
 while read guid ; do
 some_command "$guid"
 sleep 5
 done

Alternatively, I'd grep out the lines I want, but also keep the regexp as simple as possible for the current data set:

egrep '^[0-9a-f-]36$'

edited Jan 23 at 8:45

answered Jan 15 at 9:56

Noach

1904

edited Jan 23 at 8:45

answered Jan 15 at 9:56

Noach

1904

answered Jan 15 at 9:56

Noach

1904

answered Jan 15 at 9:56

Noach

1904

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu