Command to find and combine files matching a complex name pattern

My Linux directory contains a dump of files and they look like:

EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv

These files follow few common name patterns such as

EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv

How can I -

1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

2) Each file contains a header. How can I combine all of them into one file and have only one header

asked Feb 11 at 19:07

Nik

add a comment |

My Linux directory contains a dump of files and they look like:

EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv

These files follow few common name patterns such as

EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv

How can I -

1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

2) Each file contains a header. How can I combine all of them into one file and have only one header

asked Feb 11 at 19:07

Nik

add a comment |

My Linux directory contains a dump of files and they look like:

EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv

These files follow few common name patterns such as

EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv

How can I -

1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

2) Each file contains a header. How can I combine all of them into one file and have only one header

asked Feb 11 at 19:07

Nik

My Linux directory contains a dump of files and they look like:

EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv

These files follow few common name patterns such as

EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv

How can I -

1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

2) Each file contains a header. How can I combine all of them into one file and have only one header

linux shell-script shell find

asked Feb 11 at 19:07

Nik

asked Feb 11 at 19:07

Nik

asked Feb 11 at 19:07

Nik

asked Feb 11 at 19:07

Nik

asked Feb 11 at 19:07

Nik

add a comment |

2 Answers
2

active

oldest

votes

I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv and so didn't want to use a * wildcard.

To gather the list of filenames ...

which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):

$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)

The pattern, apart from the plain text, is:

? -- any (single) character

(#c3,8) -- require between three and eight characters, inclusive

[[:digit:]] -- require a digit

(#c8) -- require eight of them

See the list with:

$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv

To then ...

combine all of them into one file and have only one header

 head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv

This groups two commands and redirects their output to output.csv. The first command, head, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).

edited Feb 12 at 14:17

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

add a comment |

You might want something like this

# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
 p=$f%_*.csv
 prefix[$p]=1
done

for p in "$!prefixes[@]"; do
 awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
 zip "$p".zip "$p"_all.csv
 rm "$p"_all.csv
done

For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.

answered Feb 11 at 20:22

glenn jackman

52.1k572112

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500013%2fcommand-to-find-and-combine-files-matching-a-complex-name-pattern%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

To gather the list of filenames ...

which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):

$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)

The pattern, apart from the plain text, is:

? -- any (single) character

(#c3,8) -- require between three and eight characters, inclusive

[[:digit:]] -- require a digit

(#c8) -- require eight of them

See the list with:

$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv

To then ...

combine all of them into one file and have only one header

 head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv

edited Feb 12 at 14:17

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

add a comment |

To gather the list of filenames ...

which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):

$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)

The pattern, apart from the plain text, is:

? -- any (single) character

(#c3,8) -- require between three and eight characters, inclusive

[[:digit:]] -- require a digit

(#c8) -- require eight of them

See the list with:

$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv

To then ...

combine all of them into one file and have only one header

 head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv

edited Feb 12 at 14:17

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

add a comment |

To gather the list of filenames ...

which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):

$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)

The pattern, apart from the plain text, is:

? -- any (single) character

(#c3,8) -- require between three and eight characters, inclusive

[[:digit:]] -- require a digit

(#c8) -- require eight of them

See the list with:

$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv

To then ...

combine all of them into one file and have only one header

 head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv

edited Feb 12 at 14:17

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

To gather the list of filenames ...

which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv

I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):

$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)

The pattern, apart from the plain text, is:

? -- any (single) character

(#c3,8) -- require between three and eight characters, inclusive

[[:digit:]] -- require a digit

(#c8) -- require eight of them

See the list with:

$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv

To then ...

combine all of them into one file and have only one header

 head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv

edited Feb 12 at 14:17

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

edited Feb 12 at 14:17

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

answered Feb 11 at 20:57

Jeff Schaller

42.9k1159137

add a comment |

You might want something like this

# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
 p=$f%_*.csv
 prefix[$p]=1
done

for p in "$!prefixes[@]"; do
 awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
 zip "$p".zip "$p"_all.csv
 rm "$p"_all.csv
done

For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.

answered Feb 11 at 20:22

glenn jackman

52.1k572112

add a comment |

You might want something like this

# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
 p=$f%_*.csv
 prefix[$p]=1
done

for p in "$!prefixes[@]"; do
 awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
 zip "$p".zip "$p"_all.csv
 rm "$p"_all.csv
done

For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.

answered Feb 11 at 20:22

glenn jackman

52.1k572112

add a comment |

You might want something like this

# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
 p=$f%_*.csv
 prefix[$p]=1
done

for p in "$!prefixes[@]"; do
 awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
 zip "$p".zip "$p"_all.csv
 rm "$p"_all.csv
done

For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.

answered Feb 11 at 20:22

glenn jackman

52.1k572112

You might want something like this

# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
 p=$f%_*.csv
 prefix[$p]=1
done

for p in "$!prefixes[@]"; do
 awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
 zip "$p".zip "$p"_all.csv
 rm "$p"_all.csv
done

For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.

answered Feb 11 at 20:22

glenn jackman

52.1k572112

answered Feb 11 at 20:22

glenn jackman

52.1k572112

answered Feb 11 at 20:22

glenn jackman

52.1k572112

answered Feb 11 at 20:22

glenn jackman

52.1k572112

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu