Command to find and combine files matching a complex name pattern
Clash Royale CLAN TAG#URR8PPP
My Linux directory contains a dump of files and they look like:
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv
These files follow few common name patterns such as
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv
How can I -
1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
2) Each file contains a header. How can I combine all of them into one file and have only one header
linux shell-script shell find
add a comment |
My Linux directory contains a dump of files and they look like:
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv
These files follow few common name patterns such as
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv
How can I -
1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
2) Each file contains a header. How can I combine all of them into one file and have only one header
linux shell-script shell find
add a comment |
My Linux directory contains a dump of files and they look like:
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv
These files follow few common name patterns such as
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv
How can I -
1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
2) Each file contains a header. How can I combine all of them into one file and have only one header
linux shell-script shell find
My Linux directory contains a dump of files and they look like:
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv
These files follow few common name patterns such as
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv
How can I -
1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
2) Each file contains a header. How can I combine all of them into one file and have only one header
linux shell-script shell find
linux shell-script shell find
asked Feb 11 at 19:07
NikNik
82
82
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv
and so didn't want to use a *
wildcard.
To gather the list of filenames ...
which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
I would set up this extended_glob pattern in zsh (don't type the $
-- that's the shell prompt):
$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)
The pattern, apart from the plain text, is:
?
-- any (single) character(#c3,8)
-- require between three and eight characters, inclusive[[:digit:]]
-- require a digit(#c8)
-- require eight of them
See the list with:
$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
To then ...
combine all of them into one file and have only one header
head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv
This groups two commands and redirects their output to output.csv
. The first command, head
, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).
add a comment |
You might want something like this
# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
p=$f%_*.csv
prefix[$p]=1
done
for p in "$!prefixes[@]"; do
awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
zip "$p".zip "$p"_all.csv
rm "$p"_all.csv
done
For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500013%2fcommand-to-find-and-combine-files-matching-a-complex-name-pattern%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv
and so didn't want to use a *
wildcard.
To gather the list of filenames ...
which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
I would set up this extended_glob pattern in zsh (don't type the $
-- that's the shell prompt):
$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)
The pattern, apart from the plain text, is:
?
-- any (single) character(#c3,8)
-- require between three and eight characters, inclusive[[:digit:]]
-- require a digit(#c8)
-- require eight of them
See the list with:
$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
To then ...
combine all of them into one file and have only one header
head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv
This groups two commands and redirects their output to output.csv
. The first command, head
, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).
add a comment |
I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv
and so didn't want to use a *
wildcard.
To gather the list of filenames ...
which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
I would set up this extended_glob pattern in zsh (don't type the $
-- that's the shell prompt):
$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)
The pattern, apart from the plain text, is:
?
-- any (single) character(#c3,8)
-- require between three and eight characters, inclusive[[:digit:]]
-- require a digit(#c8)
-- require eight of them
See the list with:
$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
To then ...
combine all of them into one file and have only one header
head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv
This groups two commands and redirects their output to output.csv
. The first command, head
, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).
add a comment |
I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv
and so didn't want to use a *
wildcard.
To gather the list of filenames ...
which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
I would set up this extended_glob pattern in zsh (don't type the $
-- that's the shell prompt):
$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)
The pattern, apart from the plain text, is:
?
-- any (single) character(#c3,8)
-- require between three and eight characters, inclusive[[:digit:]]
-- require a digit(#c8)
-- require eight of them
See the list with:
$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
To then ...
combine all of them into one file and have only one header
head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv
This groups two commands and redirects their output to output.csv
. The first command, head
, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).
I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv
and so didn't want to use a *
wildcard.
To gather the list of filenames ...
which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv
I would set up this extended_glob pattern in zsh (don't type the $
-- that's the shell prompt):
$ set -o extended_glob
$ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)
The pattern, apart from the plain text, is:
?
-- any (single) character(#c3,8)
-- require between three and eight characters, inclusive[[:digit:]]
-- require a digit(#c8)
-- require eight of them
See the list with:
$ print -l $files
EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
To then ...
combine all of them into one file and have only one header
head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv
This groups two commands and redirects their output to output.csv
. The first command, head
, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).
edited Feb 12 at 14:17
answered Feb 11 at 20:57
Jeff SchallerJeff Schaller
42.9k1159137
42.9k1159137
add a comment |
add a comment |
You might want something like this
# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
p=$f%_*.csv
prefix[$p]=1
done
for p in "$!prefixes[@]"; do
awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
zip "$p".zip "$p"_all.csv
rm "$p"_all.csv
done
For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.
add a comment |
You might want something like this
# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
p=$f%_*.csv
prefix[$p]=1
done
for p in "$!prefixes[@]"; do
awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
zip "$p".zip "$p"_all.csv
rm "$p"_all.csv
done
For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.
add a comment |
You might want something like this
# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
p=$f%_*.csv
prefix[$p]=1
done
for p in "$!prefixes[@]"; do
awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
zip "$p".zip "$p"_all.csv
rm "$p"_all.csv
done
For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.
You might want something like this
# collect all the "EDW_Infile_ABC" prefixes
declare -A prefix
for f in EDQ_Infile_*_Daily_Activity_*.csv; do
p=$f%_*.csv
prefix[$p]=1
done
for p in "$!prefixes[@]"; do
awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
zip "$p".zip "$p"_all.csv
rm "$p"_all.csv
done
For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.
answered Feb 11 at 20:22
glenn jackmanglenn jackman
52.1k572112
52.1k572112
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500013%2fcommand-to-find-and-combine-files-matching-a-complex-name-pattern%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown