Delete .pdf files only if .xlsx files in directory have same filename?

Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
add a comment |Â
up vote
3
down vote
favorite
I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
1
Are they.xlsor.xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
â Jesse_b
Aug 10 at 16:28
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.
These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.
bash shell files directory rm
bash shell files directory rm
edited Aug 10 at 16:28
Jesse_b
10.5k22659
10.5k22659
asked Aug 10 at 16:26
ck_chicago
161
161
1
Are they.xlsor.xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
â Jesse_b
Aug 10 at 16:28
add a comment |Â
1
Are they.xlsor.xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
â Jesse_b
Aug 10 at 16:28
1
1
Are they
.xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?â Jesse_b
Aug 10 at 16:28
Are they
.xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?â Jesse_b
Aug 10 at 16:28
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
6
down vote
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now. Sorry.
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯ABCD_0000005.xlsxandABCD_0000005.pdfboth exist, your code leavesABCD_0000005.pdfalone.â Butâ¯ifâ¯important.pdfexists, and thereâÂÂs no corresponding spreadsheet, your code deletesimportant.pdf.
â G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
â Xalorous
Aug 10 at 17:51
 |Â
show 1 more comment
up vote
3
down vote
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done
If there's no matching pdf it won't hurt anything.
You donâÂÂt really need the""; i.e., you could do/bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.
â G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now. Sorry.
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯ABCD_0000005.xlsxandABCD_0000005.pdfboth exist, your code leavesABCD_0000005.pdfalone.â Butâ¯ifâ¯important.pdfexists, and thereâÂÂs no corresponding spreadsheet, your code deletesimportant.pdf.
â G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
â Xalorous
Aug 10 at 17:51
 |Â
show 1 more comment
up vote
6
down vote
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now. Sorry.
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯ABCD_0000005.xlsxandABCD_0000005.pdfboth exist, your code leavesABCD_0000005.pdfalone.â Butâ¯ifâ¯important.pdfexists, and thereâÂÂs no corresponding spreadsheet, your code deletesimportant.pdf.
â G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
â Xalorous
Aug 10 at 17:51
 |Â
show 1 more comment
up vote
6
down vote
up vote
6
down vote
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now. Sorry.
Loop over the pdf files, use parameter expansion to extract the basename:
#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done
Update: I got the logic backwards, should be fixed now. Sorry.
edited Aug 10 at 21:28
answered Aug 10 at 16:31
choroba
24.5k34168
24.5k34168
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯ABCD_0000005.xlsxandABCD_0000005.pdfboth exist, your code leavesABCD_0000005.pdfalone.â Butâ¯ifâ¯important.pdfexists, and thereâÂÂs no corresponding spreadsheet, your code deletesimportant.pdf.
â G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
â Xalorous
Aug 10 at 17:51
 |Â
show 1 more comment
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯ABCD_0000005.xlsxandABCD_0000005.pdfboth exist, your code leavesABCD_0000005.pdfalone.â Butâ¯ifâ¯important.pdfexists, and thereâÂÂs no corresponding spreadsheet, your code deletesimportant.pdf.
â G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to justrm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
â Xalorous
Aug 10 at 17:51
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
â Xalorous
Aug 10 at 17:00
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
@Xalorous: But your "alternative" removed the files my solution tried to keep.
â choroba
Aug 10 at 17:01
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
â Xalorous
Aug 10 at 17:04
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯
ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.â Butâ¯ifâ¯important.pdf exists, and thereâÂÂs no corresponding spreadsheet, your code deletes important.pdf.â G-Man
Aug 10 at 17:43
I agree with your decision to reject the suggested edit â new answers should be posted as new answers, not edits.â But I believe that Xalorous got it right and you (choroba) got it backwards.â Ifâ¯
ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.â Butâ¯ifâ¯important.pdf exists, and thereâÂÂs no corresponding spreadsheet, your code deletes important.pdf.â G-Man
Aug 10 at 17:43
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just
rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.â Xalorous
Aug 10 at 17:51
Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just
rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.â Xalorous
Aug 10 at 17:51
 |Â
show 1 more comment
up vote
3
down vote
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done
If there's no matching pdf it won't hurt anything.
You donâÂÂt really need the""; i.e., you could do/bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.
â G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
add a comment |Â
up vote
3
down vote
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done
If there's no matching pdf it won't hurt anything.
You donâÂÂt really need the""; i.e., you could do/bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.
â G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
add a comment |Â
up vote
3
down vote
up vote
3
down vote
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done
If there's no matching pdf it won't hurt anything.
Loop over the .xls(x) files and remove matching pdf files.
for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done
If there's no matching pdf it won't hurt anything.
answered Aug 10 at 17:04
Xalorous
22118
22118
You donâÂÂt really need the""; i.e., you could do/bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.
â G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
add a comment |Â
You donâÂÂt really need the""; i.e., you could do/bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.
â G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
You donâÂÂt really need the
""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.â G-Man
Aug 10 at 17:42
You donâÂÂt really need the
""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".â But this looks like it should work.â G-Man
Aug 10 at 17:42
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
â Xalorous
Aug 10 at 17:46
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Are they
.xlsor.xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?â Jesse_b
Aug 10 at 16:28