Delete .pdf files only if .xlsx files in directory have same filename?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.










share|improve this question



















  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28














up vote
3
down vote

favorite












I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.










share|improve this question



















  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.










share|improve this question















I have folders with hundreds of pdf and xls(x) files that were mass exported from legal e-discovery systems. The filenames in these exports correspond to bates # such as ABCD_00000001.pdf, ABCD_00000002.pdf, ... , ABCD_00002000.pdf. These mass exports include a blank pdf file for every single xls(x) file - with both having the exact same filename. E.g., ABCD_00000005.xlsx is the xlsx file that was produced in the ediscovery system and ABCD_00000005.pdf is an extraneous blank pdf file that was created in the mass export.



These extraneous .pdf files probably result from a user error on the part of the people running these mass exports, but I don't usually have control over that side of the process. So I wanted to know if any relatively straightforward way to delete these extraneous .pdf without forcing someone to go through them manually.







bash shell files directory rm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 10 at 16:28









Jesse_b

10.5k22659




10.5k22659










asked Aug 10 at 16:26









ck_chicago

161




161







  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28












  • 1




    Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
    – Jesse_b
    Aug 10 at 16:28







1




1




Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28




Are they .xls or .xlsx? Or could they be either? Why can't you just delete all pdf files in that directory? Are there some pdf files you would like to save?
– Jesse_b
Aug 10 at 16:28










2 Answers
2






active

oldest

votes

















up vote
6
down vote













Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now. Sorry.






share|improve this answer






















  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01











  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43











  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51


















up vote
3
down vote













Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer




















  • You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
6
down vote













Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now. Sorry.






share|improve this answer






















  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01











  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43











  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51















up vote
6
down vote













Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now. Sorry.






share|improve this answer






















  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01











  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43











  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51













up vote
6
down vote










up vote
6
down vote









Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now. Sorry.






share|improve this answer














Loop over the pdf files, use parameter expansion to extract the basename:



#!/bin/bash
for pdf in *.pdf ; do
basename=$pdf%.pdf
if [[ -f $basename.xls || -f $basename.xlsx ]] ; then
rm "$pdf"
fi
done


Update: I got the logic backwards, should be fixed now. Sorry.







share|improve this answer














share|improve this answer



share|improve this answer








edited Aug 10 at 21:28

























answered Aug 10 at 16:31









choroba

24.5k34168




24.5k34168











  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01











  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43











  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51

















  • Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
    – Xalorous
    Aug 10 at 17:00










  • @Xalorous: But your "alternative" removed the files my solution tried to keep.
    – choroba
    Aug 10 at 17:01











  • It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
    – Xalorous
    Aug 10 at 17:04










  • I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
    – G-Man
    Aug 10 at 17:43











  • Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
    – Xalorous
    Aug 10 at 17:51
















Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00




Added slightly more efficient alternative. If this folder is extensive, it could make a difference in processing time.
– Xalorous
Aug 10 at 17:00












@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01





@Xalorous: But your "alternative" removed the files my solution tried to keep.
– choroba
Aug 10 at 17:01













It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04




It's possible that I pasted the wrong one. I'll post a separate answer with the more efficient version.
– Xalorous
Aug 10 at 17:04












I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
– G-Man
Aug 10 at 17:43





I agree with your decision to reject the suggested edit — new answers should be posted as new answers, not edits.  But I believe that Xalorous got it right and you (choroba) got it backwards.  If ABCD_0000005.xlsx and ABCD_0000005.pdf both exist, your code leaves ABCD_0000005.pdf alone.  But if important.pdf exists, and there’s no corresponding spreadsheet, your code deletes important.pdf.
– G-Man
Aug 10 at 17:43













Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51





Actually, I missed that his logic is inverted. Taking out the ! would fix it though. The real question is what if there's a pdf without matching xls(x). What does OP want to happen then? If they want them gone, then they need to just rm -rf *.pdf. If not, then my answer. Reading the question strictly, we're only to delete pdf files with matching xls(x) files.
– Xalorous
Aug 10 at 17:51













up vote
3
down vote













Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer




















  • You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46














up vote
3
down vote













Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer




















  • You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46












up vote
3
down vote










up vote
3
down vote









Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done


If there's no matching pdf it won't hurt anything.






share|improve this answer












Loop over the .xls(x) files and remove matching pdf files.



for xls in *.xls* ; do
/bin/rm -f "$xls%.xls*"".pdf"
done


If there's no matching pdf it won't hurt anything.







share|improve this answer












share|improve this answer



share|improve this answer










answered Aug 10 at 17:04









Xalorous

22118




22118











  • You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46
















  • You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
    – G-Man
    Aug 10 at 17:42










  • I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
    – Xalorous
    Aug 10 at 17:46















You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
– G-Man
Aug 10 at 17:42




You don’t really need the ""; i.e., you could do /bin/rm -f "$xls%.xls*.pdf".  But this looks like it should work.
– G-Man
Aug 10 at 17:42












I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46




I tried it both ways, the match failed without the "" in the middle. Or maybe I made a different mistake at the same time. I'm new to bash.
– Xalorous
Aug 10 at 17:46

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461844%2fdelete-pdf-files-only-if-xlsx-files-in-directory-have-same-filename%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Peggy Mitchell

Palaiologos

The Forum (Inglewood, California)