how to tell fdupes which files to keep?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite
1












I have had problems with my backup and now I have many folders several times on my HD. I have one main 'folder tree', which I want to keep as is. How can I prevent fdupes from deleting files in directories I don't want it to delete?



Is there maybe another duplicate finding utility?







share|improve this question




















  • fdupes does a really bad work, for I watched it deleting files which definitely cannot contain the same content. I used fdupes -rdI /home/user/path/to/dir/*
    – Nepumuk
    Nov 29 '17 at 15:55











  • Do you mean that there are directories in your tree that you want just want fdupes to ignore?
    – bu5hman
    Nov 29 '17 at 17:06










  • When it comes for fdupes to delete one of two files, it keeps the oldest one. [soruce: unix.stackexchange.com/a/146200/154403]
    – Nepumuk
    Nov 30 '17 at 11:05











  • But it will keep the oldest one of two files with the same MD5 and bitwise identicality. We are not talking 'same name, different date, maybe some changes', we are talking 'exact copy of the earlier file'.
    – bu5hman
    Nov 30 '17 at 12:50










  • I know, should've been clear. Actually, when I currently run it, it seems fdupes finds files with one duplicate maximum. Now, it compares each md5sum-each file couple and deletes separately. This wasn't in earlier versions.
    – Nepumuk
    Dec 2 '17 at 9:18














up vote
1
down vote

favorite
1












I have had problems with my backup and now I have many folders several times on my HD. I have one main 'folder tree', which I want to keep as is. How can I prevent fdupes from deleting files in directories I don't want it to delete?



Is there maybe another duplicate finding utility?







share|improve this question




















  • fdupes does a really bad work, for I watched it deleting files which definitely cannot contain the same content. I used fdupes -rdI /home/user/path/to/dir/*
    – Nepumuk
    Nov 29 '17 at 15:55











  • Do you mean that there are directories in your tree that you want just want fdupes to ignore?
    – bu5hman
    Nov 29 '17 at 17:06










  • When it comes for fdupes to delete one of two files, it keeps the oldest one. [soruce: unix.stackexchange.com/a/146200/154403]
    – Nepumuk
    Nov 30 '17 at 11:05











  • But it will keep the oldest one of two files with the same MD5 and bitwise identicality. We are not talking 'same name, different date, maybe some changes', we are talking 'exact copy of the earlier file'.
    – bu5hman
    Nov 30 '17 at 12:50










  • I know, should've been clear. Actually, when I currently run it, it seems fdupes finds files with one duplicate maximum. Now, it compares each md5sum-each file couple and deletes separately. This wasn't in earlier versions.
    – Nepumuk
    Dec 2 '17 at 9:18












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I have had problems with my backup and now I have many folders several times on my HD. I have one main 'folder tree', which I want to keep as is. How can I prevent fdupes from deleting files in directories I don't want it to delete?



Is there maybe another duplicate finding utility?







share|improve this question












I have had problems with my backup and now I have many folders several times on my HD. I have one main 'folder tree', which I want to keep as is. How can I prevent fdupes from deleting files in directories I don't want it to delete?



Is there maybe another duplicate finding utility?









share|improve this question











share|improve this question




share|improve this question










asked Nov 29 '17 at 15:46









Nepumuk

5212




5212











  • fdupes does a really bad work, for I watched it deleting files which definitely cannot contain the same content. I used fdupes -rdI /home/user/path/to/dir/*
    – Nepumuk
    Nov 29 '17 at 15:55











  • Do you mean that there are directories in your tree that you want just want fdupes to ignore?
    – bu5hman
    Nov 29 '17 at 17:06










  • When it comes for fdupes to delete one of two files, it keeps the oldest one. [soruce: unix.stackexchange.com/a/146200/154403]
    – Nepumuk
    Nov 30 '17 at 11:05











  • But it will keep the oldest one of two files with the same MD5 and bitwise identicality. We are not talking 'same name, different date, maybe some changes', we are talking 'exact copy of the earlier file'.
    – bu5hman
    Nov 30 '17 at 12:50










  • I know, should've been clear. Actually, when I currently run it, it seems fdupes finds files with one duplicate maximum. Now, it compares each md5sum-each file couple and deletes separately. This wasn't in earlier versions.
    – Nepumuk
    Dec 2 '17 at 9:18
















  • fdupes does a really bad work, for I watched it deleting files which definitely cannot contain the same content. I used fdupes -rdI /home/user/path/to/dir/*
    – Nepumuk
    Nov 29 '17 at 15:55











  • Do you mean that there are directories in your tree that you want just want fdupes to ignore?
    – bu5hman
    Nov 29 '17 at 17:06










  • When it comes for fdupes to delete one of two files, it keeps the oldest one. [soruce: unix.stackexchange.com/a/146200/154403]
    – Nepumuk
    Nov 30 '17 at 11:05











  • But it will keep the oldest one of two files with the same MD5 and bitwise identicality. We are not talking 'same name, different date, maybe some changes', we are talking 'exact copy of the earlier file'.
    – bu5hman
    Nov 30 '17 at 12:50










  • I know, should've been clear. Actually, when I currently run it, it seems fdupes finds files with one duplicate maximum. Now, it compares each md5sum-each file couple and deletes separately. This wasn't in earlier versions.
    – Nepumuk
    Dec 2 '17 at 9:18















fdupes does a really bad work, for I watched it deleting files which definitely cannot contain the same content. I used fdupes -rdI /home/user/path/to/dir/*
– Nepumuk
Nov 29 '17 at 15:55





fdupes does a really bad work, for I watched it deleting files which definitely cannot contain the same content. I used fdupes -rdI /home/user/path/to/dir/*
– Nepumuk
Nov 29 '17 at 15:55













Do you mean that there are directories in your tree that you want just want fdupes to ignore?
– bu5hman
Nov 29 '17 at 17:06




Do you mean that there are directories in your tree that you want just want fdupes to ignore?
– bu5hman
Nov 29 '17 at 17:06












When it comes for fdupes to delete one of two files, it keeps the oldest one. [soruce: unix.stackexchange.com/a/146200/154403]
– Nepumuk
Nov 30 '17 at 11:05





When it comes for fdupes to delete one of two files, it keeps the oldest one. [soruce: unix.stackexchange.com/a/146200/154403]
– Nepumuk
Nov 30 '17 at 11:05













But it will keep the oldest one of two files with the same MD5 and bitwise identicality. We are not talking 'same name, different date, maybe some changes', we are talking 'exact copy of the earlier file'.
– bu5hman
Nov 30 '17 at 12:50




But it will keep the oldest one of two files with the same MD5 and bitwise identicality. We are not talking 'same name, different date, maybe some changes', we are talking 'exact copy of the earlier file'.
– bu5hman
Nov 30 '17 at 12:50












I know, should've been clear. Actually, when I currently run it, it seems fdupes finds files with one duplicate maximum. Now, it compares each md5sum-each file couple and deletes separately. This wasn't in earlier versions.
– Nepumuk
Dec 2 '17 at 9:18




I know, should've been clear. Actually, when I currently run it, it seems fdupes finds files with one duplicate maximum. Now, it compares each md5sum-each file couple and deletes separately. This wasn't in earlier versions.
– Nepumuk
Dec 2 '17 at 9:18










3 Answers
3






active

oldest

votes

















up vote
1
down vote













Why not slow the process down and take some care



Get the comparison of duplicates from fdupes and put it in a file.



fdupes -r /path/to/start > filesToDelete


Then, at your own speed, you can remove any directories or files you want to keep from the filesToDelete



To remove entire directories



sed -i '|/directoryToKeep/|d' filesToDelete 


Work through filesToDelete in your preferred text editor and put a marker (maybe an x?) at the beginning of the copy of each file you want to keep so you can keep track of what you have done, and then, when you are sure of the changes



cat aFile | xargs -d "n" rm


rm will throw an error at each line with an x (as well as at any comment or other lines generated by fdupes) but do nothing to those files. It will, however, delete all of the unmarked valid filename entries in filesToDelete.






share|improve this answer




















  • That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
    – Nepumuk
    Nov 30 '17 at 10:52










  • Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
    – Nepumuk
    Nov 30 '17 at 11:01










  • You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
    – bu5hman
    Nov 30 '17 at 11:14










  • You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
    – bu5hman
    Nov 30 '17 at 11:23










  • This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
    – Wild Penguin
    Feb 5 at 21:39

















up vote
0
down vote














Is there maybe another duplicate finding utility?




Use rmlint, it does exactly what you want using the --keep-all-tagged option:



$ rmlint --types=duplicates --keep-all-tagged /path/to/dupes /other/path/to/dupes // main/folder/tree





share|improve this answer



























    up vote
    0
    down vote













    Here's a short script to prioritize the first directiory given. It doesn't directly delete files, only prints a list for you to delete. It follows these rules:



    1. Don't print any files from dir1


    2. If any file is not in dir1, then don't print it.



      #!/usr/bin/env bash
      # priority_dup.sh dir1 dir2 [dir3 ...]`

      set1=() # List collects all matching files for a set.
      IFS=$'n'
      fdupes -r "$@" | while read i; do
      if [[ "$i" == "" ]]; then
      # Create new set, minus all files from dir1.
      set2="$(echo "$set1[*]" | grep -v "^$1/")"
      # If the sets are different, then we can print files for deletion.
      if [[ "$set1[*]" != "$set2" && "$set2" != "" ]]; then
      echo "$set2"
      echo ""
      set1=()
      fi
      else
      set1+=("$i")
      fi
      done


      Save it to a file, set the file as executable, and run try it.







    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f407765%2fhow-to-tell-fdupes-which-files-to-keep%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      Why not slow the process down and take some care



      Get the comparison of duplicates from fdupes and put it in a file.



      fdupes -r /path/to/start > filesToDelete


      Then, at your own speed, you can remove any directories or files you want to keep from the filesToDelete



      To remove entire directories



      sed -i '|/directoryToKeep/|d' filesToDelete 


      Work through filesToDelete in your preferred text editor and put a marker (maybe an x?) at the beginning of the copy of each file you want to keep so you can keep track of what you have done, and then, when you are sure of the changes



      cat aFile | xargs -d "n" rm


      rm will throw an error at each line with an x (as well as at any comment or other lines generated by fdupes) but do nothing to those files. It will, however, delete all of the unmarked valid filename entries in filesToDelete.






      share|improve this answer




















      • That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
        – Nepumuk
        Nov 30 '17 at 10:52










      • Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
        – Nepumuk
        Nov 30 '17 at 11:01










      • You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
        – bu5hman
        Nov 30 '17 at 11:14










      • You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
        – bu5hman
        Nov 30 '17 at 11:23










      • This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
        – Wild Penguin
        Feb 5 at 21:39














      up vote
      1
      down vote













      Why not slow the process down and take some care



      Get the comparison of duplicates from fdupes and put it in a file.



      fdupes -r /path/to/start > filesToDelete


      Then, at your own speed, you can remove any directories or files you want to keep from the filesToDelete



      To remove entire directories



      sed -i '|/directoryToKeep/|d' filesToDelete 


      Work through filesToDelete in your preferred text editor and put a marker (maybe an x?) at the beginning of the copy of each file you want to keep so you can keep track of what you have done, and then, when you are sure of the changes



      cat aFile | xargs -d "n" rm


      rm will throw an error at each line with an x (as well as at any comment or other lines generated by fdupes) but do nothing to those files. It will, however, delete all of the unmarked valid filename entries in filesToDelete.






      share|improve this answer




















      • That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
        – Nepumuk
        Nov 30 '17 at 10:52










      • Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
        – Nepumuk
        Nov 30 '17 at 11:01










      • You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
        – bu5hman
        Nov 30 '17 at 11:14










      • You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
        – bu5hman
        Nov 30 '17 at 11:23










      • This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
        – Wild Penguin
        Feb 5 at 21:39












      up vote
      1
      down vote










      up vote
      1
      down vote









      Why not slow the process down and take some care



      Get the comparison of duplicates from fdupes and put it in a file.



      fdupes -r /path/to/start > filesToDelete


      Then, at your own speed, you can remove any directories or files you want to keep from the filesToDelete



      To remove entire directories



      sed -i '|/directoryToKeep/|d' filesToDelete 


      Work through filesToDelete in your preferred text editor and put a marker (maybe an x?) at the beginning of the copy of each file you want to keep so you can keep track of what you have done, and then, when you are sure of the changes



      cat aFile | xargs -d "n" rm


      rm will throw an error at each line with an x (as well as at any comment or other lines generated by fdupes) but do nothing to those files. It will, however, delete all of the unmarked valid filename entries in filesToDelete.






      share|improve this answer












      Why not slow the process down and take some care



      Get the comparison of duplicates from fdupes and put it in a file.



      fdupes -r /path/to/start > filesToDelete


      Then, at your own speed, you can remove any directories or files you want to keep from the filesToDelete



      To remove entire directories



      sed -i '|/directoryToKeep/|d' filesToDelete 


      Work through filesToDelete in your preferred text editor and put a marker (maybe an x?) at the beginning of the copy of each file you want to keep so you can keep track of what you have done, and then, when you are sure of the changes



      cat aFile | xargs -d "n" rm


      rm will throw an error at each line with an x (as well as at any comment or other lines generated by fdupes) but do nothing to those files. It will, however, delete all of the unmarked valid filename entries in filesToDelete.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 29 '17 at 18:14









      bu5hman

      1,164214




      1,164214











      • That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
        – Nepumuk
        Nov 30 '17 at 10:52










      • Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
        – Nepumuk
        Nov 30 '17 at 11:01










      • You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
        – bu5hman
        Nov 30 '17 at 11:14










      • You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
        – bu5hman
        Nov 30 '17 at 11:23










      • This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
        – Wild Penguin
        Feb 5 at 21:39
















      • That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
        – Nepumuk
        Nov 30 '17 at 10:52










      • Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
        – Nepumuk
        Nov 30 '17 at 11:01










      • You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
        – bu5hman
        Nov 30 '17 at 11:14










      • You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
        – bu5hman
        Nov 30 '17 at 11:23










      • This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
        – Wild Penguin
        Feb 5 at 21:39















      That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
      – Nepumuk
      Nov 30 '17 at 10:52




      That's the manual way, which is reasonable for few duplicates. I have some thounsands of files... Can I just change permissions for the folder I want to keep (e.g. chown -R root /home/user/path/to/keep && chmod -R -037 /home/user/path/to/keep)?
      – Nepumuk
      Nov 30 '17 at 10:52












      Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
      – Nepumuk
      Nov 30 '17 at 11:01




      Could I just tell fdupes to move the files (with their folder hierarchy) to a specified folder? I could delete then the files manually, but all specific at once.
      – Nepumuk
      Nov 30 '17 at 11:01












      You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
      – bu5hman
      Nov 30 '17 at 11:14




      You can do whatever you like. At the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I only really made the bost because of
      – bu5hman
      Nov 30 '17 at 11:14












      You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
      – bu5hman
      Nov 30 '17 at 11:23




      You can do whatever you like, but at the end of the day someone (that's you) has to describe in terms the system understands exactly what criteria to use in deciding which files to delete. If you cannot formulate these in your own mind then there is a danger of mistaken deletion. I can't see a better way of finding duplicates than comparing MD5 and then bitwise comparison. In your shoes I would get the output to file, do a random check of files that fdupes calls 'duplicate' and if I was happy then just let it do its job ... after moving any 'critical' files out of the search path
      – bu5hman
      Nov 30 '17 at 11:23












      This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
      – Wild Penguin
      Feb 5 at 21:39




      This is really just a matter of using the tools you have at hand, like sort and a text editor. I was in a similar situation, that's what brought me here =). After bu5hman's suggestion, do 'sort filesToDelete SortedFiles2Delete'. There you have a list of duplicates according to hierarchy! It should be easy to delete the lines from SortedFiles2Delete, and do a while read DELETEFILE ; do mv "$DELETEFILE" trashcan/. ; done < SortedFiles2Delete ... you get the idea. Trashcan might contain duplicates that were not at all in the part of hierarchy you want to preserve.
      – Wild Penguin
      Feb 5 at 21:39












      up vote
      0
      down vote














      Is there maybe another duplicate finding utility?




      Use rmlint, it does exactly what you want using the --keep-all-tagged option:



      $ rmlint --types=duplicates --keep-all-tagged /path/to/dupes /other/path/to/dupes // main/folder/tree





      share|improve this answer
























        up vote
        0
        down vote














        Is there maybe another duplicate finding utility?




        Use rmlint, it does exactly what you want using the --keep-all-tagged option:



        $ rmlint --types=duplicates --keep-all-tagged /path/to/dupes /other/path/to/dupes // main/folder/tree





        share|improve this answer






















          up vote
          0
          down vote










          up vote
          0
          down vote










          Is there maybe another duplicate finding utility?




          Use rmlint, it does exactly what you want using the --keep-all-tagged option:



          $ rmlint --types=duplicates --keep-all-tagged /path/to/dupes /other/path/to/dupes // main/folder/tree





          share|improve this answer













          Is there maybe another duplicate finding utility?




          Use rmlint, it does exactly what you want using the --keep-all-tagged option:



          $ rmlint --types=duplicates --keep-all-tagged /path/to/dupes /other/path/to/dupes // main/folder/tree






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 23 at 20:46









          thomas_d_j

          1,181146




          1,181146




















              up vote
              0
              down vote













              Here's a short script to prioritize the first directiory given. It doesn't directly delete files, only prints a list for you to delete. It follows these rules:



              1. Don't print any files from dir1


              2. If any file is not in dir1, then don't print it.



                #!/usr/bin/env bash
                # priority_dup.sh dir1 dir2 [dir3 ...]`

                set1=() # List collects all matching files for a set.
                IFS=$'n'
                fdupes -r "$@" | while read i; do
                if [[ "$i" == "" ]]; then
                # Create new set, minus all files from dir1.
                set2="$(echo "$set1[*]" | grep -v "^$1/")"
                # If the sets are different, then we can print files for deletion.
                if [[ "$set1[*]" != "$set2" && "$set2" != "" ]]; then
                echo "$set2"
                echo ""
                set1=()
                fi
                else
                set1+=("$i")
                fi
                done


                Save it to a file, set the file as executable, and run try it.







              share|improve this answer
























                up vote
                0
                down vote













                Here's a short script to prioritize the first directiory given. It doesn't directly delete files, only prints a list for you to delete. It follows these rules:



                1. Don't print any files from dir1


                2. If any file is not in dir1, then don't print it.



                  #!/usr/bin/env bash
                  # priority_dup.sh dir1 dir2 [dir3 ...]`

                  set1=() # List collects all matching files for a set.
                  IFS=$'n'
                  fdupes -r "$@" | while read i; do
                  if [[ "$i" == "" ]]; then
                  # Create new set, minus all files from dir1.
                  set2="$(echo "$set1[*]" | grep -v "^$1/")"
                  # If the sets are different, then we can print files for deletion.
                  if [[ "$set1[*]" != "$set2" && "$set2" != "" ]]; then
                  echo "$set2"
                  echo ""
                  set1=()
                  fi
                  else
                  set1+=("$i")
                  fi
                  done


                  Save it to a file, set the file as executable, and run try it.







                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Here's a short script to prioritize the first directiory given. It doesn't directly delete files, only prints a list for you to delete. It follows these rules:



                  1. Don't print any files from dir1


                  2. If any file is not in dir1, then don't print it.



                    #!/usr/bin/env bash
                    # priority_dup.sh dir1 dir2 [dir3 ...]`

                    set1=() # List collects all matching files for a set.
                    IFS=$'n'
                    fdupes -r "$@" | while read i; do
                    if [[ "$i" == "" ]]; then
                    # Create new set, minus all files from dir1.
                    set2="$(echo "$set1[*]" | grep -v "^$1/")"
                    # If the sets are different, then we can print files for deletion.
                    if [[ "$set1[*]" != "$set2" && "$set2" != "" ]]; then
                    echo "$set2"
                    echo ""
                    set1=()
                    fi
                    else
                    set1+=("$i")
                    fi
                    done


                    Save it to a file, set the file as executable, and run try it.







                  share|improve this answer












                  Here's a short script to prioritize the first directiory given. It doesn't directly delete files, only prints a list for you to delete. It follows these rules:



                  1. Don't print any files from dir1


                  2. If any file is not in dir1, then don't print it.



                    #!/usr/bin/env bash
                    # priority_dup.sh dir1 dir2 [dir3 ...]`

                    set1=() # List collects all matching files for a set.
                    IFS=$'n'
                    fdupes -r "$@" | while read i; do
                    if [[ "$i" == "" ]]; then
                    # Create new set, minus all files from dir1.
                    set2="$(echo "$set1[*]" | grep -v "^$1/")"
                    # If the sets are different, then we can print files for deletion.
                    if [[ "$set1[*]" != "$set2" && "$set2" != "" ]]; then
                    echo "$set2"
                    echo ""
                    set1=()
                    fi
                    else
                    set1+=("$i")
                    fi
                    done


                    Save it to a file, set the file as executable, and run try it.








                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 27 at 15:41









                  Rucent88

                  1,3703922




                  1,3703922



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f407765%2fhow-to-tell-fdupes-which-files-to-keep%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      Peggy Mitchell

                      Palaiologos

                      The Forum (Inglewood, California)