Checking identical files in Linux and deleting according to location

up vote
1
down vote

favorite

I use fdupes to find and delete identical files.

But I want to be able to say something like this ...

find all the files that are duplicate in directory A or its subdirectories

if there's a duplicated file in subdirs B and C then always delete the file in B

In other words, keep all the files in C that are not already in B. And note that the directory structures are not the same so rsync isn't useful here.

I don't think fdupes offers this functionality. I have to manually choose which to delete / keep for each pair.

So I was thinking of writing a quick Python script to do the same thing. But is there a quick system command I can call from Python which can give me some kind of unique id for each file that's a reliable way of seeing if two files are identical. I'm thinking of something that doesn't involve me loading the files into python and hashing their contents.

asked Nov 29 at 12:16

interstar

3471721

add a comment |

up vote
1
down vote

favorite

I use fdupes to find and delete identical files.

But I want to be able to say something like this ...

find all the files that are duplicate in directory A or its subdirectories

if there's a duplicated file in subdirs B and C then always delete the file in B

In other words, keep all the files in C that are not already in B. And note that the directory structures are not the same so rsync isn't useful here.

I don't think fdupes offers this functionality. I have to manually choose which to delete / keep for each pair.

asked Nov 29 at 12:16

interstar

3471721

add a comment |

up vote
1
down vote

favorite

I use fdupes to find and delete identical files.

But I want to be able to say something like this ...

find all the files that are duplicate in directory A or its subdirectories

if there's a duplicated file in subdirs B and C then always delete the file in B

In other words, keep all the files in C that are not already in B. And note that the directory structures are not the same so rsync isn't useful here.

I don't think fdupes offers this functionality. I have to manually choose which to delete / keep for each pair.

asked Nov 29 at 12:16

interstar

3471721

I use fdupes to find and delete identical files.

But I want to be able to say something like this ...

find all the files that are duplicate in directory A or its subdirectories

if there's a duplicated file in subdirs B and C then always delete the file in B

In other words, keep all the files in C that are not already in B. And note that the directory structures are not the same so rsync isn't useful here.

I don't think fdupes offers this functionality. I have to manually choose which to delete / keep for each pair.

deduplication

asked Nov 29 at 12:16

interstar

3471721

asked Nov 29 at 12:16

interstar

3471721

asked Nov 29 at 12:16

interstar

3471721

asked Nov 29 at 12:16

interstar

3471721

asked Nov 29 at 12:16

interstar

3471721

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

No, a hash is the only fast way to know if multipule files match, but you can speed it up by only comparing files of the same size, also select a fast hash like md5 if no one is trying for collisions... this is done for you with git/zfs/etc

Or just

fdupes -r A B | grep B | xargs -I rm ""

edited Nov 29 at 14:45

answered Nov 29 at 12:35

user1133275

2,723415

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484895%2fchecking-identical-files-in-linux-and-deleting-according-to-location%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

Or just

fdupes -r A B | grep B | xargs -I rm ""

edited Nov 29 at 14:45

answered Nov 29 at 12:35

user1133275

2,723415

add a comment |

up vote
2
down vote

accepted

Or just

fdupes -r A B | grep B | xargs -I rm ""

edited Nov 29 at 14:45

answered Nov 29 at 12:35

user1133275

2,723415

add a comment |

up vote
2
down vote

accepted

Or just

fdupes -r A B | grep B | xargs -I rm ""

edited Nov 29 at 14:45

answered Nov 29 at 12:35

user1133275

2,723415

Or just

fdupes -r A B | grep B | xargs -I rm ""

edited Nov 29 at 14:45

answered Nov 29 at 12:35

user1133275

2,723415

edited Nov 29 at 14:45

answered Nov 29 at 12:35

user1133275

2,723415

answered Nov 29 at 12:35

user1133275

2,723415

answered Nov 29 at 12:35

user1133275

2,723415

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu