rsync and NTFS external drive

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite
1












I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:



$ wc -l /data/my_file 
1288057 /data/my_file

$ wc -l /backup-1/my_file
1287905 /backup-1/my_file

$ wc -l /backup-2/my_file
1288253 /backup-2/my_file


Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.



The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head and tail gave no indication of such wrong line breaks.



These backups were made using the following rsync command:



rsync --recursive 
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats

SOURCE DESTINATION


I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.



Is the --partial option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync manual carefully, but I am not sure I understand exactly what --partial does.



Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?







share|improve this question




















  • I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
    – Pierre-Alain TORET
    Mar 6 at 15:17










  • Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
    – Guillaume
    Mar 6 at 17:55










  • Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
    – user2661923
    Mar 17 at 9:04










  • This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
    – user2661923
    Mar 17 at 9:04










  • I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
    – Guillaume
    Mar 19 at 18:19














up vote
0
down vote

favorite
1












I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:



$ wc -l /data/my_file 
1288057 /data/my_file

$ wc -l /backup-1/my_file
1287905 /backup-1/my_file

$ wc -l /backup-2/my_file
1288253 /backup-2/my_file


Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.



The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head and tail gave no indication of such wrong line breaks.



These backups were made using the following rsync command:



rsync --recursive 
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats

SOURCE DESTINATION


I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.



Is the --partial option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync manual carefully, but I am not sure I understand exactly what --partial does.



Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?







share|improve this question




















  • I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
    – Pierre-Alain TORET
    Mar 6 at 15:17










  • Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
    – Guillaume
    Mar 6 at 17:55










  • Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
    – user2661923
    Mar 17 at 9:04










  • This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
    – user2661923
    Mar 17 at 9:04










  • I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
    – Guillaume
    Mar 19 at 18:19












up vote
0
down vote

favorite
1









up vote
0
down vote

favorite
1






1





I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:



$ wc -l /data/my_file 
1288057 /data/my_file

$ wc -l /backup-1/my_file
1287905 /backup-1/my_file

$ wc -l /backup-2/my_file
1288253 /backup-2/my_file


Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.



The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head and tail gave no indication of such wrong line breaks.



These backups were made using the following rsync command:



rsync --recursive 
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats

SOURCE DESTINATION


I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.



Is the --partial option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync manual carefully, but I am not sure I understand exactly what --partial does.



Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?







share|improve this question












I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:



$ wc -l /data/my_file 
1288057 /data/my_file

$ wc -l /backup-1/my_file
1287905 /backup-1/my_file

$ wc -l /backup-2/my_file
1288253 /backup-2/my_file


Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.



The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head and tail gave no indication of such wrong line breaks.



These backups were made using the following rsync command:



rsync --recursive 
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats

SOURCE DESTINATION


I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.



Is the --partial option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync manual carefully, but I am not sure I understand exactly what --partial does.



Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?









share|improve this question











share|improve this question




share|improve this question










asked Mar 5 at 19:46









Guillaume

1013




1013











  • I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
    – Pierre-Alain TORET
    Mar 6 at 15:17










  • Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
    – Guillaume
    Mar 6 at 17:55










  • Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
    – user2661923
    Mar 17 at 9:04










  • This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
    – user2661923
    Mar 17 at 9:04










  • I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
    – Guillaume
    Mar 19 at 18:19
















  • I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
    – Pierre-Alain TORET
    Mar 6 at 15:17










  • Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
    – Guillaume
    Mar 6 at 17:55










  • Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
    – user2661923
    Mar 17 at 9:04










  • This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
    – user2661923
    Mar 17 at 9:04










  • I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
    – Guillaume
    Mar 19 at 18:19















I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
– Pierre-Alain TORET
Mar 6 at 15:17




I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
– Pierre-Alain TORET
Mar 6 at 15:17












Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
– Guillaume
Mar 6 at 17:55




Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
– Guillaume
Mar 6 at 17:55












Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
– user2661923
Mar 17 at 9:04




Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
– user2661923
Mar 17 at 9:04












This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
– user2661923
Mar 17 at 9:04




This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
– user2661923
Mar 17 at 9:04












I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
– Guillaume
Mar 19 at 18:19




I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
– Guillaume
Mar 19 at 18:19















active

oldest

votes











Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428353%2frsync-and-ntfs-external-drive%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428353%2frsync-and-ntfs-external-drive%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Nur Jahan