rsync and NTFS external drive
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:
$ wc -l /data/my_file
1288057 /data/my_file
$ wc -l /backup-1/my_file
1287905 /backup-1/my_file
$ wc -l /backup-2/my_file
1288253 /backup-2/my_file
Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.
The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l
and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head
and tail
gave no indication of such wrong line breaks.
These backups were made using the following rsync
command:
rsync --recursive
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats
SOURCE DESTINATION
I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.
Is the --partial
option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync
manual carefully, but I am not sure I understand exactly what --partial
does.
Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?
rsync backup restore
add a comment |Â
up vote
0
down vote
favorite
I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:
$ wc -l /data/my_file
1288057 /data/my_file
$ wc -l /backup-1/my_file
1287905 /backup-1/my_file
$ wc -l /backup-2/my_file
1288253 /backup-2/my_file
Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.
The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l
and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head
and tail
gave no indication of such wrong line breaks.
These backups were made using the following rsync
command:
rsync --recursive
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats
SOURCE DESTINATION
I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.
Is the --partial
option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync
manual carefully, but I am not sure I understand exactly what --partial
does.
Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?
rsync backup restore
I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
â Pierre-Alain TORET
Mar 6 at 15:17
Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
â Guillaume
Mar 6 at 17:55
Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
â user2661923
Mar 17 at 9:04
This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
â user2661923
Mar 17 at 9:04
I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
â Guillaume
Mar 19 at 18:19
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:
$ wc -l /data/my_file
1288057 /data/my_file
$ wc -l /backup-1/my_file
1287905 /backup-1/my_file
$ wc -l /backup-2/my_file
1288253 /backup-2/my_file
Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.
The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l
and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head
and tail
gave no indication of such wrong line breaks.
These backups were made using the following rsync
command:
rsync --recursive
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats
SOURCE DESTINATION
I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.
Is the --partial
option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync
manual carefully, but I am not sure I understand exactly what --partial
does.
Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?
rsync backup restore
I noticed recently that backups I have on two different NTFS-formatted external drives, that were supposed to be identical, are actually not. And none of them is identical to the data that's on the Linux workstation (ext4 formatted).
I can see that when I simply count lines in the same file, on the three copies I have:
$ wc -l /data/my_file
1288057 /data/my_file
$ wc -l /backup-1/my_file
1287905 /backup-1/my_file
$ wc -l /backup-2/my_file
1288253 /backup-2/my_file
Luckily, the program I am using to work on these files will be unable to use any copy but the valid one, so I have an easy way to detect the correct one. The downside it that my current processing directory is about 2TB big and takes several hours to copy in and out of the workstation... So, just restoring backups and testing whether I can use them is not a convenient option (I already spent some time doing that this weekend). I cannot leave a copy of the data forever on the workstation, because it is shared between several users, and we simply don't have enough storage to keep everything (each user backs up, frees storage for the next one, and restores their latest backup when they resume working), which is why I absolutely need a reliable backup solution.
The files otherwise have the same size (596 MB), so I am thinking that maybe the data is not altered, but newline characters were introduced somehow (explaining the different output from wc -l
and why the analysis program is unable to use them). But this is difficult to verify, given the size of these files. A quick look at their head
and tail
gave no indication of such wrong line breaks.
These backups were made using the following rsync
command:
rsync --recursive
--links
--perms
--executability
--acls
--xattrs
--owner
--group
--devices
--specials
--times
--partial
--delete
--update
--one-file-system
--human-readable
--progress
--stats
SOURCE DESTINATION
I have only read files from the backups (never tried to write), from a Mac that can only read NTFS but not write it anyway (I don't really trust things like FUSE for macOS for data from several weeks of computing-intensive work), so I am pretty sure I didn't corrupt the backup just by accessing it.
Is the --partial
option causing these differences? (possibly by appending content to an already existing file, instead of re-transferring all of it). I read the rsync
manual carefully, but I am not sure I understand exactly what --partial
does.
Does this problem has anything to do with the NTFS filesystem of the external drives? If so, will using an ext4 backup drive solve this issue?
rsync backup restore
asked Mar 5 at 19:46
Guillaume
1013
1013
I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
â Pierre-Alain TORET
Mar 6 at 15:17
Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
â Guillaume
Mar 6 at 17:55
Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
â user2661923
Mar 17 at 9:04
This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
â user2661923
Mar 17 at 9:04
I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
â Guillaume
Mar 19 at 18:19
add a comment |Â
I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
â Pierre-Alain TORET
Mar 6 at 15:17
Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
â Guillaume
Mar 6 at 17:55
Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
â user2661923
Mar 17 at 9:04
This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
â user2661923
Mar 17 at 9:04
I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
â Guillaume
Mar 19 at 18:19
I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
â Pierre-Alain TORET
Mar 6 at 15:17
I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
â Pierre-Alain TORET
Mar 6 at 15:17
Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
â Guillaume
Mar 6 at 17:55
Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
â Guillaume
Mar 6 at 17:55
Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
â user2661923
Mar 17 at 9:04
Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
â user2661923
Mar 17 at 9:04
This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
â user2661923
Mar 17 at 9:04
This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
â user2661923
Mar 17 at 9:04
I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
â Guillaume
Mar 19 at 18:19
I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
â Guillaume
Mar 19 at 18:19
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428353%2frsync-and-ntfs-external-drive%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I'd say for sure you're taking risks by backuping an ext4 partition to NTFS. Why did you decide to do that in the first place ? (so it can be read by a Mac ?)
â Pierre-Alain TORET
Mar 6 at 15:17
Because I know very little about filesystems, and did not even think about asking someone else. Being able to read the backup on a Mac was a convenient side effect, but not really a goal. That's only after I encountered this problem and read a little bit about NTFS-3G and how not perfect it is, that I started thinking more about this question.
â Guillaume
Mar 6 at 17:55
Advice given in 2 comments, re comment word count limit. 3 things that I would advise: (1) i noticed that if source/dest was ext4/ntfs and i omitted the --times option, then modification times were not preserved, so the next rsync would re-send the same file from dest (weird). That's not relevant for you but suggests that dest=ntfs is not coincidence. (2) i think it is well worth your while to create a series of small experiments and actually deduce EXACTLY WHAT IS GOING WRONG. TO BE CONT'D.
â user2661923
Mar 17 at 9:04
This means set up a dest test dir, copy some files over, and narrow down which included or omitted rsync option is preventing the dest from being exactly the same as the source, file by file. (3) i ALWAYS use the -n (DRY RUN) option just before executing rsync "for real", either examining the output in the bash terminal or piping the output to temp01.txt. I ALWAYS look for anomalies in the output before executing it for real (e.g. if you mis-order your source/dest, and use the --delete option, you can WIPE OUT ALL RECENT SOURCE FILES.
â user2661923
Mar 17 at 9:04
I ended up formatting the external drive to ext4, but thank you for your advice. :-) I am also very careful with the --delete option.
â Guillaume
Mar 19 at 18:19