Rsync for migrating very large nfs share

up vote
2
down vote

favorite

I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.

My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.

Or if someone has a better idea than rsync i would love to hear it.

asked Mar 15 at 1:01

200mg

575

what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
â€“Â cas
Mar 15 at 4:49

1

rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â€“Â roaima
Mar 16 at 17:28

add a commentÂ |Â

up vote
2
down vote

favorite

Or if someone has a better idea than rsync i would love to hear it.

asked Mar 15 at 1:01

200mg

575

what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
â€“Â cas
Mar 15 at 4:49

1

rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â€“Â roaima
Mar 16 at 17:28

add a commentÂ |Â

up vote
2
down vote

favorite

Or if someone has a better idea than rsync i would love to hear it.

asked Mar 15 at 1:01

200mg

575

Or if someone has a better idea than rsync i would love to hear it.

asked Mar 15 at 1:01

200mg

575

asked Mar 15 at 1:01

200mg

575

asked Mar 15 at 1:01

200mg

575

asked Mar 15 at 1:01

200mg

575

what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
â€“Â cas
Mar 15 at 4:49

1

rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â€“Â roaima
Mar 16 at 17:28

add a commentÂ |Â

what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
â€“Â cas
Mar 15 at 4:49

1

rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â€“Â roaima
Mar 16 at 17:28

what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
â€“Â cas
Mar 15 at 4:49

rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â€“Â roaima
Mar 16 at 17:28

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

To generate the 25 files you're looking for...

$ find /lots/of/files | split -d -l 100000

This would generate files with 100,000 lines each. There are quite a few more things you can do with split so check out the manpage. With -d they will be named numerically instead of alphabetically-
as in x01, x02, ... x25

From here you can loop through the files and run rsync.

for file in x*
do
 # Run rsync command using $file as the change list
done

HTH

answered Mar 16 at 16:45

Max Friederichs

1462

This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â€“Â Art Hill
Mar 21 at 16:11

1

Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â€“Â Max Friederichs
Mar 21 at 16:56

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f430285%2frsync-for-migrating-very-large-nfs-share%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

To generate the 25 files you're looking for...

$ find /lots/of/files | split -d -l 100000

From here you can loop through the files and run rsync.

for file in x*
do
 # Run rsync command using $file as the change list
done

HTH

answered Mar 16 at 16:45

Max Friederichs

1462

This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â€“Â Art Hill
Mar 21 at 16:11

1

Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â€“Â Max Friederichs
Mar 21 at 16:56

add a commentÂ |Â

up vote
3
down vote

accepted

To generate the 25 files you're looking for...

$ find /lots/of/files | split -d -l 100000

From here you can loop through the files and run rsync.

for file in x*
do
 # Run rsync command using $file as the change list
done

HTH

answered Mar 16 at 16:45

Max Friederichs

1462

This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â€“Â Art Hill
Mar 21 at 16:11

1

Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â€“Â Max Friederichs
Mar 21 at 16:56

add a commentÂ |Â

up vote
3
down vote

accepted

To generate the 25 files you're looking for...

$ find /lots/of/files | split -d -l 100000

From here you can loop through the files and run rsync.

for file in x*
do
 # Run rsync command using $file as the change list
done

HTH

answered Mar 16 at 16:45

Max Friederichs

1462

To generate the 25 files you're looking for...

$ find /lots/of/files | split -d -l 100000

From here you can loop through the files and run rsync.

for file in x*
do
 # Run rsync command using $file as the change list
done

HTH

answered Mar 16 at 16:45

Max Friederichs

1462

answered Mar 16 at 16:45

Max Friederichs

1462

answered Mar 16 at 16:45

Max Friederichs

1462

answered Mar 16 at 16:45

Max Friederichs

1462

This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â€“Â Art Hill
Mar 21 at 16:11

1

Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â€“Â Max Friederichs
Mar 21 at 16:56

add a commentÂ |Â

This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â€“Â Art Hill
Mar 21 at 16:11

1

Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â€“Â Max Friederichs
Mar 21 at 16:56

This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â€“Â Art Hill
Mar 21 at 16:11

Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â€“Â Max Friederichs
Mar 21 at 16:56

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu