Rsync for migrating very large nfs share

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.



My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.



Or if someone has a better idea than rsync i would love to hear it.







share|improve this question




















  • what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
    – cas
    Mar 15 at 4:49






  • 1




    rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
    – roaima
    Mar 16 at 17:28















up vote
2
down vote

favorite












I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.



My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.



Or if someone has a better idea than rsync i would love to hear it.







share|improve this question




















  • what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
    – cas
    Mar 15 at 4:49






  • 1




    rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
    – roaima
    Mar 16 at 17:28













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.



My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.



Or if someone has a better idea than rsync i would love to hear it.







share|improve this question












I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.



My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.



Or if someone has a better idea than rsync i would love to hear it.









share|improve this question











share|improve this question




share|improve this question










asked Mar 15 at 1:01









200mg

575




575











  • what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
    – cas
    Mar 15 at 4:49






  • 1




    rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
    – roaima
    Mar 16 at 17:28

















  • what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
    – cas
    Mar 15 at 4:49






  • 1




    rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
    – roaima
    Mar 16 at 17:28
















what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
– cas
Mar 15 at 4:49




what command line options are you using with rsync? some can greatly impact performance. e.g. Some options require rsync to know the full file list before transferring anything (see man rsync, search for --recursive). Use --delete-during (should be the default with --delete for recent versions of rsync) rather than --delete-before or --delete-after. Similarly, don't use --delay-updates or --prune-empty-dirs. You may also want to use -W aka --whole-file to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z for compression.
– cas
Mar 15 at 4:49




1




1




rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
– roaima
Mar 16 at 17:28





rsync local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
– roaima
Mar 16 at 17:28











1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










To generate the 25 files you're looking for...



$ find /lots/of/files | split -d -l 100000


This would generate files with 100,000 lines each. There are quite a few more things you can do with split so check out the manpage. With -d they will be named numerically instead of alphabetically-
as in x01, x02, ... x25



From here you can loop through the files and run rsync.



for file in x*
do
# Run rsync command using $file as the change list
done


HTH






share|improve this answer




















  • This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
    – Art Hill
    Mar 21 at 16:11






  • 1




    Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
    – Max Friederichs
    Mar 21 at 16:56










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f430285%2frsync-for-migrating-very-large-nfs-share%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










To generate the 25 files you're looking for...



$ find /lots/of/files | split -d -l 100000


This would generate files with 100,000 lines each. There are quite a few more things you can do with split so check out the manpage. With -d they will be named numerically instead of alphabetically-
as in x01, x02, ... x25



From here you can loop through the files and run rsync.



for file in x*
do
# Run rsync command using $file as the change list
done


HTH






share|improve this answer




















  • This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
    – Art Hill
    Mar 21 at 16:11






  • 1




    Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
    – Max Friederichs
    Mar 21 at 16:56














up vote
3
down vote



accepted










To generate the 25 files you're looking for...



$ find /lots/of/files | split -d -l 100000


This would generate files with 100,000 lines each. There are quite a few more things you can do with split so check out the manpage. With -d they will be named numerically instead of alphabetically-
as in x01, x02, ... x25



From here you can loop through the files and run rsync.



for file in x*
do
# Run rsync command using $file as the change list
done


HTH






share|improve this answer




















  • This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
    – Art Hill
    Mar 21 at 16:11






  • 1




    Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
    – Max Friederichs
    Mar 21 at 16:56












up vote
3
down vote



accepted







up vote
3
down vote



accepted






To generate the 25 files you're looking for...



$ find /lots/of/files | split -d -l 100000


This would generate files with 100,000 lines each. There are quite a few more things you can do with split so check out the manpage. With -d they will be named numerically instead of alphabetically-
as in x01, x02, ... x25



From here you can loop through the files and run rsync.



for file in x*
do
# Run rsync command using $file as the change list
done


HTH






share|improve this answer












To generate the 25 files you're looking for...



$ find /lots/of/files | split -d -l 100000


This would generate files with 100,000 lines each. There are quite a few more things you can do with split so check out the manpage. With -d they will be named numerically instead of alphabetically-
as in x01, x02, ... x25



From here you can loop through the files and run rsync.



for file in x*
do
# Run rsync command using $file as the change list
done


HTH







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 16 at 16:45









Max Friederichs

1462




1462











  • This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
    – Art Hill
    Mar 21 at 16:11






  • 1




    Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
    – Max Friederichs
    Mar 21 at 16:56
















  • This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
    – Art Hill
    Mar 21 at 16:11






  • 1




    Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
    – Max Friederichs
    Mar 21 at 16:56















This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
– Art Hill
Mar 21 at 16:11




This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
– Art Hill
Mar 21 at 16:11




1




1




Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
– Max Friederichs
Mar 21 at 16:56




Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
– Max Friederichs
Mar 21 at 16:56












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f430285%2frsync-for-migrating-very-large-nfs-share%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay