Rsync for migrating very large nfs share
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.
My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.
Or if someone has a better idea than rsync i would love to hear it.
linux rsync nfs
add a comment |Â
up vote
2
down vote
favorite
I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.
My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.
Or if someone has a better idea than rsync i would love to hear it.
linux rsync nfs
what command line options are you using withrsync
? some can greatly impact performance. e.g. Some options requirersync
to know the full file list before transferring anything (seeman rsync
, search for--recursive
). Use--delete-during
(should be the default with--delete
for recent versions of rsync) rather than--delete-before
or--delete-after
. Similarly, don't use--delay-updates
or--prune-empty-dirs
. You may also want to use-W
aka--whole-file
to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use-z
for compression.
â cas
Mar 15 at 4:49
1
rsync
local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â roaima
Mar 16 at 17:28
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.
My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.
Or if someone has a better idea than rsync i would love to hear it.
linux rsync nfs
I wanted to get input on how to break up an rsync task into multiple parts to make the change log accrual go faster. The situation is we are migrating off of one storage platform to a new storage platform. We have one large flat directory with 2.5mm files inside it. the rsync change log currently takes days to complete. I would like to get several txt files broken up into maybe 100k files per txt file and then run several rsync tasks against these text files, possibly from different servers.
My shell script game is pretty weak, does anyone know how to accomplish doing an 'ls' for 100k files and pipe that to a txt file, then pick back up for the next set of 100k files, and so on until all files in this directory are represented in one of the 25 txt files.
Or if someone has a better idea than rsync i would love to hear it.
linux rsync nfs
asked Mar 15 at 1:01
200mg
575
575
what command line options are you using withrsync
? some can greatly impact performance. e.g. Some options requirersync
to know the full file list before transferring anything (seeman rsync
, search for--recursive
). Use--delete-during
(should be the default with--delete
for recent versions of rsync) rather than--delete-before
or--delete-after
. Similarly, don't use--delay-updates
or--prune-empty-dirs
. You may also want to use-W
aka--whole-file
to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use-z
for compression.
â cas
Mar 15 at 4:49
1
rsync
local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â roaima
Mar 16 at 17:28
add a comment |Â
what command line options are you using withrsync
? some can greatly impact performance. e.g. Some options requirersync
to know the full file list before transferring anything (seeman rsync
, search for--recursive
). Use--delete-during
(should be the default with--delete
for recent versions of rsync) rather than--delete-before
or--delete-after
. Similarly, don't use--delay-updates
or--prune-empty-dirs
. You may also want to use-W
aka--whole-file
to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use-z
for compression.
â cas
Mar 15 at 4:49
1
rsync
local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.
â roaima
Mar 16 at 17:28
what command line options are you using with
rsync
? some can greatly impact performance. e.g. Some options require rsync
to know the full file list before transferring anything (see man rsync
, search for --recursive
). Use --delete-during
(should be the default with --delete
for recent versions of rsync) rather than --delete-before
or --delete-after
. Similarly, don't use --delay-updates
or --prune-empty-dirs
. You may also want to use -W
aka --whole-file
to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z
for compression.â cas
Mar 15 at 4:49
what command line options are you using with
rsync
? some can greatly impact performance. e.g. Some options require rsync
to know the full file list before transferring anything (see man rsync
, search for --recursive
). Use --delete-during
(should be the default with --delete
for recent versions of rsync) rather than --delete-before
or --delete-after
. Similarly, don't use --delay-updates
or --prune-empty-dirs
. You may also want to use -W
aka --whole-file
to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use -z
for compression.â cas
Mar 15 at 4:49
1
1
rsync
local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.â roaima
Mar 16 at 17:28
rsync
local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.â roaima
Mar 16 at 17:28
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
To generate the 25 files you're looking for...
$ find /lots/of/files | split -d -l 100000
This would generate files with 100,000 lines each. There are quite a few more things you can do with split
so check out the manpage. With -d
they will be named numerically instead of alphabetically-
as in x01
, x02
, ... x25
From here you can loop through the files and run rsync.
for file in x*
do
# Run rsync command using $file as the change list
done
HTH
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
1
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
To generate the 25 files you're looking for...
$ find /lots/of/files | split -d -l 100000
This would generate files with 100,000 lines each. There are quite a few more things you can do with split
so check out the manpage. With -d
they will be named numerically instead of alphabetically-
as in x01
, x02
, ... x25
From here you can loop through the files and run rsync.
for file in x*
do
# Run rsync command using $file as the change list
done
HTH
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
1
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
add a comment |Â
up vote
3
down vote
accepted
To generate the 25 files you're looking for...
$ find /lots/of/files | split -d -l 100000
This would generate files with 100,000 lines each. There are quite a few more things you can do with split
so check out the manpage. With -d
they will be named numerically instead of alphabetically-
as in x01
, x02
, ... x25
From here you can loop through the files and run rsync.
for file in x*
do
# Run rsync command using $file as the change list
done
HTH
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
1
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
To generate the 25 files you're looking for...
$ find /lots/of/files | split -d -l 100000
This would generate files with 100,000 lines each. There are quite a few more things you can do with split
so check out the manpage. With -d
they will be named numerically instead of alphabetically-
as in x01
, x02
, ... x25
From here you can loop through the files and run rsync.
for file in x*
do
# Run rsync command using $file as the change list
done
HTH
To generate the 25 files you're looking for...
$ find /lots/of/files | split -d -l 100000
This would generate files with 100,000 lines each. There are quite a few more things you can do with split
so check out the manpage. With -d
they will be named numerically instead of alphabetically-
as in x01
, x02
, ... x25
From here you can loop through the files and run rsync.
for file in x*
do
# Run rsync command using $file as the change list
done
HTH
answered Mar 16 at 16:45
Max Friederichs
1462
1462
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
1
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
add a comment |Â
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
1
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
This seems like an elegant solution, but what if we add one layer of complexity? About 25 more files are written to this location every day, accessed by a web app. New files are more likely to be accessed. To keep web app's the number of failed file lookups to a minimum, how would you structure this? The rsync as you described would be a great start, but getting the list of new files would be quite slow with that many files, no? There would be hours, maybe days when the files written just prior to switch over to the new storage are not accessible while the delta file list is created.
â Art Hill
Mar 21 at 16:11
1
1
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
Ironically, I also do not have enough reputation to comment on your answer... Keeping in mind this was not OP's question... One idea would be to sort the files by date, such that the most recent files are "rsync'd" to the remote host first.
â Max Friederichs
Mar 21 at 16:56
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f430285%2frsync-for-migrating-very-large-nfs-share%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
what command line options are you using with
rsync
? some can greatly impact performance. e.g. Some options requirersync
to know the full file list before transferring anything (seeman rsync
, search for--recursive
). Use--delete-during
(should be the default with--delete
for recent versions of rsync) rather than--delete-before
or--delete-after
. Similarly, don't use--delay-updates
or--prune-empty-dirs
. You may also want to use-W
aka--whole-file
to turn off the file diff algorithm (i.e. use timestamps only), and maybe don't use-z
for compression.â cas
Mar 15 at 4:49
1
rsync
local to local (and yes, and NFS mount is local) will not use the delta algorithm to speed up transfer. If you can access the filesystem of the NFS server directly rather than through the NFS mount you may find things run much faster. If this is a possibility, provide what details you can.â roaima
Mar 16 at 17:28