Backing up to a local SSD taking a lot longer than expected

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












On a fresh Ubuntu 18.04 system I decided to give Deja-Dup a go. It's just a GUI for Duplicity (which uses rsync). About 850GB of data needed to be backed up. Source SSD was NVMe. Destination SSD was SATA. Initial (full) backup took about 7 hours.



The next day I did nothing more than check mail and install an application — which added ~230MB — before running Deja-Dup again.



Deja-Dup ran for ~39 minutes loading a single core at 35–70% for the entire duration.



Duplicity was invoked three times:



  • The Scanning phase pegged a core at 100% for 18 minutes.

  • The Backing Up phase pegged a core at 100% for 16 minutes.

  • The Verifying phase pegged a core at 100% for 5 minutes.

This is on a new computer with plenty of RAM. The backup was not encrypted.



Now, I expect initial (full) backups to take a while, and that's fine. What I don't expect is for ~230MB of new data to take ~39 minutes to be backed up (and consume over a collective core-hour of CPU time).



Is something wrong or broken? Should incremental backups of a couple of hundred megabytes take that much time to perform? I was expecting something under 5 minutes, not ~39. Why is it taking so long?



(If I had a spare 1TB SATA SSD I'd hook it up and just rsync the data straight across to see if that is noticeably faster — but unfortunately I do not.)



Update 1: I ran a manual backup after negligible changes (a few KB of new mail) and the time taken was the same (~39 minutes). Thus the time taken seems to have little to do with the amount of new data that needs to be backed up.



Update 2: Monitoring with iotop revealed that the Scanning phase reads 7.36GB from the drive. Now that's obviously not the whole 850GB... but it's not far removed from the number you get if you multiply the number of files on the source drive (1174000) by the block/cluster size (4096) — i.e. 4.81GB. Not sure how to account for the remaining 2.5GB though, if this were the case.







share|improve this question





















  • Perhaps it made a 2nd full backup? I've never used the gui. Try looking iin the backup directory to see what new files it created. I think you should see files ` duplicity-inc.*` for the incremental backup.
    – meuh
    Apr 29 at 8:33










  • Yep, I've got the duplicity.inc files for today and they seem to be about the right size (~15% compression). The rest is dated yesterday. The destination SSD is only 1TB so it couldn't fit two full backups without overwriting (parts of) the first anyway. So it looks like the programs did the job correctly. I'm just concerned about how long it took them to do it.
    – Tim
    Apr 29 at 8:49







  • 1




    I'm not sure how duplicity works when doing a comparison. Whereas rsync typically is configured to just compare timestamps and sizes, I think duplicity recomputes a "signature" for each file, which involves reading the whole file and comparing it with the saved signature in the backup. If so, the scan/backup is having to read all 850Gby of your disk to collect the signatures.
    – meuh
    Apr 29 at 9:14










  • If that's the case, then won't the atime on every file be modified and, because they're on SSD, won't that then trigger the rewriting of all 850GB in the process? (Or at least the first inode of each file, of which there's ~1.1 million?)
    – Tim
    Apr 29 at 20:02











  • Linux has optimised atime, see man 8 mount for relatime etc. Also, some backup programs will explicitly call utimes() to restore the atime after reading the file. I don't know what duplicity does.
    – meuh
    Apr 29 at 20:38














up vote
1
down vote

favorite












On a fresh Ubuntu 18.04 system I decided to give Deja-Dup a go. It's just a GUI for Duplicity (which uses rsync). About 850GB of data needed to be backed up. Source SSD was NVMe. Destination SSD was SATA. Initial (full) backup took about 7 hours.



The next day I did nothing more than check mail and install an application — which added ~230MB — before running Deja-Dup again.



Deja-Dup ran for ~39 minutes loading a single core at 35–70% for the entire duration.



Duplicity was invoked three times:



  • The Scanning phase pegged a core at 100% for 18 minutes.

  • The Backing Up phase pegged a core at 100% for 16 minutes.

  • The Verifying phase pegged a core at 100% for 5 minutes.

This is on a new computer with plenty of RAM. The backup was not encrypted.



Now, I expect initial (full) backups to take a while, and that's fine. What I don't expect is for ~230MB of new data to take ~39 minutes to be backed up (and consume over a collective core-hour of CPU time).



Is something wrong or broken? Should incremental backups of a couple of hundred megabytes take that much time to perform? I was expecting something under 5 minutes, not ~39. Why is it taking so long?



(If I had a spare 1TB SATA SSD I'd hook it up and just rsync the data straight across to see if that is noticeably faster — but unfortunately I do not.)



Update 1: I ran a manual backup after negligible changes (a few KB of new mail) and the time taken was the same (~39 minutes). Thus the time taken seems to have little to do with the amount of new data that needs to be backed up.



Update 2: Monitoring with iotop revealed that the Scanning phase reads 7.36GB from the drive. Now that's obviously not the whole 850GB... but it's not far removed from the number you get if you multiply the number of files on the source drive (1174000) by the block/cluster size (4096) — i.e. 4.81GB. Not sure how to account for the remaining 2.5GB though, if this were the case.







share|improve this question





















  • Perhaps it made a 2nd full backup? I've never used the gui. Try looking iin the backup directory to see what new files it created. I think you should see files ` duplicity-inc.*` for the incremental backup.
    – meuh
    Apr 29 at 8:33










  • Yep, I've got the duplicity.inc files for today and they seem to be about the right size (~15% compression). The rest is dated yesterday. The destination SSD is only 1TB so it couldn't fit two full backups without overwriting (parts of) the first anyway. So it looks like the programs did the job correctly. I'm just concerned about how long it took them to do it.
    – Tim
    Apr 29 at 8:49







  • 1




    I'm not sure how duplicity works when doing a comparison. Whereas rsync typically is configured to just compare timestamps and sizes, I think duplicity recomputes a "signature" for each file, which involves reading the whole file and comparing it with the saved signature in the backup. If so, the scan/backup is having to read all 850Gby of your disk to collect the signatures.
    – meuh
    Apr 29 at 9:14










  • If that's the case, then won't the atime on every file be modified and, because they're on SSD, won't that then trigger the rewriting of all 850GB in the process? (Or at least the first inode of each file, of which there's ~1.1 million?)
    – Tim
    Apr 29 at 20:02











  • Linux has optimised atime, see man 8 mount for relatime etc. Also, some backup programs will explicitly call utimes() to restore the atime after reading the file. I don't know what duplicity does.
    – meuh
    Apr 29 at 20:38












up vote
1
down vote

favorite









up vote
1
down vote

favorite











On a fresh Ubuntu 18.04 system I decided to give Deja-Dup a go. It's just a GUI for Duplicity (which uses rsync). About 850GB of data needed to be backed up. Source SSD was NVMe. Destination SSD was SATA. Initial (full) backup took about 7 hours.



The next day I did nothing more than check mail and install an application — which added ~230MB — before running Deja-Dup again.



Deja-Dup ran for ~39 minutes loading a single core at 35–70% for the entire duration.



Duplicity was invoked three times:



  • The Scanning phase pegged a core at 100% for 18 minutes.

  • The Backing Up phase pegged a core at 100% for 16 minutes.

  • The Verifying phase pegged a core at 100% for 5 minutes.

This is on a new computer with plenty of RAM. The backup was not encrypted.



Now, I expect initial (full) backups to take a while, and that's fine. What I don't expect is for ~230MB of new data to take ~39 minutes to be backed up (and consume over a collective core-hour of CPU time).



Is something wrong or broken? Should incremental backups of a couple of hundred megabytes take that much time to perform? I was expecting something under 5 minutes, not ~39. Why is it taking so long?



(If I had a spare 1TB SATA SSD I'd hook it up and just rsync the data straight across to see if that is noticeably faster — but unfortunately I do not.)



Update 1: I ran a manual backup after negligible changes (a few KB of new mail) and the time taken was the same (~39 minutes). Thus the time taken seems to have little to do with the amount of new data that needs to be backed up.



Update 2: Monitoring with iotop revealed that the Scanning phase reads 7.36GB from the drive. Now that's obviously not the whole 850GB... but it's not far removed from the number you get if you multiply the number of files on the source drive (1174000) by the block/cluster size (4096) — i.e. 4.81GB. Not sure how to account for the remaining 2.5GB though, if this were the case.







share|improve this question













On a fresh Ubuntu 18.04 system I decided to give Deja-Dup a go. It's just a GUI for Duplicity (which uses rsync). About 850GB of data needed to be backed up. Source SSD was NVMe. Destination SSD was SATA. Initial (full) backup took about 7 hours.



The next day I did nothing more than check mail and install an application — which added ~230MB — before running Deja-Dup again.



Deja-Dup ran for ~39 minutes loading a single core at 35–70% for the entire duration.



Duplicity was invoked three times:



  • The Scanning phase pegged a core at 100% for 18 minutes.

  • The Backing Up phase pegged a core at 100% for 16 minutes.

  • The Verifying phase pegged a core at 100% for 5 minutes.

This is on a new computer with plenty of RAM. The backup was not encrypted.



Now, I expect initial (full) backups to take a while, and that's fine. What I don't expect is for ~230MB of new data to take ~39 minutes to be backed up (and consume over a collective core-hour of CPU time).



Is something wrong or broken? Should incremental backups of a couple of hundred megabytes take that much time to perform? I was expecting something under 5 minutes, not ~39. Why is it taking so long?



(If I had a spare 1TB SATA SSD I'd hook it up and just rsync the data straight across to see if that is noticeably faster — but unfortunately I do not.)



Update 1: I ran a manual backup after negligible changes (a few KB of new mail) and the time taken was the same (~39 minutes). Thus the time taken seems to have little to do with the amount of new data that needs to be backed up.



Update 2: Monitoring with iotop revealed that the Scanning phase reads 7.36GB from the drive. Now that's obviously not the whole 850GB... but it's not far removed from the number you get if you multiply the number of files on the source drive (1174000) by the block/cluster size (4096) — i.e. 4.81GB. Not sure how to account for the remaining 2.5GB though, if this were the case.









share|improve this question












share|improve this question




share|improve this question








edited Apr 30 at 3:14
























asked Apr 29 at 8:10









Tim

28128




28128











  • Perhaps it made a 2nd full backup? I've never used the gui. Try looking iin the backup directory to see what new files it created. I think you should see files ` duplicity-inc.*` for the incremental backup.
    – meuh
    Apr 29 at 8:33










  • Yep, I've got the duplicity.inc files for today and they seem to be about the right size (~15% compression). The rest is dated yesterday. The destination SSD is only 1TB so it couldn't fit two full backups without overwriting (parts of) the first anyway. So it looks like the programs did the job correctly. I'm just concerned about how long it took them to do it.
    – Tim
    Apr 29 at 8:49







  • 1




    I'm not sure how duplicity works when doing a comparison. Whereas rsync typically is configured to just compare timestamps and sizes, I think duplicity recomputes a "signature" for each file, which involves reading the whole file and comparing it with the saved signature in the backup. If so, the scan/backup is having to read all 850Gby of your disk to collect the signatures.
    – meuh
    Apr 29 at 9:14










  • If that's the case, then won't the atime on every file be modified and, because they're on SSD, won't that then trigger the rewriting of all 850GB in the process? (Or at least the first inode of each file, of which there's ~1.1 million?)
    – Tim
    Apr 29 at 20:02











  • Linux has optimised atime, see man 8 mount for relatime etc. Also, some backup programs will explicitly call utimes() to restore the atime after reading the file. I don't know what duplicity does.
    – meuh
    Apr 29 at 20:38
















  • Perhaps it made a 2nd full backup? I've never used the gui. Try looking iin the backup directory to see what new files it created. I think you should see files ` duplicity-inc.*` for the incremental backup.
    – meuh
    Apr 29 at 8:33










  • Yep, I've got the duplicity.inc files for today and they seem to be about the right size (~15% compression). The rest is dated yesterday. The destination SSD is only 1TB so it couldn't fit two full backups without overwriting (parts of) the first anyway. So it looks like the programs did the job correctly. I'm just concerned about how long it took them to do it.
    – Tim
    Apr 29 at 8:49







  • 1




    I'm not sure how duplicity works when doing a comparison. Whereas rsync typically is configured to just compare timestamps and sizes, I think duplicity recomputes a "signature" for each file, which involves reading the whole file and comparing it with the saved signature in the backup. If so, the scan/backup is having to read all 850Gby of your disk to collect the signatures.
    – meuh
    Apr 29 at 9:14










  • If that's the case, then won't the atime on every file be modified and, because they're on SSD, won't that then trigger the rewriting of all 850GB in the process? (Or at least the first inode of each file, of which there's ~1.1 million?)
    – Tim
    Apr 29 at 20:02











  • Linux has optimised atime, see man 8 mount for relatime etc. Also, some backup programs will explicitly call utimes() to restore the atime after reading the file. I don't know what duplicity does.
    – meuh
    Apr 29 at 20:38















Perhaps it made a 2nd full backup? I've never used the gui. Try looking iin the backup directory to see what new files it created. I think you should see files ` duplicity-inc.*` for the incremental backup.
– meuh
Apr 29 at 8:33




Perhaps it made a 2nd full backup? I've never used the gui. Try looking iin the backup directory to see what new files it created. I think you should see files ` duplicity-inc.*` for the incremental backup.
– meuh
Apr 29 at 8:33












Yep, I've got the duplicity.inc files for today and they seem to be about the right size (~15% compression). The rest is dated yesterday. The destination SSD is only 1TB so it couldn't fit two full backups without overwriting (parts of) the first anyway. So it looks like the programs did the job correctly. I'm just concerned about how long it took them to do it.
– Tim
Apr 29 at 8:49





Yep, I've got the duplicity.inc files for today and they seem to be about the right size (~15% compression). The rest is dated yesterday. The destination SSD is only 1TB so it couldn't fit two full backups without overwriting (parts of) the first anyway. So it looks like the programs did the job correctly. I'm just concerned about how long it took them to do it.
– Tim
Apr 29 at 8:49





1




1




I'm not sure how duplicity works when doing a comparison. Whereas rsync typically is configured to just compare timestamps and sizes, I think duplicity recomputes a "signature" for each file, which involves reading the whole file and comparing it with the saved signature in the backup. If so, the scan/backup is having to read all 850Gby of your disk to collect the signatures.
– meuh
Apr 29 at 9:14




I'm not sure how duplicity works when doing a comparison. Whereas rsync typically is configured to just compare timestamps and sizes, I think duplicity recomputes a "signature" for each file, which involves reading the whole file and comparing it with the saved signature in the backup. If so, the scan/backup is having to read all 850Gby of your disk to collect the signatures.
– meuh
Apr 29 at 9:14












If that's the case, then won't the atime on every file be modified and, because they're on SSD, won't that then trigger the rewriting of all 850GB in the process? (Or at least the first inode of each file, of which there's ~1.1 million?)
– Tim
Apr 29 at 20:02





If that's the case, then won't the atime on every file be modified and, because they're on SSD, won't that then trigger the rewriting of all 850GB in the process? (Or at least the first inode of each file, of which there's ~1.1 million?)
– Tim
Apr 29 at 20:02













Linux has optimised atime, see man 8 mount for relatime etc. Also, some backup programs will explicitly call utimes() to restore the atime after reading the file. I don't know what duplicity does.
– meuh
Apr 29 at 20:38




Linux has optimised atime, see man 8 mount for relatime etc. Also, some backup programs will explicitly call utimes() to restore the atime after reading the file. I don't know what duplicity does.
– meuh
Apr 29 at 20:38










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










I can confirm that the Deja-Dup/Duplicity combination just makes the backing up process atrociously slow (~156x slower, in fact).



I ended up wiping the Deja-Dup/Duplicity backup and just went with pure rsync. Backups are now taking as little as 15 seconds.



$ rsync -a --delete /home/tim /media/tim/BackupDrive/


Unless you really, really, really need the simplicity that Deja-Dup provides or the features that Duplicity provides then don't even bother — you're just wasting CPU cycles, time, electricity, causing unnecessary wear on your drive, and you end up with a fragile backup. Deja-Dup/Duplicity is a horribly inefficient way to simply back up files from one local drive to another.



For simple, automated, local backups all you need is to add a rsync entry to your crontab.






share|improve this answer























    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f440697%2fbacking-up-to-a-local-ssd-taking-a-lot-longer-than-expected%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    I can confirm that the Deja-Dup/Duplicity combination just makes the backing up process atrociously slow (~156x slower, in fact).



    I ended up wiping the Deja-Dup/Duplicity backup and just went with pure rsync. Backups are now taking as little as 15 seconds.



    $ rsync -a --delete /home/tim /media/tim/BackupDrive/


    Unless you really, really, really need the simplicity that Deja-Dup provides or the features that Duplicity provides then don't even bother — you're just wasting CPU cycles, time, electricity, causing unnecessary wear on your drive, and you end up with a fragile backup. Deja-Dup/Duplicity is a horribly inefficient way to simply back up files from one local drive to another.



    For simple, automated, local backups all you need is to add a rsync entry to your crontab.






    share|improve this answer



























      up vote
      1
      down vote



      accepted










      I can confirm that the Deja-Dup/Duplicity combination just makes the backing up process atrociously slow (~156x slower, in fact).



      I ended up wiping the Deja-Dup/Duplicity backup and just went with pure rsync. Backups are now taking as little as 15 seconds.



      $ rsync -a --delete /home/tim /media/tim/BackupDrive/


      Unless you really, really, really need the simplicity that Deja-Dup provides or the features that Duplicity provides then don't even bother — you're just wasting CPU cycles, time, electricity, causing unnecessary wear on your drive, and you end up with a fragile backup. Deja-Dup/Duplicity is a horribly inefficient way to simply back up files from one local drive to another.



      For simple, automated, local backups all you need is to add a rsync entry to your crontab.






      share|improve this answer

























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        I can confirm that the Deja-Dup/Duplicity combination just makes the backing up process atrociously slow (~156x slower, in fact).



        I ended up wiping the Deja-Dup/Duplicity backup and just went with pure rsync. Backups are now taking as little as 15 seconds.



        $ rsync -a --delete /home/tim /media/tim/BackupDrive/


        Unless you really, really, really need the simplicity that Deja-Dup provides or the features that Duplicity provides then don't even bother — you're just wasting CPU cycles, time, electricity, causing unnecessary wear on your drive, and you end up with a fragile backup. Deja-Dup/Duplicity is a horribly inefficient way to simply back up files from one local drive to another.



        For simple, automated, local backups all you need is to add a rsync entry to your crontab.






        share|improve this answer















        I can confirm that the Deja-Dup/Duplicity combination just makes the backing up process atrociously slow (~156x slower, in fact).



        I ended up wiping the Deja-Dup/Duplicity backup and just went with pure rsync. Backups are now taking as little as 15 seconds.



        $ rsync -a --delete /home/tim /media/tim/BackupDrive/


        Unless you really, really, really need the simplicity that Deja-Dup provides or the features that Duplicity provides then don't even bother — you're just wasting CPU cycles, time, electricity, causing unnecessary wear on your drive, and you end up with a fragile backup. Deja-Dup/Duplicity is a horribly inefficient way to simply back up files from one local drive to another.



        For simple, automated, local backups all you need is to add a rsync entry to your crontab.







        share|improve this answer















        share|improve this answer



        share|improve this answer








        edited May 27 at 15:42


























        answered May 27 at 15:36









        Tim

        28128




        28128






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f440697%2fbacking-up-to-a-local-ssd-taking-a-lot-longer-than-expected%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?