How can I determine if running tar will cause disk to fill up

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












21















If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:



/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm


Related questions:



  • If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
    doing under the hood?

  • How can I accurately determine this information myself without
    asking again?









share|improve this question
























  • I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

    – UVV
    Apr 10 '14 at 8:02







  • 4





    Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

    – Ulrich Schwarz
    Apr 10 '14 at 9:01















21















If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:



/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm


Related questions:



  • If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
    doing under the hood?

  • How can I accurately determine this information myself without
    asking again?









share|improve this question
























  • I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

    – UVV
    Apr 10 '14 at 8:02







  • 4





    Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

    – Ulrich Schwarz
    Apr 10 '14 at 9:01













21












21








21


4






If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:



/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm


Related questions:



  • If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
    doing under the hood?

  • How can I accurately determine this information myself without
    asking again?









share|improve this question
















If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:



/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm


Related questions:



  • If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
    doing under the hood?

  • How can I accurately determine this information myself without
    asking again?






tar disk-usage






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 10 '14 at 21:55









Gilles

538k12810881605




538k12810881605










asked Apr 10 '14 at 7:53









codecowboycodecowboy

1,03551328




1,03551328












  • I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

    – UVV
    Apr 10 '14 at 8:02







  • 4





    Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

    – Ulrich Schwarz
    Apr 10 '14 at 9:01

















  • I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

    – UVV
    Apr 10 '14 at 8:02







  • 4





    Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

    – Ulrich Schwarz
    Apr 10 '14 at 9:01
















I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02






I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02





4




4





Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01





Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01










6 Answers
6






active

oldest

votes


















23














tar -c data_dir | wc -c
without compression



or



tar -cz data_dir | wc -c
with gzip compression



or



tar -cj data_dir | wc -c
with bzip2 compression



will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.



You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:



du -h --max-depth=1 data_dir



As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.



Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!



https://en.wikipedia.org/wiki/Tar_(computing)#Format_details






share|improve this answer

























  • Thanks Jamie! What is '- mysql' doing here? Is that your filename?

    – codecowboy
    Apr 10 '14 at 8:50











  • Just changed that... it is the path to your data directory.

    – FantasticJamieBurns
    Apr 10 '14 at 8:50






  • 1





    Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

    – user8909
    Apr 10 '14 at 14:10



















6














The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.



A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.



You could mitigate the problem by



  • compressing on the fly by using the z or j options to tar

  • doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.





share|improve this answer






























    2














    tar itself can report on the size of its archives with the --test option:



    tar -cf - ./* | tar --totals -tvf -


    The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.



    OUTPUT:



    ...
    -rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
    -rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
    -rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
    -rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
    -rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
    -rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
    -rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
    -rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
    Total bytes read: 4300943360 (4.1GiB, 475MiB/s)


    Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:



    ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar


    Or to simply copy with tar:



    ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -





    share|improve this answer

























    • The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

      – codecowboy
      Apr 10 '14 at 8:19











    • @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

      – mikeserv
      Apr 10 '14 at 8:24












    • @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

      – Gilles
      Apr 10 '14 at 21:56











    • -i right, sorry!

      – goldilocks
      Apr 11 '14 at 12:29











    • @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

      – codecowboy
      Apr 30 '14 at 10:46


















    2














    I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.



    tar -tvOf afile.tar | wc -c


    du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:



    $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))


    then you have have to add all of the characters. For something that looks like this:



    $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))


    I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.






    share|improve this answer
































      1














      -cvf does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.




      an easily downloadable copy




      Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2 should now-a-days be available on any system w/ tar, I think, so including j in your switches is probably the best choice. z (gzip) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.



      If you mean, does tar use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).






      share|improve this answer
































        0














        If speed is important and compression is not needed, you can hook the syscall wrappers used by tar using LD_PRELOAD, to change tar to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read and write that is performed in normal operation of tar. This makes tar much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat of the requested input file/folder(s) needs to be read from disk instead of the actual file data.



        The code below includes implementations of the close, read, and write POSIX functions. The macro OUT_FD controls which file descriptor we expect tar to use as the output file. Currently it is set to stdout.



        read was changed to just return the success value of count bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.



        write was changed to sum the input count bytes into the global variable total and return the success value of count bytes only if file descriptor matches OUT_FD, otherwise it calls the original wrapper acquired via dlsym to perform the syscall of the same name.



        close still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar is done attempting to write a tar file, so the total number is final and it prints it to stdout.



        #define _GNU_SOURCE
        #include <unistd.h>
        #include <stdio.h>
        #include <stdint.h>
        #include <inttypes.h>
        #include <stdlib.h>
        #include <errno.h>
        #include <dlfcn.h>
        #include <string.h>

        #define OUT_FD 1
        uint64_t total = 0;
        ssize_t (*original_write)(int, const void *, size_t) = NULL;
        int (*original_close)(int) = NULL;
        void print_total(void)

        printf("%" PRIu64 "n", total);


        int close(int fd)

        if(! original_close)

        original_close = dlsym(RTLD_NEXT, "close");

        if(fd == OUT_FD)

        print_total();

        return original_close(fd);


        ssize_t read(int fd, void *buf, size_t count)

        return count;


        ssize_t write(int fd, const void *buf, size_t count)

        if(!original_write)

        original_write = dlsym(RTLD_NEXT, "write");

        if(fd == OUT_FD)

        total += count;
        return count;

        return original_write(fd, buf, count);



        Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.



        $ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
        332308480
        real 0m0.457s
        user 0m0.064s
        sys 0m0.772s


        tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
        332308480
        real 0m0.016s
        user 0m0.004s
        sys 0m0.008s


        The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
        https://github.com/G4Vi/tarsize



        Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/






        share|improve this answer

























        • Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

          – G-Man
          Jan 30 at 0:00










        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "106"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f124052%2fhow-can-i-determine-if-running-tar-will-cause-disk-to-fill-up%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        23














        tar -c data_dir | wc -c
        without compression



        or



        tar -cz data_dir | wc -c
        with gzip compression



        or



        tar -cj data_dir | wc -c
        with bzip2 compression



        will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.



        You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:



        du -h --max-depth=1 data_dir



        As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.



        Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!



        https://en.wikipedia.org/wiki/Tar_(computing)#Format_details






        share|improve this answer

























        • Thanks Jamie! What is '- mysql' doing here? Is that your filename?

          – codecowboy
          Apr 10 '14 at 8:50











        • Just changed that... it is the path to your data directory.

          – FantasticJamieBurns
          Apr 10 '14 at 8:50






        • 1





          Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

          – user8909
          Apr 10 '14 at 14:10
















        23














        tar -c data_dir | wc -c
        without compression



        or



        tar -cz data_dir | wc -c
        with gzip compression



        or



        tar -cj data_dir | wc -c
        with bzip2 compression



        will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.



        You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:



        du -h --max-depth=1 data_dir



        As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.



        Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!



        https://en.wikipedia.org/wiki/Tar_(computing)#Format_details






        share|improve this answer

























        • Thanks Jamie! What is '- mysql' doing here? Is that your filename?

          – codecowboy
          Apr 10 '14 at 8:50











        • Just changed that... it is the path to your data directory.

          – FantasticJamieBurns
          Apr 10 '14 at 8:50






        • 1





          Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

          – user8909
          Apr 10 '14 at 14:10














        23












        23








        23







        tar -c data_dir | wc -c
        without compression



        or



        tar -cz data_dir | wc -c
        with gzip compression



        or



        tar -cj data_dir | wc -c
        with bzip2 compression



        will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.



        You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:



        du -h --max-depth=1 data_dir



        As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.



        Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!



        https://en.wikipedia.org/wiki/Tar_(computing)#Format_details






        share|improve this answer















        tar -c data_dir | wc -c
        without compression



        or



        tar -cz data_dir | wc -c
        with gzip compression



        or



        tar -cj data_dir | wc -c
        with bzip2 compression



        will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.



        You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:



        du -h --max-depth=1 data_dir



        As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.



        Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!



        https://en.wikipedia.org/wiki/Tar_(computing)#Format_details







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 10 '14 at 20:10









        Andrew Medico

        1486




        1486










        answered Apr 10 '14 at 8:27









        FantasticJamieBurnsFantasticJamieBurns

        34614




        34614












        • Thanks Jamie! What is '- mysql' doing here? Is that your filename?

          – codecowboy
          Apr 10 '14 at 8:50











        • Just changed that... it is the path to your data directory.

          – FantasticJamieBurns
          Apr 10 '14 at 8:50






        • 1





          Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

          – user8909
          Apr 10 '14 at 14:10


















        • Thanks Jamie! What is '- mysql' doing here? Is that your filename?

          – codecowboy
          Apr 10 '14 at 8:50











        • Just changed that... it is the path to your data directory.

          – FantasticJamieBurns
          Apr 10 '14 at 8:50






        • 1





          Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

          – user8909
          Apr 10 '14 at 14:10

















        Thanks Jamie! What is '- mysql' doing here? Is that your filename?

        – codecowboy
        Apr 10 '14 at 8:50





        Thanks Jamie! What is '- mysql' doing here? Is that your filename?

        – codecowboy
        Apr 10 '14 at 8:50













        Just changed that... it is the path to your data directory.

        – FantasticJamieBurns
        Apr 10 '14 at 8:50





        Just changed that... it is the path to your data directory.

        – FantasticJamieBurns
        Apr 10 '14 at 8:50




        1




        1





        Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

        – user8909
        Apr 10 '14 at 14:10






        Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

        – user8909
        Apr 10 '14 at 14:10














        6














        The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.



        A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.



        You could mitigate the problem by



        • compressing on the fly by using the z or j options to tar

        • doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.





        share|improve this answer



























          6














          The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.



          A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.



          You could mitigate the problem by



          • compressing on the fly by using the z or j options to tar

          • doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.





          share|improve this answer

























            6












            6








            6







            The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.



            A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.



            You could mitigate the problem by



            • compressing on the fly by using the z or j options to tar

            • doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.





            share|improve this answer













            The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.



            A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.



            You could mitigate the problem by



            • compressing on the fly by using the z or j options to tar

            • doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 10 '14 at 8:16









            FlupFlup

            6,05912044




            6,05912044





















                2














                tar itself can report on the size of its archives with the --test option:



                tar -cf - ./* | tar --totals -tvf -


                The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.



                OUTPUT:



                ...
                -rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
                -rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
                -rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
                -rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
                -rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
                -rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
                -rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
                -rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
                Total bytes read: 4300943360 (4.1GiB, 475MiB/s)


                Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:



                ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar


                Or to simply copy with tar:



                ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -





                share|improve this answer

























                • The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

                  – codecowboy
                  Apr 10 '14 at 8:19











                • @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

                  – mikeserv
                  Apr 10 '14 at 8:24












                • @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

                  – Gilles
                  Apr 10 '14 at 21:56











                • -i right, sorry!

                  – goldilocks
                  Apr 11 '14 at 12:29











                • @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

                  – codecowboy
                  Apr 30 '14 at 10:46















                2














                tar itself can report on the size of its archives with the --test option:



                tar -cf - ./* | tar --totals -tvf -


                The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.



                OUTPUT:



                ...
                -rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
                -rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
                -rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
                -rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
                -rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
                -rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
                -rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
                -rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
                Total bytes read: 4300943360 (4.1GiB, 475MiB/s)


                Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:



                ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar


                Or to simply copy with tar:



                ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -





                share|improve this answer

























                • The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

                  – codecowboy
                  Apr 10 '14 at 8:19











                • @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

                  – mikeserv
                  Apr 10 '14 at 8:24












                • @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

                  – Gilles
                  Apr 10 '14 at 21:56











                • -i right, sorry!

                  – goldilocks
                  Apr 11 '14 at 12:29











                • @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

                  – codecowboy
                  Apr 30 '14 at 10:46













                2












                2








                2







                tar itself can report on the size of its archives with the --test option:



                tar -cf - ./* | tar --totals -tvf -


                The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.



                OUTPUT:



                ...
                -rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
                -rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
                -rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
                -rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
                -rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
                -rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
                -rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
                -rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
                Total bytes read: 4300943360 (4.1GiB, 475MiB/s)


                Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:



                ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar


                Or to simply copy with tar:



                ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -





                share|improve this answer















                tar itself can report on the size of its archives with the --test option:



                tar -cf - ./* | tar --totals -tvf -


                The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.



                OUTPUT:



                ...
                -rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
                -rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
                -rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
                -rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
                -rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
                -rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
                -rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
                -rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
                Total bytes read: 4300943360 (4.1GiB, 475MiB/s)


                Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:



                ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar


                Or to simply copy with tar:



                ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 10 '14 at 9:59

























                answered Apr 10 '14 at 8:17









                mikeservmikeserv

                45.7k668159




                45.7k668159












                • The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

                  – codecowboy
                  Apr 10 '14 at 8:19











                • @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

                  – mikeserv
                  Apr 10 '14 at 8:24












                • @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

                  – Gilles
                  Apr 10 '14 at 21:56











                • -i right, sorry!

                  – goldilocks
                  Apr 11 '14 at 12:29











                • @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

                  – codecowboy
                  Apr 30 '14 at 10:46

















                • The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

                  – codecowboy
                  Apr 10 '14 at 8:19











                • @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

                  – mikeserv
                  Apr 10 '14 at 8:24












                • @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

                  – Gilles
                  Apr 10 '14 at 21:56











                • -i right, sorry!

                  – goldilocks
                  Apr 11 '14 at 12:29











                • @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

                  – codecowboy
                  Apr 30 '14 at 10:46
















                The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

                – codecowboy
                Apr 10 '14 at 8:19





                The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

                – codecowboy
                Apr 10 '14 at 8:19













                @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

                – mikeserv
                Apr 10 '14 at 8:24






                @codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

                – mikeserv
                Apr 10 '14 at 8:24














                @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

                – Gilles
                Apr 10 '14 at 21:56





                @TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

                – Gilles
                Apr 10 '14 at 21:56













                -i right, sorry!

                – goldilocks
                Apr 11 '14 at 12:29





                -i right, sorry!

                – goldilocks
                Apr 11 '14 at 12:29













                @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

                – codecowboy
                Apr 30 '14 at 10:46





                @mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

                – codecowboy
                Apr 30 '14 at 10:46











                2














                I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.



                tar -tvOf afile.tar | wc -c


                du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:



                $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))


                then you have have to add all of the characters. For something that looks like this:



                $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))


                I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.






                share|improve this answer





























                  2














                  I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.



                  tar -tvOf afile.tar | wc -c


                  du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:



                  $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))


                  then you have have to add all of the characters. For something that looks like this:



                  $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))


                  I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.






                  share|improve this answer



























                    2












                    2








                    2







                    I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.



                    tar -tvOf afile.tar | wc -c


                    du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:



                    $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))


                    then you have have to add all of the characters. For something that looks like this:



                    $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))


                    I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.






                    share|improve this answer















                    I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.



                    tar -tvOf afile.tar | wc -c


                    du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:



                    $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))


                    then you have have to add all of the characters. For something that looks like this:



                    $(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))


                    I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Apr 7 '16 at 21:34









                    slm

                    251k69529685




                    251k69529685










                    answered Apr 7 '16 at 21:28









                    tass6773tass6773

                    211




                    211





















                        1














                        -cvf does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.




                        an easily downloadable copy




                        Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2 should now-a-days be available on any system w/ tar, I think, so including j in your switches is probably the best choice. z (gzip) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.



                        If you mean, does tar use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).






                        share|improve this answer





























                          1














                          -cvf does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.




                          an easily downloadable copy




                          Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2 should now-a-days be available on any system w/ tar, I think, so including j in your switches is probably the best choice. z (gzip) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.



                          If you mean, does tar use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).






                          share|improve this answer



























                            1












                            1








                            1







                            -cvf does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.




                            an easily downloadable copy




                            Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2 should now-a-days be available on any system w/ tar, I think, so including j in your switches is probably the best choice. z (gzip) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.



                            If you mean, does tar use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).






                            share|improve this answer















                            -cvf does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.




                            an easily downloadable copy




                            Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2 should now-a-days be available on any system w/ tar, I think, so including j in your switches is probably the best choice. z (gzip) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.



                            If you mean, does tar use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Apr 10 '14 at 8:26

























                            answered Apr 10 '14 at 8:17









                            goldilocksgoldilocks

                            62.3k14152210




                            62.3k14152210





















                                0














                                If speed is important and compression is not needed, you can hook the syscall wrappers used by tar using LD_PRELOAD, to change tar to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read and write that is performed in normal operation of tar. This makes tar much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat of the requested input file/folder(s) needs to be read from disk instead of the actual file data.



                                The code below includes implementations of the close, read, and write POSIX functions. The macro OUT_FD controls which file descriptor we expect tar to use as the output file. Currently it is set to stdout.



                                read was changed to just return the success value of count bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.



                                write was changed to sum the input count bytes into the global variable total and return the success value of count bytes only if file descriptor matches OUT_FD, otherwise it calls the original wrapper acquired via dlsym to perform the syscall of the same name.



                                close still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar is done attempting to write a tar file, so the total number is final and it prints it to stdout.



                                #define _GNU_SOURCE
                                #include <unistd.h>
                                #include <stdio.h>
                                #include <stdint.h>
                                #include <inttypes.h>
                                #include <stdlib.h>
                                #include <errno.h>
                                #include <dlfcn.h>
                                #include <string.h>

                                #define OUT_FD 1
                                uint64_t total = 0;
                                ssize_t (*original_write)(int, const void *, size_t) = NULL;
                                int (*original_close)(int) = NULL;
                                void print_total(void)

                                printf("%" PRIu64 "n", total);


                                int close(int fd)

                                if(! original_close)

                                original_close = dlsym(RTLD_NEXT, "close");

                                if(fd == OUT_FD)

                                print_total();

                                return original_close(fd);


                                ssize_t read(int fd, void *buf, size_t count)

                                return count;


                                ssize_t write(int fd, const void *buf, size_t count)

                                if(!original_write)

                                original_write = dlsym(RTLD_NEXT, "write");

                                if(fd == OUT_FD)

                                total += count;
                                return count;

                                return original_write(fd, buf, count);



                                Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.



                                $ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
                                332308480
                                real 0m0.457s
                                user 0m0.064s
                                sys 0m0.772s


                                tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
                                332308480
                                real 0m0.016s
                                user 0m0.004s
                                sys 0m0.008s


                                The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
                                https://github.com/G4Vi/tarsize



                                Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/






                                share|improve this answer

























                                • Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

                                  – G-Man
                                  Jan 30 at 0:00















                                0














                                If speed is important and compression is not needed, you can hook the syscall wrappers used by tar using LD_PRELOAD, to change tar to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read and write that is performed in normal operation of tar. This makes tar much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat of the requested input file/folder(s) needs to be read from disk instead of the actual file data.



                                The code below includes implementations of the close, read, and write POSIX functions. The macro OUT_FD controls which file descriptor we expect tar to use as the output file. Currently it is set to stdout.



                                read was changed to just return the success value of count bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.



                                write was changed to sum the input count bytes into the global variable total and return the success value of count bytes only if file descriptor matches OUT_FD, otherwise it calls the original wrapper acquired via dlsym to perform the syscall of the same name.



                                close still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar is done attempting to write a tar file, so the total number is final and it prints it to stdout.



                                #define _GNU_SOURCE
                                #include <unistd.h>
                                #include <stdio.h>
                                #include <stdint.h>
                                #include <inttypes.h>
                                #include <stdlib.h>
                                #include <errno.h>
                                #include <dlfcn.h>
                                #include <string.h>

                                #define OUT_FD 1
                                uint64_t total = 0;
                                ssize_t (*original_write)(int, const void *, size_t) = NULL;
                                int (*original_close)(int) = NULL;
                                void print_total(void)

                                printf("%" PRIu64 "n", total);


                                int close(int fd)

                                if(! original_close)

                                original_close = dlsym(RTLD_NEXT, "close");

                                if(fd == OUT_FD)

                                print_total();

                                return original_close(fd);


                                ssize_t read(int fd, void *buf, size_t count)

                                return count;


                                ssize_t write(int fd, const void *buf, size_t count)

                                if(!original_write)

                                original_write = dlsym(RTLD_NEXT, "write");

                                if(fd == OUT_FD)

                                total += count;
                                return count;

                                return original_write(fd, buf, count);



                                Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.



                                $ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
                                332308480
                                real 0m0.457s
                                user 0m0.064s
                                sys 0m0.772s


                                tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
                                332308480
                                real 0m0.016s
                                user 0m0.004s
                                sys 0m0.008s


                                The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
                                https://github.com/G4Vi/tarsize



                                Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/






                                share|improve this answer

























                                • Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

                                  – G-Man
                                  Jan 30 at 0:00













                                0












                                0








                                0







                                If speed is important and compression is not needed, you can hook the syscall wrappers used by tar using LD_PRELOAD, to change tar to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read and write that is performed in normal operation of tar. This makes tar much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat of the requested input file/folder(s) needs to be read from disk instead of the actual file data.



                                The code below includes implementations of the close, read, and write POSIX functions. The macro OUT_FD controls which file descriptor we expect tar to use as the output file. Currently it is set to stdout.



                                read was changed to just return the success value of count bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.



                                write was changed to sum the input count bytes into the global variable total and return the success value of count bytes only if file descriptor matches OUT_FD, otherwise it calls the original wrapper acquired via dlsym to perform the syscall of the same name.



                                close still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar is done attempting to write a tar file, so the total number is final and it prints it to stdout.



                                #define _GNU_SOURCE
                                #include <unistd.h>
                                #include <stdio.h>
                                #include <stdint.h>
                                #include <inttypes.h>
                                #include <stdlib.h>
                                #include <errno.h>
                                #include <dlfcn.h>
                                #include <string.h>

                                #define OUT_FD 1
                                uint64_t total = 0;
                                ssize_t (*original_write)(int, const void *, size_t) = NULL;
                                int (*original_close)(int) = NULL;
                                void print_total(void)

                                printf("%" PRIu64 "n", total);


                                int close(int fd)

                                if(! original_close)

                                original_close = dlsym(RTLD_NEXT, "close");

                                if(fd == OUT_FD)

                                print_total();

                                return original_close(fd);


                                ssize_t read(int fd, void *buf, size_t count)

                                return count;


                                ssize_t write(int fd, const void *buf, size_t count)

                                if(!original_write)

                                original_write = dlsym(RTLD_NEXT, "write");

                                if(fd == OUT_FD)

                                total += count;
                                return count;

                                return original_write(fd, buf, count);



                                Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.



                                $ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
                                332308480
                                real 0m0.457s
                                user 0m0.064s
                                sys 0m0.772s


                                tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
                                332308480
                                real 0m0.016s
                                user 0m0.004s
                                sys 0m0.008s


                                The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
                                https://github.com/G4Vi/tarsize



                                Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/






                                share|improve this answer















                                If speed is important and compression is not needed, you can hook the syscall wrappers used by tar using LD_PRELOAD, to change tar to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read and write that is performed in normal operation of tar. This makes tar much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat of the requested input file/folder(s) needs to be read from disk instead of the actual file data.



                                The code below includes implementations of the close, read, and write POSIX functions. The macro OUT_FD controls which file descriptor we expect tar to use as the output file. Currently it is set to stdout.



                                read was changed to just return the success value of count bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.



                                write was changed to sum the input count bytes into the global variable total and return the success value of count bytes only if file descriptor matches OUT_FD, otherwise it calls the original wrapper acquired via dlsym to perform the syscall of the same name.



                                close still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar is done attempting to write a tar file, so the total number is final and it prints it to stdout.



                                #define _GNU_SOURCE
                                #include <unistd.h>
                                #include <stdio.h>
                                #include <stdint.h>
                                #include <inttypes.h>
                                #include <stdlib.h>
                                #include <errno.h>
                                #include <dlfcn.h>
                                #include <string.h>

                                #define OUT_FD 1
                                uint64_t total = 0;
                                ssize_t (*original_write)(int, const void *, size_t) = NULL;
                                int (*original_close)(int) = NULL;
                                void print_total(void)

                                printf("%" PRIu64 "n", total);


                                int close(int fd)

                                if(! original_close)

                                original_close = dlsym(RTLD_NEXT, "close");

                                if(fd == OUT_FD)

                                print_total();

                                return original_close(fd);


                                ssize_t read(int fd, void *buf, size_t count)

                                return count;


                                ssize_t write(int fd, const void *buf, size_t count)

                                if(!original_write)

                                original_write = dlsym(RTLD_NEXT, "write");

                                if(fd == OUT_FD)

                                total += count;
                                return count;

                                return original_write(fd, buf, count);



                                Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.



                                $ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
                                332308480
                                real 0m0.457s
                                user 0m0.064s
                                sys 0m0.772s


                                tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
                                332308480
                                real 0m0.016s
                                user 0m0.004s
                                sys 0m0.008s


                                The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
                                https://github.com/G4Vi/tarsize



                                Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Jan 30 at 1:59

























                                answered Jan 29 at 23:00









                                G4ViG4Vi

                                12




                                12












                                • Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

                                  – G-Man
                                  Jan 30 at 0:00

















                                • Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

                                  – G-Man
                                  Jan 30 at 0:00
















                                Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

                                – G-Man
                                Jan 30 at 0:00





                                Code is good, if it works, but can you describe what it does?  Please do not respond in comments; edit your answer to make it clearer and more complete.

                                – G-Man
                                Jan 30 at 0:00

















                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f124052%2fhow-can-i-determine-if-running-tar-will-cause-disk-to-fill-up%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown






                                Popular posts from this blog

                                How to check contact read email or not when send email to Individual?

                                Displaying single band from multi-band raster using QGIS

                                How many registers does an x86_64 CPU actually have?