How can I determine if running tar will cause disk to fill up

If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:

/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm

Related questions:

If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
doing under the hood?

How can I accurately determine this information myself without
asking again?

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02

4

Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01

add a comment |

If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:

/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm

Related questions:

If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
doing under the hood?

How can I accurately determine this information myself without
asking again?

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02

4

Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01

add a comment |

If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:

/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm

Related questions:

If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
doing under the hood?

How can I accurately determine this information myself without
asking again?

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

If I run tar -cvf on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h output:

/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm

Related questions:

If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or tar be
doing under the hood?

How can I accurately determine this information myself without
asking again?

tar disk-usage

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

edited Apr 10 '14 at 21:55

Gilles

538k12810881605

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

asked Apr 10 '14 at 7:53

codecowboy

1,03551328

I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02

4

Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01

add a comment |

I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02

4

Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01

I'm not sure if it's possible without processing the archive, but you can play around with --totals option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help.

– UVV
Apr 10 '14 at 8:02

Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.

– Ulrich Schwarz
Apr 10 '14 at 9:01

add a comment |

6 Answers
6

active

oldest

votes

tar -c data_dir | wc -c
without compression

tar -cz data_dir | wc -c
with gzip compression

tar -cj data_dir | wc -c
with bzip2 compression

will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.

You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:

du -h --max-depth=1 data_dir

As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.

Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!

https://en.wikipedia.org/wiki/Tar_(computing)#Format_details

edited Apr 10 '14 at 20:10

Andrew Medico

1486

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

Thanks Jamie! What is '- mysql' doing here? Is that your filename?

– codecowboy
Apr 10 '14 at 8:50

Just changed that... it is the path to your data directory.

– FantasticJamieBurns
Apr 10 '14 at 8:50

1

Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

– user8909
Apr 10 '14 at 14:10

add a comment |

The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.

A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.

You could mitigate the problem by

compressing on the fly by using the z or j options to tar

doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.

answered Apr 10 '14 at 8:16

Flup

6,05912044

add a comment |

tar itself can report on the size of its archives with the --test option:

tar -cf - ./* | tar --totals -tvf -

The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.

OUTPUT:

...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)

Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:

ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar

Or to simply copy with tar:

ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -

edited Apr 10 '14 at 9:59

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

– codecowboy
Apr 10 '14 at 8:19

@codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

– mikeserv
Apr 10 '14 at 8:24

@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

– Gilles
Apr 10 '14 at 21:56

-i right, sorry!

– goldilocks
Apr 11 '14 at 12:29

@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

– codecowboy
Apr 30 '14 at 10:46

|
show 1 more comment

I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.

tar -tvOf afile.tar | wc -c

du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))

then you have have to add all of the characters. For something that looks like this:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))

I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.

edited Apr 7 '16 at 21:34

slm♦

251k69529685

answered Apr 7 '16 at 21:28

tass6773

211

add a comment |

-cvf does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.

an easily downloadable copy

Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2 should now-a-days be available on any system w/ tar, I think, so including j in your switches is probably the best choice. z (gzip) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.

If you mean, does tar use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).

edited Apr 10 '14 at 8:26

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

add a comment |

If speed is important and compression is not needed, you can hook the syscall wrappers used by tar using LD_PRELOAD, to change tar to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read and write that is performed in normal operation of tar. This makes tar much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat of the requested input file/folder(s) needs to be read from disk instead of the actual file data.

The code below includes implementations of the close, read, and write POSIX functions. The macro OUT_FD controls which file descriptor we expect tar to use as the output file. Currently it is set to stdout.

read was changed to just return the success value of count bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.

write was changed to sum the input count bytes into the global variable total and return the success value of count bytes only if file descriptor matches OUT_FD, otherwise it calls the original wrapper acquired via dlsym to perform the syscall of the same name.

close still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar is done attempting to write a tar file, so the total number is final and it prints it to stdout.

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>

#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)

 printf("%" PRIu64 "n", total);


int close(int fd)

 if(! original_close)
 
 original_close = dlsym(RTLD_NEXT, "close");
 
 if(fd == OUT_FD)
 
 print_total();
 
 return original_close(fd);


ssize_t read(int fd, void *buf, size_t count)

 return count;


ssize_t write(int fd, const void *buf, size_t count)

 if(!original_write)
 
 original_write = dlsym(RTLD_NEXT, "write");
 
 if(fd == OUT_FD)
 
 total += count;
 return count;
 
 return original_write(fd, buf, count);

Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.

$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s

tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s

The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize

Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/

edited Jan 30 at 1:59

answered Jan 29 at 23:00

G4Vi

Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.

– G-Man
Jan 30 at 0:00

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f124052%2fhow-can-i-determine-if-running-tar-will-cause-disk-to-fill-up%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

tar -c data_dir | wc -c
without compression

tar -cz data_dir | wc -c
with gzip compression

tar -cj data_dir | wc -c
with bzip2 compression

will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.

You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:

du -h --max-depth=1 data_dir

https://en.wikipedia.org/wiki/Tar_(computing)#Format_details

edited Apr 10 '14 at 20:10

Andrew Medico

1486

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

Thanks Jamie! What is '- mysql' doing here? Is that your filename?

– codecowboy
Apr 10 '14 at 8:50

Just changed that... it is the path to your data directory.

– FantasticJamieBurns
Apr 10 '14 at 8:50

1

Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

– user8909
Apr 10 '14 at 14:10

add a comment |

tar -c data_dir | wc -c
without compression

tar -cz data_dir | wc -c
with gzip compression

tar -cj data_dir | wc -c
with bzip2 compression

will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.

You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:

du -h --max-depth=1 data_dir

https://en.wikipedia.org/wiki/Tar_(computing)#Format_details

edited Apr 10 '14 at 20:10

Andrew Medico

1486

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

Thanks Jamie! What is '- mysql' doing here? Is that your filename?

– codecowboy
Apr 10 '14 at 8:50

Just changed that... it is the path to your data directory.

– FantasticJamieBurns
Apr 10 '14 at 8:50

1

Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

– user8909
Apr 10 '14 at 14:10

add a comment |

tar -c data_dir | wc -c
without compression

tar -cz data_dir | wc -c
with gzip compression

tar -cj data_dir | wc -c
with bzip2 compression

will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.

You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:

du -h --max-depth=1 data_dir

https://en.wikipedia.org/wiki/Tar_(computing)#Format_details

edited Apr 10 '14 at 20:10

Andrew Medico

1486

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

tar -c data_dir | wc -c
without compression

tar -cz data_dir | wc -c
with gzip compression

tar -cj data_dir | wc -c
with bzip2 compression

will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.

You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:

du -h --max-depth=1 data_dir

https://en.wikipedia.org/wiki/Tar_(computing)#Format_details

edited Apr 10 '14 at 20:10

Andrew Medico

1486

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

edited Apr 10 '14 at 20:10

Andrew Medico

1486

edited Apr 10 '14 at 20:10

Andrew Medico

1486

edited Apr 10 '14 at 20:10

Andrew Medico

1486

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

answered Apr 10 '14 at 8:27

FantasticJamieBurns

34614

Thanks Jamie! What is '- mysql' doing here? Is that your filename?

– codecowboy
Apr 10 '14 at 8:50

Just changed that... it is the path to your data directory.

– FantasticJamieBurns
Apr 10 '14 at 8:50

1

Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

– user8909
Apr 10 '14 at 14:10

add a comment |

Thanks Jamie! What is '- mysql' doing here? Is that your filename?

– codecowboy
Apr 10 '14 at 8:50

Just changed that... it is the path to your data directory.

– FantasticJamieBurns
Apr 10 '14 at 8:50

1

Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

– user8909
Apr 10 '14 at 14:10

Thanks Jamie! What is '- mysql' doing here? Is that your filename?

– codecowboy
Apr 10 '14 at 8:50

Just changed that... it is the path to your data directory.

– FantasticJamieBurns
Apr 10 '14 at 8:50

Not that it really matters, but using the argument combination -f - to tar is redundant, since you can simply leave out the -f argument altogether to write the result to stdout (i.e. tar -c data_dir).

– user8909
Apr 10 '14 at 14:10

add a comment |

The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.

You could mitigate the problem by

compressing on the fly by using the z or j options to tar

doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.

answered Apr 10 '14 at 8:16

Flup

6,05912044

add a comment |

The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.

You could mitigate the problem by

compressing on the fly by using the z or j options to tar

doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.

answered Apr 10 '14 at 8:16

Flup

6,05912044

add a comment |

The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.

You could mitigate the problem by

compressing on the fly by using the z or j options to tar

doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.

answered Apr 10 '14 at 8:16

Flup

6,05912044

The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.

You could mitigate the problem by

compressing on the fly by using the z or j options to tar

doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.

answered Apr 10 '14 at 8:16

Flup

6,05912044

answered Apr 10 '14 at 8:16

Flup

6,05912044

answered Apr 10 '14 at 8:16

Flup

6,05912044

answered Apr 10 '14 at 8:16

Flup

6,05912044

add a comment |

tar itself can report on the size of its archives with the --test option:

tar -cf - ./* | tar --totals -tvf -

OUTPUT:

...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)

Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:

ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar

Or to simply copy with tar:

ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -

edited Apr 10 '14 at 9:59

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

– codecowboy
Apr 10 '14 at 8:19

@codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

– mikeserv
Apr 10 '14 at 8:24

@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

– Gilles
Apr 10 '14 at 21:56

-i right, sorry!

– goldilocks
Apr 11 '14 at 12:29

@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

– codecowboy
Apr 30 '14 at 10:46

|
show 1 more comment

tar itself can report on the size of its archives with the --test option:

tar -cf - ./* | tar --totals -tvf -

OUTPUT:

...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)

Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:

ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar

Or to simply copy with tar:

ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -

edited Apr 10 '14 at 9:59

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

– codecowboy
Apr 10 '14 at 8:19

@codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

– mikeserv
Apr 10 '14 at 8:24

@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

– Gilles
Apr 10 '14 at 21:56

-i right, sorry!

– goldilocks
Apr 11 '14 at 12:29

@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

– codecowboy
Apr 30 '14 at 10:46

|
show 1 more comment

tar itself can report on the size of its archives with the --test option:

tar -cf - ./* | tar --totals -tvf -

OUTPUT:

...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)

Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:

ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar

Or to simply copy with tar:

ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -

edited Apr 10 '14 at 9:59

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

tar itself can report on the size of its archives with the --test option:

tar -cf - ./* | tar --totals -tvf -

OUTPUT:

...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)

Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:

ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar

Or to simply copy with tar:

ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -

edited Apr 10 '14 at 9:59

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

edited Apr 10 '14 at 9:59

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

answered Apr 10 '14 at 8:17

mikeserv

45.7k668159

The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

– codecowboy
Apr 10 '14 at 8:19

@codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

– mikeserv
Apr 10 '14 at 8:24

@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

– Gilles
Apr 10 '14 at 21:56

-i right, sorry!

– goldilocks
Apr 11 '14 at 12:29

@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

– codecowboy
Apr 30 '14 at 10:46

|
show 1 more comment

The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

– codecowboy
Apr 10 '14 at 8:19

@codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

– mikeserv
Apr 10 '14 at 8:24

@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

– Gilles
Apr 10 '14 at 21:56

-i right, sorry!

– goldilocks
Apr 11 '14 at 12:29

@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

– codecowboy
Apr 30 '14 at 10:46

The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space

– codecowboy
Apr 10 '14 at 8:19

@codecowboy In that case, you should definitely do something like the above first. It will tar then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z for compression as goldilocks points out, to save on bandwidth mid-transfer.

– mikeserv
Apr 10 '14 at 8:24

@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.

– Gilles
Apr 10 '14 at 21:56

-i right, sorry!

– goldilocks
Apr 11 '14 at 12:29

@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)

– codecowboy
Apr 30 '14 at 10:46

|
show 1 more comment

I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.

tar -tvOf afile.tar | wc -c

du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))

then you have have to add all of the characters. For something that looks like this:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))

I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.

edited Apr 7 '16 at 21:34

slm♦

251k69529685

answered Apr 7 '16 at 21:28

tass6773

211

add a comment |

I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.

tar -tvOf afile.tar | wc -c

du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))

then you have have to add all of the characters. For something that looks like this:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))

I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.

edited Apr 7 '16 at 21:34

slm♦

251k69529685

answered Apr 7 '16 at 21:28

tass6773

211

add a comment |

I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.

tar -tvOf afile.tar | wc -c

du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))

then you have have to add all of the characters. For something that looks like this:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))

I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.

edited Apr 7 '16 at 21:34

slm♦

251k69529685

answered Apr 7 '16 at 21:28

tass6773

211

I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir.

tar -tvOf afile.tar | wc -c

du counts every directory as 4096 bytes, and tar counts directories as 0 bytes. You have to add 4096 to each directory:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))

then you have have to add all of the characters. For something that looks like this:

$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))

I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.

edited Apr 7 '16 at 21:34

slm♦

251k69529685

answered Apr 7 '16 at 21:28

tass6773

211

edited Apr 7 '16 at 21:34

slm♦

251k69529685

edited Apr 7 '16 at 21:34

slm♦

251k69529685

edited Apr 7 '16 at 21:34

slm♦

251k69529685

answered Apr 7 '16 at 21:28

tass6773

211

answered Apr 7 '16 at 21:28

tass6773

211

answered Apr 7 '16 at 21:28

tass6773

211

add a comment |

an easily downloadable copy

edited Apr 10 '14 at 8:26

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

add a comment |

an easily downloadable copy

edited Apr 10 '14 at 8:26

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

add a comment |

an easily downloadable copy

edited Apr 10 '14 at 8:26

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

an easily downloadable copy

edited Apr 10 '14 at 8:26

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

edited Apr 10 '14 at 8:26

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

answered Apr 10 '14 at 8:17

goldilocks

62.3k14152210

add a comment |

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>

#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)

 printf("%" PRIu64 "n", total);


int close(int fd)

 if(! original_close)
 
 original_close = dlsym(RTLD_NEXT, "close");
 
 if(fd == OUT_FD)
 
 print_total();
 
 return original_close(fd);


ssize_t read(int fd, void *buf, size_t count)

 return count;


ssize_t write(int fd, const void *buf, size_t count)

 if(!original_write)
 
 original_write = dlsym(RTLD_NEXT, "write");
 
 if(fd == OUT_FD)
 
 total += count;
 return count;
 
 return original_write(fd, buf, count);

Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.

$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s

tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s

The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize

Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/

edited Jan 30 at 1:59

answered Jan 29 at 23:00

G4Vi

Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.

– G-Man
Jan 30 at 0:00

add a comment |

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>

#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)

 printf("%" PRIu64 "n", total);


int close(int fd)

 if(! original_close)
 
 original_close = dlsym(RTLD_NEXT, "close");
 
 if(fd == OUT_FD)
 
 print_total();
 
 return original_close(fd);


ssize_t read(int fd, void *buf, size_t count)

 return count;


ssize_t write(int fd, const void *buf, size_t count)

 if(!original_write)
 
 original_write = dlsym(RTLD_NEXT, "write");
 
 if(fd == OUT_FD)
 
 total += count;
 return count;
 
 return original_write(fd, buf, count);

Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.

$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s

tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s

The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize

Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/

edited Jan 30 at 1:59

answered Jan 29 at 23:00

G4Vi

Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.

– G-Man
Jan 30 at 0:00

add a comment |

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>

#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)

 printf("%" PRIu64 "n", total);


int close(int fd)

 if(! original_close)
 
 original_close = dlsym(RTLD_NEXT, "close");
 
 if(fd == OUT_FD)
 
 print_total();
 
 return original_close(fd);


ssize_t read(int fd, void *buf, size_t count)

 return count;


ssize_t write(int fd, const void *buf, size_t count)

 if(!original_write)
 
 original_write = dlsym(RTLD_NEXT, "write");
 
 if(fd == OUT_FD)
 
 total += count;
 return count;
 
 return original_write(fd, buf, count);

Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.

$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s

tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s

The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize

Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/

edited Jan 30 at 1:59

answered Jan 29 at 23:00

G4Vi

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>

#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)

 printf("%" PRIu64 "n", total);


int close(int fd)

 if(! original_close)
 
 original_close = dlsym(RTLD_NEXT, "close");
 
 if(fd == OUT_FD)
 
 print_total();
 
 return original_close(fd);


ssize_t read(int fd, void *buf, size_t count)

 return count;


ssize_t write(int fd, const void *buf, size_t count)

 if(!original_write)
 
 original_write = dlsym(RTLD_NEXT, "write");
 
 if(fd == OUT_FD)
 
 total += count;
 return count;
 
 return original_write(fd, buf, count);

Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD solution.

$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s

tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s

The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize

Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/

edited Jan 30 at 1:59

answered Jan 29 at 23:00

G4Vi

edited Jan 30 at 1:59

answered Jan 29 at 23:00

G4Vi

answered Jan 29 at 23:00

G4Vi

answered Jan 29 at 23:00

G4Vi

Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.

– G-Man
Jan 30 at 0:00

add a comment |

Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.

– G-Man
Jan 30 at 0:00

Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.

– G-Man
Jan 30 at 0:00

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu