How can I determine if running tar will cause disk to fill up
Clash Royale CLAN TAG#URR8PPP
If I run tar -cvf
on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h
output:
/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm
Related questions:
- If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or
tar
be
doing under the hood? - How can I accurately determine this information myself without
asking again?
tar disk-usage
add a comment |
If I run tar -cvf
on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h
output:
/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm
Related questions:
- If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or
tar
be
doing under the hood? - How can I accurately determine this information myself without
asking again?
tar disk-usage
I'm not sure if it's possible without processing the archive, but you can play around with--totals
option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go throughtar --help
.
– UVV
Apr 10 '14 at 8:02
4
Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.
– Ulrich Schwarz
Apr 10 '14 at 9:01
add a comment |
If I run tar -cvf
on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h
output:
/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm
Related questions:
- If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or
tar
be
doing under the hood? - How can I accurately determine this information myself without
asking again?
tar disk-usage
If I run tar -cvf
on a directory of size 937MB to create an easily downloadable copy of a deeply nested folder structure, do I risk filling the disk given the following df -h
output:
/dev/xvda1 7.9G 3.6G 4.3G 46% /
tmpfs 298M 0 298M 0% /dev/shm
Related questions:
- If the disk might fill up, why i.e. what will Linux (Amazon AMI) and/or
tar
be
doing under the hood? - How can I accurately determine this information myself without
asking again?
tar disk-usage
tar disk-usage
edited Apr 10 '14 at 21:55
Gilles
538k12810881605
538k12810881605
asked Apr 10 '14 at 7:53
codecowboycodecowboy
1,03551328
1,03551328
I'm not sure if it's possible without processing the archive, but you can play around with--totals
option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go throughtar --help
.
– UVV
Apr 10 '14 at 8:02
4
Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.
– Ulrich Schwarz
Apr 10 '14 at 9:01
add a comment |
I'm not sure if it's possible without processing the archive, but you can play around with--totals
option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go throughtar --help
.
– UVV
Apr 10 '14 at 8:02
4
Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.
– Ulrich Schwarz
Apr 10 '14 at 9:01
I'm not sure if it's possible without processing the archive, but you can play around with
--totals
option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help
.– UVV
Apr 10 '14 at 8:02
I'm not sure if it's possible without processing the archive, but you can play around with
--totals
option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go through tar --help
.– UVV
Apr 10 '14 at 8:02
4
4
Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.
– Ulrich Schwarz
Apr 10 '14 at 9:01
Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.
– Ulrich Schwarz
Apr 10 '14 at 9:01
add a comment |
6 Answers
6
active
oldest
votes
tar -c data_dir | wc -c
without compression
or
tar -cz data_dir | wc -c
with gzip compression
or
tar -cj data_dir | wc -c
with bzip2 compression
will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.
You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:
du -h --max-depth=1 data_dir
As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.
Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!
https://en.wikipedia.org/wiki/Tar_(computing)#Format_details
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
1
Not that it really matters, but using the argument combination-f -
to tar is redundant, since you can simply leave out the-f
argument altogether to write the result to stdout (i.e.tar -c data_dir
).
– user8909
Apr 10 '14 at 14:10
add a comment |
The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.
A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.
You could mitigate the problem by
- compressing on the fly by using the
z
orj
options totar
- doing the
tar
as a normal user so that the reserved space on the/
partition won't be touched if you run out of space.
add a comment |
tar
itself can report on the size of its archives with the --test
option:
tar -cf - ./* | tar --totals -tvf -
The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz
operands to either side of the |pipe
will handle compression as you will.
OUTPUT:
...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)
Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:
ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar
Or to simply copy with tar
:
ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
@codecowboy In that case, you should definitely do something like the above first. It willtar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add-z
for compression as goldilocks points out, to save on bandwidth mid-transfer.
– mikeserv
Apr 10 '14 at 8:24
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
-i
right, sorry!
– goldilocks
Apr 11 '14 at 12:29
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
|
show 1 more comment
I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir
.
tar -tvOf afile.tar | wc -c
du
counts every directory as 4096 bytes, and tar
counts directories as 0 bytes. You have to add 4096 to each directory:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))
then you have have to add all of the characters. For something that looks like this:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))
I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.
add a comment |
-cvf
does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.
an easily downloadable copy
Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2
should now-a-days be available on any system w/ tar, I think, so including j
in your switches is probably the best choice. z
(gzip
) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.
If you mean, does tar
use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).
add a comment |
If speed is important and compression is not needed, you can hook the syscall wrappers used by tar
using LD_PRELOAD
, to change tar
to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read
and write
that is performed in normal operation of tar
. This makes tar
much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat
of the requested input file/folder(s) needs to be read from disk instead of the actual file data.
The code below includes implementations of the close
, read
, and write
POSIX functions. The macro OUT_FD
controls which file descriptor we expect tar
to use as the output file. Currently it is set to stdout.
read
was changed to just return the success value of count
bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.
write
was changed to sum the input count
bytes into the global variable total
and return the success value of count
bytes only if file descriptor matches OUT_FD
, otherwise it calls the original wrapper acquired via dlsym
to perform the syscall of the same name.
close
still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar
is done attempting to write a tar file, so the total
number is final and it prints it to stdout.
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>
#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)
printf("%" PRIu64 "n", total);
int close(int fd)
if(! original_close)
original_close = dlsym(RTLD_NEXT, "close");
if(fd == OUT_FD)
print_total();
return original_close(fd);
ssize_t read(int fd, void *buf, size_t count)
return count;
ssize_t write(int fd, const void *buf, size_t count)
if(!original_write)
original_write = dlsym(RTLD_NEXT, "write");
if(fd == OUT_FD)
total += count;
return count;
return original_write(fd, buf, count);
Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD
solution.
$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s
tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s
The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD
technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize
Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f124052%2fhow-can-i-determine-if-running-tar-will-cause-disk-to-fill-up%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
tar -c data_dir | wc -c
without compression
or
tar -cz data_dir | wc -c
with gzip compression
or
tar -cj data_dir | wc -c
with bzip2 compression
will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.
You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:
du -h --max-depth=1 data_dir
As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.
Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!
https://en.wikipedia.org/wiki/Tar_(computing)#Format_details
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
1
Not that it really matters, but using the argument combination-f -
to tar is redundant, since you can simply leave out the-f
argument altogether to write the result to stdout (i.e.tar -c data_dir
).
– user8909
Apr 10 '14 at 14:10
add a comment |
tar -c data_dir | wc -c
without compression
or
tar -cz data_dir | wc -c
with gzip compression
or
tar -cj data_dir | wc -c
with bzip2 compression
will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.
You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:
du -h --max-depth=1 data_dir
As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.
Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!
https://en.wikipedia.org/wiki/Tar_(computing)#Format_details
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
1
Not that it really matters, but using the argument combination-f -
to tar is redundant, since you can simply leave out the-f
argument altogether to write the result to stdout (i.e.tar -c data_dir
).
– user8909
Apr 10 '14 at 14:10
add a comment |
tar -c data_dir | wc -c
without compression
or
tar -cz data_dir | wc -c
with gzip compression
or
tar -cj data_dir | wc -c
with bzip2 compression
will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.
You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:
du -h --max-depth=1 data_dir
As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.
Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!
https://en.wikipedia.org/wiki/Tar_(computing)#Format_details
tar -c data_dir | wc -c
without compression
or
tar -cz data_dir | wc -c
with gzip compression
or
tar -cj data_dir | wc -c
with bzip2 compression
will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.
You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:
du -h --max-depth=1 data_dir
As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.
Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!
https://en.wikipedia.org/wiki/Tar_(computing)#Format_details
edited Apr 10 '14 at 20:10
Andrew Medico
1486
1486
answered Apr 10 '14 at 8:27
FantasticJamieBurnsFantasticJamieBurns
34614
34614
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
1
Not that it really matters, but using the argument combination-f -
to tar is redundant, since you can simply leave out the-f
argument altogether to write the result to stdout (i.e.tar -c data_dir
).
– user8909
Apr 10 '14 at 14:10
add a comment |
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
1
Not that it really matters, but using the argument combination-f -
to tar is redundant, since you can simply leave out the-f
argument altogether to write the result to stdout (i.e.tar -c data_dir
).
– user8909
Apr 10 '14 at 14:10
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Thanks Jamie! What is '- mysql' doing here? Is that your filename?
– codecowboy
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
Just changed that... it is the path to your data directory.
– FantasticJamieBurns
Apr 10 '14 at 8:50
1
1
Not that it really matters, but using the argument combination
-f -
to tar is redundant, since you can simply leave out the -f
argument altogether to write the result to stdout (i.e. tar -c data_dir
).– user8909
Apr 10 '14 at 14:10
Not that it really matters, but using the argument combination
-f -
to tar is redundant, since you can simply leave out the -f
argument altogether to write the result to stdout (i.e. tar -c data_dir
).– user8909
Apr 10 '14 at 14:10
add a comment |
The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.
A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.
You could mitigate the problem by
- compressing on the fly by using the
z
orj
options totar
- doing the
tar
as a normal user so that the reserved space on the/
partition won't be touched if you run out of space.
add a comment |
The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.
A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.
You could mitigate the problem by
- compressing on the fly by using the
z
orj
options totar
- doing the
tar
as a normal user so that the reserved space on the/
partition won't be touched if you run out of space.
add a comment |
The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.
A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.
You could mitigate the problem by
- compressing on the fly by using the
z
orj
options totar
- doing the
tar
as a normal user so that the reserved space on the/
partition won't be touched if you run out of space.
The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.
A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.
You could mitigate the problem by
- compressing on the fly by using the
z
orj
options totar
- doing the
tar
as a normal user so that the reserved space on the/
partition won't be touched if you run out of space.
answered Apr 10 '14 at 8:16
FlupFlup
6,05912044
6,05912044
add a comment |
add a comment |
tar
itself can report on the size of its archives with the --test
option:
tar -cf - ./* | tar --totals -tvf -
The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz
operands to either side of the |pipe
will handle compression as you will.
OUTPUT:
...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)
Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:
ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar
Or to simply copy with tar
:
ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
@codecowboy In that case, you should definitely do something like the above first. It willtar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add-z
for compression as goldilocks points out, to save on bandwidth mid-transfer.
– mikeserv
Apr 10 '14 at 8:24
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
-i
right, sorry!
– goldilocks
Apr 11 '14 at 12:29
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
|
show 1 more comment
tar
itself can report on the size of its archives with the --test
option:
tar -cf - ./* | tar --totals -tvf -
The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz
operands to either side of the |pipe
will handle compression as you will.
OUTPUT:
...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)
Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:
ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar
Or to simply copy with tar
:
ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
@codecowboy In that case, you should definitely do something like the above first. It willtar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add-z
for compression as goldilocks points out, to save on bandwidth mid-transfer.
– mikeserv
Apr 10 '14 at 8:24
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
-i
right, sorry!
– goldilocks
Apr 11 '14 at 12:29
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
|
show 1 more comment
tar
itself can report on the size of its archives with the --test
option:
tar -cf - ./* | tar --totals -tvf -
The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz
operands to either side of the |pipe
will handle compression as you will.
OUTPUT:
...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)
Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:
ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar
Or to simply copy with tar
:
ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -
tar
itself can report on the size of its archives with the --test
option:
tar -cf - ./* | tar --totals -tvf -
The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz
operands to either side of the |pipe
will handle compression as you will.
OUTPUT:
...
-rwxr-xr-x mikeserv/mikeserv 8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv 62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv 574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv 35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv 51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv 0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv 0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv 159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)
Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:
ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar
Or to simply copy with tar
:
ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -
edited Apr 10 '14 at 9:59
answered Apr 10 '14 at 8:17
mikeservmikeserv
45.7k668159
45.7k668159
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
@codecowboy In that case, you should definitely do something like the above first. It willtar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add-z
for compression as goldilocks points out, to save on bandwidth mid-transfer.
– mikeserv
Apr 10 '14 at 8:24
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
-i
right, sorry!
– goldilocks
Apr 11 '14 at 12:29
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
|
show 1 more comment
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
@codecowboy In that case, you should definitely do something like the above first. It willtar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add-z
for compression as goldilocks points out, to save on bandwidth mid-transfer.
– mikeserv
Apr 10 '14 at 8:24
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
-i
right, sorry!
– goldilocks
Apr 11 '14 at 12:29
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
The reason I am doing this is that I believe the directory in question has caused the output of df -i to reach 99%. I want to keep a copy of the directory for further analysis but want to clear the space
– codecowboy
Apr 10 '14 at 8:19
@codecowboy In that case, you should definitely do something like the above first. It will
tar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z
for compression as goldilocks points out, to save on bandwidth mid-transfer.– mikeserv
Apr 10 '14 at 8:24
@codecowboy In that case, you should definitely do something like the above first. It will
tar
then copy the tree to your local disk in a stream without saving anything to the remote disk at all, after which you can delete it from the remote host and restore it later. You should probably add -z
for compression as goldilocks points out, to save on bandwidth mid-transfer.– mikeserv
Apr 10 '14 at 8:24
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
@TAFKA'goldilocks' No, because it's 99% of inodes, not 99% of space.
– Gilles
Apr 10 '14 at 21:56
-i
right, sorry!– goldilocks
Apr 11 '14 at 12:29
-i
right, sorry!– goldilocks
Apr 11 '14 at 12:29
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
@mikeserv your opening line mentions the --test option but you then don't seem to use it in your command which immediately follows (it uses --totals)
– codecowboy
Apr 30 '14 at 10:46
|
show 1 more comment
I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir
.
tar -tvOf afile.tar | wc -c
du
counts every directory as 4096 bytes, and tar
counts directories as 0 bytes. You have to add 4096 to each directory:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))
then you have have to add all of the characters. For something that looks like this:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))
I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.
add a comment |
I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir
.
tar -tvOf afile.tar | wc -c
du
counts every directory as 4096 bytes, and tar
counts directories as 0 bytes. You have to add 4096 to each directory:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))
then you have have to add all of the characters. For something that looks like this:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))
I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.
add a comment |
I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir
.
tar -tvOf afile.tar | wc -c
du
counts every directory as 4096 bytes, and tar
counts directories as 0 bytes. You have to add 4096 to each directory:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))
then you have have to add all of the characters. For something that looks like this:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))
I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.
I have done a lot of research on this. You can do a test on the file with a word count but it will not give you the same number number as a du -sb adir
.
tar -tvOf afile.tar | wc -c
du
counts every directory as 4096 bytes, and tar
counts directories as 0 bytes. You have to add 4096 to each directory:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096)))
then you have have to add all of the characters. For something that looks like this:
$(( $(tar -tvOf afile.tar 2>&1 | grep '^d' | wc -l) * 4096 + $(tar -xOf afile.tar | wc -c) ))
I am not sure if this is perfect since I didn't try files that have been touched (files of 0 bytes) or files that have 1 character. This should get you closer.
edited Apr 7 '16 at 21:34
slm♦
251k69529685
251k69529685
answered Apr 7 '16 at 21:28
tass6773tass6773
211
211
add a comment |
add a comment |
-cvf
does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.
an easily downloadable copy
Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2
should now-a-days be available on any system w/ tar, I think, so including j
in your switches is probably the best choice. z
(gzip
) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.
If you mean, does tar
use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).
add a comment |
-cvf
does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.
an easily downloadable copy
Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2
should now-a-days be available on any system w/ tar, I think, so including j
in your switches is probably the best choice. z
(gzip
) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.
If you mean, does tar
use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).
add a comment |
-cvf
does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.
an easily downloadable copy
Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2
should now-a-days be available on any system w/ tar, I think, so including j
in your switches is probably the best choice. z
(gzip
) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.
If you mean, does tar
use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).
-cvf
does not include any compression, so doing that on a ~1 GB folder will result in a ~1 GB tar file (Flub's answer has more details about the additional size in the tar file, but note even if there are 10,000 files this is only 5 MB). Since you have 4+ GB free, no you will not fill the partition.
an easily downloadable copy
Most people would consider "easier" synonymous with "smaller" in terms of downloading, so you should use some compression here. bzip2
should now-a-days be available on any system w/ tar, I think, so including j
in your switches is probably the best choice. z
(gzip
) is perhaps even more common, and there are other (less ubiquitous) possibilities with more squash.
If you mean, does tar
use additional disk space temporarily in performing the task, I am pretty sure it does not for a few reasons, one being it dates back to a time when tape drives were a form of primary storage, and two being it has had decades to evolve (and I am certain it is not necessary to use temporary intermediate space, even if compression is involved).
edited Apr 10 '14 at 8:26
answered Apr 10 '14 at 8:17
goldilocksgoldilocks
62.3k14152210
62.3k14152210
add a comment |
add a comment |
If speed is important and compression is not needed, you can hook the syscall wrappers used by tar
using LD_PRELOAD
, to change tar
to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read
and write
that is performed in normal operation of tar
. This makes tar
much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat
of the requested input file/folder(s) needs to be read from disk instead of the actual file data.
The code below includes implementations of the close
, read
, and write
POSIX functions. The macro OUT_FD
controls which file descriptor we expect tar
to use as the output file. Currently it is set to stdout.
read
was changed to just return the success value of count
bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.
write
was changed to sum the input count
bytes into the global variable total
and return the success value of count
bytes only if file descriptor matches OUT_FD
, otherwise it calls the original wrapper acquired via dlsym
to perform the syscall of the same name.
close
still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar
is done attempting to write a tar file, so the total
number is final and it prints it to stdout.
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>
#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)
printf("%" PRIu64 "n", total);
int close(int fd)
if(! original_close)
original_close = dlsym(RTLD_NEXT, "close");
if(fd == OUT_FD)
print_total();
return original_close(fd);
ssize_t read(int fd, void *buf, size_t count)
return count;
ssize_t write(int fd, const void *buf, size_t count)
if(!original_write)
original_write = dlsym(RTLD_NEXT, "write");
if(fd == OUT_FD)
total += count;
return count;
return original_write(fd, buf, count);
Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD
solution.
$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s
tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s
The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD
technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize
Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
add a comment |
If speed is important and compression is not needed, you can hook the syscall wrappers used by tar
using LD_PRELOAD
, to change tar
to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read
and write
that is performed in normal operation of tar
. This makes tar
much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat
of the requested input file/folder(s) needs to be read from disk instead of the actual file data.
The code below includes implementations of the close
, read
, and write
POSIX functions. The macro OUT_FD
controls which file descriptor we expect tar
to use as the output file. Currently it is set to stdout.
read
was changed to just return the success value of count
bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.
write
was changed to sum the input count
bytes into the global variable total
and return the success value of count
bytes only if file descriptor matches OUT_FD
, otherwise it calls the original wrapper acquired via dlsym
to perform the syscall of the same name.
close
still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar
is done attempting to write a tar file, so the total
number is final and it prints it to stdout.
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>
#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)
printf("%" PRIu64 "n", total);
int close(int fd)
if(! original_close)
original_close = dlsym(RTLD_NEXT, "close");
if(fd == OUT_FD)
print_total();
return original_close(fd);
ssize_t read(int fd, void *buf, size_t count)
return count;
ssize_t write(int fd, const void *buf, size_t count)
if(!original_write)
original_write = dlsym(RTLD_NEXT, "write");
if(fd == OUT_FD)
total += count;
return count;
return original_write(fd, buf, count);
Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD
solution.
$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s
tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s
The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD
technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize
Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
add a comment |
If speed is important and compression is not needed, you can hook the syscall wrappers used by tar
using LD_PRELOAD
, to change tar
to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read
and write
that is performed in normal operation of tar
. This makes tar
much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat
of the requested input file/folder(s) needs to be read from disk instead of the actual file data.
The code below includes implementations of the close
, read
, and write
POSIX functions. The macro OUT_FD
controls which file descriptor we expect tar
to use as the output file. Currently it is set to stdout.
read
was changed to just return the success value of count
bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.
write
was changed to sum the input count
bytes into the global variable total
and return the success value of count
bytes only if file descriptor matches OUT_FD
, otherwise it calls the original wrapper acquired via dlsym
to perform the syscall of the same name.
close
still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar
is done attempting to write a tar file, so the total
number is final and it prints it to stdout.
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>
#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)
printf("%" PRIu64 "n", total);
int close(int fd)
if(! original_close)
original_close = dlsym(RTLD_NEXT, "close");
if(fd == OUT_FD)
print_total();
return original_close(fd);
ssize_t read(int fd, void *buf, size_t count)
return count;
ssize_t write(int fd, const void *buf, size_t count)
if(!original_write)
original_write = dlsym(RTLD_NEXT, "write");
if(fd == OUT_FD)
total += count;
return count;
return original_write(fd, buf, count);
Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD
solution.
$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s
tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s
The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD
technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize
Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/
If speed is important and compression is not needed, you can hook the syscall wrappers used by tar
using LD_PRELOAD
, to change tar
to calculate it for us. By reimplementing a few of these functions to suit our needs (calculating the size of potential output tar data), we are able eliminate a lot of read
and write
that is performed in normal operation of tar
. This makes tar
much faster as it doesn't need to context switch back and forth into the kernel anywhere near as much and only the stat
of the requested input file/folder(s) needs to be read from disk instead of the actual file data.
The code below includes implementations of the close
, read
, and write
POSIX functions. The macro OUT_FD
controls which file descriptor we expect tar
to use as the output file. Currently it is set to stdout.
read
was changed to just return the success value of count
bytes instead of filling buf with the data, given that the actual data wasn't read buf would not contain valid data for passing on to compression, and thus if compression was used we would calculate an incorrect size.
write
was changed to sum the input count
bytes into the global variable total
and return the success value of count
bytes only if file descriptor matches OUT_FD
, otherwise it calls the original wrapper acquired via dlsym
to perform the syscall of the same name.
close
still preforms all of its original functionality, but if the file descriptor matches OUT_FD, it knows that tar
is done attempting to write a tar file, so the total
number is final and it prints it to stdout.
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <errno.h>
#include <dlfcn.h>
#include <string.h>
#define OUT_FD 1
uint64_t total = 0;
ssize_t (*original_write)(int, const void *, size_t) = NULL;
int (*original_close)(int) = NULL;
void print_total(void)
printf("%" PRIu64 "n", total);
int close(int fd)
if(! original_close)
original_close = dlsym(RTLD_NEXT, "close");
if(fd == OUT_FD)
print_total();
return original_close(fd);
ssize_t read(int fd, void *buf, size_t count)
return count;
ssize_t write(int fd, const void *buf, size_t count)
if(!original_write)
original_write = dlsym(RTLD_NEXT, "write");
if(fd == OUT_FD)
total += count;
return count;
return original_write(fd, buf, count);
Benchmark comparing a solution where the read disk access and all the syscalls of normal tar operation is performed against the LD_PRELOAD
solution.
$ time tar -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/ | wc -c
332308480
real 0m0.457s
user 0m0.064s
sys 0m0.772s
tarsize$ time ./tarsize.sh -c /media/storage/music/Macintosh Plus- Floral Shoppe (2011) [Flac]/
332308480
real 0m0.016s
user 0m0.004s
sys 0m0.008s
The code above, a basic build script to build the above as a shared library, and a script with the "LD_PRELOAD
technique" using it is provided in the repo:
https://github.com/G4Vi/tarsize
Some info on using LD_PRELOAD: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/
edited Jan 30 at 1:59
answered Jan 29 at 23:00
G4ViG4Vi
12
12
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
add a comment |
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
Code is good, if it works, but can you describe what it does? Please do not respond in comments; edit your answer to make it clearer and more complete.
– G-Man
Jan 30 at 0:00
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f124052%2fhow-can-i-determine-if-running-tar-will-cause-disk-to-fill-up%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm not sure if it's possible without processing the archive, but you can play around with
--totals
option. Either way if you fill the disk up you can simply delete the archive, imho. To check all options available you could go throughtar --help
.– UVV
Apr 10 '14 at 8:02
4
Tangentially: don't create the tarfile as root, a certain percentage of space on the disk is set aside for root exclusively, exactly for the kind of "I've filled the disk and now I can't login because that would write .bash_history or whatever" situation.
– Ulrich Schwarz
Apr 10 '14 at 9:01