tar takes a long time before passing data to gzip
Clash Royale CLAN TAG#URR8PPP
What I want to know, is what is tar doing at the start, before it starts passing data on to gzip? Can I make it skip that step?
I'm writing a script to run on my Synology NAS box (running DSM 6.2.1-23824 Update 1, with tar version 1.28) to compress copies of virtual machine hdd images. The source files are stored as sparse files on a btrfs filesystem. I'm looking for a little bit of compression, preferably keeping the sparseness, and as much speed as it can.
While I am working with only 1 file at a time the reason for using tar in the first place is to use its --sparse
flag, as gzip cannot unzip a file as a sparse file. The central command I'm trying to run is:
GZIP=-1 nice -n 19 tar --keep-old-files --sparse -czf $destDir/$vmFolder/$file.tar.gz $file 2>>$log
However with the size of the HDD images (ranging from 2GB to 120GB), there are many minutes when tar starts, it is furiously reading the source as fast as it can, but gzip is not being given anything to work with. The length of time this goes on for scales with the size of the source file.
Things I've tried to work around the issue:
- If I just use gzip the output starts straight away, but I lose the sparse info.
If I use pipes, as below, it does the same thing.
nice -n 19 tar --keep-old-files --sparse -cf - $file | nice -n 19 gzip --fast > $destDir/$vmFolder/$file.tar.gz 2>>$log
Admittedly the NAS box only has an Intel Atom D2700, but the tar operation shouldn't be CPU intensive. I can appreciate that gzip is cpu intensive and this will be a limiting factor, particularly with an old Atom CPU. I was hoping to use lz4
or lzop
but the Synology OS doesn't seem to have them, just gzip, 7z, and xz.
Note that as part of the script it can run as many of these commands in parallel as I like using this semaphore script as a template to utilise all cores of the CPU even with single threaded gzip.
Edit: Testing my script without the --sparse
option, but still using tar
, does not have this problem, and the data immediately flows through to gzip.
tar btrfs sparse-files
add a comment |
What I want to know, is what is tar doing at the start, before it starts passing data on to gzip? Can I make it skip that step?
I'm writing a script to run on my Synology NAS box (running DSM 6.2.1-23824 Update 1, with tar version 1.28) to compress copies of virtual machine hdd images. The source files are stored as sparse files on a btrfs filesystem. I'm looking for a little bit of compression, preferably keeping the sparseness, and as much speed as it can.
While I am working with only 1 file at a time the reason for using tar in the first place is to use its --sparse
flag, as gzip cannot unzip a file as a sparse file. The central command I'm trying to run is:
GZIP=-1 nice -n 19 tar --keep-old-files --sparse -czf $destDir/$vmFolder/$file.tar.gz $file 2>>$log
However with the size of the HDD images (ranging from 2GB to 120GB), there are many minutes when tar starts, it is furiously reading the source as fast as it can, but gzip is not being given anything to work with. The length of time this goes on for scales with the size of the source file.
Things I've tried to work around the issue:
- If I just use gzip the output starts straight away, but I lose the sparse info.
If I use pipes, as below, it does the same thing.
nice -n 19 tar --keep-old-files --sparse -cf - $file | nice -n 19 gzip --fast > $destDir/$vmFolder/$file.tar.gz 2>>$log
Admittedly the NAS box only has an Intel Atom D2700, but the tar operation shouldn't be CPU intensive. I can appreciate that gzip is cpu intensive and this will be a limiting factor, particularly with an old Atom CPU. I was hoping to use lz4
or lzop
but the Synology OS doesn't seem to have them, just gzip, 7z, and xz.
Note that as part of the script it can run as many of these commands in parallel as I like using this semaphore script as a template to utilise all cores of the CPU even with single threaded gzip.
Edit: Testing my script without the --sparse
option, but still using tar
, does not have this problem, and the data immediately flows through to gzip.
tar btrfs sparse-files
1
tar
has to determine whether a file is sparse before archiving it. This means reading the entire file to find the non-sparse bits. This could potentially take time depending on the size of the source files. I'm not writing this as an answer since it's just handwaving and I have nothing to test on.
– Kusalananda
Dec 14 at 7:54
@Kusalananda You may be on the right track. From the manual on Debian 9: "--sparse
When given this option,tar
attempts to determine if the file is sparse prior to archiving it".
– Kamil Maciorowski
Dec 14 at 8:11
@Kusalananda if you are on a historic platform, you are right. Those who use modern platforms make use of the lseek() SEEK_HOLE feature since summer 2005. I am however not shure whethergtar
already arrived in the presence.star
uses the SEEK_HOLE feature since summer 2005.
– schily
Dec 14 at 15:21
@schily GNUtar
will useSEEK_HOLE
if thelseek()
implementation on the host supports it, according to thegtar
source code.
– Kusalananda
Dec 14 at 15:24
add a comment |
What I want to know, is what is tar doing at the start, before it starts passing data on to gzip? Can I make it skip that step?
I'm writing a script to run on my Synology NAS box (running DSM 6.2.1-23824 Update 1, with tar version 1.28) to compress copies of virtual machine hdd images. The source files are stored as sparse files on a btrfs filesystem. I'm looking for a little bit of compression, preferably keeping the sparseness, and as much speed as it can.
While I am working with only 1 file at a time the reason for using tar in the first place is to use its --sparse
flag, as gzip cannot unzip a file as a sparse file. The central command I'm trying to run is:
GZIP=-1 nice -n 19 tar --keep-old-files --sparse -czf $destDir/$vmFolder/$file.tar.gz $file 2>>$log
However with the size of the HDD images (ranging from 2GB to 120GB), there are many minutes when tar starts, it is furiously reading the source as fast as it can, but gzip is not being given anything to work with. The length of time this goes on for scales with the size of the source file.
Things I've tried to work around the issue:
- If I just use gzip the output starts straight away, but I lose the sparse info.
If I use pipes, as below, it does the same thing.
nice -n 19 tar --keep-old-files --sparse -cf - $file | nice -n 19 gzip --fast > $destDir/$vmFolder/$file.tar.gz 2>>$log
Admittedly the NAS box only has an Intel Atom D2700, but the tar operation shouldn't be CPU intensive. I can appreciate that gzip is cpu intensive and this will be a limiting factor, particularly with an old Atom CPU. I was hoping to use lz4
or lzop
but the Synology OS doesn't seem to have them, just gzip, 7z, and xz.
Note that as part of the script it can run as many of these commands in parallel as I like using this semaphore script as a template to utilise all cores of the CPU even with single threaded gzip.
Edit: Testing my script without the --sparse
option, but still using tar
, does not have this problem, and the data immediately flows through to gzip.
tar btrfs sparse-files
What I want to know, is what is tar doing at the start, before it starts passing data on to gzip? Can I make it skip that step?
I'm writing a script to run on my Synology NAS box (running DSM 6.2.1-23824 Update 1, with tar version 1.28) to compress copies of virtual machine hdd images. The source files are stored as sparse files on a btrfs filesystem. I'm looking for a little bit of compression, preferably keeping the sparseness, and as much speed as it can.
While I am working with only 1 file at a time the reason for using tar in the first place is to use its --sparse
flag, as gzip cannot unzip a file as a sparse file. The central command I'm trying to run is:
GZIP=-1 nice -n 19 tar --keep-old-files --sparse -czf $destDir/$vmFolder/$file.tar.gz $file 2>>$log
However with the size of the HDD images (ranging from 2GB to 120GB), there are many minutes when tar starts, it is furiously reading the source as fast as it can, but gzip is not being given anything to work with. The length of time this goes on for scales with the size of the source file.
Things I've tried to work around the issue:
- If I just use gzip the output starts straight away, but I lose the sparse info.
If I use pipes, as below, it does the same thing.
nice -n 19 tar --keep-old-files --sparse -cf - $file | nice -n 19 gzip --fast > $destDir/$vmFolder/$file.tar.gz 2>>$log
Admittedly the NAS box only has an Intel Atom D2700, but the tar operation shouldn't be CPU intensive. I can appreciate that gzip is cpu intensive and this will be a limiting factor, particularly with an old Atom CPU. I was hoping to use lz4
or lzop
but the Synology OS doesn't seem to have them, just gzip, 7z, and xz.
Note that as part of the script it can run as many of these commands in parallel as I like using this semaphore script as a template to utilise all cores of the CPU even with single threaded gzip.
Edit: Testing my script without the --sparse
option, but still using tar
, does not have this problem, and the data immediately flows through to gzip.
tar btrfs sparse-files
tar btrfs sparse-files
edited Dec 14 at 9:01
asked Dec 14 at 4:53
BeowulfNode42
3761411
3761411
1
tar
has to determine whether a file is sparse before archiving it. This means reading the entire file to find the non-sparse bits. This could potentially take time depending on the size of the source files. I'm not writing this as an answer since it's just handwaving and I have nothing to test on.
– Kusalananda
Dec 14 at 7:54
@Kusalananda You may be on the right track. From the manual on Debian 9: "--sparse
When given this option,tar
attempts to determine if the file is sparse prior to archiving it".
– Kamil Maciorowski
Dec 14 at 8:11
@Kusalananda if you are on a historic platform, you are right. Those who use modern platforms make use of the lseek() SEEK_HOLE feature since summer 2005. I am however not shure whethergtar
already arrived in the presence.star
uses the SEEK_HOLE feature since summer 2005.
– schily
Dec 14 at 15:21
@schily GNUtar
will useSEEK_HOLE
if thelseek()
implementation on the host supports it, according to thegtar
source code.
– Kusalananda
Dec 14 at 15:24
add a comment |
1
tar
has to determine whether a file is sparse before archiving it. This means reading the entire file to find the non-sparse bits. This could potentially take time depending on the size of the source files. I'm not writing this as an answer since it's just handwaving and I have nothing to test on.
– Kusalananda
Dec 14 at 7:54
@Kusalananda You may be on the right track. From the manual on Debian 9: "--sparse
When given this option,tar
attempts to determine if the file is sparse prior to archiving it".
– Kamil Maciorowski
Dec 14 at 8:11
@Kusalananda if you are on a historic platform, you are right. Those who use modern platforms make use of the lseek() SEEK_HOLE feature since summer 2005. I am however not shure whethergtar
already arrived in the presence.star
uses the SEEK_HOLE feature since summer 2005.
– schily
Dec 14 at 15:21
@schily GNUtar
will useSEEK_HOLE
if thelseek()
implementation on the host supports it, according to thegtar
source code.
– Kusalananda
Dec 14 at 15:24
1
1
tar
has to determine whether a file is sparse before archiving it. This means reading the entire file to find the non-sparse bits. This could potentially take time depending on the size of the source files. I'm not writing this as an answer since it's just handwaving and I have nothing to test on.– Kusalananda
Dec 14 at 7:54
tar
has to determine whether a file is sparse before archiving it. This means reading the entire file to find the non-sparse bits. This could potentially take time depending on the size of the source files. I'm not writing this as an answer since it's just handwaving and I have nothing to test on.– Kusalananda
Dec 14 at 7:54
@Kusalananda You may be on the right track. From the manual on Debian 9: "
--sparse
When given this option, tar
attempts to determine if the file is sparse prior to archiving it".– Kamil Maciorowski
Dec 14 at 8:11
@Kusalananda You may be on the right track. From the manual on Debian 9: "
--sparse
When given this option, tar
attempts to determine if the file is sparse prior to archiving it".– Kamil Maciorowski
Dec 14 at 8:11
@Kusalananda if you are on a historic platform, you are right. Those who use modern platforms make use of the lseek() SEEK_HOLE feature since summer 2005. I am however not shure whether
gtar
already arrived in the presence. star
uses the SEEK_HOLE feature since summer 2005.– schily
Dec 14 at 15:21
@Kusalananda if you are on a historic platform, you are right. Those who use modern platforms make use of the lseek() SEEK_HOLE feature since summer 2005. I am however not shure whether
gtar
already arrived in the presence. star
uses the SEEK_HOLE feature since summer 2005.– schily
Dec 14 at 15:21
@schily GNU
tar
will use SEEK_HOLE
if the lseek()
implementation on the host supports it, according to the gtar
source code.– Kusalananda
Dec 14 at 15:24
@schily GNU
tar
will use SEEK_HOLE
if the lseek()
implementation on the host supports it, according to the gtar
source code.– Kusalananda
Dec 14 at 15:24
add a comment |
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f487911%2ftar-takes-a-long-time-before-passing-data-to-gzip%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f487911%2ftar-takes-a-long-time-before-passing-data-to-gzip%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
tar
has to determine whether a file is sparse before archiving it. This means reading the entire file to find the non-sparse bits. This could potentially take time depending on the size of the source files. I'm not writing this as an answer since it's just handwaving and I have nothing to test on.– Kusalananda
Dec 14 at 7:54
@Kusalananda You may be on the right track. From the manual on Debian 9: "
--sparse
When given this option,tar
attempts to determine if the file is sparse prior to archiving it".– Kamil Maciorowski
Dec 14 at 8:11
@Kusalananda if you are on a historic platform, you are right. Those who use modern platforms make use of the lseek() SEEK_HOLE feature since summer 2005. I am however not shure whether
gtar
already arrived in the presence.star
uses the SEEK_HOLE feature since summer 2005.– schily
Dec 14 at 15:21
@schily GNU
tar
will useSEEK_HOLE
if thelseek()
implementation on the host supports it, according to thegtar
source code.– Kusalananda
Dec 14 at 15:24