Triple compression and I only save 1% in space?

up vote
2
down vote

favorite

I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.

So I decided to compress them.

First I used tar:

tar -zcf folder.tar folder

Then gzip

gzip folder

And finally, for good measure, just in case, bzip2

bzip2 folder

And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!

Have I done something wrong here? I would expect many times more than a 1% saving!

How else can I compress the files?

edited Apr 10 '13 at 17:12

asked Apr 10 '13 at 17:03

ACarter

2091410

28

think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
â€“Â Bananguin
Apr 10 '13 at 18:19

2

@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
â€“Â Dan Neely
Apr 10 '13 at 18:47

19

Note that you've only done double compression. Tar just combines the files.
â€“Â user606723
Apr 10 '13 at 19:19

2

@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
â€“Â amalloy
Apr 11 '13 at 0:18

9

@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
â€“Â Carlos CampderrÃ³s
Apr 11 '13 at 6:46

Â |Â
show 10 more comments

up vote
2
down vote

favorite

I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.

So I decided to compress them.

First I used tar:

tar -zcf folder.tar folder

Then gzip

gzip folder

And finally, for good measure, just in case, bzip2

bzip2 folder

And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!

Have I done something wrong here? I would expect many times more than a 1% saving!

How else can I compress the files?

edited Apr 10 '13 at 17:12

asked Apr 10 '13 at 17:03

ACarter

2091410

28

think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
â€“Â Bananguin
Apr 10 '13 at 18:19

2

@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
â€“Â Dan Neely
Apr 10 '13 at 18:47

19

Note that you've only done double compression. Tar just combines the files.
â€“Â user606723
Apr 10 '13 at 19:19

2

@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
â€“Â amalloy
Apr 11 '13 at 0:18

9

@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
â€“Â Carlos CampderrÃ³s
Apr 11 '13 at 6:46

Â |Â
show 10 more comments

up vote
2
down vote

favorite

I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.

So I decided to compress them.

First I used tar:

tar -zcf folder.tar folder

Then gzip

gzip folder

And finally, for good measure, just in case, bzip2

bzip2 folder

And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!

Have I done something wrong here? I would expect many times more than a 1% saving!

How else can I compress the files?

edited Apr 10 '13 at 17:12

asked Apr 10 '13 at 17:03

ACarter

2091410

I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.

So I decided to compress them.

First I used tar:

tar -zcf folder.tar folder

Then gzip

gzip folder

And finally, for good measure, just in case, bzip2

bzip2 folder

And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!

Have I done something wrong here? I would expect many times more than a 1% saving!

How else can I compress the files?

edited Apr 10 '13 at 17:12

asked Apr 10 '13 at 17:03

ACarter

2091410

edited Apr 10 '13 at 17:12

asked Apr 10 '13 at 17:03

ACarter

2091410

asked Apr 10 '13 at 17:03

ACarter

2091410

asked Apr 10 '13 at 17:03

ACarter

2091410

28

think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
â€“Â Bananguin
Apr 10 '13 at 18:19

2

@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
â€“Â Dan Neely
Apr 10 '13 at 18:47

19

Note that you've only done double compression. Tar just combines the files.
â€“Â user606723
Apr 10 '13 at 19:19

2

@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
â€“Â amalloy
Apr 11 '13 at 0:18

9

@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
â€“Â Carlos CampderrÃ³s
Apr 11 '13 at 6:46

Â |Â
show 10 more comments

28

think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
â€“Â Bananguin
Apr 10 '13 at 18:19

2

@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
â€“Â Dan Neely
Apr 10 '13 at 18:47

19

Note that you've only done double compression. Tar just combines the files.
â€“Â user606723
Apr 10 '13 at 19:19

2

@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
â€“Â amalloy
Apr 11 '13 at 0:18

9

@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
â€“Â Carlos CampderrÃ³s
Apr 11 '13 at 6:46

think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
â€“Â Bananguin
Apr 10 '13 at 18:19

@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
â€“Â Dan Neely
Apr 10 '13 at 18:47

Note that you've only done double compression. Tar just combines the files.
â€“Â user606723
Apr 10 '13 at 19:19

@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
â€“Â amalloy
Apr 11 '13 at 0:18

@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
â€“Â Carlos CampderrÃ³s
Apr 11 '13 at 6:46

Â |Â
show 10 more comments

8 Answers
8

active

oldest

votes

up vote
32
down vote

accepted

Compression ratio is very dependent of what you're compressing. The reason text compresses down so well is because it doesn't even begin to fully utilize the full range of numbers representable in the same binary space. So formats that do (e.g compressed files) can store the same information in less space just by virtue of using all those binary numbers that mean nothing in textual encodings and can effectively represent whole progressions of characters in a single byte and get a good compression ratio that way.

If the files are already compressed, you're typically not going to see much advantage to compressing them again. If that actually saved you additional space it's probably an indication that the first compression algorithm kind of sucks. Judging from the nature of the question I'm going to assume a lot of these are media files and as such are already compressed (albeit with algorithms that prioritize speed of decompression) and so you're probably not going to get much from them. Sort of a blood from a stone scenario: they're already as small as they could be made without losing information.

If I'm super worried about space I just do a "bzip2 -9" and call it good. I've heard good things about the ratio on XZ though. I haven't used XZ myself (other than to decompress other people's stuff), but it's supposed to have a better ratio than bzip2 but take a little longer to compress/decompress.

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

2

Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
â€“Â OrangeDog
Apr 11 '13 at 11:49

xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
â€“Â gelraen
Apr 12 '13 at 10:12

Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
â€“Â Baard Kopperud
Apr 12 '13 at 14:52

add a commentÂ |Â

up vote
13
down vote

You compression attempts failed because your data is already highly compressed and there's not much more to gain, see the other answers for more detailed explanations. However, if you can agree on lossy compression, in contrast to lossless like you tried before, you can compress the images significantly. But since data is cut away, it can not be undone.

Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.

find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +

answered Apr 10 '13 at 17:58

Marco

24.3k580112

1

+1 for being the only answer that talks about lossy and lossless.
â€“Â Konerak
Apr 11 '13 at 8:09

add a commentÂ |Â

up vote
10
down vote

Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.

Adding more compression can actually make the result (slightly) larger, because the compression algorithm has no benefit on compressed data, and then the format (eg. gzip) has to add header and/or structure information to the output.

Sorry! If you're using pngs, you can try to shrink your files using pngcrush.

answered Apr 10 '13 at 17:12

mrb

7,17912432

add a commentÂ |Â

up vote
6
down vote

1) Many image-and video-formats are already compressed, so it's very little to gain by compressing them with some other program. This is especially true for JPEG. For very small pictures (in bytes) - or rather a large archive with many small pictures - there may quite a bit to save, but in general, JPEG-files are as compressed as they can get.

2) It's generally a bad idea to try to compress the same data repeatedly; whether it's compressing an already optimized filetype (e.g. gziping a jpeg-file), or applying a different or the same compression programs to the same file in serial (as you've done).

3) When you compress a file, you sometimes will end-up with a larger file than you originally had (use touch to make an empty file, and try to bzip2 it). It has to be that way; because else you would be able to take some data, compress it again and again until nothing was left but an empty file, and still be able to uncompress back to the original data later - but does that sound logical?

It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.

4) The best way to save data, is to find the compression-program that gives the best gain for whatever data you have (as the gain may vary depending on the data); and use only that compression-program and use it only once - but with it's best (often slowest and most resource-demanding) setting. Currently the "best" (giving most gain) compression-program is probably xzip, though bzip2 is not far behind. Make sure you select the best compression-rate.

5) For images (like jpeg) you often use "lossy" compression - ie. you loose some data (unlike when you use programs like xzip, bzip2 and gzip which are not lossy). Repeatedly JPEG-compressing an image will therefor make the image smaller each time it's used (unlike using something like bzip2 twice), but you will loose details in the image. There are also other things you can do to images - like changing the size (making it smaller) or resolution (less pixels per inch) - that'll make it "smaller", but again data will be lost.

Thus if the quality of the pictures are not that important and you absolutely want to save space, using a program like ImageMagic to batch-process all the images and making them smaller, less detailed and/or using higher jpeg-compression may save you lot of space. It will be lossy though, and your pictures will loose details.

6) A bit OT, but have you looked at stuff like thumbnails-directories - like ~/.thumbnails ? If you have many pictures in your directories and use file-browsers with picture-preview, .thumbnails may contain lots of thumbnails of pictures you've browsed through at some time. Personally I've gained lots of disk-space by routinely deleting files under various hiding-places for thumbnails...

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

add a commentÂ |Â

up vote
4
down vote

Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

add a commentÂ |Â

up vote
4
down vote

Another point worth raising: using multiple compression tools/algorithms can actually cause your final result to inflate in size and become larger than it needs to be. Meaning if you compress 100GB down to 10GB and then try to compress it again you may end up with ~15GB depending on what you are compressing and what you are compressing it with.

Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.

edited Apr 10 '13 at 18:05

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

add a commentÂ |Â

up vote
4
down vote

As a mathematician, I feel like I should chime in and elaborate a bit. The question boils down to lossy compression versus lossless compression. Image compression like jpeg is a lossy compression and zipping is lossless.

Lossy - depending on how much information you are willing to lose, you can always "compress" a file down to a smaller size no matter what but the trade off is you will permanently lose some info and when you "decompress" you will not have the original file. And yes with lossy compression, you can compress again and again and get down to 1 byte but it'll be completely useless.

Lossless - with this you will not lose any information at all and when you "decompress" you will have the original file exactly. But here the trade off is that a reduction in size is not guaranteed (easily proven using the pigeon-hole principle). So some file will decrease in size. Some will remain the same. And yes some can actually increase in size. So the lossless algorithms are designed/optimized for specific kind of data so they work at (losslessly) compressing one kind of data very well and absolutely suck at others.

This is where my computer science ignorance kicks in. I think the file zipping you are using is optimized for text, not for images so they don't help with images. The images are already (lossy) compressed and then compressing them again won't help. If you want to (lossy) compress them again you might ruin the images and lose too much info...which is like saving them as jpeg with more emphasis on size than quality.

I don't know if there is a lossless compression algorithm optimized for images but that might help you. Or maybe there is an algorithm which is optimized for specific types of images you are trying to compress like if they are black & white, contain certain color schemes, are always landscapes, or are all portraits.

answered Apr 12 '13 at 7:10

Fixed Point

26848

There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
â€“Â Bananguin
May 20 '16 at 7:57

And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
â€“Â Bananguin
May 20 '16 at 8:09

add a commentÂ |Â

up vote
1
down vote

Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.

answered Apr 10 '13 at 17:12

tink

3,79211118

1

TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
â€“Â Michael KjÃ¶rling
Apr 12 '13 at 9:00

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f71991%2ftriple-compression-and-i-only-save-1-in-space%23new-answer', 'question_page');

);

Post as a guest

Name

8 Answers
8

active

oldest

votes

8 Answers
8

active

oldest

votes

up vote
32
down vote

accepted

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

2

Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
â€“Â OrangeDog
Apr 11 '13 at 11:49

xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
â€“Â gelraen
Apr 12 '13 at 10:12

Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
â€“Â Baard Kopperud
Apr 12 '13 at 14:52

add a commentÂ |Â

up vote
32
down vote

accepted

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

2

Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
â€“Â OrangeDog
Apr 11 '13 at 11:49

xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
â€“Â gelraen
Apr 12 '13 at 10:12

Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
â€“Â Baard Kopperud
Apr 12 '13 at 14:52

add a commentÂ |Â

up vote
32
down vote

accepted

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

answered Apr 10 '13 at 17:14

Bratchley

11.7k64386

2

Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
â€“Â OrangeDog
Apr 11 '13 at 11:49

xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
â€“Â gelraen
Apr 12 '13 at 10:12

Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
â€“Â Baard Kopperud
Apr 12 '13 at 14:52

add a commentÂ |Â

2

Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
â€“Â OrangeDog
Apr 11 '13 at 11:49

xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
â€“Â gelraen
Apr 12 '13 at 10:12

Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
â€“Â Baard Kopperud
Apr 12 '13 at 14:52

Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
â€“Â OrangeDog
Apr 11 '13 at 11:49

xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
â€“Â gelraen
Apr 12 '13 at 10:12

Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
â€“Â Baard Kopperud
Apr 12 '13 at 14:52

add a commentÂ |Â

up vote
13
down vote

Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.

find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +

answered Apr 10 '13 at 17:58

Marco

24.3k580112

1

+1 for being the only answer that talks about lossy and lossless.
â€“Â Konerak
Apr 11 '13 at 8:09

add a commentÂ |Â

up vote
13
down vote

Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.

find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +

answered Apr 10 '13 at 17:58

Marco

24.3k580112

1

+1 for being the only answer that talks about lossy and lossless.
â€“Â Konerak
Apr 11 '13 at 8:09

add a commentÂ |Â

up vote
13
down vote

Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.

find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +

answered Apr 10 '13 at 17:58

Marco

24.3k580112

Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.

find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +

answered Apr 10 '13 at 17:58

Marco

24.3k580112

answered Apr 10 '13 at 17:58

Marco

24.3k580112

answered Apr 10 '13 at 17:58

Marco

24.3k580112

answered Apr 10 '13 at 17:58

Marco

24.3k580112

1

+1 for being the only answer that talks about lossy and lossless.
â€“Â Konerak
Apr 11 '13 at 8:09

add a commentÂ |Â

1

+1 for being the only answer that talks about lossy and lossless.
â€“Â Konerak
Apr 11 '13 at 8:09

+1 for being the only answer that talks about lossy and lossless.
â€“Â Konerak
Apr 11 '13 at 8:09

add a commentÂ |Â

up vote
10
down vote

Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.

Sorry! If you're using pngs, you can try to shrink your files using pngcrush.

answered Apr 10 '13 at 17:12

mrb

7,17912432

add a commentÂ |Â

up vote
10
down vote

Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.

Sorry! If you're using pngs, you can try to shrink your files using pngcrush.

answered Apr 10 '13 at 17:12

mrb

7,17912432

add a commentÂ |Â

up vote
10
down vote

Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.

Sorry! If you're using pngs, you can try to shrink your files using pngcrush.

answered Apr 10 '13 at 17:12

mrb

7,17912432

Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.

Sorry! If you're using pngs, you can try to shrink your files using pngcrush.

answered Apr 10 '13 at 17:12

mrb

7,17912432

answered Apr 10 '13 at 17:12

mrb

7,17912432

answered Apr 10 '13 at 17:12

mrb

7,17912432

answered Apr 10 '13 at 17:12

mrb

7,17912432

add a commentÂ |Â

up vote
6
down vote

It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

add a commentÂ |Â

up vote
6
down vote

It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

add a commentÂ |Â

up vote
6
down vote

It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

answered Apr 10 '13 at 22:51

Baard Kopperud

4,28832344

add a commentÂ |Â

up vote
4
down vote

Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

add a commentÂ |Â

up vote
4
down vote

Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

add a commentÂ |Â

up vote
4
down vote

Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

answered Apr 10 '13 at 17:12

jordanm

29.1k27790

add a commentÂ |Â

up vote
4
down vote

Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.

edited Apr 10 '13 at 18:05

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

add a commentÂ |Â

up vote
4
down vote

Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.

edited Apr 10 '13 at 18:05

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

add a commentÂ |Â

up vote
4
down vote

Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.

edited Apr 10 '13 at 18:05

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.

edited Apr 10 '13 at 18:05

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

edited Apr 10 '13 at 18:05

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

answered Apr 10 '13 at 17:55

h3rrmiller

8,74942138

add a commentÂ |Â

up vote
4
down vote

answered Apr 12 '13 at 7:10

Fixed Point

26848

There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
â€“Â Bananguin
May 20 '16 at 7:57

And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
â€“Â Bananguin
May 20 '16 at 8:09

add a commentÂ |Â

up vote
4
down vote

answered Apr 12 '13 at 7:10

Fixed Point

26848

There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
â€“Â Bananguin
May 20 '16 at 7:57

And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
â€“Â Bananguin
May 20 '16 at 8:09

add a commentÂ |Â

up vote
4
down vote

answered Apr 12 '13 at 7:10

Fixed Point

26848

answered Apr 12 '13 at 7:10

Fixed Point

26848

answered Apr 12 '13 at 7:10

Fixed Point

26848

answered Apr 12 '13 at 7:10

Fixed Point

26848

answered Apr 12 '13 at 7:10

Fixed Point

26848

There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
â€“Â Bananguin
May 20 '16 at 7:57

And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
â€“Â Bananguin
May 20 '16 at 8:09

add a commentÂ |Â

There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
â€“Â Bananguin
May 20 '16 at 7:57

And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
â€“Â Bananguin
May 20 '16 at 8:09

There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
â€“Â Bananguin
May 20 '16 at 7:57

And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
â€“Â Bananguin
May 20 '16 at 8:09

add a commentÂ |Â

up vote
1
down vote

Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.

answered Apr 10 '13 at 17:12

tink

3,79211118

1

TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
â€“Â Michael KjÃ¶rling
Apr 12 '13 at 9:00

add a commentÂ |Â

up vote
1
down vote

Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.

answered Apr 10 '13 at 17:12

tink

3,79211118

1

TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
â€“Â Michael KjÃ¶rling
Apr 12 '13 at 9:00

add a commentÂ |Â

up vote
1
down vote

Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.

answered Apr 10 '13 at 17:12

tink

3,79211118

Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.

answered Apr 10 '13 at 17:12

tink

3,79211118

answered Apr 10 '13 at 17:12

tink

3,79211118

answered Apr 10 '13 at 17:12

tink

3,79211118

answered Apr 10 '13 at 17:12

tink

3,79211118

1

TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
â€“Â Michael KjÃ¶rling
Apr 12 '13 at 9:00

add a commentÂ |Â

1

TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
â€“Â Michael KjÃ¶rling
Apr 12 '13 at 9:00

TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
â€“Â Michael KjÃ¶rling
Apr 12 '13 at 9:00

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu