Triple compression and I only save 1% in space?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite
3












I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.



So I decided to compress them.



First I used tar:



tar -zcf folder.tar folder 


Then gzip



gzip folder


And finally, for good measure, just in case, bzip2



bzip2 folder


And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!



Have I done something wrong here? I would expect many times more than a 1% saving!



How else can I compress the files?







share|improve this question


















  • 28




    think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
    – Bananguin
    Apr 10 '13 at 18:19






  • 2




    @user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
    – Dan Neely
    Apr 10 '13 at 18:47







  • 19




    Note that you've only done double compression. Tar just combines the files.
    – user606723
    Apr 10 '13 at 19:19






  • 2




    @ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
    – amalloy
    Apr 11 '13 at 0:18






  • 9




    @user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
    – Carlos Campderrós
    Apr 11 '13 at 6:46














up vote
2
down vote

favorite
3












I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.



So I decided to compress them.



First I used tar:



tar -zcf folder.tar folder 


Then gzip



gzip folder


And finally, for good measure, just in case, bzip2



bzip2 folder


And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!



Have I done something wrong here? I would expect many times more than a 1% saving!



How else can I compress the files?







share|improve this question


















  • 28




    think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
    – Bananguin
    Apr 10 '13 at 18:19






  • 2




    @user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
    – Dan Neely
    Apr 10 '13 at 18:47







  • 19




    Note that you've only done double compression. Tar just combines the files.
    – user606723
    Apr 10 '13 at 19:19






  • 2




    @ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
    – amalloy
    Apr 11 '13 at 0:18






  • 9




    @user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
    – Carlos Campderrós
    Apr 11 '13 at 6:46












up vote
2
down vote

favorite
3









up vote
2
down vote

favorite
3






3





I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.



So I decided to compress them.



First I used tar:



tar -zcf folder.tar folder 


Then gzip



gzip folder


And finally, for good measure, just in case, bzip2



bzip2 folder


And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!



Have I done something wrong here? I would expect many times more than a 1% saving!



How else can I compress the files?







share|improve this question














I've been trying to save space on my linux server, and I had a folder containting, in subfolders, 22GB of images.



So I decided to compress them.



First I used tar:



tar -zcf folder.tar folder 


Then gzip



gzip folder


And finally, for good measure, just in case, bzip2



bzip2 folder


And after all that, the total of all the folder.tar.gz.bzip2s, came to still 22GB! With, using finer precision, a 1% space saving!



Have I done something wrong here? I would expect many times more than a 1% saving!



How else can I compress the files?









share|improve this question













share|improve this question




share|improve this question








edited Apr 10 '13 at 17:12

























asked Apr 10 '13 at 17:03









ACarter

2091410




2091410







  • 28




    think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
    – Bananguin
    Apr 10 '13 at 18:19






  • 2




    @user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
    – Dan Neely
    Apr 10 '13 at 18:47







  • 19




    Note that you've only done double compression. Tar just combines the files.
    – user606723
    Apr 10 '13 at 19:19






  • 2




    @ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
    – amalloy
    Apr 11 '13 at 0:18






  • 9




    @user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
    – Carlos Campderrós
    Apr 11 '13 at 6:46












  • 28




    think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
    – Bananguin
    Apr 10 '13 at 18:19






  • 2




    @user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
    – Dan Neely
    Apr 10 '13 at 18:47







  • 19




    Note that you've only done double compression. Tar just combines the files.
    – user606723
    Apr 10 '13 at 19:19






  • 2




    @ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
    – amalloy
    Apr 11 '13 at 0:18






  • 9




    @user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
    – Carlos Campderrós
    Apr 11 '13 at 6:46







28




28




think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
– Bananguin
Apr 10 '13 at 18:19




think about this: if all data could be compressed again and again, you could compress all data into 1 ASCII symbol. Do you think that can work?
– Bananguin
Apr 10 '13 at 18:19




2




2




@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
– Dan Neely
Apr 10 '13 at 18:47





@user1129682 BARF cs.fit.edu/~mmahoney/compression/barf.html
– Dan Neely
Apr 10 '13 at 18:47





19




19




Note that you've only done double compression. Tar just combines the files.
– user606723
Apr 10 '13 at 19:19




Note that you've only done double compression. Tar just combines the files.
– user606723
Apr 10 '13 at 19:19




2




2




@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
– amalloy
Apr 11 '13 at 0:18




@ACarter If you read to the bottom, you'll see that the trick is they write files with very long filenames and very short contents. Every byte in the original contents adds four bytes to the filename: not a very good compression scheme!
– amalloy
Apr 11 '13 at 0:18




9




9




@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
– Carlos Campderrós
Apr 11 '13 at 6:46




@user606723 really he has made triple compression. Although the tar command created a filename named folder.tar, it really should be folder.tar.gz because he is using the z flag there.
– Carlos Campderrós
Apr 11 '13 at 6:46










8 Answers
8






active

oldest

votes

















up vote
32
down vote



accepted










Compression ratio is very dependent of what you're compressing. The reason text compresses down so well is because it doesn't even begin to fully utilize the full range of numbers representable in the same binary space. So formats that do (e.g compressed files) can store the same information in less space just by virtue of using all those binary numbers that mean nothing in textual encodings and can effectively represent whole progressions of characters in a single byte and get a good compression ratio that way.



If the files are already compressed, you're typically not going to see much advantage to compressing them again. If that actually saved you additional space it's probably an indication that the first compression algorithm kind of sucks. Judging from the nature of the question I'm going to assume a lot of these are media files and as such are already compressed (albeit with algorithms that prioritize speed of decompression) and so you're probably not going to get much from them. Sort of a blood from a stone scenario: they're already as small as they could be made without losing information.



If I'm super worried about space I just do a "bzip2 -9" and call it good. I've heard good things about the ratio on XZ though. I haven't used XZ myself (other than to decompress other people's stuff), but it's supposed to have a better ratio than bzip2 but take a little longer to compress/decompress.






share|improve this answer
















  • 2




    Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
    – OrangeDog
    Apr 11 '13 at 11:49










  • xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
    – gelraen
    Apr 12 '13 at 10:12











  • Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
    – Baard Kopperud
    Apr 12 '13 at 14:52

















up vote
13
down vote













You compression attempts failed because your data is already highly compressed and there's not much more to gain, see the other answers for more detailed explanations. However, if you can agree on lossy compression, in contrast to lossless like you tried before, you can compress the images significantly. But since data is cut away, it can not be undone.



Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.



find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +





share|improve this answer
















  • 1




    +1 for being the only answer that talks about lossy and lossless.
    – Konerak
    Apr 11 '13 at 8:09

















up vote
10
down vote













Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.



Adding more compression can actually make the result (slightly) larger, because the compression algorithm has no benefit on compressed data, and then the format (eg. gzip) has to add header and/or structure information to the output.



Sorry! If you're using pngs, you can try to shrink your files using pngcrush.






share|improve this answer



























    up vote
    6
    down vote













    1) Many image-and video-formats are already compressed, so it's very little to gain by compressing them with some other program. This is especially true for JPEG. For very small pictures (in bytes) - or rather a large archive with many small pictures - there may quite a bit to save, but in general, JPEG-files are as compressed as they can get.



    2) It's generally a bad idea to try to compress the same data repeatedly; whether it's compressing an already optimized filetype (e.g. gziping a jpeg-file), or applying a different or the same compression programs to the same file in serial (as you've done).



    3) When you compress a file, you sometimes will end-up with a larger file than you originally had (use touch to make an empty file, and try to bzip2 it). It has to be that way; because else you would be able to take some data, compress it again and again until nothing was left but an empty file, and still be able to uncompress back to the original data later - but does that sound logical?



    It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.



    4) The best way to save data, is to find the compression-program that gives the best gain for whatever data you have (as the gain may vary depending on the data); and use only that compression-program and use it only once - but with it's best (often slowest and most resource-demanding) setting. Currently the "best" (giving most gain) compression-program is probably xzip, though bzip2 is not far behind. Make sure you select the best compression-rate.



    5) For images (like jpeg) you often use "lossy" compression - ie. you loose some data (unlike when you use programs like xzip, bzip2 and gzip which are not lossy). Repeatedly JPEG-compressing an image will therefor make the image smaller each time it's used (unlike using something like bzip2 twice), but you will loose details in the image. There are also other things you can do to images - like changing the size (making it smaller) or resolution (less pixels per inch) - that'll make it "smaller", but again data will be lost.



    Thus if the quality of the pictures are not that important and you absolutely want to save space, using a program like ImageMagic to batch-process all the images and making them smaller, less detailed and/or using higher jpeg-compression may save you lot of space. It will be lossy though, and your pictures will loose details.



    6) A bit OT, but have you looked at stuff like thumbnails-directories - like ~/.thumbnails ? If you have many pictures in your directories and use file-browsers with picture-preview, .thumbnails may contain lots of thumbnails of pictures you've browsed through at some time. Personally I've gained lots of disk-space by routinely deleting files under various hiding-places for thumbnails...






    share|improve this answer



























      up vote
      4
      down vote













      Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.






      share|improve this answer



























        up vote
        4
        down vote













        Another point worth raising: using multiple compression tools/algorithms can actually cause your final result to inflate in size and become larger than it needs to be. Meaning if you compress 100GB down to 10GB and then try to compress it again you may end up with ~15GB depending on what you are compressing and what you are compressing it with.



        Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.






        share|improve this answer





























          up vote
          4
          down vote













          As a mathematician, I feel like I should chime in and elaborate a bit. The question boils down to lossy compression versus lossless compression. Image compression like jpeg is a lossy compression and zipping is lossless.



          Lossy - depending on how much information you are willing to lose, you can always "compress" a file down to a smaller size no matter what but the trade off is you will permanently lose some info and when you "decompress" you will not have the original file. And yes with lossy compression, you can compress again and again and get down to 1 byte but it'll be completely useless.



          Lossless - with this you will not lose any information at all and when you "decompress" you will have the original file exactly. But here the trade off is that a reduction in size is not guaranteed (easily proven using the pigeon-hole principle). So some file will decrease in size. Some will remain the same. And yes some can actually increase in size. So the lossless algorithms are designed/optimized for specific kind of data so they work at (losslessly) compressing one kind of data very well and absolutely suck at others.



          This is where my computer science ignorance kicks in. I think the file zipping you are using is optimized for text, not for images so they don't help with images. The images are already (lossy) compressed and then compressing them again won't help. If you want to (lossy) compress them again you might ruin the images and lose too much info...which is like saving them as jpeg with more emphasis on size than quality.



          I don't know if there is a lossless compression algorithm optimized for images but that might help you. Or maybe there is an algorithm which is optimized for specific types of images you are trying to compress like if they are black & white, contain certain color schemes, are always landscapes, or are all portraits.






          share|improve this answer




















          • There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
            – Bananguin
            May 20 '16 at 7:57










          • And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
            – Bananguin
            May 20 '16 at 8:09

















          up vote
          1
          down vote













          Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.






          share|improve this answer
















          • 1




            TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
            – Michael Kjörling
            Apr 12 '13 at 9:00










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f71991%2ftriple-compression-and-i-only-save-1-in-space%23new-answer', 'question_page');

          );

          Post as a guest






























          8 Answers
          8






          active

          oldest

          votes








          8 Answers
          8






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          32
          down vote



          accepted










          Compression ratio is very dependent of what you're compressing. The reason text compresses down so well is because it doesn't even begin to fully utilize the full range of numbers representable in the same binary space. So formats that do (e.g compressed files) can store the same information in less space just by virtue of using all those binary numbers that mean nothing in textual encodings and can effectively represent whole progressions of characters in a single byte and get a good compression ratio that way.



          If the files are already compressed, you're typically not going to see much advantage to compressing them again. If that actually saved you additional space it's probably an indication that the first compression algorithm kind of sucks. Judging from the nature of the question I'm going to assume a lot of these are media files and as such are already compressed (albeit with algorithms that prioritize speed of decompression) and so you're probably not going to get much from them. Sort of a blood from a stone scenario: they're already as small as they could be made without losing information.



          If I'm super worried about space I just do a "bzip2 -9" and call it good. I've heard good things about the ratio on XZ though. I haven't used XZ myself (other than to decompress other people's stuff), but it's supposed to have a better ratio than bzip2 but take a little longer to compress/decompress.






          share|improve this answer
















          • 2




            Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
            – OrangeDog
            Apr 11 '13 at 11:49










          • xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
            – gelraen
            Apr 12 '13 at 10:12











          • Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
            – Baard Kopperud
            Apr 12 '13 at 14:52














          up vote
          32
          down vote



          accepted










          Compression ratio is very dependent of what you're compressing. The reason text compresses down so well is because it doesn't even begin to fully utilize the full range of numbers representable in the same binary space. So formats that do (e.g compressed files) can store the same information in less space just by virtue of using all those binary numbers that mean nothing in textual encodings and can effectively represent whole progressions of characters in a single byte and get a good compression ratio that way.



          If the files are already compressed, you're typically not going to see much advantage to compressing them again. If that actually saved you additional space it's probably an indication that the first compression algorithm kind of sucks. Judging from the nature of the question I'm going to assume a lot of these are media files and as such are already compressed (albeit with algorithms that prioritize speed of decompression) and so you're probably not going to get much from them. Sort of a blood from a stone scenario: they're already as small as they could be made without losing information.



          If I'm super worried about space I just do a "bzip2 -9" and call it good. I've heard good things about the ratio on XZ though. I haven't used XZ myself (other than to decompress other people's stuff), but it's supposed to have a better ratio than bzip2 but take a little longer to compress/decompress.






          share|improve this answer
















          • 2




            Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
            – OrangeDog
            Apr 11 '13 at 11:49










          • xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
            – gelraen
            Apr 12 '13 at 10:12











          • Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
            – Baard Kopperud
            Apr 12 '13 at 14:52












          up vote
          32
          down vote



          accepted







          up vote
          32
          down vote



          accepted






          Compression ratio is very dependent of what you're compressing. The reason text compresses down so well is because it doesn't even begin to fully utilize the full range of numbers representable in the same binary space. So formats that do (e.g compressed files) can store the same information in less space just by virtue of using all those binary numbers that mean nothing in textual encodings and can effectively represent whole progressions of characters in a single byte and get a good compression ratio that way.



          If the files are already compressed, you're typically not going to see much advantage to compressing them again. If that actually saved you additional space it's probably an indication that the first compression algorithm kind of sucks. Judging from the nature of the question I'm going to assume a lot of these are media files and as such are already compressed (albeit with algorithms that prioritize speed of decompression) and so you're probably not going to get much from them. Sort of a blood from a stone scenario: they're already as small as they could be made without losing information.



          If I'm super worried about space I just do a "bzip2 -9" and call it good. I've heard good things about the ratio on XZ though. I haven't used XZ myself (other than to decompress other people's stuff), but it's supposed to have a better ratio than bzip2 but take a little longer to compress/decompress.






          share|improve this answer












          Compression ratio is very dependent of what you're compressing. The reason text compresses down so well is because it doesn't even begin to fully utilize the full range of numbers representable in the same binary space. So formats that do (e.g compressed files) can store the same information in less space just by virtue of using all those binary numbers that mean nothing in textual encodings and can effectively represent whole progressions of characters in a single byte and get a good compression ratio that way.



          If the files are already compressed, you're typically not going to see much advantage to compressing them again. If that actually saved you additional space it's probably an indication that the first compression algorithm kind of sucks. Judging from the nature of the question I'm going to assume a lot of these are media files and as such are already compressed (albeit with algorithms that prioritize speed of decompression) and so you're probably not going to get much from them. Sort of a blood from a stone scenario: they're already as small as they could be made without losing information.



          If I'm super worried about space I just do a "bzip2 -9" and call it good. I've heard good things about the ratio on XZ though. I haven't used XZ myself (other than to decompress other people's stuff), but it's supposed to have a better ratio than bzip2 but take a little longer to compress/decompress.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 10 '13 at 17:14









          Bratchley

          11.7k64386




          11.7k64386







          • 2




            Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
            – OrangeDog
            Apr 11 '13 at 11:49










          • xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
            – gelraen
            Apr 12 '13 at 10:12











          • Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
            – Baard Kopperud
            Apr 12 '13 at 14:52












          • 2




            Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
            – OrangeDog
            Apr 11 '13 at 11:49










          • xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
            – gelraen
            Apr 12 '13 at 10:12











          • Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
            – Baard Kopperud
            Apr 12 '13 at 14:52







          2




          2




          Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
          – OrangeDog
          Apr 11 '13 at 11:49




          Every time I've used xs it's given a lot better compression, but a lot slower to compress. For example 50% to 25% but 20 seconds to 4 minutes (on text).
          – OrangeDog
          Apr 11 '13 at 11:49












          xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
          – gelraen
          Apr 12 '13 at 10:12





          xz compared to bzip2 usually gives better ratio and faster decompression, but takes more time to compress things, which makes it better for something that will be decompressed more then one time, like anything you plan to distribute among many users.
          – gelraen
          Apr 12 '13 at 10:12













          Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
          – Baard Kopperud
          Apr 12 '13 at 14:52




          Isn't "bzip2 -9" the default setting? For bzip2 -1 to -9 just specify the size of the "chunks" being compressed from 100k to 900k.
          – Baard Kopperud
          Apr 12 '13 at 14:52












          up vote
          13
          down vote













          You compression attempts failed because your data is already highly compressed and there's not much more to gain, see the other answers for more detailed explanations. However, if you can agree on lossy compression, in contrast to lossless like you tried before, you can compress the images significantly. But since data is cut away, it can not be undone.



          Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.



          find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +





          share|improve this answer
















          • 1




            +1 for being the only answer that talks about lossy and lossless.
            – Konerak
            Apr 11 '13 at 8:09














          up vote
          13
          down vote













          You compression attempts failed because your data is already highly compressed and there's not much more to gain, see the other answers for more detailed explanations. However, if you can agree on lossy compression, in contrast to lossless like you tried before, you can compress the images significantly. But since data is cut away, it can not be undone.



          Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.



          find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +





          share|improve this answer
















          • 1




            +1 for being the only answer that talks about lossy and lossless.
            – Konerak
            Apr 11 '13 at 8:09












          up vote
          13
          down vote










          up vote
          13
          down vote









          You compression attempts failed because your data is already highly compressed and there's not much more to gain, see the other answers for more detailed explanations. However, if you can agree on lossy compression, in contrast to lossless like you tried before, you can compress the images significantly. But since data is cut away, it can not be undone.



          Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.



          find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +





          share|improve this answer












          You compression attempts failed because your data is already highly compressed and there's not much more to gain, see the other answers for more detailed explanations. However, if you can agree on lossy compression, in contrast to lossless like you tried before, you can compress the images significantly. But since data is cut away, it can not be undone.



          Here's an example re-compressing all JPEG images using imagemagick. Note that this will overwrite your files.



          find image_directory -type f -name "*.jpg" -exec mogrify -quality 75% +






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 10 '13 at 17:58









          Marco

          24.3k580112




          24.3k580112







          • 1




            +1 for being the only answer that talks about lossy and lossless.
            – Konerak
            Apr 11 '13 at 8:09












          • 1




            +1 for being the only answer that talks about lossy and lossless.
            – Konerak
            Apr 11 '13 at 8:09







          1




          1




          +1 for being the only answer that talks about lossy and lossless.
          – Konerak
          Apr 11 '13 at 8:09




          +1 for being the only answer that talks about lossy and lossless.
          – Konerak
          Apr 11 '13 at 8:09










          up vote
          10
          down vote













          Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.



          Adding more compression can actually make the result (slightly) larger, because the compression algorithm has no benefit on compressed data, and then the format (eg. gzip) has to add header and/or structure information to the output.



          Sorry! If you're using pngs, you can try to shrink your files using pngcrush.






          share|improve this answer
























            up vote
            10
            down vote













            Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.



            Adding more compression can actually make the result (slightly) larger, because the compression algorithm has no benefit on compressed data, and then the format (eg. gzip) has to add header and/or structure information to the output.



            Sorry! If you're using pngs, you can try to shrink your files using pngcrush.






            share|improve this answer






















              up vote
              10
              down vote










              up vote
              10
              down vote









              Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.



              Adding more compression can actually make the result (slightly) larger, because the compression algorithm has no benefit on compressed data, and then the format (eg. gzip) has to add header and/or structure information to the output.



              Sorry! If you're using pngs, you can try to shrink your files using pngcrush.






              share|improve this answer












              Most common image formats are already compressed (like jpg, png, gif), so you don't get much savings. 1% sounds about right.



              Adding more compression can actually make the result (slightly) larger, because the compression algorithm has no benefit on compressed data, and then the format (eg. gzip) has to add header and/or structure information to the output.



              Sorry! If you're using pngs, you can try to shrink your files using pngcrush.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Apr 10 '13 at 17:12









              mrb

              7,17912432




              7,17912432




















                  up vote
                  6
                  down vote













                  1) Many image-and video-formats are already compressed, so it's very little to gain by compressing them with some other program. This is especially true for JPEG. For very small pictures (in bytes) - or rather a large archive with many small pictures - there may quite a bit to save, but in general, JPEG-files are as compressed as they can get.



                  2) It's generally a bad idea to try to compress the same data repeatedly; whether it's compressing an already optimized filetype (e.g. gziping a jpeg-file), or applying a different or the same compression programs to the same file in serial (as you've done).



                  3) When you compress a file, you sometimes will end-up with a larger file than you originally had (use touch to make an empty file, and try to bzip2 it). It has to be that way; because else you would be able to take some data, compress it again and again until nothing was left but an empty file, and still be able to uncompress back to the original data later - but does that sound logical?



                  It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.



                  4) The best way to save data, is to find the compression-program that gives the best gain for whatever data you have (as the gain may vary depending on the data); and use only that compression-program and use it only once - but with it's best (often slowest and most resource-demanding) setting. Currently the "best" (giving most gain) compression-program is probably xzip, though bzip2 is not far behind. Make sure you select the best compression-rate.



                  5) For images (like jpeg) you often use "lossy" compression - ie. you loose some data (unlike when you use programs like xzip, bzip2 and gzip which are not lossy). Repeatedly JPEG-compressing an image will therefor make the image smaller each time it's used (unlike using something like bzip2 twice), but you will loose details in the image. There are also other things you can do to images - like changing the size (making it smaller) or resolution (less pixels per inch) - that'll make it "smaller", but again data will be lost.



                  Thus if the quality of the pictures are not that important and you absolutely want to save space, using a program like ImageMagic to batch-process all the images and making them smaller, less detailed and/or using higher jpeg-compression may save you lot of space. It will be lossy though, and your pictures will loose details.



                  6) A bit OT, but have you looked at stuff like thumbnails-directories - like ~/.thumbnails ? If you have many pictures in your directories and use file-browsers with picture-preview, .thumbnails may contain lots of thumbnails of pictures you've browsed through at some time. Personally I've gained lots of disk-space by routinely deleting files under various hiding-places for thumbnails...






                  share|improve this answer
























                    up vote
                    6
                    down vote













                    1) Many image-and video-formats are already compressed, so it's very little to gain by compressing them with some other program. This is especially true for JPEG. For very small pictures (in bytes) - or rather a large archive with many small pictures - there may quite a bit to save, but in general, JPEG-files are as compressed as they can get.



                    2) It's generally a bad idea to try to compress the same data repeatedly; whether it's compressing an already optimized filetype (e.g. gziping a jpeg-file), or applying a different or the same compression programs to the same file in serial (as you've done).



                    3) When you compress a file, you sometimes will end-up with a larger file than you originally had (use touch to make an empty file, and try to bzip2 it). It has to be that way; because else you would be able to take some data, compress it again and again until nothing was left but an empty file, and still be able to uncompress back to the original data later - but does that sound logical?



                    It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.



                    4) The best way to save data, is to find the compression-program that gives the best gain for whatever data you have (as the gain may vary depending on the data); and use only that compression-program and use it only once - but with it's best (often slowest and most resource-demanding) setting. Currently the "best" (giving most gain) compression-program is probably xzip, though bzip2 is not far behind. Make sure you select the best compression-rate.



                    5) For images (like jpeg) you often use "lossy" compression - ie. you loose some data (unlike when you use programs like xzip, bzip2 and gzip which are not lossy). Repeatedly JPEG-compressing an image will therefor make the image smaller each time it's used (unlike using something like bzip2 twice), but you will loose details in the image. There are also other things you can do to images - like changing the size (making it smaller) or resolution (less pixels per inch) - that'll make it "smaller", but again data will be lost.



                    Thus if the quality of the pictures are not that important and you absolutely want to save space, using a program like ImageMagic to batch-process all the images and making them smaller, less detailed and/or using higher jpeg-compression may save you lot of space. It will be lossy though, and your pictures will loose details.



                    6) A bit OT, but have you looked at stuff like thumbnails-directories - like ~/.thumbnails ? If you have many pictures in your directories and use file-browsers with picture-preview, .thumbnails may contain lots of thumbnails of pictures you've browsed through at some time. Personally I've gained lots of disk-space by routinely deleting files under various hiding-places for thumbnails...






                    share|improve this answer






















                      up vote
                      6
                      down vote










                      up vote
                      6
                      down vote









                      1) Many image-and video-formats are already compressed, so it's very little to gain by compressing them with some other program. This is especially true for JPEG. For very small pictures (in bytes) - or rather a large archive with many small pictures - there may quite a bit to save, but in general, JPEG-files are as compressed as they can get.



                      2) It's generally a bad idea to try to compress the same data repeatedly; whether it's compressing an already optimized filetype (e.g. gziping a jpeg-file), or applying a different or the same compression programs to the same file in serial (as you've done).



                      3) When you compress a file, you sometimes will end-up with a larger file than you originally had (use touch to make an empty file, and try to bzip2 it). It has to be that way; because else you would be able to take some data, compress it again and again until nothing was left but an empty file, and still be able to uncompress back to the original data later - but does that sound logical?



                      It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.



                      4) The best way to save data, is to find the compression-program that gives the best gain for whatever data you have (as the gain may vary depending on the data); and use only that compression-program and use it only once - but with it's best (often slowest and most resource-demanding) setting. Currently the "best" (giving most gain) compression-program is probably xzip, though bzip2 is not far behind. Make sure you select the best compression-rate.



                      5) For images (like jpeg) you often use "lossy" compression - ie. you loose some data (unlike when you use programs like xzip, bzip2 and gzip which are not lossy). Repeatedly JPEG-compressing an image will therefor make the image smaller each time it's used (unlike using something like bzip2 twice), but you will loose details in the image. There are also other things you can do to images - like changing the size (making it smaller) or resolution (less pixels per inch) - that'll make it "smaller", but again data will be lost.



                      Thus if the quality of the pictures are not that important and you absolutely want to save space, using a program like ImageMagic to batch-process all the images and making them smaller, less detailed and/or using higher jpeg-compression may save you lot of space. It will be lossy though, and your pictures will loose details.



                      6) A bit OT, but have you looked at stuff like thumbnails-directories - like ~/.thumbnails ? If you have many pictures in your directories and use file-browsers with picture-preview, .thumbnails may contain lots of thumbnails of pictures you've browsed through at some time. Personally I've gained lots of disk-space by routinely deleting files under various hiding-places for thumbnails...






                      share|improve this answer












                      1) Many image-and video-formats are already compressed, so it's very little to gain by compressing them with some other program. This is especially true for JPEG. For very small pictures (in bytes) - or rather a large archive with many small pictures - there may quite a bit to save, but in general, JPEG-files are as compressed as they can get.



                      2) It's generally a bad idea to try to compress the same data repeatedly; whether it's compressing an already optimized filetype (e.g. gziping a jpeg-file), or applying a different or the same compression programs to the same file in serial (as you've done).



                      3) When you compress a file, you sometimes will end-up with a larger file than you originally had (use touch to make an empty file, and try to bzip2 it). It has to be that way; because else you would be able to take some data, compress it again and again until nothing was left but an empty file, and still be able to uncompress back to the original data later - but does that sound logical?



                      It's typically compressing already optimized (like jpeg) or already compressed data which will cause growing this way, especially using the same compression-programs on the data several times.



                      4) The best way to save data, is to find the compression-program that gives the best gain for whatever data you have (as the gain may vary depending on the data); and use only that compression-program and use it only once - but with it's best (often slowest and most resource-demanding) setting. Currently the "best" (giving most gain) compression-program is probably xzip, though bzip2 is not far behind. Make sure you select the best compression-rate.



                      5) For images (like jpeg) you often use "lossy" compression - ie. you loose some data (unlike when you use programs like xzip, bzip2 and gzip which are not lossy). Repeatedly JPEG-compressing an image will therefor make the image smaller each time it's used (unlike using something like bzip2 twice), but you will loose details in the image. There are also other things you can do to images - like changing the size (making it smaller) or resolution (less pixels per inch) - that'll make it "smaller", but again data will be lost.



                      Thus if the quality of the pictures are not that important and you absolutely want to save space, using a program like ImageMagic to batch-process all the images and making them smaller, less detailed and/or using higher jpeg-compression may save you lot of space. It will be lossy though, and your pictures will loose details.



                      6) A bit OT, but have you looked at stuff like thumbnails-directories - like ~/.thumbnails ? If you have many pictures in your directories and use file-browsers with picture-preview, .thumbnails may contain lots of thumbnails of pictures you've browsed through at some time. Personally I've gained lots of disk-space by routinely deleting files under various hiding-places for thumbnails...







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Apr 10 '13 at 22:51









                      Baard Kopperud

                      4,28832344




                      4,28832344




















                          up vote
                          4
                          down vote













                          Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.






                          share|improve this answer
























                            up vote
                            4
                            down vote













                            Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.






                            share|improve this answer






















                              up vote
                              4
                              down vote










                              up vote
                              4
                              down vote









                              Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.






                              share|improve this answer












                              Image formats such as png and jpeg are already compressed. The gain from compressing them again is minimal.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Apr 10 '13 at 17:12









                              jordanm

                              29.1k27790




                              29.1k27790




















                                  up vote
                                  4
                                  down vote













                                  Another point worth raising: using multiple compression tools/algorithms can actually cause your final result to inflate in size and become larger than it needs to be. Meaning if you compress 100GB down to 10GB and then try to compress it again you may end up with ~15GB depending on what you are compressing and what you are compressing it with.



                                  Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.






                                  share|improve this answer


























                                    up vote
                                    4
                                    down vote













                                    Another point worth raising: using multiple compression tools/algorithms can actually cause your final result to inflate in size and become larger than it needs to be. Meaning if you compress 100GB down to 10GB and then try to compress it again you may end up with ~15GB depending on what you are compressing and what you are compressing it with.



                                    Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.






                                    share|improve this answer
























                                      up vote
                                      4
                                      down vote










                                      up vote
                                      4
                                      down vote









                                      Another point worth raising: using multiple compression tools/algorithms can actually cause your final result to inflate in size and become larger than it needs to be. Meaning if you compress 100GB down to 10GB and then try to compress it again you may end up with ~15GB depending on what you are compressing and what you are compressing it with.



                                      Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.






                                      share|improve this answer














                                      Another point worth raising: using multiple compression tools/algorithms can actually cause your final result to inflate in size and become larger than it needs to be. Meaning if you compress 100GB down to 10GB and then try to compress it again you may end up with ~15GB depending on what you are compressing and what you are compressing it with.



                                      Personally I never do anything more than tar cjvf container.tar.bz2 /target simply because the amount of disk space saved by double compressing is miniscule.







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited Apr 10 '13 at 18:05

























                                      answered Apr 10 '13 at 17:55









                                      h3rrmiller

                                      8,74942138




                                      8,74942138




















                                          up vote
                                          4
                                          down vote













                                          As a mathematician, I feel like I should chime in and elaborate a bit. The question boils down to lossy compression versus lossless compression. Image compression like jpeg is a lossy compression and zipping is lossless.



                                          Lossy - depending on how much information you are willing to lose, you can always "compress" a file down to a smaller size no matter what but the trade off is you will permanently lose some info and when you "decompress" you will not have the original file. And yes with lossy compression, you can compress again and again and get down to 1 byte but it'll be completely useless.



                                          Lossless - with this you will not lose any information at all and when you "decompress" you will have the original file exactly. But here the trade off is that a reduction in size is not guaranteed (easily proven using the pigeon-hole principle). So some file will decrease in size. Some will remain the same. And yes some can actually increase in size. So the lossless algorithms are designed/optimized for specific kind of data so they work at (losslessly) compressing one kind of data very well and absolutely suck at others.



                                          This is where my computer science ignorance kicks in. I think the file zipping you are using is optimized for text, not for images so they don't help with images. The images are already (lossy) compressed and then compressing them again won't help. If you want to (lossy) compress them again you might ruin the images and lose too much info...which is like saving them as jpeg with more emphasis on size than quality.



                                          I don't know if there is a lossless compression algorithm optimized for images but that might help you. Or maybe there is an algorithm which is optimized for specific types of images you are trying to compress like if they are black & white, contain certain color schemes, are always landscapes, or are all portraits.






                                          share|improve this answer




















                                          • There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
                                            – Bananguin
                                            May 20 '16 at 7:57










                                          • And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
                                            – Bananguin
                                            May 20 '16 at 8:09














                                          up vote
                                          4
                                          down vote













                                          As a mathematician, I feel like I should chime in and elaborate a bit. The question boils down to lossy compression versus lossless compression. Image compression like jpeg is a lossy compression and zipping is lossless.



                                          Lossy - depending on how much information you are willing to lose, you can always "compress" a file down to a smaller size no matter what but the trade off is you will permanently lose some info and when you "decompress" you will not have the original file. And yes with lossy compression, you can compress again and again and get down to 1 byte but it'll be completely useless.



                                          Lossless - with this you will not lose any information at all and when you "decompress" you will have the original file exactly. But here the trade off is that a reduction in size is not guaranteed (easily proven using the pigeon-hole principle). So some file will decrease in size. Some will remain the same. And yes some can actually increase in size. So the lossless algorithms are designed/optimized for specific kind of data so they work at (losslessly) compressing one kind of data very well and absolutely suck at others.



                                          This is where my computer science ignorance kicks in. I think the file zipping you are using is optimized for text, not for images so they don't help with images. The images are already (lossy) compressed and then compressing them again won't help. If you want to (lossy) compress them again you might ruin the images and lose too much info...which is like saving them as jpeg with more emphasis on size than quality.



                                          I don't know if there is a lossless compression algorithm optimized for images but that might help you. Or maybe there is an algorithm which is optimized for specific types of images you are trying to compress like if they are black & white, contain certain color schemes, are always landscapes, or are all portraits.






                                          share|improve this answer




















                                          • There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
                                            – Bananguin
                                            May 20 '16 at 7:57










                                          • And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
                                            – Bananguin
                                            May 20 '16 at 8:09












                                          up vote
                                          4
                                          down vote










                                          up vote
                                          4
                                          down vote









                                          As a mathematician, I feel like I should chime in and elaborate a bit. The question boils down to lossy compression versus lossless compression. Image compression like jpeg is a lossy compression and zipping is lossless.



                                          Lossy - depending on how much information you are willing to lose, you can always "compress" a file down to a smaller size no matter what but the trade off is you will permanently lose some info and when you "decompress" you will not have the original file. And yes with lossy compression, you can compress again and again and get down to 1 byte but it'll be completely useless.



                                          Lossless - with this you will not lose any information at all and when you "decompress" you will have the original file exactly. But here the trade off is that a reduction in size is not guaranteed (easily proven using the pigeon-hole principle). So some file will decrease in size. Some will remain the same. And yes some can actually increase in size. So the lossless algorithms are designed/optimized for specific kind of data so they work at (losslessly) compressing one kind of data very well and absolutely suck at others.



                                          This is where my computer science ignorance kicks in. I think the file zipping you are using is optimized for text, not for images so they don't help with images. The images are already (lossy) compressed and then compressing them again won't help. If you want to (lossy) compress them again you might ruin the images and lose too much info...which is like saving them as jpeg with more emphasis on size than quality.



                                          I don't know if there is a lossless compression algorithm optimized for images but that might help you. Or maybe there is an algorithm which is optimized for specific types of images you are trying to compress like if they are black & white, contain certain color schemes, are always landscapes, or are all portraits.






                                          share|improve this answer












                                          As a mathematician, I feel like I should chime in and elaborate a bit. The question boils down to lossy compression versus lossless compression. Image compression like jpeg is a lossy compression and zipping is lossless.



                                          Lossy - depending on how much information you are willing to lose, you can always "compress" a file down to a smaller size no matter what but the trade off is you will permanently lose some info and when you "decompress" you will not have the original file. And yes with lossy compression, you can compress again and again and get down to 1 byte but it'll be completely useless.



                                          Lossless - with this you will not lose any information at all and when you "decompress" you will have the original file exactly. But here the trade off is that a reduction in size is not guaranteed (easily proven using the pigeon-hole principle). So some file will decrease in size. Some will remain the same. And yes some can actually increase in size. So the lossless algorithms are designed/optimized for specific kind of data so they work at (losslessly) compressing one kind of data very well and absolutely suck at others.



                                          This is where my computer science ignorance kicks in. I think the file zipping you are using is optimized for text, not for images so they don't help with images. The images are already (lossy) compressed and then compressing them again won't help. If you want to (lossy) compress them again you might ruin the images and lose too much info...which is like saving them as jpeg with more emphasis on size than quality.



                                          I don't know if there is a lossless compression algorithm optimized for images but that might help you. Or maybe there is an algorithm which is optimized for specific types of images you are trying to compress like if they are black & white, contain certain color schemes, are always landscapes, or are all portraits.







                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Apr 12 '13 at 7:10









                                          Fixed Point

                                          26848




                                          26848











                                          • There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
                                            – Bananguin
                                            May 20 '16 at 7:57










                                          • And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
                                            – Bananguin
                                            May 20 '16 at 8:09
















                                          • There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
                                            – Bananguin
                                            May 20 '16 at 7:57










                                          • And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
                                            – Bananguin
                                            May 20 '16 at 8:09















                                          There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
                                          – Bananguin
                                          May 20 '16 at 7:57




                                          There is no advantage to be gained from optimizing an algorithm to a specific kind of data. Ultimately it's all words over an alphabet of 2 letters. If there are patterns they can be found, whether text or titty. However, there can be heuristic optimizations for data structure. I once wrote a little something that would interpret a compound PDF (i.e. a PDF with PDF images) and eliminate redundant font definitions and meta information. With this I can squeeze an average of 20% from my PDFs, without data loss, even though the original PDFs are already compressed.
                                          – Bananguin
                                          May 20 '16 at 7:57












                                          And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
                                          – Bananguin
                                          May 20 '16 at 8:09




                                          And just to spell out the pidgeon-hole principle: if any string could be compressed into a shorter string w/o data loss, ultimately every string could be compressed into a string of length 1. As the number of characters is limited (e.g. 8bit), while the length of input strings is not, having |available characters|+1 strings (e.g. 257) all with pairwise different lenghts, would result in at least one compressed string appearing twice and a decompression algorithm cannot know which original string to reconstruct from it. => Not every string can be compressed into a shorter string w/o data loss.
                                          – Bananguin
                                          May 20 '16 at 8:09










                                          up vote
                                          1
                                          down vote













                                          Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.






                                          share|improve this answer
















                                          • 1




                                            TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
                                            – Michael Kjörling
                                            Apr 12 '13 at 9:00














                                          up vote
                                          1
                                          down vote













                                          Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.






                                          share|improve this answer
















                                          • 1




                                            TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
                                            – Michael Kjörling
                                            Apr 12 '13 at 9:00












                                          up vote
                                          1
                                          down vote










                                          up vote
                                          1
                                          down vote









                                          Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.






                                          share|improve this answer












                                          Images, unless you're using raw or tiff, have already got "built-in compression". trying to compress them again will most likely do more harm than good by adding extra headers.







                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Apr 10 '13 at 17:12









                                          tink

                                          3,79211118




                                          3,79211118







                                          • 1




                                            TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
                                            – Michael Kjörling
                                            Apr 12 '13 at 9:00












                                          • 1




                                            TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
                                            – Michael Kjörling
                                            Apr 12 '13 at 9:00







                                          1




                                          1




                                          TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
                                          – Michael Kjörling
                                          Apr 12 '13 at 9:00




                                          TIFF can be either lossily or losslessly compressed. Camera RAW formats are often compressed (easily shown by looking at file sizes; if they differ by more than a miniscule amount attributable to metadata and a possible embedded thumbnail, then the raw data is very likely compressed to some degree).
                                          – Michael Kjörling
                                          Apr 12 '13 at 9:00

















                                           

                                          draft saved


                                          draft discarded















































                                           


                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f71991%2ftriple-compression-and-i-only-save-1-in-space%23new-answer', 'question_page');

                                          );

                                          Post as a guest













































































                                          Popular posts from this blog

                                          How to check contact read email or not when send email to Individual?

                                          How many registers does an x86_64 CPU actually have?

                                          Nur Jahan