Merge possibly truncated gzipped log files

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:




  • server-1-log.gz (Yesterday's log file)


  • server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)


  • server-1-log.2.gz (Today's log file re-transferred and intact)


  • server-2-log.gz (Yesterday's log file)


  • server-2-log.1.gz (Today's log file)

All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:



zcat *.gz | sort | uniq | gzip > /tmp/merged.gz


The problem is that the truncated log file produces the following error from zcat:




gzip: server-1-log.1.gz: unexpected end of file




It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?



  • Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

  • Can I fix truncated gzip files before calling zcat?

  • Can I use a different decompression program instead?






share|improve this question
























    up vote
    0
    down vote

    favorite












    I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:




    • server-1-log.gz (Yesterday's log file)


    • server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)


    • server-1-log.2.gz (Today's log file re-transferred and intact)


    • server-2-log.gz (Yesterday's log file)


    • server-2-log.1.gz (Today's log file)

    All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:



    zcat *.gz | sort | uniq | gzip > /tmp/merged.gz


    The problem is that the truncated log file produces the following error from zcat:




    gzip: server-1-log.1.gz: unexpected end of file




    It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?



    • Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

    • Can I fix truncated gzip files before calling zcat?

    • Can I use a different decompression program instead?






    share|improve this question






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:




      • server-1-log.gz (Yesterday's log file)


      • server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)


      • server-1-log.2.gz (Today's log file re-transferred and intact)


      • server-2-log.gz (Yesterday's log file)


      • server-2-log.1.gz (Today's log file)

      All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:



      zcat *.gz | sort | uniq | gzip > /tmp/merged.gz


      The problem is that the truncated log file produces the following error from zcat:




      gzip: server-1-log.1.gz: unexpected end of file




      It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?



      • Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

      • Can I fix truncated gzip files before calling zcat?

      • Can I use a different decompression program instead?






      share|improve this question












      I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:




      • server-1-log.gz (Yesterday's log file)


      • server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)


      • server-1-log.2.gz (Today's log file re-transferred and intact)


      • server-2-log.gz (Yesterday's log file)


      • server-2-log.1.gz (Today's log file)

      All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:



      zcat *.gz | sort | uniq | gzip > /tmp/merged.gz


      The problem is that the truncated log file produces the following error from zcat:




      gzip: server-1-log.1.gz: unexpected end of file




      It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?



      • Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

      • Can I fix truncated gzip files before calling zcat?

      • Can I use a different decompression program instead?








      share|improve this question











      share|improve this question




      share|improve this question










      asked Oct 20 '17 at 15:10









      Stephen Ostermiller

      5242620




      5242620




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I’m guessing you’re using the gzip script version of zcat. That just executes gzip -dc, which can’t be told to ignore errors and stops when it encounters one.



          The documented fix for individual corrupted compressed files is to run them through zcat, so you won’t get much help there...



          To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.






          share|improve this answer




















          • Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
            – Stephen Ostermiller
            Oct 20 '17 at 16:56










          • It’s the default behaviour, at least for zcat.
            – Stephen Kitt
            Oct 20 '17 at 17:08










          • Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
            – Stephen Ostermiller
            Oct 20 '17 at 17:13

















          up vote
          0
          down vote













          I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:



          echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz


          The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.






          share|improve this answer




















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "106"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399369%2fmerge-possibly-truncated-gzipped-log-files%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            I’m guessing you’re using the gzip script version of zcat. That just executes gzip -dc, which can’t be told to ignore errors and stops when it encounters one.



            The documented fix for individual corrupted compressed files is to run them through zcat, so you won’t get much help there...



            To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.






            share|improve this answer




















            • Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
              – Stephen Ostermiller
              Oct 20 '17 at 16:56










            • It’s the default behaviour, at least for zcat.
              – Stephen Kitt
              Oct 20 '17 at 17:08










            • Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
              – Stephen Ostermiller
              Oct 20 '17 at 17:13














            up vote
            1
            down vote



            accepted










            I’m guessing you’re using the gzip script version of zcat. That just executes gzip -dc, which can’t be told to ignore errors and stops when it encounters one.



            The documented fix for individual corrupted compressed files is to run them through zcat, so you won’t get much help there...



            To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.






            share|improve this answer




















            • Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
              – Stephen Ostermiller
              Oct 20 '17 at 16:56










            • It’s the default behaviour, at least for zcat.
              – Stephen Kitt
              Oct 20 '17 at 17:08










            • Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
              – Stephen Ostermiller
              Oct 20 '17 at 17:13












            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            I’m guessing you’re using the gzip script version of zcat. That just executes gzip -dc, which can’t be told to ignore errors and stops when it encounters one.



            The documented fix for individual corrupted compressed files is to run them through zcat, so you won’t get much help there...



            To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.






            share|improve this answer












            I’m guessing you’re using the gzip script version of zcat. That just executes gzip -dc, which can’t be told to ignore errors and stops when it encounters one.



            The documented fix for individual corrupted compressed files is to run them through zcat, so you won’t get much help there...



            To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Oct 20 '17 at 15:27









            Stephen Kitt

            144k22313378




            144k22313378











            • Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
              – Stephen Ostermiller
              Oct 20 '17 at 16:56










            • It’s the default behaviour, at least for zcat.
              – Stephen Kitt
              Oct 20 '17 at 17:08










            • Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
              – Stephen Ostermiller
              Oct 20 '17 at 17:13
















            • Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
              – Stephen Ostermiller
              Oct 20 '17 at 16:56










            • It’s the default behaviour, at least for zcat.
              – Stephen Kitt
              Oct 20 '17 at 17:08










            • Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
              – Stephen Ostermiller
              Oct 20 '17 at 17:13















            Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
            – Stephen Ostermiller
            Oct 20 '17 at 16:56




            Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
            – Stephen Ostermiller
            Oct 20 '17 at 16:56












            It’s the default behaviour, at least for zcat.
            – Stephen Kitt
            Oct 20 '17 at 17:08




            It’s the default behaviour, at least for zcat.
            – Stephen Kitt
            Oct 20 '17 at 17:08












            Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
            – Stephen Ostermiller
            Oct 20 '17 at 17:13




            Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
            – Stephen Ostermiller
            Oct 20 '17 at 17:13












            up vote
            0
            down vote













            I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:



            echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz


            The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.






            share|improve this answer
























              up vote
              0
              down vote













              I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:



              echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz


              The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.






              share|improve this answer






















                up vote
                0
                down vote










                up vote
                0
                down vote









                I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:



                echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz


                The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.






                share|improve this answer












                I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:



                echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz


                The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Oct 20 '17 at 15:26









                Stephen Ostermiller

                5242620




                5242620



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399369%2fmerge-possibly-truncated-gzipped-log-files%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    How to check contact read email or not when send email to Individual?

                    Displaying single band from multi-band raster using QGIS

                    How many registers does an x86_64 CPU actually have?