Merge possibly truncated gzipped log files

up vote
0
down vote

favorite

I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:

server-1-log.gz (Yesterday's log file)

server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)

server-1-log.2.gz (Today's log file re-transferred and intact)

server-2-log.gz (Yesterday's log file)

server-2-log.1.gz (Today's log file)

All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:

zcat *.gz | sort | uniq | gzip > /tmp/merged.gz

The problem is that the truncated log file produces the following error from zcat:

gzip: server-1-log.1.gz: unexpected end of file

It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?

Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

Can I fix truncated gzip files before calling zcat?

Can I use a different decompression program instead?

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

add a commentÂ |Â

up vote
0
down vote

favorite

server-1-log.gz (Yesterday's log file)

server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)

server-1-log.2.gz (Today's log file re-transferred and intact)

server-2-log.gz (Yesterday's log file)

server-2-log.1.gz (Today's log file)

All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:

zcat *.gz | sort | uniq | gzip > /tmp/merged.gz

The problem is that the truncated log file produces the following error from zcat:

gzip: server-1-log.1.gz: unexpected end of file

Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

Can I fix truncated gzip files before calling zcat?

Can I use a different decompression program instead?

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

add a commentÂ |Â

up vote
0
down vote

favorite

server-1-log.gz (Yesterday's log file)

server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)

server-1-log.2.gz (Today's log file re-transferred and intact)

server-2-log.gz (Yesterday's log file)

server-2-log.1.gz (Today's log file)

All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:

zcat *.gz | sort | uniq | gzip > /tmp/merged.gz

The problem is that the truncated log file produces the following error from zcat:

gzip: server-1-log.1.gz: unexpected end of file

Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

Can I fix truncated gzip files before calling zcat?

Can I use a different decompression program instead?

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

server-1-log.gz (Yesterday's log file)

server-1-log.1.gz (Today's log file that got interrupted while transferring and is truncated)

server-1-log.2.gz (Today's log file re-transferred and intact)

server-2-log.gz (Yesterday's log file)

server-2-log.1.gz (Today's log file)

All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:

zcat *.gz | sort | uniq | gzip > /tmp/merged.gz

The problem is that the truncated log file produces the following error from zcat:

gzip: server-1-log.1.gz: unexpected end of file

Can I tell zcat not to exit on errors? I don't see anything in the man page for it.

Can I fix truncated gzip files before calling zcat?

Can I use a different decompression program instead?

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

asked Oct 20 '17 at 15:10

Stephen Ostermiller

5242620

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

IÃ¢Â€Â™m guessing youÃ¢Â€Â™re using the gzip script version of zcat. That just executes gzip -dc, which canÃ¢Â€Â™t be told to ignore errors and stops when it encounters one.

The documented fix for individual corrupted compressed files is to run them through zcat, so you wonÃ¢Â€Â™t get much help there...

To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â€“Â Stephen Ostermiller
Oct 20 '17 at 16:56

ItÃ¢Â€Â™s the default behaviour, at least for zcat.
â€“Â Stephen Kitt
Oct 20 '17 at 17:08

Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â€“Â Stephen Ostermiller
Oct 20 '17 at 17:13

add a commentÂ |Â

up vote
0
down vote

I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:

echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz

The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399369%2fmerge-possibly-truncated-gzipped-log-files%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

The documented fix for individual corrupted compressed files is to run them through zcat, so you wonÃ¢Â€Â™t get much help there...

To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â€“Â Stephen Ostermiller
Oct 20 '17 at 16:56

ItÃ¢Â€Â™s the default behaviour, at least for zcat.
â€“Â Stephen Kitt
Oct 20 '17 at 17:08

Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â€“Â Stephen Ostermiller
Oct 20 '17 at 17:13

add a commentÂ |Â

up vote
1
down vote

accepted

The documented fix for individual corrupted compressed files is to run them through zcat, so you wonÃ¢Â€Â™t get much help there...

To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â€“Â Stephen Ostermiller
Oct 20 '17 at 16:56

ItÃ¢Â€Â™s the default behaviour, at least for zcat.
â€“Â Stephen Kitt
Oct 20 '17 at 17:08

Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â€“Â Stephen Ostermiller
Oct 20 '17 at 17:13

add a commentÂ |Â

up vote
1
down vote

accepted

The documented fix for individual corrupted compressed files is to run them through zcat, so you wonÃ¢Â€Â™t get much help there...

To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

The documented fix for individual corrupted compressed files is to run them through zcat, so you wonÃ¢Â€Â™t get much help there...

To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

answered Oct 20 '17 at 15:27

Stephen Kitt

144k22313378

Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â€“Â Stephen Ostermiller
Oct 20 '17 at 16:56

ItÃ¢Â€Â™s the default behaviour, at least for zcat.
â€“Â Stephen Kitt
Oct 20 '17 at 17:08

Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â€“Â Stephen Ostermiller
Oct 20 '17 at 17:13

add a commentÂ |Â

Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â€“Â Stephen Ostermiller
Oct 20 '17 at 16:56

ItÃ¢Â€Â™s the default behaviour, at least for zcat.
â€“Â Stephen Kitt
Oct 20 '17 at 17:08

Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â€“Â Stephen Ostermiller
Oct 20 '17 at 17:13

Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â€“Â Stephen Ostermiller
Oct 20 '17 at 16:56

ItÃ¢Â€Â™s the default behaviour, at least for zcat.
â€“Â Stephen Kitt
Oct 20 '17 at 17:08

Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â€“Â Stephen Ostermiller
Oct 20 '17 at 17:13

add a commentÂ |Â

up vote
0
down vote

I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:

echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz

The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

add a commentÂ |Â

up vote
0
down vote

I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:

echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz

The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

add a commentÂ |Â

up vote
0
down vote

I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:

echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz

The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:

echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz

The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

answered Oct 20 '17 at 15:26

Stephen Ostermiller

5242620

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu