Merge possibly truncated gzipped log files

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz(Yesterday's log file)server-1-log.1.gz(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz(Today's log file re-transferred and intact)server-2-log.gz(Yesterday's log file)server-2-log.1.gz(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcatnot to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat? - Can I use a different decompression program instead?
gzip corruption
add a comment |Â
up vote
0
down vote
favorite
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz(Yesterday's log file)server-1-log.1.gz(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz(Today's log file re-transferred and intact)server-2-log.gz(Yesterday's log file)server-2-log.1.gz(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcatnot to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat? - Can I use a different decompression program instead?
gzip corruption
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz(Yesterday's log file)server-1-log.1.gz(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz(Today's log file re-transferred and intact)server-2-log.gz(Yesterday's log file)server-2-log.1.gz(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcatnot to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat? - Can I use a different decompression program instead?
gzip corruption
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz(Yesterday's log file)server-1-log.1.gz(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz(Today's log file re-transferred and intact)server-2-log.gz(Yesterday's log file)server-2-log.1.gz(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcatnot to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat? - Can I use a different decompression program instead?
gzip corruption
asked Oct 20 '17 at 15:10
Stephen Ostermiller
5242620
5242620
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip script version of zcat. That just executes gzip -dc, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip script version of zcat. That just executes gzip -dc, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip script version of zcat. That just executes gzip -dc, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip script version of zcat. That just executes gzip -dc, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.
IâÂÂm guessing youâÂÂre using the gzip script version of zcat. That just executes gzip -dc, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for loop or xargs as you found), or use Zutils which has a version of zcat which continues processing when it encounters errors.
answered Oct 20 '17 at 15:27
Stephen Kitt
144k22313378
144k22313378
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least for
zcat.â Stephen Kitt
Oct 20 '17 at 17:08
ItâÂÂs the default behaviour, at least for
zcat.â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
add a comment |Â
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
I found a way to do it. I can run each file through its own instance of zcat. To do so, I can use xargs -n 1 to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
answered Oct 20 '17 at 15:26
Stephen Ostermiller
5242620
5242620
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399369%2fmerge-possibly-truncated-gzipped-log-files%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password