Merge possibly truncated gzipped log files
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz
(Yesterday's log file)server-1-log.1.gz
(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz
(Today's log file re-transferred and intact)server-2-log.gz
(Yesterday's log file)server-2-log.1.gz
(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat
:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcat
not to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat
? - Can I use a different decompression program instead?
gzip corruption
add a comment |Â
up vote
0
down vote
favorite
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz
(Yesterday's log file)server-1-log.1.gz
(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz
(Today's log file re-transferred and intact)server-2-log.gz
(Yesterday's log file)server-2-log.1.gz
(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat
:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcat
not to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat
? - Can I use a different decompression program instead?
gzip corruption
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz
(Yesterday's log file)server-1-log.1.gz
(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz
(Today's log file re-transferred and intact)server-2-log.gz
(Yesterday's log file)server-2-log.1.gz
(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat
:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcat
not to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat
? - Can I use a different decompression program instead?
gzip corruption
I have multiple log files from each day that I need to merge together. Each comes from a different server. The job that puts them there sometimes gets interrupted and files get truncated. In that case the file gets written with a different name next time it runs. So I may end up with a list of log files like:
server-1-log.gz
(Yesterday's log file)server-1-log.1.gz
(Today's log file that got interrupted while transferring and is truncated)server-1-log.2.gz
(Today's log file re-transferred and intact)server-2-log.gz
(Yesterday's log file)server-2-log.1.gz
(Today's log file)
All the log files start with a time stamp on each line, so it is fairly trivial to sort and de-duplicate them. I've been trying to merge these files using the command:
zcat *.gz | sort | uniq | gzip > /tmp/merged.gz
The problem is that the truncated log file produces the following error from zcat
:
gzip: server-1-log.1.gz: unexpected end of file
It turns out that zcat completely exits when it hits this error, without reading all the data from the other files. I end up losing the data that exists in the other good files because one of the files is corrupt. How can I fix this?
- Can I tell
zcat
not to exit on errors? I don't see anything in the man page for it. - Can I fix truncated gzip files before calling
zcat
? - Can I use a different decompression program instead?
gzip corruption
asked Oct 20 '17 at 15:10
Stephen Ostermiller
5242620
5242620
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip
script version of zcat
. That just executes gzip -dc
, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat
, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for
loop or xargs
as you found), or use Zutils which has a version of zcat
which continues processing when it encounters errors.
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat
.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat
. To do so, I can use xargs -n 1
to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip
script version of zcat
. That just executes gzip -dc
, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat
, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for
loop or xargs
as you found), or use Zutils which has a version of zcat
which continues processing when it encounters errors.
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat
.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip
script version of zcat
. That just executes gzip -dc
, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat
, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for
loop or xargs
as you found), or use Zutils which has a version of zcat
which continues processing when it encounters errors.
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat
.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
IâÂÂm guessing youâÂÂre using the gzip
script version of zcat
. That just executes gzip -dc
, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat
, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for
loop or xargs
as you found), or use Zutils which has a version of zcat
which continues processing when it encounters errors.
IâÂÂm guessing youâÂÂre using the gzip
script version of zcat
. That just executes gzip -dc
, which canâÂÂt be told to ignore errors and stops when it encounters one.
The documented fix for individual corrupted compressed files is to run them through zcat
, so you wonâÂÂt get much help there...
To process your files, you can either loop over them (with a for
loop or xargs
as you found), or use Zutils which has a version of zcat
which continues processing when it encounters errors.
answered Oct 20 '17 at 15:27
Stephen Kitt
144k22313378
144k22313378
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat
.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least forzcat
.
â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
Is the "continue processing on errors" from zutils standard, or does it need a command line argument for it?
â Stephen Ostermiller
Oct 20 '17 at 16:56
ItâÂÂs the default behaviour, at least for
zcat
.â Stephen Kitt
Oct 20 '17 at 17:08
ItâÂÂs the default behaviour, at least for
zcat
.â Stephen Kitt
Oct 20 '17 at 17:08
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
Thanks. zutils just got installed on all my servers. Seems like a nice package with significant functionality over the standard stuff.
â Stephen Ostermiller
Oct 20 '17 at 17:13
add a comment |Â
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat
. To do so, I can use xargs -n 1
to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
add a comment |Â
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat
. To do so, I can use xargs -n 1
to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I found a way to do it. I can run each file through its own instance of zcat
. To do so, I can use xargs -n 1
to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
I found a way to do it. I can run each file through its own instance of zcat
. To do so, I can use xargs -n 1
to start an instance of zcat for each file:
echo *.gz | xargs -n 1 zcat | sort | uniq | gzip > /tmp/merged.gz
The single zcat still fails, but the other ones run to completion. It doesn't kill the whole pipe.
answered Oct 20 '17 at 15:26
Stephen Ostermiller
5242620
5242620
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399369%2fmerge-possibly-truncated-gzipped-log-files%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password