cat on big files does not work

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












I'm trying to concatenate four big files in two. The files *_1P.gz contain the same amount of lines as the corrisponding *_2P.gz.



The files A_1P.gz and A_2P.gz both contain 1104507560 lines.

The files B_1P.gz and B_2P.gz both contain 1182136972 lines.



However, cat A_1P.gz B_1P.gz > C_1P.gz| wc -l returns 186974687 lines, and cat A_2P.gz B_2P.gz > C_2P.gz| wc -l returns 182952523 lines, so both are not only way smaller than the two input files (they should be more than 2B lines long and they're less than 2M instead), but also they have a different number of lines. The command ran showing no errors whatsoever.



I can't understand what's happening, I generated those four big files with cat as well and it worked properly.



  • What could the problem be?

  • What other options do I have to concatenate gzipped files without using cat?

I'm working on a CentOS server. I still have 197G space, so that shouldn't be an issue (or it should show an error, at least).







share|improve this question

















  • 14




    When you're counting lines, don't you want to count the uncompressed lines? Also the pipeline cat file1 file2 >file3 | wc -l does not make sense as wc would get no data. What's the command that you are actually using?
    – Kusalananda
    Jul 19 at 13:28










  • What command(s) did you use to count the lines in the original files? It's possible that you unintentionally used some wrapper that silently decompressed them first. Try showing the size in bytes (using wc -c) instead of lines.
    – JigglyNaga
    Jul 19 at 13:28











  • @Kusalananda I obtained the line count of the four big files doing zcat *P.gz | wc -l. The actual command was cat file1 file2 > file3; wc -l file3, but actually I didn't precede it with zcat, and that might be the root of my problem. If that's so, I'll feel really stupid...
    – LinuxBlanket
    Jul 19 at 13:31










  • @LinuxBlanket yes, you need to count the uncompressed lines, since lines are defined by n and there is no reason to expect to have a specific number of n characters in the compressed file.
    – terdon♦
    Jul 19 at 13:34






  • 1




    Obviously a cat eating big files is not an healthy diet. :)
    – Rui F Ribeiro
    Jul 19 at 13:58
















up vote
3
down vote

favorite












I'm trying to concatenate four big files in two. The files *_1P.gz contain the same amount of lines as the corrisponding *_2P.gz.



The files A_1P.gz and A_2P.gz both contain 1104507560 lines.

The files B_1P.gz and B_2P.gz both contain 1182136972 lines.



However, cat A_1P.gz B_1P.gz > C_1P.gz| wc -l returns 186974687 lines, and cat A_2P.gz B_2P.gz > C_2P.gz| wc -l returns 182952523 lines, so both are not only way smaller than the two input files (they should be more than 2B lines long and they're less than 2M instead), but also they have a different number of lines. The command ran showing no errors whatsoever.



I can't understand what's happening, I generated those four big files with cat as well and it worked properly.



  • What could the problem be?

  • What other options do I have to concatenate gzipped files without using cat?

I'm working on a CentOS server. I still have 197G space, so that shouldn't be an issue (or it should show an error, at least).







share|improve this question

















  • 14




    When you're counting lines, don't you want to count the uncompressed lines? Also the pipeline cat file1 file2 >file3 | wc -l does not make sense as wc would get no data. What's the command that you are actually using?
    – Kusalananda
    Jul 19 at 13:28










  • What command(s) did you use to count the lines in the original files? It's possible that you unintentionally used some wrapper that silently decompressed them first. Try showing the size in bytes (using wc -c) instead of lines.
    – JigglyNaga
    Jul 19 at 13:28











  • @Kusalananda I obtained the line count of the four big files doing zcat *P.gz | wc -l. The actual command was cat file1 file2 > file3; wc -l file3, but actually I didn't precede it with zcat, and that might be the root of my problem. If that's so, I'll feel really stupid...
    – LinuxBlanket
    Jul 19 at 13:31










  • @LinuxBlanket yes, you need to count the uncompressed lines, since lines are defined by n and there is no reason to expect to have a specific number of n characters in the compressed file.
    – terdon♦
    Jul 19 at 13:34






  • 1




    Obviously a cat eating big files is not an healthy diet. :)
    – Rui F Ribeiro
    Jul 19 at 13:58












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I'm trying to concatenate four big files in two. The files *_1P.gz contain the same amount of lines as the corrisponding *_2P.gz.



The files A_1P.gz and A_2P.gz both contain 1104507560 lines.

The files B_1P.gz and B_2P.gz both contain 1182136972 lines.



However, cat A_1P.gz B_1P.gz > C_1P.gz| wc -l returns 186974687 lines, and cat A_2P.gz B_2P.gz > C_2P.gz| wc -l returns 182952523 lines, so both are not only way smaller than the two input files (they should be more than 2B lines long and they're less than 2M instead), but also they have a different number of lines. The command ran showing no errors whatsoever.



I can't understand what's happening, I generated those four big files with cat as well and it worked properly.



  • What could the problem be?

  • What other options do I have to concatenate gzipped files without using cat?

I'm working on a CentOS server. I still have 197G space, so that shouldn't be an issue (or it should show an error, at least).







share|improve this question













I'm trying to concatenate four big files in two. The files *_1P.gz contain the same amount of lines as the corrisponding *_2P.gz.



The files A_1P.gz and A_2P.gz both contain 1104507560 lines.

The files B_1P.gz and B_2P.gz both contain 1182136972 lines.



However, cat A_1P.gz B_1P.gz > C_1P.gz| wc -l returns 186974687 lines, and cat A_2P.gz B_2P.gz > C_2P.gz| wc -l returns 182952523 lines, so both are not only way smaller than the two input files (they should be more than 2B lines long and they're less than 2M instead), but also they have a different number of lines. The command ran showing no errors whatsoever.



I can't understand what's happening, I generated those four big files with cat as well and it worked properly.



  • What could the problem be?

  • What other options do I have to concatenate gzipped files without using cat?

I'm working on a CentOS server. I still have 197G space, so that shouldn't be an issue (or it should show an error, at least).









share|improve this question












share|improve this question




share|improve this question








edited Jul 19 at 14:14









Kusalananda

101k13199311




101k13199311









asked Jul 19 at 13:24









LinuxBlanket

2261311




2261311







  • 14




    When you're counting lines, don't you want to count the uncompressed lines? Also the pipeline cat file1 file2 >file3 | wc -l does not make sense as wc would get no data. What's the command that you are actually using?
    – Kusalananda
    Jul 19 at 13:28










  • What command(s) did you use to count the lines in the original files? It's possible that you unintentionally used some wrapper that silently decompressed them first. Try showing the size in bytes (using wc -c) instead of lines.
    – JigglyNaga
    Jul 19 at 13:28











  • @Kusalananda I obtained the line count of the four big files doing zcat *P.gz | wc -l. The actual command was cat file1 file2 > file3; wc -l file3, but actually I didn't precede it with zcat, and that might be the root of my problem. If that's so, I'll feel really stupid...
    – LinuxBlanket
    Jul 19 at 13:31










  • @LinuxBlanket yes, you need to count the uncompressed lines, since lines are defined by n and there is no reason to expect to have a specific number of n characters in the compressed file.
    – terdon♦
    Jul 19 at 13:34






  • 1




    Obviously a cat eating big files is not an healthy diet. :)
    – Rui F Ribeiro
    Jul 19 at 13:58












  • 14




    When you're counting lines, don't you want to count the uncompressed lines? Also the pipeline cat file1 file2 >file3 | wc -l does not make sense as wc would get no data. What's the command that you are actually using?
    – Kusalananda
    Jul 19 at 13:28










  • What command(s) did you use to count the lines in the original files? It's possible that you unintentionally used some wrapper that silently decompressed them first. Try showing the size in bytes (using wc -c) instead of lines.
    – JigglyNaga
    Jul 19 at 13:28











  • @Kusalananda I obtained the line count of the four big files doing zcat *P.gz | wc -l. The actual command was cat file1 file2 > file3; wc -l file3, but actually I didn't precede it with zcat, and that might be the root of my problem. If that's so, I'll feel really stupid...
    – LinuxBlanket
    Jul 19 at 13:31










  • @LinuxBlanket yes, you need to count the uncompressed lines, since lines are defined by n and there is no reason to expect to have a specific number of n characters in the compressed file.
    – terdon♦
    Jul 19 at 13:34






  • 1




    Obviously a cat eating big files is not an healthy diet. :)
    – Rui F Ribeiro
    Jul 19 at 13:58







14




14




When you're counting lines, don't you want to count the uncompressed lines? Also the pipeline cat file1 file2 >file3 | wc -l does not make sense as wc would get no data. What's the command that you are actually using?
– Kusalananda
Jul 19 at 13:28




When you're counting lines, don't you want to count the uncompressed lines? Also the pipeline cat file1 file2 >file3 | wc -l does not make sense as wc would get no data. What's the command that you are actually using?
– Kusalananda
Jul 19 at 13:28












What command(s) did you use to count the lines in the original files? It's possible that you unintentionally used some wrapper that silently decompressed them first. Try showing the size in bytes (using wc -c) instead of lines.
– JigglyNaga
Jul 19 at 13:28





What command(s) did you use to count the lines in the original files? It's possible that you unintentionally used some wrapper that silently decompressed them first. Try showing the size in bytes (using wc -c) instead of lines.
– JigglyNaga
Jul 19 at 13:28













@Kusalananda I obtained the line count of the four big files doing zcat *P.gz | wc -l. The actual command was cat file1 file2 > file3; wc -l file3, but actually I didn't precede it with zcat, and that might be the root of my problem. If that's so, I'll feel really stupid...
– LinuxBlanket
Jul 19 at 13:31




@Kusalananda I obtained the line count of the four big files doing zcat *P.gz | wc -l. The actual command was cat file1 file2 > file3; wc -l file3, but actually I didn't precede it with zcat, and that might be the root of my problem. If that's so, I'll feel really stupid...
– LinuxBlanket
Jul 19 at 13:31












@LinuxBlanket yes, you need to count the uncompressed lines, since lines are defined by n and there is no reason to expect to have a specific number of n characters in the compressed file.
– terdon♦
Jul 19 at 13:34




@LinuxBlanket yes, you need to count the uncompressed lines, since lines are defined by n and there is no reason to expect to have a specific number of n characters in the compressed file.
– terdon♦
Jul 19 at 13:34




1




1




Obviously a cat eating big files is not an healthy diet. :)
– Rui F Ribeiro
Jul 19 at 13:58




Obviously a cat eating big files is not an healthy diet. :)
– Rui F Ribeiro
Jul 19 at 13:58










1 Answer
1






active

oldest

votes

















up vote
10
down vote



accepted










Note that the files are compressed. You can't therefore use wc -l on the files directly to count the original number of lines in them without decompressing them first.



It's OK to use cat for concatenating these types of compressed files as the resulting file is a valid compressed file in itself. Uncompressing it later would result in a file that is the concatenation of the uncompressed data from the two files.



cat A_1P.gz B_1P.gz >C_1P.gz


To count the number of lines in C_1P.gz:



zcat C_1P.gz | wc -l


or



gunzip -c C_1P.gz | wc -l


or



gzip -dc C_1P.gz | wc -l


but note that we need to uncompress the file to count the lines, otherwise we'll be counting the "random" newlines that the file compression algorithm generates as part of the compressed data (these have nothing to do with the lines in your uncompressed file).






share|improve this answer























  • ...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
    – LinuxBlanket
    Jul 19 at 13:54










  • @LinuxBlanket It's an easy mistake to make.
    – Kusalananda
    Jul 19 at 14:00










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457218%2fcat-on-big-files-does-not-work%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
10
down vote



accepted










Note that the files are compressed. You can't therefore use wc -l on the files directly to count the original number of lines in them without decompressing them first.



It's OK to use cat for concatenating these types of compressed files as the resulting file is a valid compressed file in itself. Uncompressing it later would result in a file that is the concatenation of the uncompressed data from the two files.



cat A_1P.gz B_1P.gz >C_1P.gz


To count the number of lines in C_1P.gz:



zcat C_1P.gz | wc -l


or



gunzip -c C_1P.gz | wc -l


or



gzip -dc C_1P.gz | wc -l


but note that we need to uncompress the file to count the lines, otherwise we'll be counting the "random" newlines that the file compression algorithm generates as part of the compressed data (these have nothing to do with the lines in your uncompressed file).






share|improve this answer























  • ...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
    – LinuxBlanket
    Jul 19 at 13:54










  • @LinuxBlanket It's an easy mistake to make.
    – Kusalananda
    Jul 19 at 14:00














up vote
10
down vote



accepted










Note that the files are compressed. You can't therefore use wc -l on the files directly to count the original number of lines in them without decompressing them first.



It's OK to use cat for concatenating these types of compressed files as the resulting file is a valid compressed file in itself. Uncompressing it later would result in a file that is the concatenation of the uncompressed data from the two files.



cat A_1P.gz B_1P.gz >C_1P.gz


To count the number of lines in C_1P.gz:



zcat C_1P.gz | wc -l


or



gunzip -c C_1P.gz | wc -l


or



gzip -dc C_1P.gz | wc -l


but note that we need to uncompress the file to count the lines, otherwise we'll be counting the "random" newlines that the file compression algorithm generates as part of the compressed data (these have nothing to do with the lines in your uncompressed file).






share|improve this answer























  • ...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
    – LinuxBlanket
    Jul 19 at 13:54










  • @LinuxBlanket It's an easy mistake to make.
    – Kusalananda
    Jul 19 at 14:00












up vote
10
down vote



accepted







up vote
10
down vote



accepted






Note that the files are compressed. You can't therefore use wc -l on the files directly to count the original number of lines in them without decompressing them first.



It's OK to use cat for concatenating these types of compressed files as the resulting file is a valid compressed file in itself. Uncompressing it later would result in a file that is the concatenation of the uncompressed data from the two files.



cat A_1P.gz B_1P.gz >C_1P.gz


To count the number of lines in C_1P.gz:



zcat C_1P.gz | wc -l


or



gunzip -c C_1P.gz | wc -l


or



gzip -dc C_1P.gz | wc -l


but note that we need to uncompress the file to count the lines, otherwise we'll be counting the "random" newlines that the file compression algorithm generates as part of the compressed data (these have nothing to do with the lines in your uncompressed file).






share|improve this answer















Note that the files are compressed. You can't therefore use wc -l on the files directly to count the original number of lines in them without decompressing them first.



It's OK to use cat for concatenating these types of compressed files as the resulting file is a valid compressed file in itself. Uncompressing it later would result in a file that is the concatenation of the uncompressed data from the two files.



cat A_1P.gz B_1P.gz >C_1P.gz


To count the number of lines in C_1P.gz:



zcat C_1P.gz | wc -l


or



gunzip -c C_1P.gz | wc -l


or



gzip -dc C_1P.gz | wc -l


but note that we need to uncompress the file to count the lines, otherwise we'll be counting the "random" newlines that the file compression algorithm generates as part of the compressed data (these have nothing to do with the lines in your uncompressed file).







share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 19 at 14:15


























answered Jul 19 at 13:31









Kusalananda

101k13199311




101k13199311











  • ...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
    – LinuxBlanket
    Jul 19 at 13:54










  • @LinuxBlanket It's an easy mistake to make.
    – Kusalananda
    Jul 19 at 14:00
















  • ...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
    – LinuxBlanket
    Jul 19 at 13:54










  • @LinuxBlanket It's an easy mistake to make.
    – Kusalananda
    Jul 19 at 14:00















...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
– LinuxBlanket
Jul 19 at 13:54




...yes, I realized thanks to your comment that, unlike for A_1P.gz and B_1P.gz line count, I didn't uncompress the file before counting lines, and doing zcat file | wc -l yielded the correct line number. I'm sorry for the silly question, I don't know how I didn't see it before...
– LinuxBlanket
Jul 19 at 13:54












@LinuxBlanket It's an easy mistake to make.
– Kusalananda
Jul 19 at 14:00




@LinuxBlanket It's an easy mistake to make.
– Kusalananda
Jul 19 at 14:00












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457218%2fcat-on-big-files-does-not-work%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay