Is there a quickest way to count lines in file which is of 4TB in linux?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Actually I have 4 TB of Txt file as Exported from Teradata records and I want to know How many records are there in that file.
text-processing cat wc
add a comment |
Actually I have 4 TB of Txt file as Exported from Teradata records and I want to know How many records are there in that file.
text-processing cat wc
Is each line a record? If yes, you can just usewc -l
– Panki
Mar 7 at 10:56
This doesn’t answer the stated question, but the fastest way would be to ask your Teradata system.
– Stephen Kitt
Mar 7 at 11:08
If the export happened to put a comment at the top, that'd make it pretty fast to find.
– Jeff Schaller♦
Mar 7 at 11:21
I tried Using vim -R filename it took around 1.5 Hrs
– Santosh Garole
Mar 8 at 7:45
add a comment |
Actually I have 4 TB of Txt file as Exported from Teradata records and I want to know How many records are there in that file.
text-processing cat wc
Actually I have 4 TB of Txt file as Exported from Teradata records and I want to know How many records are there in that file.
text-processing cat wc
text-processing cat wc
edited Mar 7 at 10:56
Jeff Schaller♦
44.6k1162145
44.6k1162145
asked Mar 7 at 10:53
Santosh GaroleSantosh Garole
62
62
Is each line a record? If yes, you can just usewc -l
– Panki
Mar 7 at 10:56
This doesn’t answer the stated question, but the fastest way would be to ask your Teradata system.
– Stephen Kitt
Mar 7 at 11:08
If the export happened to put a comment at the top, that'd make it pretty fast to find.
– Jeff Schaller♦
Mar 7 at 11:21
I tried Using vim -R filename it took around 1.5 Hrs
– Santosh Garole
Mar 8 at 7:45
add a comment |
Is each line a record? If yes, you can just usewc -l
– Panki
Mar 7 at 10:56
This doesn’t answer the stated question, but the fastest way would be to ask your Teradata system.
– Stephen Kitt
Mar 7 at 11:08
If the export happened to put a comment at the top, that'd make it pretty fast to find.
– Jeff Schaller♦
Mar 7 at 11:21
I tried Using vim -R filename it took around 1.5 Hrs
– Santosh Garole
Mar 8 at 7:45
Is each line a record? If yes, you can just use
wc -l
– Panki
Mar 7 at 10:56
Is each line a record? If yes, you can just use
wc -l
– Panki
Mar 7 at 10:56
This doesn’t answer the stated question, but the fastest way would be to ask your Teradata system.
– Stephen Kitt
Mar 7 at 11:08
This doesn’t answer the stated question, but the fastest way would be to ask your Teradata system.
– Stephen Kitt
Mar 7 at 11:08
If the export happened to put a comment at the top, that'd make it pretty fast to find.
– Jeff Schaller♦
Mar 7 at 11:21
If the export happened to put a comment at the top, that'd make it pretty fast to find.
– Jeff Schaller♦
Mar 7 at 11:21
I tried Using vim -R filename it took around 1.5 Hrs
– Santosh Garole
Mar 8 at 7:45
I tried Using vim -R filename it took around 1.5 Hrs
– Santosh Garole
Mar 8 at 7:45
add a comment |
2 Answers
2
active
oldest
votes
If this information is not already present as meta data in a separate file (or embedded in the data, or available through a query to the system that you exported the data from) and if there is no index file of some description available, then the quickest way to count the number of lines is by using wc -l
on the file.
You can not really do it quicker.
To count the number of records in the file, you will have to know what record separator is in used and use something like awk
to count these. Again, that is if this information is not already stored elsewhere as meta data and if it's not available through a query to the originating system, and if the records themselves are not already enumerated and sorted within the file.
add a comment |
You should not use line based utilities such as awk
and sed
. These utilities will issue a read()
system call for every line in the input file (see that answer on why this is so). If you have lots of lines, this will be a huge performance loss.
Since your file is 4TB in size, I guess that there are a lot of lines. So even wc -l
will produce a lot of read()
system calls, since it reads only 16384
bytes per call (on my system). Anyway this would be an improvement over awk
and sed
. The best method - unless you write your own program - might be just
cat file | wc -l
This is no useless use of cat, because cat
reads chunks of 131072
bytes per read()
system call (on my system) and wc -l
will issue more, but not on the file directly, instead on the pipe. But however, cat
tries to read as much as possible per system call.
Won't an io redirect be faster thancat
and pipe ?
– RoVo
Mar 7 at 12:22
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
Short test with 10 iterations ofwc -l
with a 701MB file:wc -l file
1.7s ;;wc -l < file
1.7s ;;cat file | wc -l
2.6s.
– RoVo
Mar 7 at 12:28
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f504892%2fis-there-a-quickest-way-to-count-lines-in-file-which-is-of-4tb-in-linux%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If this information is not already present as meta data in a separate file (or embedded in the data, or available through a query to the system that you exported the data from) and if there is no index file of some description available, then the quickest way to count the number of lines is by using wc -l
on the file.
You can not really do it quicker.
To count the number of records in the file, you will have to know what record separator is in used and use something like awk
to count these. Again, that is if this information is not already stored elsewhere as meta data and if it's not available through a query to the originating system, and if the records themselves are not already enumerated and sorted within the file.
add a comment |
If this information is not already present as meta data in a separate file (or embedded in the data, or available through a query to the system that you exported the data from) and if there is no index file of some description available, then the quickest way to count the number of lines is by using wc -l
on the file.
You can not really do it quicker.
To count the number of records in the file, you will have to know what record separator is in used and use something like awk
to count these. Again, that is if this information is not already stored elsewhere as meta data and if it's not available through a query to the originating system, and if the records themselves are not already enumerated and sorted within the file.
add a comment |
If this information is not already present as meta data in a separate file (or embedded in the data, or available through a query to the system that you exported the data from) and if there is no index file of some description available, then the quickest way to count the number of lines is by using wc -l
on the file.
You can not really do it quicker.
To count the number of records in the file, you will have to know what record separator is in used and use something like awk
to count these. Again, that is if this information is not already stored elsewhere as meta data and if it's not available through a query to the originating system, and if the records themselves are not already enumerated and sorted within the file.
If this information is not already present as meta data in a separate file (or embedded in the data, or available through a query to the system that you exported the data from) and if there is no index file of some description available, then the quickest way to count the number of lines is by using wc -l
on the file.
You can not really do it quicker.
To count the number of records in the file, you will have to know what record separator is in used and use something like awk
to count these. Again, that is if this information is not already stored elsewhere as meta data and if it's not available through a query to the originating system, and if the records themselves are not already enumerated and sorted within the file.
edited Mar 7 at 11:09
answered Mar 7 at 10:58
Kusalananda♦Kusalananda
139k17261433
139k17261433
add a comment |
add a comment |
You should not use line based utilities such as awk
and sed
. These utilities will issue a read()
system call for every line in the input file (see that answer on why this is so). If you have lots of lines, this will be a huge performance loss.
Since your file is 4TB in size, I guess that there are a lot of lines. So even wc -l
will produce a lot of read()
system calls, since it reads only 16384
bytes per call (on my system). Anyway this would be an improvement over awk
and sed
. The best method - unless you write your own program - might be just
cat file | wc -l
This is no useless use of cat, because cat
reads chunks of 131072
bytes per read()
system call (on my system) and wc -l
will issue more, but not on the file directly, instead on the pipe. But however, cat
tries to read as much as possible per system call.
Won't an io redirect be faster thancat
and pipe ?
– RoVo
Mar 7 at 12:22
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
Short test with 10 iterations ofwc -l
with a 701MB file:wc -l file
1.7s ;;wc -l < file
1.7s ;;cat file | wc -l
2.6s.
– RoVo
Mar 7 at 12:28
add a comment |
You should not use line based utilities such as awk
and sed
. These utilities will issue a read()
system call for every line in the input file (see that answer on why this is so). If you have lots of lines, this will be a huge performance loss.
Since your file is 4TB in size, I guess that there are a lot of lines. So even wc -l
will produce a lot of read()
system calls, since it reads only 16384
bytes per call (on my system). Anyway this would be an improvement over awk
and sed
. The best method - unless you write your own program - might be just
cat file | wc -l
This is no useless use of cat, because cat
reads chunks of 131072
bytes per read()
system call (on my system) and wc -l
will issue more, but not on the file directly, instead on the pipe. But however, cat
tries to read as much as possible per system call.
Won't an io redirect be faster thancat
and pipe ?
– RoVo
Mar 7 at 12:22
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
Short test with 10 iterations ofwc -l
with a 701MB file:wc -l file
1.7s ;;wc -l < file
1.7s ;;cat file | wc -l
2.6s.
– RoVo
Mar 7 at 12:28
add a comment |
You should not use line based utilities such as awk
and sed
. These utilities will issue a read()
system call for every line in the input file (see that answer on why this is so). If you have lots of lines, this will be a huge performance loss.
Since your file is 4TB in size, I guess that there are a lot of lines. So even wc -l
will produce a lot of read()
system calls, since it reads only 16384
bytes per call (on my system). Anyway this would be an improvement over awk
and sed
. The best method - unless you write your own program - might be just
cat file | wc -l
This is no useless use of cat, because cat
reads chunks of 131072
bytes per read()
system call (on my system) and wc -l
will issue more, but not on the file directly, instead on the pipe. But however, cat
tries to read as much as possible per system call.
You should not use line based utilities such as awk
and sed
. These utilities will issue a read()
system call for every line in the input file (see that answer on why this is so). If you have lots of lines, this will be a huge performance loss.
Since your file is 4TB in size, I guess that there are a lot of lines. So even wc -l
will produce a lot of read()
system calls, since it reads only 16384
bytes per call (on my system). Anyway this would be an improvement over awk
and sed
. The best method - unless you write your own program - might be just
cat file | wc -l
This is no useless use of cat, because cat
reads chunks of 131072
bytes per read()
system call (on my system) and wc -l
will issue more, but not on the file directly, instead on the pipe. But however, cat
tries to read as much as possible per system call.
edited Mar 7 at 12:22
answered Mar 7 at 12:16
chaoschaos
36k977120
36k977120
Won't an io redirect be faster thancat
and pipe ?
– RoVo
Mar 7 at 12:22
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
Short test with 10 iterations ofwc -l
with a 701MB file:wc -l file
1.7s ;;wc -l < file
1.7s ;;cat file | wc -l
2.6s.
– RoVo
Mar 7 at 12:28
add a comment |
Won't an io redirect be faster thancat
and pipe ?
– RoVo
Mar 7 at 12:22
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
Short test with 10 iterations ofwc -l
with a 701MB file:wc -l file
1.7s ;;wc -l < file
1.7s ;;cat file | wc -l
2.6s.
– RoVo
Mar 7 at 12:28
Won't an io redirect be faster than
cat
and pipe ?– RoVo
Mar 7 at 12:22
Won't an io redirect be faster than
cat
and pipe ?– RoVo
Mar 7 at 12:22
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
@RoVo Could be, have you tried it?
– chaos
Mar 7 at 12:25
Short test with 10 iterations of
wc -l
with a 701MB file: wc -l file
1.7s ;; wc -l < file
1.7s ;; cat file | wc -l
2.6s.– RoVo
Mar 7 at 12:28
Short test with 10 iterations of
wc -l
with a 701MB file: wc -l file
1.7s ;; wc -l < file
1.7s ;; cat file | wc -l
2.6s.– RoVo
Mar 7 at 12:28
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f504892%2fis-there-a-quickest-way-to-count-lines-in-file-which-is-of-4tb-in-linux%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is each line a record? If yes, you can just use
wc -l
– Panki
Mar 7 at 10:56
This doesn’t answer the stated question, but the fastest way would be to ask your Teradata system.
– Stephen Kitt
Mar 7 at 11:08
If the export happened to put a comment at the top, that'd make it pretty fast to find.
– Jeff Schaller♦
Mar 7 at 11:21
I tried Using vim -R filename it took around 1.5 Hrs
– Santosh Garole
Mar 8 at 7:45