Extract middle section of lines of a text file?

Clash Royale CLAN TAG#URR8PPP
I am writing a PHP script to parse a large text file to do database inserts from it. However on my host, the file is too large, and I hit the memory limit for PHP.
The file has about 16,000 lines; I want to split it up into four separate files (at first) to see if I can load those.
The first part I can get with head -4000 file.txt. The middle sections are slightly trickier -- I was thinking about piping tail output into head ( tail -4001 file.txt | head -4000 > section2.txt ), but is there another/better way?
Actually my logic is messed up -- for section two, I would need to so something like tail -12001 file.txt | head - 4000, and then lower the tail argument for the next sections. I'm getting mixed up already! :P
shell command-line text-processing
add a comment |
I am writing a PHP script to parse a large text file to do database inserts from it. However on my host, the file is too large, and I hit the memory limit for PHP.
The file has about 16,000 lines; I want to split it up into four separate files (at first) to see if I can load those.
The first part I can get with head -4000 file.txt. The middle sections are slightly trickier -- I was thinking about piping tail output into head ( tail -4001 file.txt | head -4000 > section2.txt ), but is there another/better way?
Actually my logic is messed up -- for section two, I would need to so something like tail -12001 file.txt | head - 4000, and then lower the tail argument for the next sections. I'm getting mixed up already! :P
shell command-line text-processing
add a comment |
I am writing a PHP script to parse a large text file to do database inserts from it. However on my host, the file is too large, and I hit the memory limit for PHP.
The file has about 16,000 lines; I want to split it up into four separate files (at first) to see if I can load those.
The first part I can get with head -4000 file.txt. The middle sections are slightly trickier -- I was thinking about piping tail output into head ( tail -4001 file.txt | head -4000 > section2.txt ), but is there another/better way?
Actually my logic is messed up -- for section two, I would need to so something like tail -12001 file.txt | head - 4000, and then lower the tail argument for the next sections. I'm getting mixed up already! :P
shell command-line text-processing
I am writing a PHP script to parse a large text file to do database inserts from it. However on my host, the file is too large, and I hit the memory limit for PHP.
The file has about 16,000 lines; I want to split it up into four separate files (at first) to see if I can load those.
The first part I can get with head -4000 file.txt. The middle sections are slightly trickier -- I was thinking about piping tail output into head ( tail -4001 file.txt | head -4000 > section2.txt ), but is there another/better way?
Actually my logic is messed up -- for section two, I would need to so something like tail -12001 file.txt | head - 4000, and then lower the tail argument for the next sections. I'm getting mixed up already! :P
shell command-line text-processing
shell command-line text-processing
edited Dec 30 '18 at 20:34
Peter Mortensen
88758
88758
asked Oct 14 '11 at 16:56
user394user394
4,887155172
4,887155172
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
If you want not to get messed up but still do it using tail and head, there is a useful way of invoking tail using a line-count from the beginning, not the end:
tail -n +4001 yourfile | head -4000
... But a better, automatic tool made just for splitting files is called... split! It's also a part of GNU coreutils, so any normal Linux system should have it. Here's how you can use it:
split -l 4000 yourInputFile thePrefixForOutputFiles
(See man split if in doubt.)
add a comment |
Combining head and tail as you did will work, but for this I would use sed
sed -n '1,4000p' input_file # print lines 1-4000 of input_file
This lets you solve your problem with a quick shell function
chunk_it()
step=4
start=1
end=$step
for n in 1..4 ; do
sed -n "$start,$endp" "$1" > "$1".$start-$end
let start+=$step
let end+=$step
done
chunk_it your_file
Now you have your_file.1-4000 and yuor_file.4001-8000 and so on.
Note: requires bash
3
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22623%2fextract-middle-section-of-lines-of-a-text-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you want not to get messed up but still do it using tail and head, there is a useful way of invoking tail using a line-count from the beginning, not the end:
tail -n +4001 yourfile | head -4000
... But a better, automatic tool made just for splitting files is called... split! It's also a part of GNU coreutils, so any normal Linux system should have it. Here's how you can use it:
split -l 4000 yourInputFile thePrefixForOutputFiles
(See man split if in doubt.)
add a comment |
If you want not to get messed up but still do it using tail and head, there is a useful way of invoking tail using a line-count from the beginning, not the end:
tail -n +4001 yourfile | head -4000
... But a better, automatic tool made just for splitting files is called... split! It's also a part of GNU coreutils, so any normal Linux system should have it. Here's how you can use it:
split -l 4000 yourInputFile thePrefixForOutputFiles
(See man split if in doubt.)
add a comment |
If you want not to get messed up but still do it using tail and head, there is a useful way of invoking tail using a line-count from the beginning, not the end:
tail -n +4001 yourfile | head -4000
... But a better, automatic tool made just for splitting files is called... split! It's also a part of GNU coreutils, so any normal Linux system should have it. Here's how you can use it:
split -l 4000 yourInputFile thePrefixForOutputFiles
(See man split if in doubt.)
If you want not to get messed up but still do it using tail and head, there is a useful way of invoking tail using a line-count from the beginning, not the end:
tail -n +4001 yourfile | head -4000
... But a better, automatic tool made just for splitting files is called... split! It's also a part of GNU coreutils, so any normal Linux system should have it. Here's how you can use it:
split -l 4000 yourInputFile thePrefixForOutputFiles
(See man split if in doubt.)
edited Oct 14 '11 at 17:19
answered Oct 14 '11 at 17:13
rozcietrzewiaczrozcietrzewiacz
29k47292
29k47292
add a comment |
add a comment |
Combining head and tail as you did will work, but for this I would use sed
sed -n '1,4000p' input_file # print lines 1-4000 of input_file
This lets you solve your problem with a quick shell function
chunk_it()
step=4
start=1
end=$step
for n in 1..4 ; do
sed -n "$start,$endp" "$1" > "$1".$start-$end
let start+=$step
let end+=$step
done
chunk_it your_file
Now you have your_file.1-4000 and yuor_file.4001-8000 and so on.
Note: requires bash
3
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
add a comment |
Combining head and tail as you did will work, but for this I would use sed
sed -n '1,4000p' input_file # print lines 1-4000 of input_file
This lets you solve your problem with a quick shell function
chunk_it()
step=4
start=1
end=$step
for n in 1..4 ; do
sed -n "$start,$endp" "$1" > "$1".$start-$end
let start+=$step
let end+=$step
done
chunk_it your_file
Now you have your_file.1-4000 and yuor_file.4001-8000 and so on.
Note: requires bash
3
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
add a comment |
Combining head and tail as you did will work, but for this I would use sed
sed -n '1,4000p' input_file # print lines 1-4000 of input_file
This lets you solve your problem with a quick shell function
chunk_it()
step=4
start=1
end=$step
for n in 1..4 ; do
sed -n "$start,$endp" "$1" > "$1".$start-$end
let start+=$step
let end+=$step
done
chunk_it your_file
Now you have your_file.1-4000 and yuor_file.4001-8000 and so on.
Note: requires bash
Combining head and tail as you did will work, but for this I would use sed
sed -n '1,4000p' input_file # print lines 1-4000 of input_file
This lets you solve your problem with a quick shell function
chunk_it()
step=4
start=1
end=$step
for n in 1..4 ; do
sed -n "$start,$endp" "$1" > "$1".$start-$end
let start+=$step
let end+=$step
done
chunk_it your_file
Now you have your_file.1-4000 and yuor_file.4001-8000 and so on.
Note: requires bash
answered Oct 14 '11 at 17:16
SorpigalSorpigal
907610
907610
3
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
add a comment |
3
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
3
3
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
I like the sed way.
– fanchyna
Feb 20 '16 at 15:38
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
This doesn't work for me because sed doesn't exit. It prints out the lines I want to stdout, but I have to ctrl-c out, and as a result, I can't redirect it to a file. Any suggestion to make it usable?
– Brent212
Jun 30 '17 at 18:41
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
Figured it out! "sed -n '<start_line>,<end_line>w <output_file>' <input_file>" works for me.
– Brent212
Jun 30 '17 at 18:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
@Brent212 Another option to note is that you can also pipe it into less or redirect the output to a file.
– Kyle s
Dec 19 '18 at 19:54
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22623%2fextract-middle-section-of-lines-of-a-text-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown