How to split a file into paragraphs and name the resulting pieces based on an identifier present in each paragraph
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier
which is present between the lines BEGIN JOB
and END JOB
Sample data
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns
ADHOC_Extract.txt
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
HOC_Extract.txt
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
I am ok even to write a shell script for the same
text-processing awk sed split
New contributor
add a comment |Â
up vote
0
down vote
favorite
I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier
which is present between the lines BEGIN JOB
and END JOB
Sample data
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns
ADHOC_Extract.txt
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
HOC_Extract.txt
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
I am ok even to write a shell script for the same
text-processing awk sed split
New contributor
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier
which is present between the lines BEGIN JOB
and END JOB
Sample data
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns
ADHOC_Extract.txt
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
HOC_Extract.txt
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
I am ok even to write a shell script for the same
text-processing awk sed split
New contributor
I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier
which is present between the lines BEGIN JOB
and END JOB
Sample data
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns
ADHOC_Extract.txt
BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB
HOC_Extract.txt
BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB
I am ok even to write a shell script for the same
text-processing awk sed split
text-processing awk sed split
New contributor
New contributor
edited 1 min ago
don_crissti
47.7k15126155
47.7k15126155
New contributor
asked 29 mins ago
sirish
1
1
New contributor
New contributor
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
#!/bin/bash
cat test.txt| while read; do
[ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
filename=$REPLY#*"
filename=$filename%".txt
begin=0
echo "BEGIN JOB" > "$filename"
echo "$REPLY" >> "$filename"
else
echo "$REPLY" >> "$filename"
fi
done
1
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
#!/bin/bash
cat test.txt| while read; do
[ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
filename=$REPLY#*"
filename=$filename%".txt
begin=0
echo "BEGIN JOB" > "$filename"
echo "$REPLY" >> "$filename"
else
echo "$REPLY" >> "$filename"
fi
done
1
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
add a comment |Â
up vote
0
down vote
#!/bin/bash
cat test.txt| while read; do
[ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
filename=$REPLY#*"
filename=$filename%".txt
begin=0
echo "BEGIN JOB" > "$filename"
echo "$REPLY" >> "$filename"
else
echo "$REPLY" >> "$filename"
fi
done
1
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
add a comment |Â
up vote
0
down vote
up vote
0
down vote
#!/bin/bash
cat test.txt| while read; do
[ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
filename=$REPLY#*"
filename=$filename%".txt
begin=0
echo "BEGIN JOB" > "$filename"
echo "$REPLY" >> "$filename"
else
echo "$REPLY" >> "$filename"
fi
done
#!/bin/bash
cat test.txt| while read; do
[ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
filename=$REPLY#*"
filename=$filename%".txt
begin=0
echo "BEGIN JOB" > "$filename"
echo "$REPLY" >> "$filename"
else
echo "$REPLY" >> "$filename"
fi
done
answered 9 mins ago
Ipor Sircer
9,5011920
9,5011920
1
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
add a comment |Â
1
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
1
1
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
That might turn out to be quite slow on a 3 million line file.
â steve
3 mins ago
add a comment |Â
sirish is a new contributor. Be nice, and check out our Code of Conduct.
sirish is a new contributor. Be nice, and check out our Code of Conduct.
sirish is a new contributor. Be nice, and check out our Code of Conduct.
sirish is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475154%2fhow-to-split-a-file-into-paragraphs-and-name-the-resulting-pieces-based-on-an-id%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password