How to split a file into paragraphs and name the resulting pieces based on an identifier present in each paragraph

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier which is present between the lines BEGIN JOB and END JOB



Sample data



BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB

BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB


Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns



ADHOC_Extract.txt

BEGIN JOB
Identifier "ADHOC_Extract"
DateModified "2018-10-02"
TimeModified "15.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "ADHOC_Extract"
END JOB

HOC_Extract.txt

BEGIN JOB
Identifier "HOC_Extract"
DateModified "2018-11-02"
TimeModified "12.09.52"
BEGIN DSRECORD
Identifier "ROOT"
OLEType "CJobDefn"
Readonly "0"
Name "HOC_Extract"
END JOB


I am ok even to write a shell script for the same










share|improve this question









New contributor




sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    0
    down vote

    favorite












    I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier which is present between the lines BEGIN JOB and END JOB



    Sample data



    BEGIN JOB
    Identifier "ADHOC_Extract"
    DateModified "2018-10-02"
    TimeModified "15.09.52"
    BEGIN DSRECORD
    Identifier "ROOT"
    OLEType "CJobDefn"
    Readonly "0"
    Name "ADHOC_Extract"
    END JOB

    BEGIN JOB
    Identifier "HOC_Extract"
    DateModified "2018-11-02"
    TimeModified "12.09.52"
    BEGIN DSRECORD
    Identifier "ROOT"
    OLEType "CJobDefn"
    Readonly "0"
    Name "HOC_Extract"
    END JOB


    Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns



    ADHOC_Extract.txt

    BEGIN JOB
    Identifier "ADHOC_Extract"
    DateModified "2018-10-02"
    TimeModified "15.09.52"
    BEGIN DSRECORD
    Identifier "ROOT"
    OLEType "CJobDefn"
    Readonly "0"
    Name "ADHOC_Extract"
    END JOB

    HOC_Extract.txt

    BEGIN JOB
    Identifier "HOC_Extract"
    DateModified "2018-11-02"
    TimeModified "12.09.52"
    BEGIN DSRECORD
    Identifier "ROOT"
    OLEType "CJobDefn"
    Readonly "0"
    Name "HOC_Extract"
    END JOB


    I am ok even to write a shell script for the same










    share|improve this question









    New contributor




    sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier which is present between the lines BEGIN JOB and END JOB



      Sample data



      BEGIN JOB
      Identifier "ADHOC_Extract"
      DateModified "2018-10-02"
      TimeModified "15.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "ADHOC_Extract"
      END JOB

      BEGIN JOB
      Identifier "HOC_Extract"
      DateModified "2018-11-02"
      TimeModified "12.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "HOC_Extract"
      END JOB


      Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns



      ADHOC_Extract.txt

      BEGIN JOB
      Identifier "ADHOC_Extract"
      DateModified "2018-10-02"
      TimeModified "15.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "ADHOC_Extract"
      END JOB

      HOC_Extract.txt

      BEGIN JOB
      Identifier "HOC_Extract"
      DateModified "2018-11-02"
      TimeModified "12.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "HOC_Extract"
      END JOB


      I am ok even to write a shell script for the same










      share|improve this question









      New contributor




      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have a big file with more than 3264880 lines. I wanted to split that file based on Two strings "BEGIN JOB" and " END JOB" and write it into multiple files and the file name should be based on a certain Identifier which is present between the lines BEGIN JOB and END JOB



      Sample data



      BEGIN JOB
      Identifier "ADHOC_Extract"
      DateModified "2018-10-02"
      TimeModified "15.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "ADHOC_Extract"
      END JOB

      BEGIN JOB
      Identifier "HOC_Extract"
      DateModified "2018-11-02"
      TimeModified "12.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "HOC_Extract"
      END JOB


      Output expected as two files since my sample has just two... But it will have more than 1000 such repeated patterns



      ADHOC_Extract.txt

      BEGIN JOB
      Identifier "ADHOC_Extract"
      DateModified "2018-10-02"
      TimeModified "15.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "ADHOC_Extract"
      END JOB

      HOC_Extract.txt

      BEGIN JOB
      Identifier "HOC_Extract"
      DateModified "2018-11-02"
      TimeModified "12.09.52"
      BEGIN DSRECORD
      Identifier "ROOT"
      OLEType "CJobDefn"
      Readonly "0"
      Name "HOC_Extract"
      END JOB


      I am ok even to write a shell script for the same







      text-processing awk sed split






      share|improve this question









      New contributor




      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 1 min ago









      don_crissti

      47.7k15126155




      47.7k15126155






      New contributor




      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 29 mins ago









      sirish

      1




      1




      New contributor




      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      sirish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          #!/bin/bash
          cat test.txt| while read; do
          [ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
          if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
          filename=$REPLY#*"
          filename=$filename%".txt
          begin=0
          echo "BEGIN JOB" > "$filename"
          echo "$REPLY" >> "$filename"
          else
          echo "$REPLY" >> "$filename"
          fi
          done




          share
















          • 1




            That might turn out to be quite slow on a 3 million line file.
            – steve
            3 mins ago










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          sirish is a new contributor. Be nice, and check out our Code of Conduct.









           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475154%2fhow-to-split-a-file-into-paragraphs-and-name-the-resulting-pieces-based-on-an-id%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          #!/bin/bash
          cat test.txt| while read; do
          [ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
          if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
          filename=$REPLY#*"
          filename=$filename%".txt
          begin=0
          echo "BEGIN JOB" > "$filename"
          echo "$REPLY" >> "$filename"
          else
          echo "$REPLY" >> "$filename"
          fi
          done




          share
















          • 1




            That might turn out to be quite slow on a 3 million line file.
            – steve
            3 mins ago














          up vote
          0
          down vote













          #!/bin/bash
          cat test.txt| while read; do
          [ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
          if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
          filename=$REPLY#*"
          filename=$filename%".txt
          begin=0
          echo "BEGIN JOB" > "$filename"
          echo "$REPLY" >> "$filename"
          else
          echo "$REPLY" >> "$filename"
          fi
          done




          share
















          • 1




            That might turn out to be quite slow on a 3 million line file.
            – steve
            3 mins ago












          up vote
          0
          down vote










          up vote
          0
          down vote









          #!/bin/bash
          cat test.txt| while read; do
          [ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
          if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
          filename=$REPLY#*"
          filename=$filename%".txt
          begin=0
          echo "BEGIN JOB" > "$filename"
          echo "$REPLY" >> "$filename"
          else
          echo "$REPLY" >> "$filename"
          fi
          done




          share












          #!/bin/bash
          cat test.txt| while read; do
          [ "$REPLY" = "BEGIN JOB" ] && begin=1 && continue
          if [ $begin = 1 ] && [[ "$REPLY" =~ Identifier ]];then
          filename=$REPLY#*"
          filename=$filename%".txt
          begin=0
          echo "BEGIN JOB" > "$filename"
          echo "$REPLY" >> "$filename"
          else
          echo "$REPLY" >> "$filename"
          fi
          done





          share











          share


          share










          answered 9 mins ago









          Ipor Sircer

          9,5011920




          9,5011920







          • 1




            That might turn out to be quite slow on a 3 million line file.
            – steve
            3 mins ago












          • 1




            That might turn out to be quite slow on a 3 million line file.
            – steve
            3 mins ago







          1




          1




          That might turn out to be quite slow on a 3 million line file.
          – steve
          3 mins ago




          That might turn out to be quite slow on a 3 million line file.
          – steve
          3 mins ago










          sirish is a new contributor. Be nice, and check out our Code of Conduct.









           

          draft saved


          draft discarded


















          sirish is a new contributor. Be nice, and check out our Code of Conduct.












          sirish is a new contributor. Be nice, and check out our Code of Conduct.











          sirish is a new contributor. Be nice, and check out our Code of Conduct.













           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475154%2fhow-to-split-a-file-into-paragraphs-and-name-the-resulting-pieces-based-on-an-id%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Bahrain

          Postfix configuration issue with fips on centos 7; mailgun relay