How shall I perform multiline matching and substitution using awk?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example



line 1
li
ne 2


There is a line break between the second and the third lines and I should modify the file to be



line 1
line 2


To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:



$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile


To concatenate two lines separated by a line break, I am still wondering.



Thanks.










share|improve this question





















  • setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
    – mosvy
    yesterday















up vote
1
down vote

favorite












In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example



line 1
li
ne 2


There is a line break between the second and the third lines and I should modify the file to be



line 1
line 2


To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:



$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile


To concatenate two lines separated by a line break, I am still wondering.



Thanks.










share|improve this question





















  • setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
    – mosvy
    yesterday













up vote
1
down vote

favorite









up vote
1
down vote

favorite











In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example



line 1
li
ne 2


There is a line break between the second and the third lines and I should modify the file to be



line 1
line 2


To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:



$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile


To concatenate two lines separated by a line break, I am still wondering.



Thanks.










share|improve this question













In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example



line 1
li
ne 2


There is a line break between the second and the third lines and I should modify the file to be



line 1
line 2


To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:



$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile


To concatenate two lines separated by a line break, I am still wondering.



Thanks.







text-processing awk gawk






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked yesterday









Tim

24.8k70239434




24.8k70239434











  • setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
    – mosvy
    yesterday

















  • setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
    – mosvy
    yesterday
















setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday





setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday











4 Answers
4






active

oldest

votes

















up vote
1
down vote



accepted










You could run something along the lines of



awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex



  • RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

  • then do you favorite multiline transformation





share|improve this answer






















  • Thanks. Do you know matching without substitution for multiline case?
    – Tim
    yesterday










  • I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
    – Tim
    yesterday











  • Is RS="f" also a working solution?
    – Tim
    23 hours ago






  • 1




    This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
    – Kusalananda
    23 hours ago






  • 1




    @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
    – Kusalananda
    12 hours ago

















up vote
4
down vote













I would address it differently: by looping over the input until you find a "line-ending condition":



awk ' 
line=$0;
while($0 !~ /[[:digit:]] *$/ && getline > 0)
line=line$0;

print line
' < input


On an extended input file of:



line 1
li
ne 2
li
ne
number 3
line 4


Or, more verbosely (to see the trailing space):



$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$


The output is:



line 1
line 2
line number 3
line 4





share|improve this answer






















  • Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
    – Tim
    yesterday











  • What "multilne patterns" are you thinking of?
    – RudiC
    yesterday

















up vote
2
down vote













$ cat file
line 1
li
ne 2
lo
ng li
ne 3




$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3


This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).



Approximate sed equivalent (but with an explicit loop):



$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3





share|improve this answer



























    up vote
    0
    down vote













    Small GNU sed?



    sed ':L; /[0-9] *$/!N; bL;; s/n//g' file





    share|improve this answer






















    • doesn't work for me?
      – andrew lorien
      23 hours ago










    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481498%2fhow-shall-i-perform-multiline-matching-and-substitution-using-awk%23new-answer', 'question_page');

    );

    Post as a guest






























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    You could run something along the lines of



    awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex



    • RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

    • then do you favorite multiline transformation





    share|improve this answer






















    • Thanks. Do you know matching without substitution for multiline case?
      – Tim
      yesterday










    • I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
      – Tim
      yesterday











    • Is RS="f" also a working solution?
      – Tim
      23 hours ago






    • 1




      This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
      – Kusalananda
      23 hours ago






    • 1




      @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
      – Kusalananda
      12 hours ago














    up vote
    1
    down vote



    accepted










    You could run something along the lines of



    awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex



    • RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

    • then do you favorite multiline transformation





    share|improve this answer






















    • Thanks. Do you know matching without substitution for multiline case?
      – Tim
      yesterday










    • I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
      – Tim
      yesterday











    • Is RS="f" also a working solution?
      – Tim
      23 hours ago






    • 1




      This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
      – Kusalananda
      23 hours ago






    • 1




      @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
      – Kusalananda
      12 hours ago












    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    You could run something along the lines of



    awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex



    • RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

    • then do you favorite multiline transformation





    share|improve this answer














    You could run something along the lines of



    awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex



    • RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

    • then do you favorite multiline transformation






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 12 hours ago

























    answered yesterday









    JJoao

    6,9441826




    6,9441826











    • Thanks. Do you know matching without substitution for multiline case?
      – Tim
      yesterday










    • I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
      – Tim
      yesterday











    • Is RS="f" also a working solution?
      – Tim
      23 hours ago






    • 1




      This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
      – Kusalananda
      23 hours ago






    • 1




      @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
      – Kusalananda
      12 hours ago
















    • Thanks. Do you know matching without substitution for multiline case?
      – Tim
      yesterday










    • I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
      – Tim
      yesterday











    • Is RS="f" also a working solution?
      – Tim
      23 hours ago






    • 1




      This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
      – Kusalananda
      23 hours ago






    • 1




      @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
      – Kusalananda
      12 hours ago















    Thanks. Do you know matching without substitution for multiline case?
    – Tim
    yesterday




    Thanks. Do you know matching without substitution for multiline case?
    – Tim
    yesterday












    I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
    – Tim
    yesterday





    I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
    – Tim
    yesterday













    Is RS="f" also a working solution?
    – Tim
    23 hours ago




    Is RS="f" also a working solution?
    – Tim
    23 hours ago




    1




    1




    This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
    – Kusalananda
    23 hours ago




    This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
    – Kusalananda
    23 hours ago




    1




    1




    @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
    – Kusalananda
    12 hours ago




    @JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
    – Kusalananda
    12 hours ago












    up vote
    4
    down vote













    I would address it differently: by looping over the input until you find a "line-ending condition":



    awk ' 
    line=$0;
    while($0 !~ /[[:digit:]] *$/ && getline > 0)
    line=line$0;

    print line
    ' < input


    On an extended input file of:



    line 1
    li
    ne 2
    li
    ne
    number 3
    line 4


    Or, more verbosely (to see the trailing space):



    $ cat -e input
    line 1$
    li$
    ne 2$
    li$
    ne $
    number 3$
    line 4$


    The output is:



    line 1
    line 2
    line number 3
    line 4





    share|improve this answer






















    • Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
      – Tim
      yesterday











    • What "multilne patterns" are you thinking of?
      – RudiC
      yesterday














    up vote
    4
    down vote













    I would address it differently: by looping over the input until you find a "line-ending condition":



    awk ' 
    line=$0;
    while($0 !~ /[[:digit:]] *$/ && getline > 0)
    line=line$0;

    print line
    ' < input


    On an extended input file of:



    line 1
    li
    ne 2
    li
    ne
    number 3
    line 4


    Or, more verbosely (to see the trailing space):



    $ cat -e input
    line 1$
    li$
    ne 2$
    li$
    ne $
    number 3$
    line 4$


    The output is:



    line 1
    line 2
    line number 3
    line 4





    share|improve this answer






















    • Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
      – Tim
      yesterday











    • What "multilne patterns" are you thinking of?
      – RudiC
      yesterday












    up vote
    4
    down vote










    up vote
    4
    down vote









    I would address it differently: by looping over the input until you find a "line-ending condition":



    awk ' 
    line=$0;
    while($0 !~ /[[:digit:]] *$/ && getline > 0)
    line=line$0;

    print line
    ' < input


    On an extended input file of:



    line 1
    li
    ne 2
    li
    ne
    number 3
    line 4


    Or, more verbosely (to see the trailing space):



    $ cat -e input
    line 1$
    li$
    ne 2$
    li$
    ne $
    number 3$
    line 4$


    The output is:



    line 1
    line 2
    line number 3
    line 4





    share|improve this answer














    I would address it differently: by looping over the input until you find a "line-ending condition":



    awk ' 
    line=$0;
    while($0 !~ /[[:digit:]] *$/ && getline > 0)
    line=line$0;

    print line
    ' < input


    On an extended input file of:



    line 1
    li
    ne 2
    li
    ne
    number 3
    line 4


    Or, more verbosely (to see the trailing space):



    $ cat -e input
    line 1$
    li$
    ne 2$
    li$
    ne $
    number 3$
    line 4$


    The output is:



    line 1
    line 2
    line number 3
    line 4






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday









    qubert

    5666




    5666










    answered yesterday









    Jeff Schaller

    35.8k952119




    35.8k952119











    • Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
      – Tim
      yesterday











    • What "multilne patterns" are you thinking of?
      – RudiC
      yesterday
















    • Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
      – Tim
      yesterday











    • What "multilne patterns" are you thinking of?
      – RudiC
      yesterday















    Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
    – Tim
    yesterday





    Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
    – Tim
    yesterday













    What "multilne patterns" are you thinking of?
    – RudiC
    yesterday




    What "multilne patterns" are you thinking of?
    – RudiC
    yesterday










    up vote
    2
    down vote













    $ cat file
    line 1
    li
    ne 2
    lo
    ng li
    ne 3




    $ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
    line 1
    line 2
    long line 3


    This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).



    Approximate sed equivalent (but with an explicit loop):



    $ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
    line 1
    line 2
    long line 3





    share|improve this answer
























      up vote
      2
      down vote













      $ cat file
      line 1
      li
      ne 2
      lo
      ng li
      ne 3




      $ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
      line 1
      line 2
      long line 3


      This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).



      Approximate sed equivalent (but with an explicit loop):



      $ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
      line 1
      line 2
      long line 3





      share|improve this answer






















        up vote
        2
        down vote










        up vote
        2
        down vote









        $ cat file
        line 1
        li
        ne 2
        lo
        ng li
        ne 3




        $ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
        line 1
        line 2
        long line 3


        This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).



        Approximate sed equivalent (but with an explicit loop):



        $ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
        line 1
        line 2
        long line 3





        share|improve this answer












        $ cat file
        line 1
        li
        ne 2
        lo
        ng li
        ne 3




        $ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
        line 1
        line 2
        long line 3


        This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).



        Approximate sed equivalent (but with an explicit loop):



        $ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
        line 1
        line 2
        long line 3






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 23 hours ago









        Kusalananda

        115k15218349




        115k15218349




















            up vote
            0
            down vote













            Small GNU sed?



            sed ':L; /[0-9] *$/!N; bL;; s/n//g' file





            share|improve this answer






















            • doesn't work for me?
              – andrew lorien
              23 hours ago














            up vote
            0
            down vote













            Small GNU sed?



            sed ':L; /[0-9] *$/!N; bL;; s/n//g' file





            share|improve this answer






















            • doesn't work for me?
              – andrew lorien
              23 hours ago












            up vote
            0
            down vote










            up vote
            0
            down vote









            Small GNU sed?



            sed ':L; /[0-9] *$/!N; bL;; s/n//g' file





            share|improve this answer














            Small GNU sed?



            sed ':L; /[0-9] *$/!N; bL;; s/n//g' file






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 23 hours ago









            Kusalananda

            115k15218349




            115k15218349










            answered yesterday









            RudiC

            2,9311211




            2,9311211











            • doesn't work for me?
              – andrew lorien
              23 hours ago
















            • doesn't work for me?
              – andrew lorien
              23 hours ago















            doesn't work for me?
            – andrew lorien
            23 hours ago




            doesn't work for me?
            – andrew lorien
            23 hours ago

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481498%2fhow-shall-i-perform-multiline-matching-and-substitution-using-awk%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Peggy Mitchell

            Palaiologos

            The Forum (Inglewood, California)