Merging in Unix

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
1












I have a CSV file with vertical bars (|) as the delimiter, like below, for which I need to apply merging technique in Unix. The file contains hundreds of thousands of records (four fields), but I gave only five records for ease of reading.



field1 |field2 | field3 |field4|
1|abc|def|ghi|
4|ijk|
|lmn|
5||opq|rst|
8|
uvw||xyz|
10|hjg|jsh|nbm|


And I want the output result as



field1|field2|field3|field4|
1|abc|def|ghi|
4|ijk||lmn|
5||opq|rst|
8|uvw||xyz|
10|hjg|jsh|nbm|


Can someone help me how to do the same?










share|improve this question























  • so you want leading and trailing spaces around the pipe symbols as well as any newlines except those after every 4th pipe symbol removed? is that correct?
    – Sam
    Sep 25 at 18:08






  • 2




    I’m sorry if you’re stuck with data that look like this.   While the answers that have been presented will handle this mangled structure in the best case, it is very precarious (sensitive) to data corruption.   For example, if you have a file where every record is split across two lines (every line has two fields), and one line gets deleted (or totally scrambled), the rebuilt (output) file will be wrong from there on.   You might want to specify that the first field (and only the first field) of each line is a number, so error checking becomes possible.   … (Cont’d)
    – G-Man
    Sep 25 at 21:09










  • (Cont’d) …  P.S. Is it possible for parts of multiple records to be on the same line?  For example, 1|abc|def| / ghi|4|ijk| / |lmn|?  And is it possible for a field to be split across lines?  For example, 10|hjg|j / sh|nbm|?
    – G-Man
    Sep 25 at 21:09














up vote
3
down vote

favorite
1












I have a CSV file with vertical bars (|) as the delimiter, like below, for which I need to apply merging technique in Unix. The file contains hundreds of thousands of records (four fields), but I gave only five records for ease of reading.



field1 |field2 | field3 |field4|
1|abc|def|ghi|
4|ijk|
|lmn|
5||opq|rst|
8|
uvw||xyz|
10|hjg|jsh|nbm|


And I want the output result as



field1|field2|field3|field4|
1|abc|def|ghi|
4|ijk||lmn|
5||opq|rst|
8|uvw||xyz|
10|hjg|jsh|nbm|


Can someone help me how to do the same?










share|improve this question























  • so you want leading and trailing spaces around the pipe symbols as well as any newlines except those after every 4th pipe symbol removed? is that correct?
    – Sam
    Sep 25 at 18:08






  • 2




    I’m sorry if you’re stuck with data that look like this.   While the answers that have been presented will handle this mangled structure in the best case, it is very precarious (sensitive) to data corruption.   For example, if you have a file where every record is split across two lines (every line has two fields), and one line gets deleted (or totally scrambled), the rebuilt (output) file will be wrong from there on.   You might want to specify that the first field (and only the first field) of each line is a number, so error checking becomes possible.   … (Cont’d)
    – G-Man
    Sep 25 at 21:09










  • (Cont’d) …  P.S. Is it possible for parts of multiple records to be on the same line?  For example, 1|abc|def| / ghi|4|ijk| / |lmn|?  And is it possible for a field to be split across lines?  For example, 10|hjg|j / sh|nbm|?
    – G-Man
    Sep 25 at 21:09












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





I have a CSV file with vertical bars (|) as the delimiter, like below, for which I need to apply merging technique in Unix. The file contains hundreds of thousands of records (four fields), but I gave only five records for ease of reading.



field1 |field2 | field3 |field4|
1|abc|def|ghi|
4|ijk|
|lmn|
5||opq|rst|
8|
uvw||xyz|
10|hjg|jsh|nbm|


And I want the output result as



field1|field2|field3|field4|
1|abc|def|ghi|
4|ijk||lmn|
5||opq|rst|
8|uvw||xyz|
10|hjg|jsh|nbm|


Can someone help me how to do the same?










share|improve this question















I have a CSV file with vertical bars (|) as the delimiter, like below, for which I need to apply merging technique in Unix. The file contains hundreds of thousands of records (four fields), but I gave only five records for ease of reading.



field1 |field2 | field3 |field4|
1|abc|def|ghi|
4|ijk|
|lmn|
5||opq|rst|
8|
uvw||xyz|
10|hjg|jsh|nbm|


And I want the output result as



field1|field2|field3|field4|
1|abc|def|ghi|
4|ijk||lmn|
5||opq|rst|
8|uvw||xyz|
10|hjg|jsh|nbm|


Can someone help me how to do the same?







text-processing awk sed merge






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 25 at 20:54









G-Man

11.9k92658




11.9k92658










asked Sep 25 at 17:59









Sankar

191




191











  • so you want leading and trailing spaces around the pipe symbols as well as any newlines except those after every 4th pipe symbol removed? is that correct?
    – Sam
    Sep 25 at 18:08






  • 2




    I’m sorry if you’re stuck with data that look like this.   While the answers that have been presented will handle this mangled structure in the best case, it is very precarious (sensitive) to data corruption.   For example, if you have a file where every record is split across two lines (every line has two fields), and one line gets deleted (or totally scrambled), the rebuilt (output) file will be wrong from there on.   You might want to specify that the first field (and only the first field) of each line is a number, so error checking becomes possible.   … (Cont’d)
    – G-Man
    Sep 25 at 21:09










  • (Cont’d) …  P.S. Is it possible for parts of multiple records to be on the same line?  For example, 1|abc|def| / ghi|4|ijk| / |lmn|?  And is it possible for a field to be split across lines?  For example, 10|hjg|j / sh|nbm|?
    – G-Man
    Sep 25 at 21:09
















  • so you want leading and trailing spaces around the pipe symbols as well as any newlines except those after every 4th pipe symbol removed? is that correct?
    – Sam
    Sep 25 at 18:08






  • 2




    I’m sorry if you’re stuck with data that look like this.   While the answers that have been presented will handle this mangled structure in the best case, it is very precarious (sensitive) to data corruption.   For example, if you have a file where every record is split across two lines (every line has two fields), and one line gets deleted (or totally scrambled), the rebuilt (output) file will be wrong from there on.   You might want to specify that the first field (and only the first field) of each line is a number, so error checking becomes possible.   … (Cont’d)
    – G-Man
    Sep 25 at 21:09










  • (Cont’d) …  P.S. Is it possible for parts of multiple records to be on the same line?  For example, 1|abc|def| / ghi|4|ijk| / |lmn|?  And is it possible for a field to be split across lines?  For example, 10|hjg|j / sh|nbm|?
    – G-Man
    Sep 25 at 21:09















so you want leading and trailing spaces around the pipe symbols as well as any newlines except those after every 4th pipe symbol removed? is that correct?
– Sam
Sep 25 at 18:08




so you want leading and trailing spaces around the pipe symbols as well as any newlines except those after every 4th pipe symbol removed? is that correct?
– Sam
Sep 25 at 18:08




2




2




I’m sorry if you’re stuck with data that look like this.   While the answers that have been presented will handle this mangled structure in the best case, it is very precarious (sensitive) to data corruption.   For example, if you have a file where every record is split across two lines (every line has two fields), and one line gets deleted (or totally scrambled), the rebuilt (output) file will be wrong from there on.   You might want to specify that the first field (and only the first field) of each line is a number, so error checking becomes possible.   … (Cont’d)
– G-Man
Sep 25 at 21:09




I’m sorry if you’re stuck with data that look like this.   While the answers that have been presented will handle this mangled structure in the best case, it is very precarious (sensitive) to data corruption.   For example, if you have a file where every record is split across two lines (every line has two fields), and one line gets deleted (or totally scrambled), the rebuilt (output) file will be wrong from there on.   You might want to specify that the first field (and only the first field) of each line is a number, so error checking becomes possible.   … (Cont’d)
– G-Man
Sep 25 at 21:09












(Cont’d) …  P.S. Is it possible for parts of multiple records to be on the same line?  For example, 1|abc|def| / ghi|4|ijk| / |lmn|?  And is it possible for a field to be split across lines?  For example, 10|hjg|j / sh|nbm|?
– G-Man
Sep 25 at 21:09




(Cont’d) …  P.S. Is it possible for parts of multiple records to be on the same line?  For example, 1|abc|def| / ghi|4|ijk| / |lmn|?  And is it possible for a field to be split across lines?  For example, 10|hjg|j / sh|nbm|?
– G-Man
Sep 25 at 21:09










3 Answers
3






active

oldest

votes

















up vote
3
down vote













I'm assuming you don't want all those blank lines.



$ cat file
1|abc|def|ghi|
4|ijk|
|lmn|
5||opq|rst|
8|
uvw||xyz|
10|hjg|jsh|nbm|

$ awk -F'|' 'while (NF < 5) getline nextline; $0 = $0 nextline1' file
1|abc|def|ghi|
4|ijk||lmn|
5||opq|rst|
8|uvw||xyz|
10|hjg|jsh|nbm|



Update for the question edit: remove whitespace around the field separator



awk -F'[[:blank:]]*[|][[:blank:]]*' -v OFS='|' '
while (NF < 5) getline nextline; $0 = $0 nextline; $1=$1; print
' file





share|improve this answer


















  • 1




    genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
    – Shervan
    Sep 25 at 18:37











  • Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
    – glenn jackman
    Sep 25 at 18:40










  • Yes, $0 = $0 and 1 at the end. thank you for any clarification!
    – Shervan
    Sep 25 at 18:42






  • 2




    It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
    – glenn jackman
    Sep 25 at 19:55






  • 2




    The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
    – glenn jackman
    Sep 25 at 19:56

















up vote
0
down vote













With GNU sed:



sed ':loop /(.*|)4.*/ !N; s/n//; b loop; s/ *| */|/g' file


The command dissected:



:loop



The : signals a label that we can use for branches. "loop" is just the name that I chose for the label.



/(.*|)4.*/



Is a line selector regex that matches lines that contain 4 pipe symbols, each allowed to be preceded by zero or more arbitrary characters (.*|), with zero or more arbitrary characters allowed to follow the last pipe.



! ...



Applies the commands in the brackets to any line that did not match the previous regex.



N; s/n//; b loop



N concatenes the current line in pattern space with a newline symbol and the next line from the source file, then s/n// removes the newline symbol and b loop branches back to the label we have defined in the start, so the concatenated line will be compared against the regex again.



Lastly



s/ *| */|/g



will be applied to any line in pattern space before it is output. This removes any spaces around pipe symbols.






share|improve this answer






















  • this code not working!
    – Shervan
    Sep 25 at 18:35










  • does too for me with GNU sed 4.4
    – Sam
    Sep 25 at 18:37










  • sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
    – Shervan
    Sep 25 at 18:38






  • 1




    oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
    – Sam
    Sep 26 at 5:09







  • 1




    @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
    – Stéphane Chazelas
    Sep 26 at 7:37

















up vote
0
down vote













If using Vim is an option:



vim -Nesc 'g!/(.*|)4$/j!' -cwq input.txt



  • -Nes runs Vim in script mode, making it easier to automate


  • -c ... runs Vim commands after opening the file


  • g!/(.*|)4$/j! - on every line :g, that doesn't ! match /(.*|)4$/ (a regex matching 4 pipes separated by anything), join the next line to it (:j).


  • wq - save and quit.





share|improve this answer




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471391%2fmerging-in-unix%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote













    I'm assuming you don't want all those blank lines.



    $ cat file
    1|abc|def|ghi|
    4|ijk|
    |lmn|
    5||opq|rst|
    8|
    uvw||xyz|
    10|hjg|jsh|nbm|

    $ awk -F'|' 'while (NF < 5) getline nextline; $0 = $0 nextline1' file
    1|abc|def|ghi|
    4|ijk||lmn|
    5||opq|rst|
    8|uvw||xyz|
    10|hjg|jsh|nbm|



    Update for the question edit: remove whitespace around the field separator



    awk -F'[[:blank:]]*[|][[:blank:]]*' -v OFS='|' '
    while (NF < 5) getline nextline; $0 = $0 nextline; $1=$1; print
    ' file





    share|improve this answer


















    • 1




      genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
      – Shervan
      Sep 25 at 18:37











    • Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
      – glenn jackman
      Sep 25 at 18:40










    • Yes, $0 = $0 and 1 at the end. thank you for any clarification!
      – Shervan
      Sep 25 at 18:42






    • 2




      It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
      – glenn jackman
      Sep 25 at 19:55






    • 2




      The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
      – glenn jackman
      Sep 25 at 19:56














    up vote
    3
    down vote













    I'm assuming you don't want all those blank lines.



    $ cat file
    1|abc|def|ghi|
    4|ijk|
    |lmn|
    5||opq|rst|
    8|
    uvw||xyz|
    10|hjg|jsh|nbm|

    $ awk -F'|' 'while (NF < 5) getline nextline; $0 = $0 nextline1' file
    1|abc|def|ghi|
    4|ijk||lmn|
    5||opq|rst|
    8|uvw||xyz|
    10|hjg|jsh|nbm|



    Update for the question edit: remove whitespace around the field separator



    awk -F'[[:blank:]]*[|][[:blank:]]*' -v OFS='|' '
    while (NF < 5) getline nextline; $0 = $0 nextline; $1=$1; print
    ' file





    share|improve this answer


















    • 1




      genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
      – Shervan
      Sep 25 at 18:37











    • Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
      – glenn jackman
      Sep 25 at 18:40










    • Yes, $0 = $0 and 1 at the end. thank you for any clarification!
      – Shervan
      Sep 25 at 18:42






    • 2




      It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
      – glenn jackman
      Sep 25 at 19:55






    • 2




      The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
      – glenn jackman
      Sep 25 at 19:56












    up vote
    3
    down vote










    up vote
    3
    down vote









    I'm assuming you don't want all those blank lines.



    $ cat file
    1|abc|def|ghi|
    4|ijk|
    |lmn|
    5||opq|rst|
    8|
    uvw||xyz|
    10|hjg|jsh|nbm|

    $ awk -F'|' 'while (NF < 5) getline nextline; $0 = $0 nextline1' file
    1|abc|def|ghi|
    4|ijk||lmn|
    5||opq|rst|
    8|uvw||xyz|
    10|hjg|jsh|nbm|



    Update for the question edit: remove whitespace around the field separator



    awk -F'[[:blank:]]*[|][[:blank:]]*' -v OFS='|' '
    while (NF < 5) getline nextline; $0 = $0 nextline; $1=$1; print
    ' file





    share|improve this answer














    I'm assuming you don't want all those blank lines.



    $ cat file
    1|abc|def|ghi|
    4|ijk|
    |lmn|
    5||opq|rst|
    8|
    uvw||xyz|
    10|hjg|jsh|nbm|

    $ awk -F'|' 'while (NF < 5) getline nextline; $0 = $0 nextline1' file
    1|abc|def|ghi|
    4|ijk||lmn|
    5||opq|rst|
    8|uvw||xyz|
    10|hjg|jsh|nbm|



    Update for the question edit: remove whitespace around the field separator



    awk -F'[[:blank:]]*[|][[:blank:]]*' -v OFS='|' '
    while (NF < 5) getline nextline; $0 = $0 nextline; $1=$1; print
    ' file






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Sep 25 at 21:42

























    answered Sep 25 at 18:18









    glenn jackman

    48.3k365105




    48.3k365105







    • 1




      genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
      – Shervan
      Sep 25 at 18:37











    • Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
      – glenn jackman
      Sep 25 at 18:40










    • Yes, $0 = $0 and 1 at the end. thank you for any clarification!
      – Shervan
      Sep 25 at 18:42






    • 2




      It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
      – glenn jackman
      Sep 25 at 19:55






    • 2




      The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
      – glenn jackman
      Sep 25 at 19:56












    • 1




      genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
      – Shervan
      Sep 25 at 18:37











    • Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
      – glenn jackman
      Sep 25 at 18:40










    • Yes, $0 = $0 and 1 at the end. thank you for any clarification!
      – Shervan
      Sep 25 at 18:42






    • 2




      It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
      – glenn jackman
      Sep 25 at 19:55






    • 2




      The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
      – glenn jackman
      Sep 25 at 19:56







    1




    1




    genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
    – Shervan
    Sep 25 at 18:37





    genius solution !! what we call this process? may I kindly ask you to add some explanations for newbies like me. thank you!
    – Shervan
    Sep 25 at 18:37













    Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
    – glenn jackman
    Sep 25 at 18:40




    Is there any particular bit you're unclear about? I assume a while loop is clear. getline reads the next line into the given variable. Then I concatentate the current line with the next line, and we re-check the number of fields. Other awk help can be found on the awk tag info page.
    – glenn jackman
    Sep 25 at 18:40












    Yes, $0 = $0 and 1 at the end. thank you for any clarification!
    – Shervan
    Sep 25 at 18:42




    Yes, $0 = $0 and 1 at the end. thank you for any clarification!
    – Shervan
    Sep 25 at 18:42




    2




    2




    It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
    – glenn jackman
    Sep 25 at 19:55




    It's not $0=$0, it's "assign to $0 the concatenation of $0 and nextline". awk doesn't have a concatenation operator: other languages might want $0 = $0 + nextline, but with awk you just put strings or variables side-by-side. For clarity we can write $0 = ($0 nextline)
    – glenn jackman
    Sep 25 at 19:55




    2




    2




    The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
    – glenn jackman
    Sep 25 at 19:56




    The 1 is a common awk idiom that means "print the current record". Follow the link I gave and do some reading: it's well documented.
    – glenn jackman
    Sep 25 at 19:56












    up vote
    0
    down vote













    With GNU sed:



    sed ':loop /(.*|)4.*/ !N; s/n//; b loop; s/ *| */|/g' file


    The command dissected:



    :loop



    The : signals a label that we can use for branches. "loop" is just the name that I chose for the label.



    /(.*|)4.*/



    Is a line selector regex that matches lines that contain 4 pipe symbols, each allowed to be preceded by zero or more arbitrary characters (.*|), with zero or more arbitrary characters allowed to follow the last pipe.



    ! ...



    Applies the commands in the brackets to any line that did not match the previous regex.



    N; s/n//; b loop



    N concatenes the current line in pattern space with a newline symbol and the next line from the source file, then s/n// removes the newline symbol and b loop branches back to the label we have defined in the start, so the concatenated line will be compared against the regex again.



    Lastly



    s/ *| */|/g



    will be applied to any line in pattern space before it is output. This removes any spaces around pipe symbols.






    share|improve this answer






















    • this code not working!
      – Shervan
      Sep 25 at 18:35










    • does too for me with GNU sed 4.4
      – Sam
      Sep 25 at 18:37










    • sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
      – Shervan
      Sep 25 at 18:38






    • 1




      oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
      – Sam
      Sep 26 at 5:09







    • 1




      @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
      – Stéphane Chazelas
      Sep 26 at 7:37














    up vote
    0
    down vote













    With GNU sed:



    sed ':loop /(.*|)4.*/ !N; s/n//; b loop; s/ *| */|/g' file


    The command dissected:



    :loop



    The : signals a label that we can use for branches. "loop" is just the name that I chose for the label.



    /(.*|)4.*/



    Is a line selector regex that matches lines that contain 4 pipe symbols, each allowed to be preceded by zero or more arbitrary characters (.*|), with zero or more arbitrary characters allowed to follow the last pipe.



    ! ...



    Applies the commands in the brackets to any line that did not match the previous regex.



    N; s/n//; b loop



    N concatenes the current line in pattern space with a newline symbol and the next line from the source file, then s/n// removes the newline symbol and b loop branches back to the label we have defined in the start, so the concatenated line will be compared against the regex again.



    Lastly



    s/ *| */|/g



    will be applied to any line in pattern space before it is output. This removes any spaces around pipe symbols.






    share|improve this answer






















    • this code not working!
      – Shervan
      Sep 25 at 18:35










    • does too for me with GNU sed 4.4
      – Sam
      Sep 25 at 18:37










    • sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
      – Shervan
      Sep 25 at 18:38






    • 1




      oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
      – Sam
      Sep 26 at 5:09







    • 1




      @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
      – Stéphane Chazelas
      Sep 26 at 7:37












    up vote
    0
    down vote










    up vote
    0
    down vote









    With GNU sed:



    sed ':loop /(.*|)4.*/ !N; s/n//; b loop; s/ *| */|/g' file


    The command dissected:



    :loop



    The : signals a label that we can use for branches. "loop" is just the name that I chose for the label.



    /(.*|)4.*/



    Is a line selector regex that matches lines that contain 4 pipe symbols, each allowed to be preceded by zero or more arbitrary characters (.*|), with zero or more arbitrary characters allowed to follow the last pipe.



    ! ...



    Applies the commands in the brackets to any line that did not match the previous regex.



    N; s/n//; b loop



    N concatenes the current line in pattern space with a newline symbol and the next line from the source file, then s/n// removes the newline symbol and b loop branches back to the label we have defined in the start, so the concatenated line will be compared against the regex again.



    Lastly



    s/ *| */|/g



    will be applied to any line in pattern space before it is output. This removes any spaces around pipe symbols.






    share|improve this answer














    With GNU sed:



    sed ':loop /(.*|)4.*/ !N; s/n//; b loop; s/ *| */|/g' file


    The command dissected:



    :loop



    The : signals a label that we can use for branches. "loop" is just the name that I chose for the label.



    /(.*|)4.*/



    Is a line selector regex that matches lines that contain 4 pipe symbols, each allowed to be preceded by zero or more arbitrary characters (.*|), with zero or more arbitrary characters allowed to follow the last pipe.



    ! ...



    Applies the commands in the brackets to any line that did not match the previous regex.



    N; s/n//; b loop



    N concatenes the current line in pattern space with a newline symbol and the next line from the source file, then s/n// removes the newline symbol and b loop branches back to the label we have defined in the start, so the concatenated line will be compared against the regex again.



    Lastly



    s/ *| */|/g



    will be applied to any line in pattern space before it is output. This removes any spaces around pipe symbols.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Sep 26 at 7:26

























    answered Sep 25 at 18:25









    Sam

    29219




    29219











    • this code not working!
      – Shervan
      Sep 25 at 18:35










    • does too for me with GNU sed 4.4
      – Sam
      Sep 25 at 18:37










    • sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
      – Shervan
      Sep 25 at 18:38






    • 1




      oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
      – Sam
      Sep 26 at 5:09







    • 1




      @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
      – Stéphane Chazelas
      Sep 26 at 7:37
















    • this code not working!
      – Shervan
      Sep 25 at 18:35










    • does too for me with GNU sed 4.4
      – Sam
      Sep 25 at 18:37










    • sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
      – Shervan
      Sep 25 at 18:38






    • 1




      oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
      – Sam
      Sep 26 at 5:09







    • 1




      @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
      – Stéphane Chazelas
      Sep 26 at 7:37















    this code not working!
    – Shervan
    Sep 25 at 18:35




    this code not working!
    – Shervan
    Sep 25 at 18:35












    does too for me with GNU sed 4.4
    – Sam
    Sep 25 at 18:37




    does too for me with GNU sed 4.4
    – Sam
    Sep 25 at 18:37












    sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
    – Shervan
    Sep 25 at 18:38




    sed --version My sed (GNU sed) 4.2.2 Copyright (C) 2012 Free Software Foundation, Inc.
    – Shervan
    Sep 25 at 18:38




    1




    1




    oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
    – Sam
    Sep 26 at 5:09





    oh, man... the command is not at fault. you are definitely not typing it as displayed. you are using double quotes and your shell's history expansion feature is enabled.
    – Sam
    Sep 26 at 5:09





    1




    1




    @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
    – Stéphane Chazelas
    Sep 26 at 7:37




    @Shervan is probably using csh or tcsh where that ! needs to be escaped, even inside single quotes.
    – Stéphane Chazelas
    Sep 26 at 7:37










    up vote
    0
    down vote













    If using Vim is an option:



    vim -Nesc 'g!/(.*|)4$/j!' -cwq input.txt



    • -Nes runs Vim in script mode, making it easier to automate


    • -c ... runs Vim commands after opening the file


    • g!/(.*|)4$/j! - on every line :g, that doesn't ! match /(.*|)4$/ (a regex matching 4 pipes separated by anything), join the next line to it (:j).


    • wq - save and quit.





    share|improve this answer
























      up vote
      0
      down vote













      If using Vim is an option:



      vim -Nesc 'g!/(.*|)4$/j!' -cwq input.txt



      • -Nes runs Vim in script mode, making it easier to automate


      • -c ... runs Vim commands after opening the file


      • g!/(.*|)4$/j! - on every line :g, that doesn't ! match /(.*|)4$/ (a regex matching 4 pipes separated by anything), join the next line to it (:j).


      • wq - save and quit.





      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        If using Vim is an option:



        vim -Nesc 'g!/(.*|)4$/j!' -cwq input.txt



        • -Nes runs Vim in script mode, making it easier to automate


        • -c ... runs Vim commands after opening the file


        • g!/(.*|)4$/j! - on every line :g, that doesn't ! match /(.*|)4$/ (a regex matching 4 pipes separated by anything), join the next line to it (:j).


        • wq - save and quit.





        share|improve this answer












        If using Vim is an option:



        vim -Nesc 'g!/(.*|)4$/j!' -cwq input.txt



        • -Nes runs Vim in script mode, making it easier to automate


        • -c ... runs Vim commands after opening the file


        • g!/(.*|)4$/j! - on every line :g, that doesn't ! match /(.*|)4$/ (a regex matching 4 pipes separated by anything), join the next line to it (:j).


        • wq - save and quit.






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Sep 26 at 7:43









        muru

        33.9k578147




        33.9k578147



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471391%2fmerging-in-unix%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Bahrain

            Postfix configuration issue with fips on centos 7; mailgun relay