Delimit by space but ignore backslash space

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
9
down vote

favorite












5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question























  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31















up vote
9
down vote

favorite












5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question























  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31













up vote
9
down vote

favorite









up vote
9
down vote

favorite











5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question















5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32






text-processing awk sed






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 25 at 15:54









αғsнιη

16k92563




16k92563










asked Sep 25 at 15:00









GypsyCosmonaut

718628




718628











  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31

















  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31
















Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
– Gregory Nisbet
Sep 25 at 23:49





Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
– Gregory Nisbet
Sep 25 at 23:49













@GregoryNisbet I've put in is for escape character, not the real data
– GypsyCosmonaut
Sep 26 at 0:13




@GregoryNisbet I've put in is for escape character, not the real data
– GypsyCosmonaut
Sep 26 at 0:13




1




1




If your data happened to contain a real backslash, how would it be represented?
– Gregory Nisbet
Sep 26 at 0:16




If your data happened to contain a real backslash, how would it be represented?
– Gregory Nisbet
Sep 26 at 0:16












@GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
– GypsyCosmonaut
Sep 26 at 0:31





@GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
– GypsyCosmonaut
Sep 26 at 0:31











4 Answers
4






active

oldest

votes

















up vote
10
down vote



accepted










with gnu awk (gawk) you can use some zero-length assertions like < or >:



$ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
a b


but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



$ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
a b,





share|improve this answer





























    up vote
    6
    down vote













    You could substitute space with something else and back again afterwards.



    sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





    share|improve this answer




















    • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
      – ctac_
      Sep 25 at 16:26










    • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
      – glenn jackman
      Sep 25 at 18:29


















    up vote
    5
    down vote













    With just sed:



    sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


    Or shorter:



    sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


    This (([^]*\ )1,)?[^ ]* matches:




    • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


    • ([^]*\ )1,: matching above with one-or-more times of occurrences.


    • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


    • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


    • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

    then is replacement part just print the 1 which is the output:



    5678
    testing, group
    [testing
    ip 5.6.7.8
    launch-wizard-1
    456dlkjfa
    1.2.3.4
    test
    default
    launch-wizard-2
    launch-wizard-3
    2.3.4.5/32





    share|improve this answer





























      up vote
      5
      down vote













      With GNU grep or compatible:



      grep -Po '^(\.|S)*'


      Or with ERE:



      grep -Eo '^(\.|[^[:space:]])*'


      That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






      share|improve this answer




















        Your Answer







        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "106"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471353%2fdelimit-by-space-but-ignore-backslash-space%23new-answer', 'question_page');

        );

        Post as a guest






























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        10
        down vote



        accepted










        with gnu awk (gawk) you can use some zero-length assertions like < or >:



        $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
        a b


        but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



        $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
        a b,





        share|improve this answer


























          up vote
          10
          down vote



          accepted










          with gnu awk (gawk) you can use some zero-length assertions like < or >:



          $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
          a b


          but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



          $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
          a b,





          share|improve this answer
























            up vote
            10
            down vote



            accepted







            up vote
            10
            down vote



            accepted






            with gnu awk (gawk) you can use some zero-length assertions like < or >:



            $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
            a b


            but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



            $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
            a b,





            share|improve this answer














            with gnu awk (gawk) you can use some zero-length assertions like < or >:



            $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
            a b


            but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



            $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
            a b,






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Sep 25 at 22:03

























            answered Sep 25 at 16:55









            mosvy

            1,782110




            1,782110






















                up vote
                6
                down vote













                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                share|improve this answer




















                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29















                up vote
                6
                down vote













                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                share|improve this answer




















                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29













                up vote
                6
                down vote










                up vote
                6
                down vote









                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                share|improve this answer












                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Sep 25 at 15:20









                RoVo

                1,843213




                1,843213











                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29

















                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29
















                Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                – ctac_
                Sep 25 at 16:26




                Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                – ctac_
                Sep 25 at 16:26












                Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                – glenn jackman
                Sep 25 at 18:29





                Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                – glenn jackman
                Sep 25 at 18:29











                up vote
                5
                down vote













                With just sed:



                sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                Or shorter:



                sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                This (([^]*\ )1,)?[^ ]* matches:




                • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                then is replacement part just print the 1 which is the output:



                5678
                testing, group
                [testing
                ip 5.6.7.8
                launch-wizard-1
                456dlkjfa
                1.2.3.4
                test
                default
                launch-wizard-2
                launch-wizard-3
                2.3.4.5/32





                share|improve this answer


























                  up vote
                  5
                  down vote













                  With just sed:



                  sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                  Or shorter:



                  sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                  This (([^]*\ )1,)?[^ ]* matches:




                  • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                  • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                  • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                  • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                  • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                  then is replacement part just print the 1 which is the output:



                  5678
                  testing, group
                  [testing
                  ip 5.6.7.8
                  launch-wizard-1
                  456dlkjfa
                  1.2.3.4
                  test
                  default
                  launch-wizard-2
                  launch-wizard-3
                  2.3.4.5/32





                  share|improve this answer
























                    up vote
                    5
                    down vote










                    up vote
                    5
                    down vote









                    With just sed:



                    sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                    Or shorter:



                    sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                    This (([^]*\ )1,)?[^ ]* matches:




                    • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                    • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                    • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                    • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                    • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                    then is replacement part just print the 1 which is the output:



                    5678
                    testing, group
                    [testing
                    ip 5.6.7.8
                    launch-wizard-1
                    456dlkjfa
                    1.2.3.4
                    test
                    default
                    launch-wizard-2
                    launch-wizard-3
                    2.3.4.5/32





                    share|improve this answer














                    With just sed:



                    sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                    Or shorter:



                    sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                    This (([^]*\ )1,)?[^ ]* matches:




                    • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                    • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                    • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                    • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                    • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                    then is replacement part just print the 1 which is the output:



                    5678
                    testing, group
                    [testing
                    ip 5.6.7.8
                    launch-wizard-1
                    456dlkjfa
                    1.2.3.4
                    test
                    default
                    launch-wizard-2
                    launch-wizard-3
                    2.3.4.5/32






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Sep 25 at 16:00

























                    answered Sep 25 at 15:53









                    αғsнιη

                    16k92563




                    16k92563




















                        up vote
                        5
                        down vote













                        With GNU grep or compatible:



                        grep -Po '^(\.|S)*'


                        Or with ERE:



                        grep -Eo '^(\.|[^[:space:]])*'


                        That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






                        share|improve this answer
























                          up vote
                          5
                          down vote













                          With GNU grep or compatible:



                          grep -Po '^(\.|S)*'


                          Or with ERE:



                          grep -Eo '^(\.|[^[:space:]])*'


                          That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






                          share|improve this answer






















                            up vote
                            5
                            down vote










                            up vote
                            5
                            down vote









                            With GNU grep or compatible:



                            grep -Po '^(\.|S)*'


                            Or with ERE:



                            grep -Eo '^(\.|[^[:space:]])*'


                            That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






                            share|improve this answer












                            With GNU grep or compatible:



                            grep -Po '^(\.|S)*'


                            Or with ERE:



                            grep -Eo '^(\.|[^[:space:]])*'


                            That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Sep 25 at 22:03









                            Stéphane Chazelas

                            287k53529867




                            287k53529867



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471353%2fdelimit-by-space-but-ignore-backslash-space%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                CZ0g,WqYZJZAyUAZaWootGDsuYAUJIhhRKib4QPc,z rgW,MF KYjtMp,SOZLiTab,HllVnfi1zeIGSa
                                gOHiJ,OSbOm9dfPFeP6,Pr6QN1xCFtbVY xuonTJjfFJV2kt92DbdwH9 B0PBeUEYDDuO1AOYSYJGTX0 W45SftlWtv XirY7y1CtY,nk

                                Popular posts from this blog

                                How to check contact read email or not when send email to Individual?

                                How many registers does an x86_64 CPU actually have?

                                Displaying single band from multi-band raster using QGIS