Delimit by space but ignore backslash space

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
9
down vote

favorite












5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question























  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31















up vote
9
down vote

favorite












5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question























  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31













up vote
9
down vote

favorite









up vote
9
down vote

favorite











5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32









share|improve this question















5678 
testing, group
[testing
ip 5.6.7.8
launch-wizard-1 0.0.0.0/0
456dlkjfa
1.2.3.4
test 1.2.3.4/32 4.3.2.0/23 4.3.2.0/23
default 4.3.2.0/23 4.3.2.0/23
launch-wizard-2 0.0.0.0/0
launch-wizard-3 0.0.0.0/0
2.3.4.5/32


I would like to get the first column of the above but the catch is that, I need to treat (backslash space) as a part of the column, so awk 'print $1' should give me



5678
testing, group
[testing
ip 5.6.7.8
launch-wizard-1
456dlkjfa
1.2.3.4
test
default
launch-wizard-2
launch-wizard-3
2.3.4.5/32






text-processing awk sed






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 25 at 15:54









αғsнιη

16k92563




16k92563










asked Sep 25 at 15:00









GypsyCosmonaut

718628




718628











  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31

















  • Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
    – Gregory Nisbet
    Sep 25 at 23:49











  • @GregoryNisbet I've put in is for escape character, not the real data
    – GypsyCosmonaut
    Sep 26 at 0:13






  • 1




    If your data happened to contain a real backslash, how would it be represented?
    – Gregory Nisbet
    Sep 26 at 0:16










  • @GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
    – GypsyCosmonaut
    Sep 26 at 0:31
















Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
– Gregory Nisbet
Sep 25 at 23:49





Is being treated as an escape character always or is only special? For instance, is a\ b one field or two?
– Gregory Nisbet
Sep 25 at 23:49













@GregoryNisbet I've put in is for escape character, not the real data
– GypsyCosmonaut
Sep 26 at 0:13




@GregoryNisbet I've put in is for escape character, not the real data
– GypsyCosmonaut
Sep 26 at 0:13




1




1




If your data happened to contain a real backslash, how would it be represented?
– Gregory Nisbet
Sep 26 at 0:16




If your data happened to contain a real backslash, how would it be represented?
– Gregory Nisbet
Sep 26 at 0:16












@GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
– GypsyCosmonaut
Sep 26 at 0:31





@GregoryNisbet Good question. Because I replaced only [[:space:]] with [[:space:]], the original data has untouched in their place. After getting the original data in the first column delimited by only spaces and not [[:space:]], I'd be replacing [[:space:]] with [[:space:]] and I'd be left with original data back again which has .
– GypsyCosmonaut
Sep 26 at 0:31











4 Answers
4






active

oldest

votes

















up vote
10
down vote



accepted










with gnu awk (gawk) you can use some zero-length assertions like < or >:



$ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
a b


but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



$ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
a b,





share|improve this answer





























    up vote
    6
    down vote













    You could substitute space with something else and back again afterwards.



    sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





    share|improve this answer




















    • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
      – ctac_
      Sep 25 at 16:26










    • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
      – glenn jackman
      Sep 25 at 18:29


















    up vote
    5
    down vote













    With just sed:



    sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


    Or shorter:



    sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


    This (([^]*\ )1,)?[^ ]* matches:




    • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


    • ([^]*\ )1,: matching above with one-or-more times of occurrences.


    • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


    • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


    • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

    then is replacement part just print the 1 which is the output:



    5678
    testing, group
    [testing
    ip 5.6.7.8
    launch-wizard-1
    456dlkjfa
    1.2.3.4
    test
    default
    launch-wizard-2
    launch-wizard-3
    2.3.4.5/32





    share|improve this answer





























      up vote
      5
      down vote













      With GNU grep or compatible:



      grep -Po '^(\.|S)*'


      Or with ERE:



      grep -Eo '^(\.|[^[:space:]])*'


      That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






      share|improve this answer




















        Your Answer







        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "106"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471353%2fdelimit-by-space-but-ignore-backslash-space%23new-answer', 'question_page');

        );

        Post as a guest






























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        10
        down vote



        accepted










        with gnu awk (gawk) you can use some zero-length assertions like < or >:



        $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
        a b


        but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



        $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
        a b,





        share|improve this answer


























          up vote
          10
          down vote



          accepted










          with gnu awk (gawk) you can use some zero-length assertions like < or >:



          $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
          a b


          but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



          $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
          a b,





          share|improve this answer
























            up vote
            10
            down vote



            accepted







            up vote
            10
            down vote



            accepted






            with gnu awk (gawk) you can use some zero-length assertions like < or >:



            $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
            a b


            but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



            $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
            a b,





            share|improve this answer














            with gnu awk (gawk) you can use some zero-length assertions like < or >:



            $ echo 'a b c' | gawk 'BEGINFS="\> +" print $1'
            a b


            but unfortunately not the full-blown ones from perl or pcre (eg. (?<!\), (?<=w), etc):



            $ echo 'a b, c' | perl -nle '@a=split /(?<!\)s+/, $_; print $a[0]'
            a b,






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Sep 25 at 22:03

























            answered Sep 25 at 16:55









            mosvy

            1,782110




            1,782110






















                up vote
                6
                down vote













                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                share|improve this answer




















                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29















                up vote
                6
                down vote













                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                share|improve this answer




















                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29













                up vote
                6
                down vote










                up vote
                6
                down vote









                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'





                share|improve this answer












                You could substitute space with something else and back again afterwards.



                sed 's/\ /\x20/g' data_file | awk ' print $1; ' | sed 's/\x20/\ /g'






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Sep 25 at 15:20









                RoVo

                1,843213




                1,843213











                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29

















                • Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                  – ctac_
                  Sep 25 at 16:26










                • Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                  – glenn jackman
                  Sep 25 at 18:29
















                Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                – ctac_
                Sep 25 at 16:26




                Only with sed : sed 's/\ /\x20/g;s/ .*//;s/\x20/\ /g' data_file
                – ctac_
                Sep 25 at 16:26












                Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                – glenn jackman
                Sep 25 at 18:29





                Or, awk, using the default SUBSEP variable value of 34: awk 'gsub(/\ /,SUBSEP,$0); val=$1; gsub(SUBSEP,"\ ",val); print val' file
                – glenn jackman
                Sep 25 at 18:29











                up vote
                5
                down vote













                With just sed:



                sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                Or shorter:



                sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                This (([^]*\ )1,)?[^ ]* matches:




                • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                then is replacement part just print the 1 which is the output:



                5678
                testing, group
                [testing
                ip 5.6.7.8
                launch-wizard-1
                456dlkjfa
                1.2.3.4
                test
                default
                launch-wizard-2
                launch-wizard-3
                2.3.4.5/32





                share|improve this answer


























                  up vote
                  5
                  down vote













                  With just sed:



                  sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                  Or shorter:



                  sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                  This (([^]*\ )1,)?[^ ]* matches:




                  • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                  • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                  • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                  • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                  • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                  then is replacement part just print the 1 which is the output:



                  5678
                  testing, group
                  [testing
                  ip 5.6.7.8
                  launch-wizard-1
                  456dlkjfa
                  1.2.3.4
                  test
                  default
                  launch-wizard-2
                  launch-wizard-3
                  2.3.4.5/32





                  share|improve this answer
























                    up vote
                    5
                    down vote










                    up vote
                    5
                    down vote









                    With just sed:



                    sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                    Or shorter:



                    sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                    This (([^]*\ )1,)?[^ ]* matches:




                    • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                    • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                    • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                    • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                    • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                    then is replacement part just print the 1 which is the output:



                    5678
                    testing, group
                    [testing
                    ip 5.6.7.8
                    launch-wizard-1
                    456dlkjfa
                    1.2.3.4
                    test
                    default
                    launch-wizard-2
                    launch-wizard-3
                    2.3.4.5/32





                    share|improve this answer














                    With just sed:



                    sed -r 's/^((([^]*\ )1,)?[^ ]*).*/1/' infile


                    Or shorter:



                    sed -r 's/^(([^]*\ )*[^ ]*).*/1/' infile


                    This (([^]*\ )1,)?[^ ]* matches:




                    • [^]*\: anything that it's not a back-slash which ends with back-slash followed by a space (note that inside character class is not required to be escaped, but outside does).


                    • ([^]*\ )1,: matching above with one-or-more times of occurrences.


                    • (([^]*\ )1,)?: this is optional when using (...)?; we could use ([^]*\ )0, instead as well or ([^]*\ )*.


                    • ((([^]*\ )1,)?[^ ]*): matches above which is optional followed by anything that it's not a space and hold as group match with 1 as its back-reference.


                    • ((([^]*\ )1,)?[^ ]*).*: matches above (...) and anything else .*.

                    then is replacement part just print the 1 which is the output:



                    5678
                    testing, group
                    [testing
                    ip 5.6.7.8
                    launch-wizard-1
                    456dlkjfa
                    1.2.3.4
                    test
                    default
                    launch-wizard-2
                    launch-wizard-3
                    2.3.4.5/32






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Sep 25 at 16:00

























                    answered Sep 25 at 15:53









                    αғsнιη

                    16k92563




                    16k92563




















                        up vote
                        5
                        down vote













                        With GNU grep or compatible:



                        grep -Po '^(\.|S)*'


                        Or with ERE:



                        grep -Eo '^(\.|[^[:space:]])*'


                        That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






                        share|improve this answer
























                          up vote
                          5
                          down vote













                          With GNU grep or compatible:



                          grep -Po '^(\.|S)*'


                          Or with ERE:



                          grep -Eo '^(\.|[^[:space:]])*'


                          That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






                          share|improve this answer






















                            up vote
                            5
                            down vote










                            up vote
                            5
                            down vote









                            With GNU grep or compatible:



                            grep -Po '^(\.|S)*'


                            Or with ERE:



                            grep -Eo '^(\.|[^[:space:]])*'


                            That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.






                            share|improve this answer












                            With GNU grep or compatible:



                            grep -Po '^(\.|S)*'


                            Or with ERE:



                            grep -Eo '^(\.|[^[:space:]])*'


                            That treats as a quoting operator, for whitespace as a delimiter, but also for itself. That is, on foo\ bar input, it returns foo\.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Sep 25 at 22:03









                            Stéphane Chazelas

                            287k53529867




                            287k53529867



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f471353%2fdelimit-by-space-but-ignore-backslash-space%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Popular posts from this blog

                                How to check contact read email or not when send email to Individual?

                                Displaying single band from multi-band raster using QGIS

                                How many registers does an x86_64 CPU actually have?