AWK match on term. Columns don't line up

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












Been trying to figure this one out for a while now. Went through this site and googled like crazy. It would be greatly appreciated if someone could help.



I have some log files with no headers and columns are all over the place. Meaning that an item like src=4.2.2.2 could be anywhere in the file. All objects in the file do have a something=xxx format.



Example Log:




src=1.1.1.1 sport=12312 dport=80 message=hacked



dport=443 src=2.2.2.2 message=null sport=32432



message=clean dport=21 sport=43434 src=3.3.3.3




I have used the match() function, but am trying to pull out multiple fields.




gawk 'match($0, "src=([^ ]+)") print substr($0, RSTART, RLENGTH) ' file




gives me the following:




src=1.1.1.1



src=2.2.2.2



src=3.3.3.3




I would like to have multiple fields like src, dport and message so the output lines up and looks like the following:




src=1.1.1.1 dport=80 message=hacked



src=2.2.2.2 dport=443 message=null



src=3.3.3.3 dport=21 message=clean




Is this possible with gawk or something else?



Thanks!







share|improve this question
















  • 1




    So basically you want to take a list with four fields in random order, remove/ignore one of them (sport) and output the other three in a predefined order (src, dport, message). Correct?
    – nohillside
    Jan 3 at 15:24














up vote
2
down vote

favorite












Been trying to figure this one out for a while now. Went through this site and googled like crazy. It would be greatly appreciated if someone could help.



I have some log files with no headers and columns are all over the place. Meaning that an item like src=4.2.2.2 could be anywhere in the file. All objects in the file do have a something=xxx format.



Example Log:




src=1.1.1.1 sport=12312 dport=80 message=hacked



dport=443 src=2.2.2.2 message=null sport=32432



message=clean dport=21 sport=43434 src=3.3.3.3




I have used the match() function, but am trying to pull out multiple fields.




gawk 'match($0, "src=([^ ]+)") print substr($0, RSTART, RLENGTH) ' file




gives me the following:




src=1.1.1.1



src=2.2.2.2



src=3.3.3.3




I would like to have multiple fields like src, dport and message so the output lines up and looks like the following:




src=1.1.1.1 dport=80 message=hacked



src=2.2.2.2 dport=443 message=null



src=3.3.3.3 dport=21 message=clean




Is this possible with gawk or something else?



Thanks!







share|improve this question
















  • 1




    So basically you want to take a list with four fields in random order, remove/ignore one of them (sport) and output the other three in a predefined order (src, dport, message). Correct?
    – nohillside
    Jan 3 at 15:24












up vote
2
down vote

favorite









up vote
2
down vote

favorite











Been trying to figure this one out for a while now. Went through this site and googled like crazy. It would be greatly appreciated if someone could help.



I have some log files with no headers and columns are all over the place. Meaning that an item like src=4.2.2.2 could be anywhere in the file. All objects in the file do have a something=xxx format.



Example Log:




src=1.1.1.1 sport=12312 dport=80 message=hacked



dport=443 src=2.2.2.2 message=null sport=32432



message=clean dport=21 sport=43434 src=3.3.3.3




I have used the match() function, but am trying to pull out multiple fields.




gawk 'match($0, "src=([^ ]+)") print substr($0, RSTART, RLENGTH) ' file




gives me the following:




src=1.1.1.1



src=2.2.2.2



src=3.3.3.3




I would like to have multiple fields like src, dport and message so the output lines up and looks like the following:




src=1.1.1.1 dport=80 message=hacked



src=2.2.2.2 dport=443 message=null



src=3.3.3.3 dport=21 message=clean




Is this possible with gawk or something else?



Thanks!







share|improve this question












Been trying to figure this one out for a while now. Went through this site and googled like crazy. It would be greatly appreciated if someone could help.



I have some log files with no headers and columns are all over the place. Meaning that an item like src=4.2.2.2 could be anywhere in the file. All objects in the file do have a something=xxx format.



Example Log:




src=1.1.1.1 sport=12312 dport=80 message=hacked



dport=443 src=2.2.2.2 message=null sport=32432



message=clean dport=21 sport=43434 src=3.3.3.3




I have used the match() function, but am trying to pull out multiple fields.




gawk 'match($0, "src=([^ ]+)") print substr($0, RSTART, RLENGTH) ' file




gives me the following:




src=1.1.1.1



src=2.2.2.2



src=3.3.3.3




I would like to have multiple fields like src, dport and message so the output lines up and looks like the following:




src=1.1.1.1 dport=80 message=hacked



src=2.2.2.2 dport=443 message=null



src=3.3.3.3 dport=21 message=clean




Is this possible with gawk or something else?



Thanks!









share|improve this question











share|improve this question




share|improve this question










asked Jan 3 at 15:21









mrusenet

182




182







  • 1




    So basically you want to take a list with four fields in random order, remove/ignore one of them (sport) and output the other three in a predefined order (src, dport, message). Correct?
    – nohillside
    Jan 3 at 15:24












  • 1




    So basically you want to take a list with four fields in random order, remove/ignore one of them (sport) and output the other three in a predefined order (src, dport, message). Correct?
    – nohillside
    Jan 3 at 15:24







1




1




So basically you want to take a list with four fields in random order, remove/ignore one of them (sport) and output the other three in a predefined order (src, dport, message). Correct?
– nohillside
Jan 3 at 15:24




So basically you want to take a list with four fields in random order, remove/ignore one of them (sport) and output the other three in a predefined order (src, dport, message). Correct?
– nohillside
Jan 3 at 15:24










2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










Awk solution (independent of item positions):



awk 'function get_item(name) 
match($0, name"=[^[:space:]]+");
return substr($0, RSTART, RLENGTH)

print get_item("src"), get_item("dport"), get_item("message") ' file


With the above approach you are able to output the crucial needed items in any order.



The output:



src=1.1.1.1 dport=80 message=hacked
src=2.2.2.2 dport=443 message=null
src=3.3.3.3 dport=21 message=clean





share|improve this answer
















  • 1




    Elegant extension of the OP's methodology - I like it
    – steeldriver
    Jan 3 at 16:06










  • @steeldriver, thanks ...
    – RomanPerekhrest
    Jan 3 at 16:06










  • That's awesome. Should have just asked in here first.
    – mrusenet
    Jan 3 at 17:08










  • Another quick question. How would I include spaces in output? Like if "message=something something something"
    – mrusenet
    Jan 3 at 17:09










  • @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
    – RomanPerekhrest
    Jan 3 at 18:19

















up vote
2
down vote













With GNU awk (NOTE: this relies on the lexical sort order of the "key" strings):



gawk 'split($0,a); asort(a); printf("%st%st%sn", a[4], a[1], a[2])' file
src=1.1.1.1 dport=80 message=hacked
src=2.2.2.2 dport=443 message=null
src=3.3.3.3 dport=21 message=clean





share|improve this answer




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414572%2fawk-match-on-term-columns-dont-line-up%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    Awk solution (independent of item positions):



    awk 'function get_item(name) 
    match($0, name"=[^[:space:]]+");
    return substr($0, RSTART, RLENGTH)

    print get_item("src"), get_item("dport"), get_item("message") ' file


    With the above approach you are able to output the crucial needed items in any order.



    The output:



    src=1.1.1.1 dport=80 message=hacked
    src=2.2.2.2 dport=443 message=null
    src=3.3.3.3 dport=21 message=clean





    share|improve this answer
















    • 1




      Elegant extension of the OP's methodology - I like it
      – steeldriver
      Jan 3 at 16:06










    • @steeldriver, thanks ...
      – RomanPerekhrest
      Jan 3 at 16:06










    • That's awesome. Should have just asked in here first.
      – mrusenet
      Jan 3 at 17:08










    • Another quick question. How would I include spaces in output? Like if "message=something something something"
      – mrusenet
      Jan 3 at 17:09










    • @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
      – RomanPerekhrest
      Jan 3 at 18:19














    up vote
    1
    down vote



    accepted










    Awk solution (independent of item positions):



    awk 'function get_item(name) 
    match($0, name"=[^[:space:]]+");
    return substr($0, RSTART, RLENGTH)

    print get_item("src"), get_item("dport"), get_item("message") ' file


    With the above approach you are able to output the crucial needed items in any order.



    The output:



    src=1.1.1.1 dport=80 message=hacked
    src=2.2.2.2 dport=443 message=null
    src=3.3.3.3 dport=21 message=clean





    share|improve this answer
















    • 1




      Elegant extension of the OP's methodology - I like it
      – steeldriver
      Jan 3 at 16:06










    • @steeldriver, thanks ...
      – RomanPerekhrest
      Jan 3 at 16:06










    • That's awesome. Should have just asked in here first.
      – mrusenet
      Jan 3 at 17:08










    • Another quick question. How would I include spaces in output? Like if "message=something something something"
      – mrusenet
      Jan 3 at 17:09










    • @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
      – RomanPerekhrest
      Jan 3 at 18:19












    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    Awk solution (independent of item positions):



    awk 'function get_item(name) 
    match($0, name"=[^[:space:]]+");
    return substr($0, RSTART, RLENGTH)

    print get_item("src"), get_item("dport"), get_item("message") ' file


    With the above approach you are able to output the crucial needed items in any order.



    The output:



    src=1.1.1.1 dport=80 message=hacked
    src=2.2.2.2 dport=443 message=null
    src=3.3.3.3 dport=21 message=clean





    share|improve this answer












    Awk solution (independent of item positions):



    awk 'function get_item(name) 
    match($0, name"=[^[:space:]]+");
    return substr($0, RSTART, RLENGTH)

    print get_item("src"), get_item("dport"), get_item("message") ' file


    With the above approach you are able to output the crucial needed items in any order.



    The output:



    src=1.1.1.1 dport=80 message=hacked
    src=2.2.2.2 dport=443 message=null
    src=3.3.3.3 dport=21 message=clean






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 3 at 15:52









    RomanPerekhrest

    22.4k12145




    22.4k12145







    • 1




      Elegant extension of the OP's methodology - I like it
      – steeldriver
      Jan 3 at 16:06










    • @steeldriver, thanks ...
      – RomanPerekhrest
      Jan 3 at 16:06










    • That's awesome. Should have just asked in here first.
      – mrusenet
      Jan 3 at 17:08










    • Another quick question. How would I include spaces in output? Like if "message=something something something"
      – mrusenet
      Jan 3 at 17:09










    • @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
      – RomanPerekhrest
      Jan 3 at 18:19












    • 1




      Elegant extension of the OP's methodology - I like it
      – steeldriver
      Jan 3 at 16:06










    • @steeldriver, thanks ...
      – RomanPerekhrest
      Jan 3 at 16:06










    • That's awesome. Should have just asked in here first.
      – mrusenet
      Jan 3 at 17:08










    • Another quick question. How would I include spaces in output? Like if "message=something something something"
      – mrusenet
      Jan 3 at 17:09










    • @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
      – RomanPerekhrest
      Jan 3 at 18:19







    1




    1




    Elegant extension of the OP's methodology - I like it
    – steeldriver
    Jan 3 at 16:06




    Elegant extension of the OP's methodology - I like it
    – steeldriver
    Jan 3 at 16:06












    @steeldriver, thanks ...
    – RomanPerekhrest
    Jan 3 at 16:06




    @steeldriver, thanks ...
    – RomanPerekhrest
    Jan 3 at 16:06












    That's awesome. Should have just asked in here first.
    – mrusenet
    Jan 3 at 17:08




    That's awesome. Should have just asked in here first.
    – mrusenet
    Jan 3 at 17:08












    Another quick question. How would I include spaces in output? Like if "message=something something something"
    – mrusenet
    Jan 3 at 17:09




    Another quick question. How would I include spaces in output? Like if "message=something something something"
    – mrusenet
    Jan 3 at 17:09












    @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
    – RomanPerekhrest
    Jan 3 at 18:19




    @mrusenet, "Another quick question" does not mean "quick solution". That's another story for another case, not the current one
    – RomanPerekhrest
    Jan 3 at 18:19












    up vote
    2
    down vote













    With GNU awk (NOTE: this relies on the lexical sort order of the "key" strings):



    gawk 'split($0,a); asort(a); printf("%st%st%sn", a[4], a[1], a[2])' file
    src=1.1.1.1 dport=80 message=hacked
    src=2.2.2.2 dport=443 message=null
    src=3.3.3.3 dport=21 message=clean





    share|improve this answer
























      up vote
      2
      down vote













      With GNU awk (NOTE: this relies on the lexical sort order of the "key" strings):



      gawk 'split($0,a); asort(a); printf("%st%st%sn", a[4], a[1], a[2])' file
      src=1.1.1.1 dport=80 message=hacked
      src=2.2.2.2 dport=443 message=null
      src=3.3.3.3 dport=21 message=clean





      share|improve this answer






















        up vote
        2
        down vote










        up vote
        2
        down vote









        With GNU awk (NOTE: this relies on the lexical sort order of the "key" strings):



        gawk 'split($0,a); asort(a); printf("%st%st%sn", a[4], a[1], a[2])' file
        src=1.1.1.1 dport=80 message=hacked
        src=2.2.2.2 dport=443 message=null
        src=3.3.3.3 dport=21 message=clean





        share|improve this answer












        With GNU awk (NOTE: this relies on the lexical sort order of the "key" strings):



        gawk 'split($0,a); asort(a); printf("%st%st%sn", a[4], a[1], a[2])' file
        src=1.1.1.1 dport=80 message=hacked
        src=2.2.2.2 dport=443 message=null
        src=3.3.3.3 dport=21 message=clean






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 15:48









        steeldriver

        31.6k34979




        31.6k34979






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414572%2fawk-match-on-term-columns-dont-line-up%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?