columns to data matrix awk

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):



dataframe.txt:



name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21


to this matrix with present 1 and non present 0:



matrix.txt:



 1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1






share|improve this question





















  • previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?
    – Sundeep
    May 9 at 10:52















up vote
1
down vote

favorite












I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):



dataframe.txt:



name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21


to this matrix with present 1 and non present 0:



matrix.txt:



 1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1






share|improve this question





















  • previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?
    – Sundeep
    May 9 at 10:52













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):



dataframe.txt:



name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21


to this matrix with present 1 and non present 0:



matrix.txt:



 1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1






share|improve this question













I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):



dataframe.txt:



name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21


to this matrix with present 1 and non present 0:



matrix.txt:



 1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1








share|improve this question












share|improve this question




share|improve this question








edited May 10 at 12:19









αғsнιη

14.8k82462




14.8k82462









asked May 9 at 10:43









newAWKer

62




62











  • previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?
    – Sundeep
    May 9 at 10:52

















  • previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?
    – Sundeep
    May 9 at 10:52
















previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?
– Sundeep
May 9 at 10:52





previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?
– Sundeep
May 9 at 10:52











2 Answers
2






active

oldest

votes

















up vote
2
down vote













Awk solution:



awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h


printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file



  • h = "1 2 3 4 5 6 7 8 11 12 20 21" - header line


  • len = split(h, head) - split h line into array head where indices are ordered positions starting from 1 and values are crucial values obtained via splitting; len contains an array size


  • print "tt" h - print the header line with leading tab characters


  • printf "%st", $1; - print the 1st field $1


  • for (i = 1; i <= len; i++) - iterate through head items


    • $2 ~ "\<" head[i] "\>" - check if the 2nd field $2 contains currently accessed item head[i]



The output:



 1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1





share|improve this answer























  • thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
    – newAWKer
    May 9 at 13:18











  • @newAWKer, it works fine. What's your awk --version?
    – RomanPerekhrest
    May 9 at 13:19










  • awk version 20070501
    – newAWKer
    May 9 at 13:22










  • @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
    – RomanPerekhrest
    May 9 at 13:31










  • I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
    – newAWKer
    May 9 at 13:42


















up vote
0
down vote













Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.



We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).



awk 'NR==FNR !head[$1]++; next 
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile


The output is:



name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1



You could generate the header and insert in first line in input file, then the awk would be easy as:



awk 'NR==1 split($0, head); next 
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile





share|improve this answer























    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442728%2fcolumns-to-data-matrix-awk%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    Awk solution:



    awk 'BEGIN
    h = "1 2 3 4 5 6 7 8 11 12 20 21";
    len = split(h, head);
    print "tt" h


    printf "%st", $1;
    for (i = 1; i <= len; i++)
    printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
    print ""
    ' file



    • h = "1 2 3 4 5 6 7 8 11 12 20 21" - header line


    • len = split(h, head) - split h line into array head where indices are ordered positions starting from 1 and values are crucial values obtained via splitting; len contains an array size


    • print "tt" h - print the header line with leading tab characters


    • printf "%st", $1; - print the 1st field $1


    • for (i = 1; i <= len; i++) - iterate through head items


      • $2 ~ "\<" head[i] "\>" - check if the 2nd field $2 contains currently accessed item head[i]



    The output:



     1 2 3 4 5 6 7 8 11 12 20 21
    name1 1 1 0 1 0 0 0 1 0 0 0 0
    name2 0 0 0 1 1 0 1 0 0 0 0 0
    name3 0 0 0 0 0 0 0 1 0 0 0 0
    name4 0 0 0 0 0 0 0 0 1 1 0 0
    namex 0 0 0 0 0 0 0 0 0 0 1 1





    share|improve this answer























    • thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
      – newAWKer
      May 9 at 13:18











    • @newAWKer, it works fine. What's your awk --version?
      – RomanPerekhrest
      May 9 at 13:19










    • awk version 20070501
      – newAWKer
      May 9 at 13:22










    • @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
      – RomanPerekhrest
      May 9 at 13:31










    • I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
      – newAWKer
      May 9 at 13:42















    up vote
    2
    down vote













    Awk solution:



    awk 'BEGIN
    h = "1 2 3 4 5 6 7 8 11 12 20 21";
    len = split(h, head);
    print "tt" h


    printf "%st", $1;
    for (i = 1; i <= len; i++)
    printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
    print ""
    ' file



    • h = "1 2 3 4 5 6 7 8 11 12 20 21" - header line


    • len = split(h, head) - split h line into array head where indices are ordered positions starting from 1 and values are crucial values obtained via splitting; len contains an array size


    • print "tt" h - print the header line with leading tab characters


    • printf "%st", $1; - print the 1st field $1


    • for (i = 1; i <= len; i++) - iterate through head items


      • $2 ~ "\<" head[i] "\>" - check if the 2nd field $2 contains currently accessed item head[i]



    The output:



     1 2 3 4 5 6 7 8 11 12 20 21
    name1 1 1 0 1 0 0 0 1 0 0 0 0
    name2 0 0 0 1 1 0 1 0 0 0 0 0
    name3 0 0 0 0 0 0 0 1 0 0 0 0
    name4 0 0 0 0 0 0 0 0 1 1 0 0
    namex 0 0 0 0 0 0 0 0 0 0 1 1





    share|improve this answer























    • thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
      – newAWKer
      May 9 at 13:18











    • @newAWKer, it works fine. What's your awk --version?
      – RomanPerekhrest
      May 9 at 13:19










    • awk version 20070501
      – newAWKer
      May 9 at 13:22










    • @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
      – RomanPerekhrest
      May 9 at 13:31










    • I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
      – newAWKer
      May 9 at 13:42













    up vote
    2
    down vote










    up vote
    2
    down vote









    Awk solution:



    awk 'BEGIN
    h = "1 2 3 4 5 6 7 8 11 12 20 21";
    len = split(h, head);
    print "tt" h


    printf "%st", $1;
    for (i = 1; i <= len; i++)
    printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
    print ""
    ' file



    • h = "1 2 3 4 5 6 7 8 11 12 20 21" - header line


    • len = split(h, head) - split h line into array head where indices are ordered positions starting from 1 and values are crucial values obtained via splitting; len contains an array size


    • print "tt" h - print the header line with leading tab characters


    • printf "%st", $1; - print the 1st field $1


    • for (i = 1; i <= len; i++) - iterate through head items


      • $2 ~ "\<" head[i] "\>" - check if the 2nd field $2 contains currently accessed item head[i]



    The output:



     1 2 3 4 5 6 7 8 11 12 20 21
    name1 1 1 0 1 0 0 0 1 0 0 0 0
    name2 0 0 0 1 1 0 1 0 0 0 0 0
    name3 0 0 0 0 0 0 0 1 0 0 0 0
    name4 0 0 0 0 0 0 0 0 1 1 0 0
    namex 0 0 0 0 0 0 0 0 0 0 1 1





    share|improve this answer















    Awk solution:



    awk 'BEGIN
    h = "1 2 3 4 5 6 7 8 11 12 20 21";
    len = split(h, head);
    print "tt" h


    printf "%st", $1;
    for (i = 1; i <= len; i++)
    printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
    print ""
    ' file



    • h = "1 2 3 4 5 6 7 8 11 12 20 21" - header line


    • len = split(h, head) - split h line into array head where indices are ordered positions starting from 1 and values are crucial values obtained via splitting; len contains an array size


    • print "tt" h - print the header line with leading tab characters


    • printf "%st", $1; - print the 1st field $1


    • for (i = 1; i <= len; i++) - iterate through head items


      • $2 ~ "\<" head[i] "\>" - check if the 2nd field $2 contains currently accessed item head[i]



    The output:



     1 2 3 4 5 6 7 8 11 12 20 21
    name1 1 1 0 1 0 0 0 1 0 0 0 0
    name2 0 0 0 1 1 0 1 0 0 0 0 0
    name3 0 0 0 0 0 0 0 1 0 0 0 0
    name4 0 0 0 0 0 0 0 0 1 1 0 0
    namex 0 0 0 0 0 0 0 0 0 0 1 1






    share|improve this answer















    share|improve this answer



    share|improve this answer








    edited May 9 at 11:09


























    answered May 9 at 11:01









    RomanPerekhrest

    22.4k12144




    22.4k12144











    • thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
      – newAWKer
      May 9 at 13:18











    • @newAWKer, it works fine. What's your awk --version?
      – RomanPerekhrest
      May 9 at 13:19










    • awk version 20070501
      – newAWKer
      May 9 at 13:22










    • @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
      – RomanPerekhrest
      May 9 at 13:31










    • I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
      – newAWKer
      May 9 at 13:42

















    • thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
      – newAWKer
      May 9 at 13:18











    • @newAWKer, it works fine. What's your awk --version?
      – RomanPerekhrest
      May 9 at 13:19










    • awk version 20070501
      – newAWKer
      May 9 at 13:22










    • @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
      – RomanPerekhrest
      May 9 at 13:31










    • I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
      – newAWKer
      May 9 at 13:42
















    thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
    – newAWKer
    May 9 at 13:18





    thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/
    – newAWKer
    May 9 at 13:18













    @newAWKer, it works fine. What's your awk --version?
    – RomanPerekhrest
    May 9 at 13:19




    @newAWKer, it works fine. What's your awk --version?
    – RomanPerekhrest
    May 9 at 13:19












    awk version 20070501
    – newAWKer
    May 9 at 13:22




    awk version 20070501
    – newAWKer
    May 9 at 13:22












    @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
    – RomanPerekhrest
    May 9 at 13:31




    @newAWKer, try to change ($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)
    – RomanPerekhrest
    May 9 at 13:31












    I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
    – newAWKer
    May 9 at 13:42





    I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
    – newAWKer
    May 9 at 13:42













    up vote
    0
    down vote













    Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.



    We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).



    awk 'NR==FNR !head[$1]++; next 
    printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
    ' <(grep -o 'b[0-9]+' infile) infile


    The output is:



    name1 1 1 1 0 0 1 0 0 0 0
    name2 0 0 1 1 1 0 0 0 0 0
    name3 0 0 0 0 0 1 0 0 0 0
    name4 0 0 0 0 0 0 1 1 0 0
    namex 0 0 0 0 0 0 0 0 1 1



    You could generate the header and insert in first line in input file, then the awk would be easy as:



    awk 'NR==1 split($0, head); next 
    printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
    print "" ' infile





    share|improve this answer



























      up vote
      0
      down vote













      Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.



      We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).



      awk 'NR==FNR !head[$1]++; next 
      printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
      ' <(grep -o 'b[0-9]+' infile) infile


      The output is:



      name1 1 1 1 0 0 1 0 0 0 0
      name2 0 0 1 1 1 0 0 0 0 0
      name3 0 0 0 0 0 1 0 0 0 0
      name4 0 0 0 0 0 0 1 1 0 0
      namex 0 0 0 0 0 0 0 0 1 1



      You could generate the header and insert in first line in input file, then the awk would be easy as:



      awk 'NR==1 split($0, head); next 
      printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
      print "" ' infile





      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.



        We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).



        awk 'NR==FNR !head[$1]++; next 
        printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
        ' <(grep -o 'b[0-9]+' infile) infile


        The output is:



        name1 1 1 1 0 0 1 0 0 0 0
        name2 0 0 1 1 1 0 0 0 0 0
        name3 0 0 0 0 0 1 0 0 0 0
        name4 0 0 0 0 0 0 1 1 0 0
        namex 0 0 0 0 0 0 0 0 1 1



        You could generate the header and insert in first line in input file, then the awk would be easy as:



        awk 'NR==1 split($0, head); next 
        printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
        print "" ' infile





        share|improve this answer















        Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.



        We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).



        awk 'NR==FNR !head[$1]++; next 
        printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
        ' <(grep -o 'b[0-9]+' infile) infile


        The output is:



        name1 1 1 1 0 0 1 0 0 0 0
        name2 0 0 1 1 1 0 0 0 0 0
        name3 0 0 0 0 0 1 0 0 0 0
        name4 0 0 0 0 0 0 1 1 0 0
        namex 0 0 0 0 0 0 0 0 1 1



        You could generate the header and insert in first line in input file, then the awk would be easy as:



        awk 'NR==1 split($0, head); next 
        printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
        print "" ' infile






        share|improve this answer















        share|improve this answer



        share|improve this answer








        edited May 10 at 10:56


























        answered May 10 at 10:50









        αғsнιη

        14.8k82462




        14.8k82462






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442728%2fcolumns-to-data-matrix-awk%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?