How to put items according to a heading? [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-2
down vote

favorite












I have a 5000+ space separated lines as below:



Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2


I want to make a table with common headers for all and quantity each in a table as below,



 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0


I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.







share|improve this question













closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 2




    Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
    – Rui F Ribeiro
    May 30 at 2:53











  • I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
    – JanakSunuwar
    May 30 at 21:58














up vote
-2
down vote

favorite












I have a 5000+ space separated lines as below:



Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2


I want to make a table with common headers for all and quantity each in a table as below,



 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0


I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.







share|improve this question













closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 2




    Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
    – Rui F Ribeiro
    May 30 at 2:53











  • I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
    – JanakSunuwar
    May 30 at 21:58












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have a 5000+ space separated lines as below:



Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2


I want to make a table with common headers for all and quantity each in a table as below,



 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0


I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.







share|improve this question













I have a 5000+ space separated lines as below:



Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2


I want to make a table with common headers for all and quantity each in a table as below,



 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0


I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.









share|improve this question












share|improve this question




share|improve this question








edited May 30 at 22:04
























asked May 30 at 1:50









JanakSunuwar

63




63




closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









  • 2




    Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
    – Rui F Ribeiro
    May 30 at 2:53











  • I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
    – JanakSunuwar
    May 30 at 21:58












  • 2




    Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
    – Rui F Ribeiro
    May 30 at 2:53











  • I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
    – JanakSunuwar
    May 30 at 21:58







2




2




Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
– Rui F Ribeiro
May 30 at 2:53





Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
– Rui F Ribeiro
May 30 at 2:53













I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
– JanakSunuwar
May 30 at 21:58




I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
– JanakSunuwar
May 30 at 21:58










2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:



awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0


Alternatively, with GNU awk (which allows multidimensional arrays):



gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";


for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];


END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";

' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0





share|improve this answer





















  • Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
    – glenn jackman
    May 30 at 14:46











  • Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
    – JanakSunuwar
    May 30 at 22:01










  • @steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
    – JanakSunuwar
    May 31 at 1:10


















up vote
3
down vote













BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found

print line



Running this on your data (after having removed empty lines):



$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0


Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.



The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...



A variation that truncates the headers at the # and uses the part after the # as the data to display:



BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]

next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header

line = line OFS found

print line



Running it:



$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0


Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.






share|improve this answer























  • Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
    – JanakSunuwar
    May 30 at 21:59

















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:



awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0


Alternatively, with GNU awk (which allows multidimensional arrays):



gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";


for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];


END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";

' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0





share|improve this answer





















  • Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
    – glenn jackman
    May 30 at 14:46











  • Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
    – JanakSunuwar
    May 30 at 22:01










  • @steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
    – JanakSunuwar
    May 31 at 1:10















up vote
2
down vote



accepted










If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:



awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0


Alternatively, with GNU awk (which allows multidimensional arrays):



gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";


for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];


END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";

' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0





share|improve this answer





















  • Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
    – glenn jackman
    May 30 at 14:46











  • Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
    – JanakSunuwar
    May 30 at 22:01










  • @steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
    – JanakSunuwar
    May 31 at 1:10













up vote
2
down vote



accepted







up vote
2
down vote



accepted






If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:



awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0


Alternatively, with GNU awk (which allows multidimensional arrays):



gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";


for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];


END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";

' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0





share|improve this answer













If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:



awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0


Alternatively, with GNU awk (which allows multidimensional arrays):



gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";


for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];


END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";

' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0






share|improve this answer













share|improve this answer



share|improve this answer











answered May 30 at 11:28









steeldriver

31.2k34978




31.2k34978











  • Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
    – glenn jackman
    May 30 at 14:46











  • Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
    – JanakSunuwar
    May 30 at 22:01










  • @steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
    – JanakSunuwar
    May 31 at 1:10

















  • Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
    – glenn jackman
    May 30 at 14:46











  • Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
    – JanakSunuwar
    May 30 at 22:01










  • @steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
    – JanakSunuwar
    May 31 at 1:10
















Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
– glenn jackman
May 30 at 14:46





Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
– glenn jackman
May 30 at 14:46













Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
– JanakSunuwar
May 30 at 22:01




Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
– JanakSunuwar
May 30 at 22:01












@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
– JanakSunuwar
May 31 at 1:10





@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
– JanakSunuwar
May 31 at 1:10













up vote
3
down vote













BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found

print line



Running this on your data (after having removed empty lines):



$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0


Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.



The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...



A variation that truncates the headers at the # and uses the part after the # as the data to display:



BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]

next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header

line = line OFS found

print line



Running it:



$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0


Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.






share|improve this answer























  • Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
    – JanakSunuwar
    May 30 at 21:59














up vote
3
down vote













BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found

print line



Running this on your data (after having removed empty lines):



$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0


Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.



The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...



A variation that truncates the headers at the # and uses the part after the # as the data to display:



BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]

next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header

line = line OFS found

print line



Running it:



$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0


Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.






share|improve this answer























  • Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
    – JanakSunuwar
    May 30 at 21:59












up vote
3
down vote










up vote
3
down vote









BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found

print line



Running this on your data (after having removed empty lines):



$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0


Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.



The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...



A variation that truncates the headers at the # and uses the part after the # as the data to display:



BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]

next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header

line = line OFS found

print line



Running it:



$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0


Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.






share|improve this answer















BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found

print line



Running this on your data (after having removed empty lines):



$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0


Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.



The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...



A variation that truncates the headers at the # and uses the part after the # as the data to display:



BEGIN OFS = "t" 

# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]

next


# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line



line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header

line = line OFS found

print line



Running it:



$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0


Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.







share|improve this answer















share|improve this answer



share|improve this answer








edited May 30 at 7:17


























answered May 30 at 7:01









Kusalananda

102k13199313




102k13199313











  • Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
    – JanakSunuwar
    May 30 at 21:59
















  • Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
    – JanakSunuwar
    May 30 at 21:59















Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
– JanakSunuwar
May 30 at 21:59




Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
– JanakSunuwar
May 30 at 21:59


Popular posts from this blog

Peggy Mitchell

Palaiologos

The Forum (Inglewood, California)