How to put items according to a heading? [closed]

up vote
-2
down vote

favorite

I have a 5000+ space separated lines as below:

Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2

I want to make a table with common headers for all and quantity each in a table as below,

 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.

edited May 30 at 22:04

asked May 30 at 1:50

JanakSunuwar

closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

2

Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â€“Â Rui F Ribeiro
May 30 at 2:53

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â€“Â JanakSunuwar
May 30 at 21:58

add a commentÂ |Â

up vote
-2
down vote

favorite

I have a 5000+ space separated lines as below:

Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2

I want to make a table with common headers for all and quantity each in a table as below,

 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.

edited May 30 at 22:04

asked May 30 at 1:50

JanakSunuwar

closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22

2

Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â€“Â Rui F Ribeiro
May 30 at 2:53

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â€“Â JanakSunuwar
May 30 at 21:58

add a commentÂ |Â

up vote
-2
down vote

favorite

I have a 5000+ space separated lines as below:

Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2

I want to make a table with common headers for all and quantity each in a table as below,

 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.

edited May 30 at 22:04

asked May 30 at 1:50

JanakSunuwar

I have a 5000+ space separated lines as below:

Item_A: Acou#1 Bla#5

Item_B: Acou#1 Elfa#2 Flq#2

Item_C: Acou#1 Bla#4 Elfa#2 Flq#2

Item_D: Agly#3 Bla#4 Elfa#2

I want to make a table with common headers for all and quantity each in a table as below,

 Acou Agly Bla Elfa Flq

Item_A: 1 0 5 0 0

Item_B: 1 0 0 2 2

Item_C: 1 0 4 2 2

Item_D: 0 3 4 2 0

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.

edited May 30 at 22:04

asked May 30 at 1:50

JanakSunuwar

edited May 30 at 22:04

asked May 30 at 1:50

JanakSunuwar

asked May 30 at 1:50

JanakSunuwar

asked May 30 at 1:50

JanakSunuwar

closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22

2

Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â€“Â Rui F Ribeiro
May 30 at 2:53

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â€“Â JanakSunuwar
May 30 at 21:58

add a commentÂ |Â

2

Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â€“Â Rui F Ribeiro
May 30 at 2:53

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â€“Â JanakSunuwar
May 30 at 21:58

Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â€“Â Rui F Ribeiro
May 30 at 2:53

I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â€“Â JanakSunuwar
May 30 at 21:58

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:

awk '
 for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file | 
 datamash --filler=0 crosstab 1,2 count 3
 Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0

Alternatively, with GNU awk (which allows multidimensional arrays):

gawk '
 BEGIN 
 OFS="t";
 PROCINFO["sorted_in"] = "@ind_str_asc";
 
 
 for (i=2;i<=NF;i++) 
 split($i,a,"#"); 
 h[a[1]] = 1;
 t[$1][a[1]] += a[2];
 
 
 END 
 for (j in h) printf("t%s", j);
 printf "n";
 for (i in t) 
 printf("%s",i);
 for (j in h) 
 printf("t%d", j in t[i] ? t[i][j] : 0);
 printf "n";
 
 ' file
 Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0

answered May 30 at 11:28

steeldriver

31.2k34978

Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â€“Â glenn jackman
May 30 at 14:46

Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â€“Â JanakSunuwar
May 30 at 22:01

@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â€“Â JanakSunuwar
May 31 at 1:10

add a commentÂ |Â

up vote
3
down vote

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i)
 if (!($i in heads))
 heads[$i]
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i)
 if ($i == j)
 found = 1 # matches header
 line = line OFS found
 
 print line

Running this on your data (after having removed empty lines):

$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0

Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.

The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...

A variation that truncates the headers at the # and uses the part after the # as the data to display:

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i) 
 split($i, h, "#")
 if (!(h[1] in heads))
 heads[h[1]]
 
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i) 
 split($i, h, "#")
 if (h[1] == j)
 found = h[2] # matches header
 
 line = line OFS found
 
 print line

Running it:

$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0

Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.

edited May 30 at 7:17

answered May 30 at 7:01

Kusalananda

102k13199313

Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â€“Â JanakSunuwar
May 30 at 21:59

add a commentÂ |Â

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:

awk '
 for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file | 
 datamash --filler=0 crosstab 1,2 count 3
 Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0

Alternatively, with GNU awk (which allows multidimensional arrays):

gawk '
 BEGIN 
 OFS="t";
 PROCINFO["sorted_in"] = "@ind_str_asc";
 
 
 for (i=2;i<=NF;i++) 
 split($i,a,"#"); 
 h[a[1]] = 1;
 t[$1][a[1]] += a[2];
 
 
 END 
 for (j in h) printf("t%s", j);
 printf "n";
 for (i in t) 
 printf("%s",i);
 for (j in h) 
 printf("t%d", j in t[i] ? t[i][j] : 0);
 printf "n";
 
 ' file
 Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0

answered May 30 at 11:28

steeldriver

31.2k34978

Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â€“Â glenn jackman
May 30 at 14:46

Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â€“Â JanakSunuwar
May 30 at 22:01

@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â€“Â JanakSunuwar
May 31 at 1:10

add a commentÂ |Â

up vote
2
down vote

accepted

If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:

awk '
 for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file | 
 datamash --filler=0 crosstab 1,2 count 3
 Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0

Alternatively, with GNU awk (which allows multidimensional arrays):

gawk '
 BEGIN 
 OFS="t";
 PROCINFO["sorted_in"] = "@ind_str_asc";
 
 
 for (i=2;i<=NF;i++) 
 split($i,a,"#"); 
 h[a[1]] = 1;
 t[$1][a[1]] += a[2];
 
 
 END 
 for (j in h) printf("t%s", j);
 printf "n";
 for (i in t) 
 printf("%s",i);
 for (j in h) 
 printf("t%d", j in t[i] ? t[i][j] : 0);
 printf "n";
 
 ' file
 Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0

answered May 30 at 11:28

steeldriver

31.2k34978

Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â€“Â glenn jackman
May 30 at 14:46

Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â€“Â JanakSunuwar
May 30 at 22:01

@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â€“Â JanakSunuwar
May 31 at 1:10

add a commentÂ |Â

up vote
2
down vote

accepted

If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:

awk '
 for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file | 
 datamash --filler=0 crosstab 1,2 count 3
 Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0

Alternatively, with GNU awk (which allows multidimensional arrays):

gawk '
 BEGIN 
 OFS="t";
 PROCINFO["sorted_in"] = "@ind_str_asc";
 
 
 for (i=2;i<=NF;i++) 
 split($i,a,"#"); 
 h[a[1]] = 1;
 t[$1][a[1]] += a[2];
 
 
 END 
 for (j in h) printf("t%s", j);
 printf "n";
 for (i in t) 
 printf("%s",i);
 for (j in h) 
 printf("t%d", j in t[i] ? t[i][j] : 0);
 printf "n";
 
 ' file
 Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0

answered May 30 at 11:28

steeldriver

31.2k34978

If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:

awk '
 for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file | 
 datamash --filler=0 crosstab 1,2 count 3
 Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0

Alternatively, with GNU awk (which allows multidimensional arrays):

gawk '
 BEGIN 
 OFS="t";
 PROCINFO["sorted_in"] = "@ind_str_asc";
 
 
 for (i=2;i<=NF;i++) 
 split($i,a,"#"); 
 h[a[1]] = 1;
 t[$1][a[1]] += a[2];
 
 
 END 
 for (j in h) printf("t%s", j);
 printf "n";
 for (i in t) 
 printf("%s",i);
 for (j in h) 
 printf("t%d", j in t[i] ? t[i][j] : 0);
 printf "n";
 
 ' file
 Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0

answered May 30 at 11:28

steeldriver

31.2k34978

answered May 30 at 11:28

steeldriver

31.2k34978

answered May 30 at 11:28

steeldriver

31.2k34978

answered May 30 at 11:28

steeldriver

31.2k34978

Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â€“Â glenn jackman
May 30 at 14:46

Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â€“Â JanakSunuwar
May 30 at 22:01

@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â€“Â JanakSunuwar
May 31 at 1:10

add a commentÂ |Â

Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â€“Â glenn jackman
May 30 at 14:46

Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â€“Â JanakSunuwar
May 30 at 22:01

@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â€“Â JanakSunuwar
May 31 at 1:10

Add # to FS and you don't need to split: FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â€“Â glenn jackman
May 30 at 14:46

Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â€“Â JanakSunuwar
May 30 at 22:01

@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â€“Â JanakSunuwar
May 31 at 1:10

add a commentÂ |Â

up vote
3
down vote

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i)
 if (!($i in heads))
 heads[$i]
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i)
 if ($i == j)
 found = 1 # matches header
 line = line OFS found
 
 print line

Running this on your data (after having removed empty lines):

$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0

The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...

A variation that truncates the headers at the # and uses the part after the # as the data to display:

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i) 
 split($i, h, "#")
 if (!(h[1] in heads))
 heads[h[1]]
 
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i) 
 split($i, h, "#")
 if (h[1] == j)
 found = h[2] # matches header
 
 line = line OFS found
 
 print line

Running it:

$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0

Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.

edited May 30 at 7:17

answered May 30 at 7:01

Kusalananda

102k13199313

Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â€“Â JanakSunuwar
May 30 at 21:59

add a commentÂ |Â

up vote
3
down vote

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i)
 if (!($i in heads))
 heads[$i]
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i)
 if ($i == j)
 found = 1 # matches header
 line = line OFS found
 
 print line

Running this on your data (after having removed empty lines):

$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0

The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...

A variation that truncates the headers at the # and uses the part after the # as the data to display:

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i) 
 split($i, h, "#")
 if (!(h[1] in heads))
 heads[h[1]]
 
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i) 
 split($i, h, "#")
 if (h[1] == j)
 found = h[2] # matches header
 
 line = line OFS found
 
 print line

Running it:

$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0

Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.

edited May 30 at 7:17

answered May 30 at 7:01

Kusalananda

102k13199313

Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â€“Â JanakSunuwar
May 30 at 21:59

add a commentÂ |Â

up vote
3
down vote

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i)
 if (!($i in heads))
 heads[$i]
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i)
 if ($i == j)
 found = 1 # matches header
 line = line OFS found
 
 print line

Running this on your data (after having removed empty lines):

$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0

The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...

A variation that truncates the headers at the # and uses the part after the # as the data to display:

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i) 
 split($i, h, "#")
 if (!(h[1] in heads))
 heads[h[1]]
 
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i) 
 split($i, h, "#")
 if (h[1] == j)
 found = h[2] # matches header
 
 line = line OFS found
 
 print line

Running it:

$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0

Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.

edited May 30 at 7:17

answered May 30 at 7:01

Kusalananda

102k13199313

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i)
 if (!($i in heads))
 heads[$i]
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i)
 if ($i == j)
 found = 1 # matches header
 line = line OFS found
 
 print line

Running this on your data (after having removed empty lines):

$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0

The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...

A variation that truncates the headers at the # and uses the part after the # as the data to display:

BEGIN OFS = "t" 

# Collect headers from data
FNR == NR 
 for (i = 2; i <= NF; ++i) 
 split($i, h, "#")
 if (!(h[1] in heads))
 heads[h[1]]
 
 next


# Output header
FNR == 1 
 line = "Items"
 for (j in heads)
 line = line OFS j
 print line



 line = $1
 # Iterate through the header items, testing each field against it
 for (j in heads) 
 found = 0 # assume not found
 for (i = 2; !found && i <= NF; ++i) 
 split($i, h, "#")
 if (h[1] == j)
 found = h[2] # matches header
 
 line = line OFS found
 
 print line

Running it:

$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0

Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.

edited May 30 at 7:17

answered May 30 at 7:01

Kusalananda

102k13199313

edited May 30 at 7:17

answered May 30 at 7:01

Kusalananda

102k13199313

answered May 30 at 7:01

Kusalananda

102k13199313

answered May 30 at 7:01

Kusalananda

102k13199313

Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â€“Â JanakSunuwar
May 30 at 21:59

add a commentÂ |Â

Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â€“Â JanakSunuwar
May 30 at 21:59

Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â€“Â JanakSunuwar
May 30 at 21:59

add a commentÂ |Â

搜尋此網誌

mjhjmtu