How to put items according to a heading? [closed]

Clash Royale CLAN TAG#URR8PPP
up vote
-2
down vote
favorite
I have a 5000+ space separated lines as below:
Item_A: Acou#1 Bla#5
Item_B: Acou#1 Elfa#2 Flq#2
Item_C: Acou#1 Bla#4 Elfa#2 Flq#2
Item_D: Agly#3 Bla#4 Elfa#2
I want to make a table with common headers for all and quantity each in a table as below,
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
text-processing awk
closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |Â
up vote
-2
down vote
favorite
I have a 5000+ space separated lines as below:
Item_A: Acou#1 Bla#5
Item_B: Acou#1 Elfa#2 Flq#2
Item_C: Acou#1 Bla#4 Elfa#2 Flq#2
Item_D: Agly#3 Bla#4 Elfa#2
I want to make a table with common headers for all and quantity each in a table as below,
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
text-processing awk
closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
2
Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â Rui F Ribeiro
May 30 at 2:53
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â JanakSunuwar
May 30 at 21:58
add a comment |Â
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have a 5000+ space separated lines as below:
Item_A: Acou#1 Bla#5
Item_B: Acou#1 Elfa#2 Flq#2
Item_C: Acou#1 Bla#4 Elfa#2 Flq#2
Item_D: Agly#3 Bla#4 Elfa#2
I want to make a table with common headers for all and quantity each in a table as below,
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
text-processing awk
I have a 5000+ space separated lines as below:
Item_A: Acou#1 Bla#5
Item_B: Acou#1 Elfa#2 Flq#2
Item_C: Acou#1 Bla#4 Elfa#2 Flq#2
Item_D: Agly#3 Bla#4 Elfa#2
I want to make a table with common headers for all and quantity each in a table as below,
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
text-processing awk
edited May 30 at 22:04
asked May 30 at 1:50
JanakSunuwar
63
63
closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as too broad by Rui F Ribeiro, Jeff Schaller, Anthony Geoghegan, Jesse_b, Thomas May 30 at 16:22
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
2
Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â Rui F Ribeiro
May 30 at 2:53
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â JanakSunuwar
May 30 at 21:58
add a comment |Â
2
Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â Rui F Ribeiro
May 30 at 2:53
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â JanakSunuwar
May 30 at 21:58
2
2
Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â Rui F Ribeiro
May 30 at 2:53
Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â Rui F Ribeiro
May 30 at 2:53
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â JanakSunuwar
May 30 at 21:58
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â JanakSunuwar
May 30 at 21:58
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:
awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0
Alternatively, with GNU awk (which allows multidimensional arrays):
gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";
for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];
END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";
' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
Add # to FS and you don't need to split:FS = "[ t#]+"-- thenfor (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â glenn jackman
May 30 at 14:46
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
add a comment |Â
up vote
3
down vote
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found
print line
Running this on your data (after having removed empty lines):
$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0
Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.
The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...
A variation that truncates the headers at the # and uses the part after the # as the data to display:
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header
line = line OFS found
print line
Running it:
$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0
Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:
awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0
Alternatively, with GNU awk (which allows multidimensional arrays):
gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";
for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];
END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";
' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
Add # to FS and you don't need to split:FS = "[ t#]+"-- thenfor (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â glenn jackman
May 30 at 14:46
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
add a comment |Â
up vote
2
down vote
accepted
If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:
awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0
Alternatively, with GNU awk (which allows multidimensional arrays):
gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";
for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];
END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";
' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
Add # to FS and you don't need to split:FS = "[ t#]+"-- thenfor (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â glenn jackman
May 30 at 14:46
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:
awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0
Alternatively, with GNU awk (which allows multidimensional arrays):
gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";
for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];
END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";
' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
If you don't mind throwing GNU datamash into the mix, then you could simply serialize the entries and then crosstabulate them:
awk '
for (i=2;i<=NF;i++) split($i,a,"#"); print $1,a[1],a[2]' OFS='t' file |
datamash --filler=0 crosstab 1,2 count 3
Acou Agly Bla Elfa Flq
Item_A: 1 0 1 0 0
Item_B: 1 0 0 1 1
Item_C: 1 0 1 1 1
Item_D: 0 1 1 1 0
Alternatively, with GNU awk (which allows multidimensional arrays):
gawk '
BEGIN
OFS="t";
PROCINFO["sorted_in"] = "@ind_str_asc";
for (i=2;i<=NF;i++)
split($i,a,"#");
h[a[1]] = 1;
t[$1][a[1]] += a[2];
END
for (j in h) printf("t%s", j);
printf "n";
for (i in t)
printf("%s",i);
for (j in h)
printf("t%d", j in t[i] ? t[i][j] : 0);
printf "n";
' file
Acou Agly Bla Elfa Flq
Item_A: 1 0 5 0 0
Item_B: 1 0 0 2 2
Item_C: 1 0 4 2 2
Item_D: 0 3 4 2 0
answered May 30 at 11:28
steeldriver
31.2k34978
31.2k34978
Add # to FS and you don't need to split:FS = "[ t#]+"-- thenfor (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â glenn jackman
May 30 at 14:46
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
add a comment |Â
Add # to FS and you don't need to split:FS = "[ t#]+"-- thenfor (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)
â glenn jackman
May 30 at 14:46
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
Add # to FS and you don't need to split:
FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)â glenn jackman
May 30 at 14:46
Add # to FS and you don't need to split:
FS = "[ t#]+" -- then for (i=2; i<NF; i+=2) h[$i]=1; t[$1][$i] = $(i+1)â glenn jackman
May 30 at 14:46
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
Thank you @steeldriver. It works great and don't need to sort columns alphabetically as well. Much appreciated.
â JanakSunuwar
May 30 at 22:01
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
@steeldriver: The alternate version works; however, there is an error from datamash as:datamash: invalid operation 'crosstab'
â JanakSunuwar
May 31 at 1:10
add a comment |Â
up vote
3
down vote
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found
print line
Running this on your data (after having removed empty lines):
$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0
Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.
The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...
A variation that truncates the headers at the # and uses the part after the # as the data to display:
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header
line = line OFS found
print line
Running it:
$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0
Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
add a comment |Â
up vote
3
down vote
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found
print line
Running this on your data (after having removed empty lines):
$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0
Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.
The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...
A variation that truncates the headers at the # and uses the part after the # as the data to display:
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header
line = line OFS found
print line
Running it:
$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0
Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
add a comment |Â
up vote
3
down vote
up vote
3
down vote
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found
print line
Running this on your data (after having removed empty lines):
$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0
Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.
The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...
A variation that truncates the headers at the # and uses the part after the # as the data to display:
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header
line = line OFS found
print line
Running it:
$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0
Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
if (!($i in heads))
heads[$i]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
if ($i == j)
found = 1 # matches header
line = line OFS found
print line
Running this on your data (after having removed empty lines):
$ awk -f script.awk file file
Items Acou#1 Bla#4 Bla#5 Elfa#2 Agly#3 Flq#2
Item_A: 1 0 1 0 0 0
Item_B: 1 0 0 1 0 1
Item_C: 1 1 0 1 0 1
Item_D: 0 1 0 1 1 0
Note that you have to specify the input data file twice. This is because we're scanning it two times. In the first scan we're collecting the data items on each line (the FNR == NR block). On the second scan, we test each collected data item (the headers) against the data on each line.
The output is simply 0 if the field in the header is not present in the data on that line, and 1 if it is. This is not quite what you asked for, so...
A variation that truncates the headers at the # and uses the part after the # as the data to display:
BEGIN OFS = "t"
# Collect headers from data
FNR == NR
for (i = 2; i <= NF; ++i)
split($i, h, "#")
if (!(h[1] in heads))
heads[h[1]]
next
# Output header
FNR == 1
line = "Items"
for (j in heads)
line = line OFS j
print line
line = $1
# Iterate through the header items, testing each field against it
for (j in heads)
found = 0 # assume not found
for (i = 2; !found && i <= NF; ++i)
split($i, h, "#")
if (h[1] == j)
found = h[2] # matches header
line = line OFS found
print line
Running it:
$ awk -f script.awk file file
Items Elfa Bla Acou Agly Flq
Item_A: 0 5 1 0 0
Item_B: 2 0 1 0 2
Item_C: 2 4 1 0 2
Item_D: 2 4 0 3 0
Note that the order of the columns is not necessarily sorted (since they are stored as keys in an associative array). I'm leaving it as an exercise to the reader to sort them.
edited May 30 at 7:17
answered May 30 at 7:01
Kusalananda
102k13199313
102k13199313
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
add a comment |Â
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
Thank you for detail explanation @Kusalananda. Much appreciated and works great as you have explained.
â JanakSunuwar
May 30 at 21:59
add a comment |Â
2
Please detail what you have done until now and where exactly you need help. We are not a script "for hire" service, we are a Unix forum.
â Rui F Ribeiro
May 30 at 2:53
I used to grep lines containing "Acou", then, "Bla" and so on. Then edit in excel to quantify them and concatenate all separate files to one file. However, it took a lot of time.
â JanakSunuwar
May 30 at 21:58