columns to data matrix awk

Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1 and non present 0:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
add a comment |Â
up vote
1
down vote
favorite
I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1 and non present 0:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
previous examplesplease click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1in column under3.. is1 2 3 4 5 6 7 8 11 12 20 21known beforehand?
â Sundeep
May 9 at 10:52
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1 and non present 0:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
I'm looking for an awk solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1 and non present 0:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
edited May 10 at 12:19
ñÃÂsýù÷
14.8k82462
14.8k82462
asked May 9 at 10:43
newAWKer
62
62
previous examplesplease click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1in column under3.. is1 2 3 4 5 6 7 8 11 12 20 21known beforehand?
â Sundeep
May 9 at 10:52
add a comment |Â
previous examplesplease click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1in column under3.. is1 2 3 4 5 6 7 8 11 12 20 21known beforehand?
â Sundeep
May 9 at 10:52
previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?â Sundeep
May 9 at 10:52
previous examples please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1 in column under 3.. is 1 2 3 4 5 6 7 8 11 12 20 21 known beforehand?â Sundeep
May 9 at 10:52
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
Awk solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"- header linelen = split(h, head)- splithline into arrayheadwhere indices are ordered positions starting from1and values are crucial values obtained via splitting;lencontains an array sizeprint "tt" h- print the header line with leading tab charactersprintf "%st", $1;- print the 1st field$1for (i = 1; i <= len; i++)- iterate throughheaditems$2 ~ "\<" head[i] "\>"- check if the 2nd field$2contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
0
down vote
Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.
We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Awk solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"- header linelen = split(h, head)- splithline into arrayheadwhere indices are ordered positions starting from1and values are crucial values obtained via splitting;lencontains an array sizeprint "tt" h- print the header line with leading tab charactersprintf "%st", $1;- print the 1st field$1for (i = 1; i <= len; i++)- iterate throughheaditems$2 ~ "\<" head[i] "\>"- check if the 2nd field$2contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
2
down vote
Awk solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"- header linelen = split(h, head)- splithline into arrayheadwhere indices are ordered positions starting from1and values are crucial values obtained via splitting;lencontains an array sizeprint "tt" h- print the header line with leading tab charactersprintf "%st", $1;- print the 1st field$1for (i = 1; i <= len; i++)- iterate throughheaditems$2 ~ "\<" head[i] "\>"- check if the 2nd field$2contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
2
down vote
up vote
2
down vote
Awk solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"- header linelen = split(h, head)- splithline into arrayheadwhere indices are ordered positions starting from1and values are crucial values obtained via splitting;lencontains an array sizeprint "tt" h- print the header line with leading tab charactersprintf "%st", $1;- print the 1st field$1for (i = 1; i <= len; i++)- iterate throughheaditems$2 ~ "\<" head[i] "\>"- check if the 2nd field$2contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
Awk solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"- header linelen = split(h, head)- splithline into arrayheadwhere indices are ordered positions starting from1and values are crucial values obtained via splitting;lencontains an array sizeprint "tt" h- print the header line with leading tab charactersprintf "%st", $1;- print the 1st field$1for (i = 1; i <= len; i++)- iterate throughheaditems$2 ~ "\<" head[i] "\>"- check if the 2nd field$2contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
edited May 9 at 11:09
answered May 9 at 11:01
RomanPerekhrest
22.4k12144
22.4k12144
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing
printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/â newAWKer
May 9 at 13:18
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing
printf "%s%d", (i == 1? "" : OFS ) this command to get the 1's but can't get it to work :/â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's your
awk --version?â RomanPerekhrest
May 9 at 13:19
@newAWKer, it works fine. What's your
awk --version?â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change
($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)â RomanPerekhrest
May 9 at 13:31
@newAWKer, try to change
($2 ~ "\<" head[i] "\>") to ($2 ~ "\<" head[i] "\>" ? 1:0)â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
0
down vote
Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.
We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
add a comment |Â
up vote
0
down vote
Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.
We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.
We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
Another awk approach which generate header first by itself and then fill the matrix with 1 for each key in array head exist in current input line otherwise with 0 if doesn't exist.
We used grep to fetch only -o numbers which matches the empty string at the left edge of a number b (same as < we used in awk for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
edited May 10 at 10:56
answered May 10 at 10:50
ñÃÂsýù÷
14.8k82462
14.8k82462
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442728%2fcolumns-to-data-matrix-awk%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
previous examplesplease click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1in column under3.. is1 2 3 4 5 6 7 8 11 12 20 21known beforehand?â Sundeep
May 9 at 10:52