columns to data matrix awk
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I'm looking for an awk
solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1
and non present 0
:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
add a comment |Â
up vote
1
down vote
favorite
I'm looking for an awk
solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1
and non present 0
:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
previous examples
please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1
in column under3
.. is1 2 3 4 5 6 7 8 11 12 20 21
known beforehand?
â Sundeep
May 9 at 10:52
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm looking for an awk
solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1
and non present 0
:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
I'm looking for an awk
solution to get from this table (with x number of names in column 1, and different lengths of column 2 which is ";" separated):
dataframe.txt:
name1 1;2;4;8
name2 4;5;7
name3 8
name4 11;12
namex 20;21
to this matrix with present 1
and non present 0
:
matrix.txt:
1 2 4 5 7 8 11 12 20 21
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
text-processing awk
edited May 10 at 12:19
ñÃÂsýù÷
14.8k82462
14.8k82462
asked May 9 at 10:43
newAWKer
62
62
previous examples
please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1
in column under3
.. is1 2 3 4 5 6 7 8 11 12 20 21
known beforehand?
â Sundeep
May 9 at 10:52
add a comment |Â
previous examples
please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1
in column under3
.. is1 2 3 4 5 6 7 8 11 12 20 21
known beforehand?
â Sundeep
May 9 at 10:52
previous examples
please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1
in column under 3
.. is 1 2 3 4 5 6 7 8 11 12 20 21
known beforehand?â Sundeep
May 9 at 10:52
previous examples
please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no 1
in column under 3
.. is 1 2 3 4 5 6 7 8 11 12 20 21
known beforehand?â Sundeep
May 9 at 10:52
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
Awk
solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"
- header linelen = split(h, head)
- splith
line into arrayhead
where indices are ordered positions starting from1
and values are crucial values obtained via splitting;len
contains an array sizeprint "tt" h
- print the header line with leading tab charactersprintf "%st", $1;
- print the 1st field$1
for (i = 1; i <= len; i++)
- iterate throughhead
items$2 ~ "\<" head[i] "\>"
- check if the 2nd field$2
contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version
?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")
to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
0
down vote
Another awk
approach which generate header first by itself and then fill the matrix with 1 for each key in array head
exist in current input line otherwise with 0 if doesn't exist.
We used grep
to fetch only -o
numbers which matches the empty string at the left edge of a number b
(same as <
we used in awk
for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk
would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Awk
solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"
- header linelen = split(h, head)
- splith
line into arrayhead
where indices are ordered positions starting from1
and values are crucial values obtained via splitting;len
contains an array sizeprint "tt" h
- print the header line with leading tab charactersprintf "%st", $1;
- print the 1st field$1
for (i = 1; i <= len; i++)
- iterate throughhead
items$2 ~ "\<" head[i] "\>"
- check if the 2nd field$2
contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version
?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")
to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
2
down vote
Awk
solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"
- header linelen = split(h, head)
- splith
line into arrayhead
where indices are ordered positions starting from1
and values are crucial values obtained via splitting;len
contains an array sizeprint "tt" h
- print the header line with leading tab charactersprintf "%st", $1;
- print the 1st field$1
for (i = 1; i <= len; i++)
- iterate throughhead
items$2 ~ "\<" head[i] "\>"
- check if the 2nd field$2
contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version
?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")
to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
2
down vote
up vote
2
down vote
Awk
solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"
- header linelen = split(h, head)
- splith
line into arrayhead
where indices are ordered positions starting from1
and values are crucial values obtained via splitting;len
contains an array sizeprint "tt" h
- print the header line with leading tab charactersprintf "%st", $1;
- print the 1st field$1
for (i = 1; i <= len; i++)
- iterate throughhead
items$2 ~ "\<" head[i] "\>"
- check if the 2nd field$2
contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
Awk
solution:
awk 'BEGIN
h = "1 2 3 4 5 6 7 8 11 12 20 21";
len = split(h, head);
print "tt" h
printf "%st", $1;
for (i = 1; i <= len; i++)
printf "%s%d", (i == 1? "" : OFS), ($2 ~ "\<" head[i] "\>");
print ""
' file
h = "1 2 3 4 5 6 7 8 11 12 20 21"
- header linelen = split(h, head)
- splith
line into arrayhead
where indices are ordered positions starting from1
and values are crucial values obtained via splitting;len
contains an array sizeprint "tt" h
- print the header line with leading tab charactersprintf "%st", $1;
- print the 1st field$1
for (i = 1; i <= len; i++)
- iterate throughhead
items$2 ~ "\<" head[i] "\>"
- check if the 2nd field$2
contains currently accessed itemhead[i]
The output:
1 2 3 4 5 6 7 8 11 12 20 21
name1 1 1 0 1 0 0 0 1 0 0 0 0
name2 0 0 0 1 1 0 1 0 0 0 0 0
name3 0 0 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 0 0 1 1
edited May 9 at 11:09
answered May 9 at 11:01
RomanPerekhrest
22.4k12144
22.4k12144
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version
?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")
to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changingprintf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/
â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's yourawk --version
?
â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change($2 ~ "\<" head[i] "\>")
to($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing
printf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/â newAWKer
May 9 at 13:18
thanks for the code and the comments! Unfortunately I get a matrix with only 0's. I tried changing
printf "%s%d", (i == 1? "" : OFS )
this command to get the 1's but can't get it to work :/â newAWKer
May 9 at 13:18
@newAWKer, it works fine. What's your
awk --version
?â RomanPerekhrest
May 9 at 13:19
@newAWKer, it works fine. What's your
awk --version
?â RomanPerekhrest
May 9 at 13:19
awk version 20070501
â newAWKer
May 9 at 13:22
awk version 20070501
â newAWKer
May 9 at 13:22
@newAWKer, try to change
($2 ~ "\<" head[i] "\>")
to ($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
@newAWKer, try to change
($2 ~ "\<" head[i] "\>")
to ($2 ~ "\<" head[i] "\>" ? 1:0)
â RomanPerekhrest
May 9 at 13:31
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
I changed it to ($2 ~ head[i] ? 1:0) and then it worked! thanks a lot!!
â newAWKer
May 9 at 13:42
 |Â
show 3 more comments
up vote
0
down vote
Another awk
approach which generate header first by itself and then fill the matrix with 1 for each key in array head
exist in current input line otherwise with 0 if doesn't exist.
We used grep
to fetch only -o
numbers which matches the empty string at the left edge of a number b
(same as <
we used in awk
for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk
would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
add a comment |Â
up vote
0
down vote
Another awk
approach which generate header first by itself and then fill the matrix with 1 for each key in array head
exist in current input line otherwise with 0 if doesn't exist.
We used grep
to fetch only -o
numbers which matches the empty string at the left edge of a number b
(same as <
we used in awk
for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk
would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Another awk
approach which generate header first by itself and then fill the matrix with 1 for each key in array head
exist in current input line otherwise with 0 if doesn't exist.
We used grep
to fetch only -o
numbers which matches the empty string at the left edge of a number b
(same as <
we used in awk
for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk
would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
Another awk
approach which generate header first by itself and then fill the matrix with 1 for each key in array head
exist in current input line otherwise with 0 if doesn't exist.
We used grep
to fetch only -o
numbers which matches the empty string at the left edge of a number b
(same as <
we used in awk
for both left&right sides).
awk 'NR==FNR !head[$1]++; next
printf $1; for (x in head) printf (x?FS:"") ($0 ~ "\<" x "\>") ; print ""
' <(grep -o 'b[0-9]+' infile) infile
The output is:
name1 1 1 1 0 0 1 0 0 0 0
name2 0 0 1 1 1 0 0 0 0 0
name3 0 0 0 0 0 1 0 0 0 0
name4 0 0 0 0 0 0 1 1 0 0
namex 0 0 0 0 0 0 0 0 1 1
You could generate the header and insert in first line in input file, then the awk
would be easy as:
awk 'NR==1 split($0, head); next
printf $1; for (x in head) printf (x?FS:"") ($2 ~ "\<" head[x] "\>" ) ;
print "" ' infile
edited May 10 at 10:56
answered May 10 at 10:50
ñÃÂsýù÷
14.8k82462
14.8k82462
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442728%2fcolumns-to-data-matrix-awk%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
previous examples
please click edit and add that code/question-link to the question... it'd also help if you add what changes you attempted.. that'd show your research efforts.. regarding expected output shown, on what basis column is added? there is no1
in column under3
.. is1 2 3 4 5 6 7 8 11 12 20 21
known beforehand?â Sundeep
May 9 at 10:52