awk manipulation of a file

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have a file which is the output of several commands piped. Something like this
command1 input.txt| command2 | command3 | input file
The file is tab-separated
After command 3, my input file looks like this
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";
After command 3, I used awk command to split the last column using ;
This is the command
command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'
I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?
Thanks
and this is the output
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
awk
add a comment |Â
up vote
0
down vote
favorite
I have a file which is the output of several commands piped. Something like this
command1 input.txt| command2 | command3 | input file
The file is tab-separated
After command 3, my input file looks like this
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";
After command 3, I used awk command to split the last column using ;
This is the command
command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'
I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?
Thanks
and this is the output
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
awk
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a file which is the output of several commands piped. Something like this
command1 input.txt| command2 | command3 | input file
The file is tab-separated
After command 3, my input file looks like this
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";
After command 3, I used awk command to split the last column using ;
This is the command
command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'
I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?
Thanks
and this is the output
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
awk
I have a file which is the output of several commands piped. Something like this
command1 input.txt| command2 | command3 | input file
The file is tab-separated
After command 3, my input file looks like this
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";
After command 3, I used awk command to split the last column using ;
This is the command
command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'
I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?
Thanks
and this is the output
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
awk
awk
asked Aug 14 at 23:23
user3138373
84041430
84041430
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
So, given
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$
(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?
If so, one way would be to decrement NF instead of assigning the empty string to $NF:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
So, given
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$
(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?
If so, one way would be to decrement NF instead of assigning the empty string to $NF:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
add a comment |Â
up vote
2
down vote
accepted
So, given
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$
(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?
If so, one way would be to decrement NF instead of assigning the empty string to $NF:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
So, given
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$
(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?
If so, one way would be to decrement NF instead of assigning the empty string to $NF:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
So, given
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$
(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?
If so, one way would be to decrement NF instead of assigning the empty string to $NF:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:
$ printf 'footbarta;b;c;d' |
awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$
answered Aug 14 at 23:41
steeldriver
32.1k34979
32.1k34979
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462636%2fawk-manipulation-of-a-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password