awk manipulation of a file

up vote
0
down vote

favorite

I have a file which is the output of several commands piped. Something like this

command1 input.txt| command2 | command3 | input file

The file is tab-separated

After command 3, my input file looks like this

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";

After command 3, I used awk command to split the last column using ;
This is the command

command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'

I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?

Thanks

and this is the output

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"

asked Aug 14 at 23:23

user3138373

84041430

add a commentÂ |Â

up vote
0
down vote

favorite

I have a file which is the output of several commands piped. Something like this

command1 input.txt| command2 | command3 | input file

The file is tab-separated

After command 3, my input file looks like this

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";

After command 3, I used awk command to split the last column using ;
This is the command

command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'

Thanks

and this is the output

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"

asked Aug 14 at 23:23

user3138373

84041430

add a commentÂ |Â

up vote
0
down vote

favorite

I have a file which is the output of several commands piped. Something like this

command1 input.txt| command2 | command3 | input file

The file is tab-separated

After command 3, my input file looks like this

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";

After command 3, I used awk command to split the last column using ;
This is the command

command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'

Thanks

and this is the output

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"

asked Aug 14 at 23:23

user3138373

84041430

I have a file which is the output of several commands piped. Something like this

command1 input.txt| command2 | command3 | input file

The file is tab-separated

After command 3, my input file looks like this

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";

After command 3, I used awk command to split the last column using ;
This is the command

command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'

Thanks

and this is the output

chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"

awk

asked Aug 14 at 23:23

user3138373

84041430

asked Aug 14 at 23:23

user3138373

84041430

asked Aug 14 at 23:23

user3138373

84041430

asked Aug 14 at 23:23

user3138373

84041430

asked Aug 14 at 23:23

user3138373

84041430

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

So, given

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$

(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?

If so, one way would be to decrement NF instead of assigning the empty string to $NF:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

answered Aug 14 at 23:41

steeldriver

32.1k34979

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462636%2fawk-manipulation-of-a-file%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

So, given

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$

(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?

If so, one way would be to decrement NF instead of assigning the empty string to $NF:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

answered Aug 14 at 23:41

steeldriver

32.1k34979

add a commentÂ |Â

up vote
2
down vote

accepted

So, given

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$

(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?

If so, one way would be to decrement NF instead of assigning the empty string to $NF:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

answered Aug 14 at 23:41

steeldriver

32.1k34979

add a commentÂ |Â

up vote
2
down vote

accepted

So, given

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$

(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?

If so, one way would be to decrement NF instead of assigning the empty string to $NF:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

answered Aug 14 at 23:41

steeldriver

32.1k34979

So, given

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
foo^Ibar^I^Ia^Id$

(where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?

If so, one way would be to decrement NF instead of assigning the empty string to $NF:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:

$ printf 'footbarta;b;c;d' | 
 awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
foo^Ibar^Ia^Id$

answered Aug 14 at 23:41

steeldriver

32.1k34979

answered Aug 14 at 23:41

steeldriver

32.1k34979

answered Aug 14 at 23:41

steeldriver

32.1k34979

answered Aug 14 at 23:41

steeldriver

32.1k34979

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu