Removing space from the specific fields of header line for extracting a correct file from a large original file
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).
countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694
I tried to get information of some columns for a given list of Identifier in the text file as below:
rs555500075
rs548419688
rs568405545
rs534229142
I used this command:
fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1
But the output of the above command is like:
countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
1 10352 -/A rs555500075 0.4375 5008 0.410663
1 10504 A/T rs548419688 0.000199681 5008 0
1 10505 C/G rs568405545 0.000199681 5008 0
1 10510 G/A rs534229142 0.000199681 5008 0.00144092
As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?
Many thanks in advance
linux awk grep text-formatting
add a comment |Â
up vote
0
down vote
favorite
I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).
countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694
I tried to get information of some columns for a given list of Identifier in the text file as below:
rs555500075
rs548419688
rs568405545
rs534229142
I used this command:
fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1
But the output of the above command is like:
countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
1 10352 -/A rs555500075 0.4375 5008 0.410663
1 10504 A/T rs548419688 0.000199681 5008 0
1 10505 C/G rs568405545 0.000199681 5008 0
1 10510 G/A rs534229142 0.000199681 5008 0.00144092
As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?
Many thanks in advance
linux awk grep text-formatting
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).
countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694
I tried to get information of some columns for a given list of Identifier in the text file as below:
rs555500075
rs548419688
rs568405545
rs534229142
I used this command:
fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1
But the output of the above command is like:
countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
1 10352 -/A rs555500075 0.4375 5008 0.410663
1 10504 A/T rs548419688 0.000199681 5008 0
1 10505 C/G rs568405545 0.000199681 5008 0
1 10510 G/A rs534229142 0.000199681 5008 0.00144092
As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?
Many thanks in advance
linux awk grep text-formatting
I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).
countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694
I tried to get information of some columns for a given list of Identifier in the text file as below:
rs555500075
rs548419688
rs568405545
rs534229142
I used this command:
fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1
But the output of the above command is like:
countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
1 10352 -/A rs555500075 0.4375 5008 0.410663
1 10504 A/T rs548419688 0.000199681 5008 0
1 10505 C/G rs568405545 0.000199681 5008 0
1 10510 G/A rs534229142 0.000199681 5008 0.00144092
As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?
Many thanks in advance
linux awk grep text-formatting
linux awk grep text-formatting
asked 3 mins ago
Mary
165
165
add a comment |Â
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f478141%2fremoving-space-from-the-specific-fields-of-header-line-for-extracting-a-correct%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password