Removing space from the specific fields of header line for extracting a correct file from a large original file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).



countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694


I tried to get information of some columns for a given list of Identifier in the text file as below:



rs555500075
rs548419688
rs568405545
rs534229142


I used this command:



fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1


But the output of the above command is like:



countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
1 10352 -/A rs555500075 0.4375 5008 0.410663
1 10504 A/T rs548419688 0.000199681 5008 0
1 10505 C/G rs568405545 0.000199681 5008 0
1 10510 G/A rs534229142 0.000199681 5008 0.00144092


As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?



Many thanks in advance









share

























    up vote
    0
    down vote

    favorite












    I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).



    countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
    1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
    1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
    1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
    1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
    1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
    1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
    1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694


    I tried to get information of some columns for a given list of Identifier in the text file as below:



    rs555500075
    rs548419688
    rs568405545
    rs534229142


    I used this command:



    fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1


    But the output of the above command is like:



    countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
    1 10352 -/A rs555500075 0.4375 5008 0.410663
    1 10504 A/T rs548419688 0.000199681 5008 0
    1 10505 C/G rs568405545 0.000199681 5008 0
    1 10510 G/A rs534229142 0.000199681 5008 0.00144092


    As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?



    Many thanks in advance









    share























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).



      countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
      1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
      1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
      1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
      1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
      1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
      1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
      1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694


      I tried to get information of some columns for a given list of Identifier in the text file as below:



      rs555500075
      rs548419688
      rs568405545
      rs534229142


      I used this command:



      fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1


      But the output of the above command is like:



      countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
      1 10352 -/A rs555500075 0.4375 5008 0.410663
      1 10504 A/T rs548419688 0.000199681 5008 0
      1 10505 C/G rs568405545 0.000199681 5008 0
      1 10510 G/A rs534229142 0.000199681 5008 0.00144092


      As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?



      Many thanks in advance









      share













      I have a challenge with probably a stupid thing, please be patient with me! I have a large file (tab-delimited) that its head is like below: (Here, Flags column is empty, but not in the original file, so it has to be kept. I guess this column and the next columns that there is space in their name caused the problem).



      countChromosome Start Stop Ref/Alt Identifier Flags Read Depth (DP) Allele Counts Allele Frequencies # Alleles # Het # HomoVar AMR - Allele Counts AMR - Allele Frequencies AMR - # Alleles
      1 10177 10177 -/C rs367896724 103152 2130 0.425319 5008 1490 320 250 0.360231 694
      1 10235 10235 -/A rs540431307 78015 6 0.00119808 5008 6 0 1 0.00144092 694
      1 10352 10352 -/A rs555500075 88915 2191 0.4375 5008 2025 83 285 0.410663 694
      1 10504 10505 A/T rs548419688 9632 1 0.000199681 5008 1 0 0 0 694
      1 10505 10506 C/G rs568405545 9676 1 0.000199681 5008 1 0 0 0 694
      1 10510 10511 G/A rs534229142 9869 1 0.000199681 5008 1 0 1 0.00144092 694
      1 10538 10539 C/A rs537182016 9203 3 0.000599042 5008 3 0 1 0.00144092 694


      I tried to get information of some columns for a given list of Identifier in the text file as below:



      rs555500075
      rs548419688
      rs568405545
      rs534229142


      I used this command:



      fgrep -wf ids.txt original_file.txt | awk ' print $1"t"$2"t"$4"t"$5"t"$8"t"$9"t"$13' > test1


      But the output of the above command is like:



      countChromosome Start Ref/Alt Identifier Depth (DP) Frequencies
      1 10352 -/A rs555500075 0.4375 5008 0.410663
      1 10504 A/T rs548419688 0.000199681 5008 0
      1 10505 C/G rs568405545 0.000199681 5008 0
      1 10510 G/A rs534229142 0.000199681 5008 0.00144092


      As I mentioned above, I guess Flag column (6th column) and the next columns that there is space in their name caused the problem. Although, I tried some way to solve the issue, none of them was useful. Could you please kindly help me out to solve this problem?



      Many thanks in advance







      linux awk grep text-formatting





      share












      share










      share



      share










      asked 3 mins ago









      Mary

      165




      165

























          active

          oldest

          votes











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f478141%2fremoving-space-from-the-specific-fields-of-header-line-for-extracting-a-correct%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f478141%2fremoving-space-from-the-specific-fields-of-header-line-for-extracting-a-correct%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?