awk manipulation of a file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a file which is the output of several commands piped. Something like this



command1 input.txt| command2 | command3 | input file


The file is tab-separated



After command 3, my input file looks like this



chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";


After command 3, I used awk command to split the last column using ;
This is the command



command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'


I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?



Thanks



and this is the output



chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"









share|improve this question

























    up vote
    0
    down vote

    favorite












    I have a file which is the output of several commands piped. Something like this



    command1 input.txt| command2 | command3 | input file


    The file is tab-separated



    After command 3, my input file looks like this



    chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
    chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";


    After command 3, I used awk command to split the last column using ;
    This is the command



    command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'


    I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?



    Thanks



    and this is the output



    chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
    chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"









    share|improve this question























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a file which is the output of several commands piped. Something like this



      command1 input.txt| command2 | command3 | input file


      The file is tab-separated



      After command 3, my input file looks like this



      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";


      After command 3, I used awk command to split the last column using ;
      This is the command



      command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'


      I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?



      Thanks



      and this is the output



      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"









      share|improve this question













      I have a file which is the output of several commands piped. Something like this



      command1 input.txt| command2 | command3 | input file


      The file is tab-separated



      After command 3, my input file looks like this



      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000368564.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-202"; level 2; protein_id "ENSP00000357552.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041967.2";
      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10"; transcript_id "ENST00000356348.6"; gene_type "protein_coding"; gene_name "KPNA5"; transcript_type "protein_coding"; transcript_name "KPNA5-201"; level 2; protein_id "ENSP00000348704.1"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS5111.1"; havana_gene "OTTHUMG00000015448.4"; havana_transcript "OTTHUMT00000041969.2";


      After command 3, I used awk command to split the last column using ;
      This is the command



      command1 input.txt| command2 | command3 | awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]'


      I wanted to split on the last field of the file obtained from command3 and then print all the fields except last field and then a[1] and a[4], the split fields, but this adds a tab between columns 1-25 and a[1],a[4]. How can I avoid that?



      Thanks



      and this is the output



      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"
      chr6 116732135 116741866 116732135 116732368 116741505 116741866 + 0.79 0.51 0.97 0.77 0.48 0.97 0.02 0.37 'chr6:116732136-116732368:+@chr6:116741506-116741866:+.A.withRI','chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.up_chr6:116732136-116732368:+@chr6:116741506-116741866:+.B.dn' (0,0):10,(1,0):147,(1,1):1 0:148 (0,0):36,(1,0):161,(1,1):3 0:163,1:1 chr6 + 116732136,116732136 116741866,116741866 gene_id "ENSG00000196911.10" gene_name "KPNA5"






      awk






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 14 at 23:23









      user3138373

      84041430




      84041430




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          So, given



          $ printf 'footbarta;b;c;d' | 
          awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
          foo^Ibar^I^Ia^Id$


          (where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?



          If so, one way would be to decrement NF instead of assigning the empty string to $NF:



          $ printf 'footbarta;b;c;d' | 
          awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
          foo^Ibar^Ia^Id$


          Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:



          $ printf 'footbarta;b;c;d' | 
          awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
          foo^Ibar^Ia^Id$





          share|improve this answer




















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "106"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462636%2fawk-manipulation-of-a-file%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            So, given



            $ printf 'footbarta;b;c;d' | 
            awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
            foo^Ibar^I^Ia^Id$


            (where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?



            If so, one way would be to decrement NF instead of assigning the empty string to $NF:



            $ printf 'footbarta;b;c;d' | 
            awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
            foo^Ibar^Ia^Id$


            Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:



            $ printf 'footbarta;b;c;d' | 
            awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
            foo^Ibar^Ia^Id$





            share|improve this answer
























              up vote
              2
              down vote



              accepted










              So, given



              $ printf 'footbarta;b;c;d' | 
              awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
              foo^Ibar^I^Ia^Id$


              (where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?



              If so, one way would be to decrement NF instead of assigning the empty string to $NF:



              $ printf 'footbarta;b;c;d' | 
              awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
              foo^Ibar^Ia^Id$


              Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:



              $ printf 'footbarta;b;c;d' | 
              awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
              foo^Ibar^Ia^Id$





              share|improve this answer






















                up vote
                2
                down vote



                accepted







                up vote
                2
                down vote



                accepted






                So, given



                $ printf 'footbarta;b;c;d' | 
                awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
                foo^Ibar^I^Ia^Id$


                (where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?



                If so, one way would be to decrement NF instead of assigning the empty string to $NF:



                $ printf 'footbarta;b;c;d' | 
                awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
                foo^Ibar^Ia^Id$


                Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:



                $ printf 'footbarta;b;c;d' | 
                awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
                foo^Ibar^Ia^Id$





                share|improve this answer












                So, given



                $ printf 'footbarta;b;c;d' | 
                awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0,a[1],a[4]' | cat -A
                foo^Ibar^I^Ia^Id$


                (where I'm using cat -A to display the tabs as ^I for ease of visualization) you want to eliminate the double tab?



                If so, one way would be to decrement NF instead of assigning the empty string to $NF:



                $ printf 'footbarta;b;c;d' | 
                awk -F "t" -v OFS="t" 'split($NF,a,";"); NF--; print $0,a[1],a[4]' | cat -A
                foo^Ibar^Ia^Id$


                Another way would be to concatenate the strings instead of printing them as fields - you can do that by removing the , between them:



                $ printf 'footbarta;b;c;d' | 
                awk -F "t" -v OFS="t" 'split($NF,a,";"); $NF=""; print $0 a[1],a[4]' | cat -A
                foo^Ibar^Ia^Id$






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Aug 14 at 23:41









                steeldriver

                32.1k34979




                32.1k34979



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462636%2fawk-manipulation-of-a-file%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Peggy Mitchell

                    Palaiologos

                    The Forum (Inglewood, California)