how to print only part of each row that stars with a particular character

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-1
down vote

favorite
1












I have a file with over 10,000 rows:



head samples 
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


I want to print only part of each line that starts with "HI.*"



This is my desired output:



HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz









share|improve this question



























    up vote
    -1
    down vote

    favorite
    1












    I have a file with over 10,000 rows:



    head samples 
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


    I want to print only part of each line that starts with "HI.*"



    This is my desired output:



    HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
    HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
    HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
    HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
    HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
    HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz









    share|improve this question

























      up vote
      -1
      down vote

      favorite
      1









      up vote
      -1
      down vote

      favorite
      1






      1





      I have a file with over 10,000 rows:



      head samples 
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


      I want to print only part of each line that starts with "HI.*"



      This is my desired output:



      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz









      share|improve this question















      I have a file with over 10,000 rows:



      head samples 
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


      I want to print only part of each line that starts with "HI.*"



      This is my desired output:



      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz






      text-processing awk grep






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Aug 13 at 17:37









      msp9011

      3,46643862




      3,46643862










      asked Aug 13 at 17:10









      Anna1364

      421110




      421110




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Using awk



          awk -F'/' '$NF ~ /^HI./ print $NF ' infile


          to remove the .md5 suffix, you could do:



          awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


          • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


          • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


          The sed solution:



          sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


          • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


          • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


          • we used different sed delimiter : since we have special / character in the input.






          share|improve this answer





























            up vote
            1
            down vote













            Try this,



            awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


            • prints the last field if last field starts with HI

            • excludes the last 4 charecters .md5

            Output



            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
            HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
            HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
            HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





            share|improve this answer





























              up vote
              0
              down vote













              awk -F"filename/" 'gsub (".md5","");print $2'





              share|improve this answer






















                Your Answer







                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "106"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462354%2fhow-to-print-only-part-of-each-row-that-stars-with-a-particular-character%23new-answer', 'question_page');

                );

                Post as a guest






























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                3
                down vote



                accepted










                Using awk



                awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                to remove the .md5 suffix, you could do:



                awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                The sed solution:



                sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                • we used different sed delimiter : since we have special / character in the input.






                share|improve this answer


























                  up vote
                  3
                  down vote



                  accepted










                  Using awk



                  awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                  to remove the .md5 suffix, you could do:



                  awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                  • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                  • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                  The sed solution:



                  sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                  • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                  • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                  • we used different sed delimiter : since we have special / character in the input.






                  share|improve this answer
























                    up vote
                    3
                    down vote



                    accepted







                    up vote
                    3
                    down vote



                    accepted






                    Using awk



                    awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                    to remove the .md5 suffix, you could do:



                    awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                    • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                    • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                    The sed solution:



                    sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                    • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                    • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                    • we used different sed delimiter : since we have special / character in the input.






                    share|improve this answer














                    Using awk



                    awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                    to remove the .md5 suffix, you could do:



                    awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                    • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                    • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                    The sed solution:



                    sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                    • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                    • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                    • we used different sed delimiter : since we have special / character in the input.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Aug 13 at 17:49

























                    answered Aug 13 at 17:16









                    αғsнιη

                    15.7k92563




                    15.7k92563






















                        up vote
                        1
                        down vote













                        Try this,



                        awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                        • prints the last field if last field starts with HI

                        • excludes the last 4 charecters .md5

                        Output



                        HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                        HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                        HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                        HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                        HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                        HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                        HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





                        share|improve this answer


























                          up vote
                          1
                          down vote













                          Try this,



                          awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                          • prints the last field if last field starts with HI

                          • excludes the last 4 charecters .md5

                          Output



                          HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                          HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                          HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                          HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                          HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                          HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                          HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





                          share|improve this answer
























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            Try this,



                            awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                            • prints the last field if last field starts with HI

                            • excludes the last 4 charecters .md5

                            Output



                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                            HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





                            share|improve this answer














                            Try this,



                            awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                            • prints the last field if last field starts with HI

                            • excludes the last 4 charecters .md5

                            Output



                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                            HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Aug 13 at 17:24

























                            answered Aug 13 at 17:18









                            msp9011

                            3,46643862




                            3,46643862




















                                up vote
                                0
                                down vote













                                awk -F"filename/" 'gsub (".md5","");print $2'





                                share|improve this answer


























                                  up vote
                                  0
                                  down vote













                                  awk -F"filename/" 'gsub (".md5","");print $2'





                                  share|improve this answer
























                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    awk -F"filename/" 'gsub (".md5","");print $2'





                                    share|improve this answer














                                    awk -F"filename/" 'gsub (".md5","");print $2'






                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Aug 14 at 17:22









                                    Kusalananda

                                    106k14209327




                                    106k14209327










                                    answered Aug 14 at 17:11









                                    kalpesh

                                    164




                                    164



























                                         

                                        draft saved


                                        draft discarded















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462354%2fhow-to-print-only-part-of-each-row-that-stars-with-a-particular-character%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Popular posts from this blog

                                        How to check contact read email or not when send email to Individual?

                                        Displaying single band from multi-band raster using QGIS

                                        How many registers does an x86_64 CPU actually have?