how to print only part of each row that stars with a particular character

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-1
down vote

favorite
1












I have a file with over 10,000 rows:



head samples 
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


I want to print only part of each line that starts with "HI.*"



This is my desired output:



HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz









share|improve this question



























    up vote
    -1
    down vote

    favorite
    1












    I have a file with over 10,000 rows:



    head samples 
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
    https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


    I want to print only part of each line that starts with "HI.*"



    This is my desired output:



    HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
    HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
    HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
    HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
    HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
    HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz









    share|improve this question

























      up vote
      -1
      down vote

      favorite
      1









      up vote
      -1
      down vote

      favorite
      1






      1





      I have a file with over 10,000 rows:



      head samples 
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


      I want to print only part of each line that starts with "HI.*"



      This is my desired output:



      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz









      share|improve this question















      I have a file with over 10,000 rows:



      head samples 
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
      https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5


      I want to print only part of each line that starts with "HI.*"



      This is my desired output:



      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
      HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
      HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
      HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz






      text-processing awk grep






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Aug 13 at 17:37









      msp9011

      3,46643862




      3,46643862










      asked Aug 13 at 17:10









      Anna1364

      421110




      421110




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Using awk



          awk -F'/' '$NF ~ /^HI./ print $NF ' infile


          to remove the .md5 suffix, you could do:



          awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


          • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


          • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


          The sed solution:



          sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


          • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


          • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


          • we used different sed delimiter : since we have special / character in the input.






          share|improve this answer





























            up vote
            1
            down vote













            Try this,



            awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


            • prints the last field if last field starts with HI

            • excludes the last 4 charecters .md5

            Output



            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
            HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
            HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
            HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





            share|improve this answer





























              up vote
              0
              down vote













              awk -F"filename/" 'gsub (".md5","");print $2'





              share|improve this answer






















                Your Answer







                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "106"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462354%2fhow-to-print-only-part-of-each-row-that-stars-with-a-particular-character%23new-answer', 'question_page');

                );

                Post as a guest






























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                3
                down vote



                accepted










                Using awk



                awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                to remove the .md5 suffix, you could do:



                awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                The sed solution:



                sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                • we used different sed delimiter : since we have special / character in the input.






                share|improve this answer


























                  up vote
                  3
                  down vote



                  accepted










                  Using awk



                  awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                  to remove the .md5 suffix, you could do:



                  awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                  • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                  • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                  The sed solution:



                  sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                  • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                  • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                  • we used different sed delimiter : since we have special / character in the input.






                  share|improve this answer
























                    up vote
                    3
                    down vote



                    accepted







                    up vote
                    3
                    down vote



                    accepted






                    Using awk



                    awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                    to remove the .md5 suffix, you could do:



                    awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                    • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                    • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                    The sed solution:



                    sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                    • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                    • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                    • we used different sed delimiter : since we have special / character in the input.






                    share|improve this answer














                    Using awk



                    awk -F'/' '$NF ~ /^HI./ print $NF ' infile


                    to remove the .md5 suffix, you could do:



                    awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile


                    • in awk, the $0 is referring to the whole line/record and $1, $2, $3, ... are referring to the first, second, third, ... respectively; and $NF referring to the last field and accordingly the $(NF-1) is the second last field.


                    • the tild ~ operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as string string ~ /regular-expression/


                    The sed solution:



                    sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile


                    • this /([^/]*).md5 matches last slash followed by anything but not a slash that ends with .md5. We take ([^/]*) (everything between last slash and .md5 as a group match and print just that in replacement part with its back-reference 1.


                    • this /^HI./!d deletes the lines which doesn't start with HI. from the result of previous sed command.


                    • we used different sed delimiter : since we have special / character in the input.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Aug 13 at 17:49

























                    answered Aug 13 at 17:16









                    αғsнιη

                    15.7k92563




                    15.7k92563






















                        up vote
                        1
                        down vote













                        Try this,



                        awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                        • prints the last field if last field starts with HI

                        • excludes the last 4 charecters .md5

                        Output



                        HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                        HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                        HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                        HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                        HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                        HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                        HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





                        share|improve this answer


























                          up vote
                          1
                          down vote













                          Try this,



                          awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                          • prints the last field if last field starts with HI

                          • excludes the last 4 charecters .md5

                          Output



                          HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                          HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                          HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                          HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                          HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                          HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                          HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





                          share|improve this answer
























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            Try this,



                            awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                            • prints the last field if last field starts with HI

                            • excludes the last 4 charecters .md5

                            Output



                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                            HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz





                            share|improve this answer














                            Try this,



                            awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt


                            • prints the last field if last field starts with HI

                            • excludes the last 4 charecters .md5

                            Output



                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
                            HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
                            HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
                            HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
                            HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Aug 13 at 17:24

























                            answered Aug 13 at 17:18









                            msp9011

                            3,46643862




                            3,46643862




















                                up vote
                                0
                                down vote













                                awk -F"filename/" 'gsub (".md5","");print $2'





                                share|improve this answer


























                                  up vote
                                  0
                                  down vote













                                  awk -F"filename/" 'gsub (".md5","");print $2'





                                  share|improve this answer
























                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    awk -F"filename/" 'gsub (".md5","");print $2'





                                    share|improve this answer














                                    awk -F"filename/" 'gsub (".md5","");print $2'






                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Aug 14 at 17:22









                                    Kusalananda

                                    106k14209327




                                    106k14209327










                                    answered Aug 14 at 17:11









                                    kalpesh

                                    164




                                    164



























                                         

                                        draft saved


                                        draft discarded















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462354%2fhow-to-print-only-part-of-each-row-that-stars-with-a-particular-character%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Popular posts from this blog

                                        How to check contact read email or not when send email to Individual?

                                        Christian Cage

                                        How to properly install USB display driver for Fresco Logic FL2000DX on Ubuntu?