Extract lines from a file with a value,whose parameter greater than 100, with grep and awk [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












-1















I have a file, and I have to create a script with the following output file as per the specifications:



The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.



My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)



The lines which do not contain the ZH parameter are to be omitted.










share|improve this question















closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















  • Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

    – Rui F Ribeiro
    Feb 27 at 7:14












  • It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

    – David
    Feb 27 at 7:22











  • The file is a sam file, obtained by alignment to a genome

    – David
    Feb 27 at 7:23











  • I have very little knowledge of linux, hence the question

    – David
    Feb 27 at 7:24






  • 5





    @David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

    – Kusalananda
    Feb 27 at 7:29
















-1















I have a file, and I have to create a script with the following output file as per the specifications:



The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.



My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)



The lines which do not contain the ZH parameter are to be omitted.










share|improve this question















closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















  • Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

    – Rui F Ribeiro
    Feb 27 at 7:14












  • It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

    – David
    Feb 27 at 7:22











  • The file is a sam file, obtained by alignment to a genome

    – David
    Feb 27 at 7:23











  • I have very little knowledge of linux, hence the question

    – David
    Feb 27 at 7:24






  • 5





    @David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

    – Kusalananda
    Feb 27 at 7:29














-1












-1








-1








I have a file, and I have to create a script with the following output file as per the specifications:



The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.



My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)



The lines which do not contain the ZH parameter are to be omitted.










share|improve this question
















I have a file, and I have to create a script with the following output file as per the specifications:



The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.



My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)



The lines which do not contain the ZH parameter are to be omitted.







awk sed grep






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 27 at 7:12









Rui F Ribeiro

41.7k1483142




41.7k1483142










asked Feb 27 at 7:11









DavidDavid

61




61




closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

    – Rui F Ribeiro
    Feb 27 at 7:14












  • It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

    – David
    Feb 27 at 7:22











  • The file is a sam file, obtained by alignment to a genome

    – David
    Feb 27 at 7:23











  • I have very little knowledge of linux, hence the question

    – David
    Feb 27 at 7:24






  • 5





    @David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

    – Kusalananda
    Feb 27 at 7:29


















  • Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

    – Rui F Ribeiro
    Feb 27 at 7:14












  • It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

    – David
    Feb 27 at 7:22











  • The file is a sam file, obtained by alignment to a genome

    – David
    Feb 27 at 7:23











  • I have very little knowledge of linux, hence the question

    – David
    Feb 27 at 7:24






  • 5





    @David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

    – Kusalananda
    Feb 27 at 7:29

















Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14






Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14














It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22





It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22













The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23





The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23













I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24





I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24




5




5





@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29






@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29











2 Answers
2






active

oldest

votes


















1














Use head and grep:



(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile





share|improve this answer























  • It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

    – David
    Feb 27 at 9:58











  • The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

    – RoVo
    Feb 27 at 10:09












  • You can use regexr.com for great explanations and for testing.

    – RoVo
    Feb 27 at 10:14


















0














Without a minimal example, it's hard to guess what you're after...



Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:



awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file


This prints all lines including a field like ZH:<one character>:<some number>.



gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.






share|improve this answer





























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Use head and grep:



    (
    # get header
    head -n 28 file
    # grep lines with ZH value > 100
    grep -Ew "ZH:.:[1-9][0-9]2," file
    ) > outfile





    share|improve this answer























    • It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

      – David
      Feb 27 at 9:58











    • The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

      – RoVo
      Feb 27 at 10:09












    • You can use regexr.com for great explanations and for testing.

      – RoVo
      Feb 27 at 10:14















    1














    Use head and grep:



    (
    # get header
    head -n 28 file
    # grep lines with ZH value > 100
    grep -Ew "ZH:.:[1-9][0-9]2," file
    ) > outfile





    share|improve this answer























    • It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

      – David
      Feb 27 at 9:58











    • The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

      – RoVo
      Feb 27 at 10:09












    • You can use regexr.com for great explanations and for testing.

      – RoVo
      Feb 27 at 10:14













    1












    1








    1







    Use head and grep:



    (
    # get header
    head -n 28 file
    # grep lines with ZH value > 100
    grep -Ew "ZH:.:[1-9][0-9]2," file
    ) > outfile





    share|improve this answer













    Use head and grep:



    (
    # get header
    head -n 28 file
    # grep lines with ZH value > 100
    grep -Ew "ZH:.:[1-9][0-9]2," file
    ) > outfile






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Feb 27 at 8:53









    RoVoRoVo

    3,407317




    3,407317












    • It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

      – David
      Feb 27 at 9:58











    • The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

      – RoVo
      Feb 27 at 10:09












    • You can use regexr.com for great explanations and for testing.

      – RoVo
      Feb 27 at 10:14

















    • It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

      – David
      Feb 27 at 9:58











    • The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

      – RoVo
      Feb 27 at 10:09












    • You can use regexr.com for great explanations and for testing.

      – RoVo
      Feb 27 at 10:14
















    It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

    – David
    Feb 27 at 9:58





    It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

    – David
    Feb 27 at 9:58













    The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

    – RoVo
    Feb 27 at 10:09






    The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

    – RoVo
    Feb 27 at 10:09














    You can use regexr.com for great explanations and for testing.

    – RoVo
    Feb 27 at 10:14





    You can use regexr.com for great explanations and for testing.

    – RoVo
    Feb 27 at 10:14













    0














    Without a minimal example, it's hard to guess what you're after...



    Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:



    awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file


    This prints all lines including a field like ZH:<one character>:<some number>.



    gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.






    share|improve this answer



























      0














      Without a minimal example, it's hard to guess what you're after...



      Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:



      awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file


      This prints all lines including a field like ZH:<one character>:<some number>.



      gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.






      share|improve this answer

























        0












        0








        0







        Without a minimal example, it's hard to guess what you're after...



        Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:



        awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file


        This prints all lines including a field like ZH:<one character>:<some number>.



        gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.






        share|improve this answer













        Without a minimal example, it's hard to guess what you're after...



        Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:



        awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file


        This prints all lines including a field like ZH:<one character>:<some number>.



        gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 27 at 8:43









        olivoliv

        1,901413




        1,901413












            Popular posts from this blog

            Peggy Mitchell

            Palaiologos

            The Forum (Inglewood, California)