Extract lines from a file with a value,whose parameter greater than 100, with grep and awk [closed]

Clash Royale CLAN TAG#URR8PPP
I have a file, and I have to create a script with the following output file as per the specifications:
The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.
My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)
The lines which do not contain the ZH parameter are to be omitted.
awk sed grep
closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 6 more comments
I have a file, and I have to create a script with the following output file as per the specifications:
The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.
My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)
The lines which do not contain the ZH parameter are to be omitted.
awk sed grep
closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.
– Rui F Ribeiro
Feb 27 at 7:14
It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.
– David
Feb 27 at 7:22
The file is a sam file, obtained by alignment to a genome
– David
Feb 27 at 7:23
I have very little knowledge of linux, hence the question
– David
Feb 27 at 7:24
5
@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.
– Kusalananda
Feb 27 at 7:29
|
show 6 more comments
I have a file, and I have to create a script with the following output file as per the specifications:
The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.
My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)
The lines which do not contain the ZH parameter are to be omitted.
awk sed grep
I have a file, and I have to create a script with the following output file as per the specifications:
The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.
My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)
The lines which do not contain the ZH parameter are to be omitted.
awk sed grep
awk sed grep
edited Feb 27 at 7:12
Rui F Ribeiro
41.7k1483142
41.7k1483142
asked Feb 27 at 7:11
DavidDavid
61
61
closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.
– Rui F Ribeiro
Feb 27 at 7:14
It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.
– David
Feb 27 at 7:22
The file is a sam file, obtained by alignment to a genome
– David
Feb 27 at 7:23
I have very little knowledge of linux, hence the question
– David
Feb 27 at 7:24
5
@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.
– Kusalananda
Feb 27 at 7:29
|
show 6 more comments
Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.
– Rui F Ribeiro
Feb 27 at 7:14
It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.
– David
Feb 27 at 7:22
The file is a sam file, obtained by alignment to a genome
– David
Feb 27 at 7:23
I have very little knowledge of linux, hence the question
– David
Feb 27 at 7:24
5
@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.
– Kusalananda
Feb 27 at 7:29
Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.
– Rui F Ribeiro
Feb 27 at 7:14
Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.
– Rui F Ribeiro
Feb 27 at 7:14
It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.
– David
Feb 27 at 7:22
It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.
– David
Feb 27 at 7:22
The file is a sam file, obtained by alignment to a genome
– David
Feb 27 at 7:23
The file is a sam file, obtained by alignment to a genome
– David
Feb 27 at 7:23
I have very little knowledge of linux, hence the question
– David
Feb 27 at 7:24
I have very little knowledge of linux, hence the question
– David
Feb 27 at 7:24
5
5
@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.
– Kusalananda
Feb 27 at 7:29
@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.
– Kusalananda
Feb 27 at 7:29
|
show 6 more comments
2 Answers
2
active
oldest
votes
Use head and grep:
(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
The...part defines the number of occurrences of a character or group.2means exactly 2 occurrences,2,5means between 2 and 5 occurrences,2,means at least 2 occurrences. Note that you need extended regex (-E) for it to work.
– RoVo
Feb 27 at 10:09
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
add a comment |
Without a minimal example, it's hard to guess what you're after...
Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:
awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file
This prints all lines including a field like ZH:<one character>:<some number>.
gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use head and grep:
(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
The...part defines the number of occurrences of a character or group.2means exactly 2 occurrences,2,5means between 2 and 5 occurrences,2,means at least 2 occurrences. Note that you need extended regex (-E) for it to work.
– RoVo
Feb 27 at 10:09
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
add a comment |
Use head and grep:
(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
The...part defines the number of occurrences of a character or group.2means exactly 2 occurrences,2,5means between 2 and 5 occurrences,2,means at least 2 occurrences. Note that you need extended regex (-E) for it to work.
– RoVo
Feb 27 at 10:09
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
add a comment |
Use head and grep:
(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile
Use head and grep:
(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile
answered Feb 27 at 8:53
RoVoRoVo
3,407317
3,407317
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
The...part defines the number of occurrences of a character or group.2means exactly 2 occurrences,2,5means between 2 and 5 occurrences,2,means at least 2 occurrences. Note that you need extended regex (-E) for it to work.
– RoVo
Feb 27 at 10:09
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
add a comment |
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
The...part defines the number of occurrences of a character or group.2means exactly 2 occurrences,2,5means between 2 and 5 occurrences,2,means at least 2 occurrences. Note that you need extended regex (-E) for it to work.
– RoVo
Feb 27 at 10:09
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?
– David
Feb 27 at 9:58
The
... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.– RoVo
Feb 27 at 10:09
The
... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.– RoVo
Feb 27 at 10:09
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
You can use regexr.com for great explanations and for testing.
– RoVo
Feb 27 at 10:14
add a comment |
Without a minimal example, it's hard to guess what you're after...
Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:
awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file
This prints all lines including a field like ZH:<one character>:<some number>.
gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.
add a comment |
Without a minimal example, it's hard to guess what you're after...
Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:
awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file
This prints all lines including a field like ZH:<one character>:<some number>.
gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.
add a comment |
Without a minimal example, it's hard to guess what you're after...
Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:
awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file
This prints all lines including a field like ZH:<one character>:<some number>.
gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.
Without a minimal example, it's hard to guess what you're after...
Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:
awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file
This prints all lines including a field like ZH:<one character>:<some number>.
gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.
answered Feb 27 at 8:43
olivoliv
1,901413
1,901413
add a comment |
add a comment |
Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.
– Rui F Ribeiro
Feb 27 at 7:14
It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.
– David
Feb 27 at 7:22
The file is a sam file, obtained by alignment to a genome
– David
Feb 27 at 7:23
I have very little knowledge of linux, hence the question
– David
Feb 27 at 7:24
5
@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.
– Kusalananda
Feb 27 at 7:29