Extract lines from a file with a value,whose parameter greater than 100, with grep and awk [closed]

-1

I have a file, and I have to create a script with the following output file as per the specifications:

The input file has first 28 lines of header, which have to be included in the output
Some of the lines of the input have a specific parameter ZH:any value (eg ZH:100 or ZH:50). The column where the ZH parameter is present differs from line to line.

My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)

The lines which do not contain the ZH parameter are to be omitted.

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

asked Feb 27 at 7:11

David

closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14

It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22

The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23

I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24

5

@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29

|
show 6 more comments

-1

I have a file, and I have to create a script with the following output file as per the specifications:

My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)

The lines which do not contain the ZH parameter are to be omitted.

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

asked Feb 27 at 7:11

David

closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37

Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14

It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22

The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23

I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24

5

@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29

|
show 6 more comments

-1

I have a file, and I have to create a script with the following output file as per the specifications:

My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)

The lines which do not contain the ZH parameter are to be omitted.

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

asked Feb 27 at 7:11

David

I have a file, and I have to create a script with the following output file as per the specifications:

My output file should contain the header lines and those lines which contain the ZH parameter with corresponding values greater than 100 (eg ZH:105 , ZH:200 and so on)

The lines which do not contain the ZH parameter are to be omitted.

awk sed grep

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

asked Feb 27 at 7:11

David

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

asked Feb 27 at 7:11

David

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

edited Feb 27 at 7:12

Rui F Ribeiro

41.7k1483142

asked Feb 27 at 7:11

David

asked Feb 27 at 7:11

David

asked Feb 27 at 7:11

David

closed as unclear what you're asking by Rui F Ribeiro, Haxiel, X Tian, Jeff Schaller, jimmij Feb 28 at 8:37

Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14

It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22

The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23

I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24

5

@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29

|
show 6 more comments

Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14

It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22

The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23

I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24

5

@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29

Please add samples of the output and input, tell us what was done until now and where you might enlighten you. This is a Unix questions site, not a script delivering service.

– Rui F Ribeiro
Feb 27 at 7:14

It is not possible to show a sample file here as the files are very big themselves and the lines are too long to post in the comments section.

– David
Feb 27 at 7:22

The file is a sam file, obtained by alignment to a genome

– David
Feb 27 at 7:23

I have very little knowledge of linux, hence the question

– David
Feb 27 at 7:24

@David Then cut the length of the line so that it's manageable, while still allowing someone to properly test a solution against it. You've said nothing about your file apart from that it has a header and fields. We don't even know whether it is comma or tab delimited, or if it follows some known format. In a comment (not in the question), you mention SAM files. Be aware that most people here don't know what a SAM file is.

– Kusalananda
Feb 27 at 7:29

|
show 6 more comments

2 Answers
2

active

oldest

votes

Use head and grep:

(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile

answered Feb 27 at 8:53

RoVo

3,407317

It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

– David
Feb 27 at 9:58

The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

– RoVo
Feb 27 at 10:09

You can use regexr.com for great explanations and for testing.

– RoVo
Feb 27 at 10:14

add a comment |

Without a minimal example, it's hard to guess what you're after...

Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:

awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file

This prints all lines including a field like ZH:<one character>:<some number>.

gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.

answered Feb 27 at 8:43

oliv

1,901413

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Use head and grep:

(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile

answered Feb 27 at 8:53

RoVo

3,407317

It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

– David
Feb 27 at 9:58

The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

– RoVo
Feb 27 at 10:09

You can use regexr.com for great explanations and for testing.

– RoVo
Feb 27 at 10:14

add a comment |

Use head and grep:

(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile

answered Feb 27 at 8:53

RoVo

3,407317

It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

– David
Feb 27 at 9:58

The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

– RoVo
Feb 27 at 10:09

You can use regexr.com for great explanations and for testing.

– RoVo
Feb 27 at 10:14

add a comment |

Use head and grep:

(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile

answered Feb 27 at 8:53

RoVo

3,407317

Use head and grep:

(
# get header
head -n 28 file
# grep lines with ZH value > 100
grep -Ew "ZH:.:[1-9][0-9]2," file
) > outfile

answered Feb 27 at 8:53

RoVo

3,407317

answered Feb 27 at 8:53

RoVo

3,407317

answered Feb 27 at 8:53

RoVo

3,407317

answered Feb 27 at 8:53

RoVo

3,407317

It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

– David
Feb 27 at 9:58

The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

– RoVo
Feb 27 at 10:09

You can use regexr.com for great explanations and for testing.

– RoVo
Feb 27 at 10:14

add a comment |

It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

– David
Feb 27 at 9:58

The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

– RoVo
Feb 27 at 10:09

You can use regexr.com for great explanations and for testing.

– RoVo
Feb 27 at 10:14

It worked!! Thanks a lot. Could you explain the meaning of 2, in your line?

– David
Feb 27 at 9:58

The ... part defines the number of occurrences of a character or group. 2 means exactly 2 occurrences, 2,5 means between 2 and 5 occurrences, 2, means at least 2 occurrences. Note that you need extended regex (-E) for it to work.

– RoVo
Feb 27 at 10:09

You can use regexr.com for great explanations and for testing.

– RoVo
Feb 27 at 10:14

add a comment |

Without a minimal example, it's hard to guess what you're after...

Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:

awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file

This prints all lines including a field like ZH:<one character>:<some number>.

gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.

answered Feb 27 at 8:43

oliv

1,901413

add a comment |

Without a minimal example, it's hard to guess what you're after...

Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:

awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file

This prints all lines including a field like ZH:<one character>:<some number>.

gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.

answered Feb 27 at 8:43

oliv

1,901413

add a comment |

Without a minimal example, it's hard to guess what you're after...

Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:

awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file

This prints all lines including a field like ZH:<one character>:<some number>.

gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.

answered Feb 27 at 8:43

oliv

1,901413

Without a minimal example, it's hard to guess what you're after...

Anyway, this awk script might be helpful if you want to filter lines having ZH parameter:

awk 'strtonum(gensub(/^.*ZH:.:([0-9]+).*$/, "\1", "1"))>100' file

This prints all lines including a field like ZH:<one character>:<some number>.

gensub extracts the number associated to ZH. Then it is converted to number and compared with the number 100.

answered Feb 27 at 8:43

oliv

1,901413

answered Feb 27 at 8:43

oliv

1,901413

answered Feb 27 at 8:43

oliv

1,901413

answered Feb 27 at 8:43

oliv

1,901413

add a comment |

搜尋此網誌

mjhjmtu