how to print only part of each row that stars with a particular character
Clash Royale CLAN TAG#URR8PPP
up vote
-1
down vote
favorite
I have a file with over 10,000 rows:
head samples
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5
I want to print only part of each line that starts with "HI.*"
This is my desired output:
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
text-processing awk grep
add a comment |Â
up vote
-1
down vote
favorite
I have a file with over 10,000 rows:
head samples
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5
I want to print only part of each line that starts with "HI.*"
This is my desired output:
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
text-processing awk grep
add a comment |Â
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I have a file with over 10,000 rows:
head samples
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5
I want to print only part of each line that starts with "HI.*"
This is my desired output:
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
text-processing awk grep
I have a file with over 10,000 rows:
head samples
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192170/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192171/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192168/type/READ_SET_FASTQ_PE/filename/HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz.md5
https://genomequebec.mcgill.ca/nanuqMPS/readSetMd5Download/id/192169/type/READ_SET_FASTQ/filename/HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz.md5
I want to print only part of each line that starts with "HI.*"
This is my desired output:
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
text-processing awk grep
text-processing awk grep
edited Aug 13 at 17:37
msp9011
3,46643862
3,46643862
asked Aug 13 at 17:10
Anna1364
421110
421110
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
Using awk
awk -F'/' '$NF ~ /^HI./ print $NF ' infile
to remove the .md5
suffix, you could do:
awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile
in
awk
, the$0
is referring to the whole line/record and$1
,$2
,$3
, ... are referring to the first, second, third, ... respectively; and$NF
referring to the last field and accordingly the$(NF-1)
is the second last field.the tild
~
operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as stringstring ~ /regular-expression/
The sed
solution:
sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile
this
/([^/]*).md5
matches last slash followed by anything but not a slash that ends with.md5
. We take([^/]*)
(everything between last slash and.md5
as a group match and print just that in replacement part with its back-reference1
.this
/^HI./!d
deletes the lines which doesn't start withHI.
from the result of previoussed
command.we used different
sed
delimiter:
since we have special/
character in the input.
add a comment |Â
up vote
1
down vote
Try this,
awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt
- prints the last field if last field starts with
HI
- excludes the last 4 charecters
.md5
Output
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz
add a comment |Â
up vote
0
down vote
awk -F"filename/" 'gsub (".md5","");print $2'
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Using awk
awk -F'/' '$NF ~ /^HI./ print $NF ' infile
to remove the .md5
suffix, you could do:
awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile
in
awk
, the$0
is referring to the whole line/record and$1
,$2
,$3
, ... are referring to the first, second, third, ... respectively; and$NF
referring to the last field and accordingly the$(NF-1)
is the second last field.the tild
~
operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as stringstring ~ /regular-expression/
The sed
solution:
sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile
this
/([^/]*).md5
matches last slash followed by anything but not a slash that ends with.md5
. We take([^/]*)
(everything between last slash and.md5
as a group match and print just that in replacement part with its back-reference1
.this
/^HI./!d
deletes the lines which doesn't start withHI.
from the result of previoussed
command.we used different
sed
delimiter:
since we have special/
character in the input.
add a comment |Â
up vote
3
down vote
accepted
Using awk
awk -F'/' '$NF ~ /^HI./ print $NF ' infile
to remove the .md5
suffix, you could do:
awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile
in
awk
, the$0
is referring to the whole line/record and$1
,$2
,$3
, ... are referring to the first, second, third, ... respectively; and$NF
referring to the last field and accordingly the$(NF-1)
is the second last field.the tild
~
operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as stringstring ~ /regular-expression/
The sed
solution:
sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile
this
/([^/]*).md5
matches last slash followed by anything but not a slash that ends with.md5
. We take([^/]*)
(everything between last slash and.md5
as a group match and print just that in replacement part with its back-reference1
.this
/^HI./!d
deletes the lines which doesn't start withHI.
from the result of previoussed
command.we used different
sed
delimiter:
since we have special/
character in the input.
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Using awk
awk -F'/' '$NF ~ /^HI./ print $NF ' infile
to remove the .md5
suffix, you could do:
awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile
in
awk
, the$0
is referring to the whole line/record and$1
,$2
,$3
, ... are referring to the first, second, third, ... respectively; and$NF
referring to the last field and accordingly the$(NF-1)
is the second last field.the tild
~
operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as stringstring ~ /regular-expression/
The sed
solution:
sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile
this
/([^/]*).md5
matches last slash followed by anything but not a slash that ends with.md5
. We take([^/]*)
(everything between last slash and.md5
as a group match and print just that in replacement part with its back-reference1
.this
/^HI./!d
deletes the lines which doesn't start withHI.
from the result of previoussed
command.we used different
sed
delimiter:
since we have special/
character in the input.
Using awk
awk -F'/' '$NF ~ /^HI./ print $NF ' infile
to remove the .md5
suffix, you could do:
awk -F'(/|.md5)' '$(NF-1) ~ /^HI./ print $(NF-1) ' infile
in
awk
, the$0
is referring to the whole line/record and$1
,$2
,$3
, ... are referring to the first, second, third, ... respectively; and$NF
referring to the last field and accordingly the$(NF-1)
is the second last field.the tild
~
operator in awk treat the right-hand operator as (extended) regular-expression match against the left-hand operand as stringstring ~ /regular-expression/
The sed
solution:
sed 's:.*/([^/]*).md5$:1: ; /^HI./!d' infile
this
/([^/]*).md5
matches last slash followed by anything but not a slash that ends with.md5
. We take([^/]*)
(everything between last slash and.md5
as a group match and print just that in replacement part with its back-reference1
.this
/^HI./!d
deletes the lines which doesn't start withHI.
from the result of previoussed
command.we used different
sed
delimiter:
since we have special/
character in the input.
edited Aug 13 at 17:49
answered Aug 13 at 17:16
ñÃÂsýù÷
15.7k92563
15.7k92563
add a comment |Â
add a comment |Â
up vote
1
down vote
Try this,
awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt
- prints the last field if last field starts with
HI
- excludes the last 4 charecters
.md5
Output
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz
add a comment |Â
up vote
1
down vote
Try this,
awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt
- prints the last field if last field starts with
HI
- excludes the last 4 charecters
.md5
Output
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Try this,
awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt
- prints the last field if last field starts with
HI
- excludes the last 4 charecters
.md5
Output
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz
Try this,
awk -F '/' '$NF ~ /^HI/ print substr($NF, 1, length($NF)-4)' file.txt
- prints the last field if last field starts with
HI
- excludes the last 4 charecters
.md5
Output
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R1.fastq.gz
HI.2613.007.Custom_0022.ED9_SD2A27-1_180_R2.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R1.fastq.gz
HI.2613.007.Custom_0021.ED4_KS1A29-7_338_R2.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R1.fastq.gz
HI.2613.007.Index_18.ED17_MO1A26-7_353_R2.fastq.gz
HI.2613.007.Index_14.ED14_IA2A35-2_310_R1.fastq.gz
edited Aug 13 at 17:24
answered Aug 13 at 17:18
msp9011
3,46643862
3,46643862
add a comment |Â
add a comment |Â
up vote
0
down vote
awk -F"filename/" 'gsub (".md5","");print $2'
add a comment |Â
up vote
0
down vote
awk -F"filename/" 'gsub (".md5","");print $2'
add a comment |Â
up vote
0
down vote
up vote
0
down vote
awk -F"filename/" 'gsub (".md5","");print $2'
awk -F"filename/" 'gsub (".md5","");print $2'
edited Aug 14 at 17:22
Kusalananda
106k14209327
106k14209327
answered Aug 14 at 17:11
kalpesh
164
164
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462354%2fhow-to-print-only-part-of-each-row-that-stars-with-a-particular-character%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password