Grep a range of values with specific starting characters
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have 10GB files in which i want to count occurrence of some specific text i.e TY[0-9].
File format is like :
ABC,2A,2018-07-06,2018-06-20 00:00:00
BCD,TY1,2018-07-06,2018-06-20 00:00:00
EFG,TY2,2018-07-06,2018-06-20 00:00:00
IGH,2A,2018-07-06,2018-06-20 00:00:00
I want to get the count of all text starting with TY
. I tried using egrep but i am not able to get that .
egrep "^TY[0-9]" Filename
awk grep
add a comment |Â
up vote
0
down vote
favorite
I have 10GB files in which i want to count occurrence of some specific text i.e TY[0-9].
File format is like :
ABC,2A,2018-07-06,2018-06-20 00:00:00
BCD,TY1,2018-07-06,2018-06-20 00:00:00
EFG,TY2,2018-07-06,2018-06-20 00:00:00
IGH,2A,2018-07-06,2018-06-20 00:00:00
I want to get the count of all text starting with TY
. I tried using egrep but i am not able to get that .
egrep "^TY[0-9]" Filename
awk grep
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have 10GB files in which i want to count occurrence of some specific text i.e TY[0-9].
File format is like :
ABC,2A,2018-07-06,2018-06-20 00:00:00
BCD,TY1,2018-07-06,2018-06-20 00:00:00
EFG,TY2,2018-07-06,2018-06-20 00:00:00
IGH,2A,2018-07-06,2018-06-20 00:00:00
I want to get the count of all text starting with TY
. I tried using egrep but i am not able to get that .
egrep "^TY[0-9]" Filename
awk grep
I have 10GB files in which i want to count occurrence of some specific text i.e TY[0-9].
File format is like :
ABC,2A,2018-07-06,2018-06-20 00:00:00
BCD,TY1,2018-07-06,2018-06-20 00:00:00
EFG,TY2,2018-07-06,2018-06-20 00:00:00
IGH,2A,2018-07-06,2018-06-20 00:00:00
I want to get the count of all text starting with TY
. I tried using egrep but i am not able to get that .
egrep "^TY[0-9]" Filename
awk grep
asked Jun 21 at 18:37
Developer
15717
15717
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
Using awk
to count the number of times the second comma-delimited field in the file starts with the string TY
followed by a digit:
awk -F, '$2 ~ /^TY[[:digit:]]/ n++ END print n ' filename
I'm wondering whether using cut
in combination with grep
would be quick? Cutting out the second column would give grep
less data to work with, and so it may be quicker than just grep
alone.
cut -d, -f2 filename | grep -c '^TY[[:digit:]]'
... but I'm not sure.
After some testing on my OpenBSD system, using a 1.1GB file, the cut
+grep
is actually almost 50% quicker than awk
(8 seconds vs. 15 seconds). And a pure grep
solution (grep -Ec '<TY[0-9]' filename
, taken from glenn's solution) takes 13 seconds.
So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.
In your second example, why notcut -d, -f2 inputfile | grep -c [...]
rather than| grep | wc -l
?
â DopeGhoti
Jun 21 at 18:59
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
add a comment |Â
up vote
2
down vote
You want to use a word boundary instead of the start-of-line anchor:
$ grep -Ec '<TY[0-9]' file
2
Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then
$ grep -Eo '<TY[0-9]' file | wc -l
add a comment |Â
up vote
1
down vote
If you want to find the number of occurrence of a ,
delimited field that starts with TY
and is followed by any number of decimal digits, you could do:
<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; ENDprint 0+$n'
Which on an input like:
TY1,TY2,TY,TYFOO
TY213,X-TY2,TY4
Would return 4
(TY1
, TY2
, TY213
, TY4
).
(?<!...)
and (?!...)
are respectively negative look behing and ahead operators. So here, we're looking for TY
followed by one or more (+
) digits (d
), provided its neither preceded nor followed by a character other than ,
.
Another way to do it would be to convert ,
s to newlines and count the number of resulting lines that start with TY
followed by one or more digits:
<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'
(on my system, that's about 10 times as fast as the perl
solution)
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Using awk
to count the number of times the second comma-delimited field in the file starts with the string TY
followed by a digit:
awk -F, '$2 ~ /^TY[[:digit:]]/ n++ END print n ' filename
I'm wondering whether using cut
in combination with grep
would be quick? Cutting out the second column would give grep
less data to work with, and so it may be quicker than just grep
alone.
cut -d, -f2 filename | grep -c '^TY[[:digit:]]'
... but I'm not sure.
After some testing on my OpenBSD system, using a 1.1GB file, the cut
+grep
is actually almost 50% quicker than awk
(8 seconds vs. 15 seconds). And a pure grep
solution (grep -Ec '<TY[0-9]' filename
, taken from glenn's solution) takes 13 seconds.
So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.
In your second example, why notcut -d, -f2 inputfile | grep -c [...]
rather than| grep | wc -l
?
â DopeGhoti
Jun 21 at 18:59
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
add a comment |Â
up vote
3
down vote
accepted
Using awk
to count the number of times the second comma-delimited field in the file starts with the string TY
followed by a digit:
awk -F, '$2 ~ /^TY[[:digit:]]/ n++ END print n ' filename
I'm wondering whether using cut
in combination with grep
would be quick? Cutting out the second column would give grep
less data to work with, and so it may be quicker than just grep
alone.
cut -d, -f2 filename | grep -c '^TY[[:digit:]]'
... but I'm not sure.
After some testing on my OpenBSD system, using a 1.1GB file, the cut
+grep
is actually almost 50% quicker than awk
(8 seconds vs. 15 seconds). And a pure grep
solution (grep -Ec '<TY[0-9]' filename
, taken from glenn's solution) takes 13 seconds.
So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.
In your second example, why notcut -d, -f2 inputfile | grep -c [...]
rather than| grep | wc -l
?
â DopeGhoti
Jun 21 at 18:59
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Using awk
to count the number of times the second comma-delimited field in the file starts with the string TY
followed by a digit:
awk -F, '$2 ~ /^TY[[:digit:]]/ n++ END print n ' filename
I'm wondering whether using cut
in combination with grep
would be quick? Cutting out the second column would give grep
less data to work with, and so it may be quicker than just grep
alone.
cut -d, -f2 filename | grep -c '^TY[[:digit:]]'
... but I'm not sure.
After some testing on my OpenBSD system, using a 1.1GB file, the cut
+grep
is actually almost 50% quicker than awk
(8 seconds vs. 15 seconds). And a pure grep
solution (grep -Ec '<TY[0-9]' filename
, taken from glenn's solution) takes 13 seconds.
So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.
Using awk
to count the number of times the second comma-delimited field in the file starts with the string TY
followed by a digit:
awk -F, '$2 ~ /^TY[[:digit:]]/ n++ END print n ' filename
I'm wondering whether using cut
in combination with grep
would be quick? Cutting out the second column would give grep
less data to work with, and so it may be quicker than just grep
alone.
cut -d, -f2 filename | grep -c '^TY[[:digit:]]'
... but I'm not sure.
After some testing on my OpenBSD system, using a 1.1GB file, the cut
+grep
is actually almost 50% quicker than awk
(8 seconds vs. 15 seconds). And a pure grep
solution (grep -Ec '<TY[0-9]' filename
, taken from glenn's solution) takes 13 seconds.
So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.
edited Jun 21 at 19:02
answered Jun 21 at 18:47
Kusalananda
101k13199312
101k13199312
In your second example, why notcut -d, -f2 inputfile | grep -c [...]
rather than| grep | wc -l
?
â DopeGhoti
Jun 21 at 18:59
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
add a comment |Â
In your second example, why notcut -d, -f2 inputfile | grep -c [...]
rather than| grep | wc -l
?
â DopeGhoti
Jun 21 at 18:59
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
In your second example, why not
cut -d, -f2 inputfile | grep -c [...]
rather than | grep | wc -l
?â DopeGhoti
Jun 21 at 18:59
In your second example, why not
cut -d, -f2 inputfile | grep -c [...]
rather than | grep | wc -l
?â DopeGhoti
Jun 21 at 18:59
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.
â Kusalananda
Jun 21 at 19:02
add a comment |Â
up vote
2
down vote
You want to use a word boundary instead of the start-of-line anchor:
$ grep -Ec '<TY[0-9]' file
2
Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then
$ grep -Eo '<TY[0-9]' file | wc -l
add a comment |Â
up vote
2
down vote
You want to use a word boundary instead of the start-of-line anchor:
$ grep -Ec '<TY[0-9]' file
2
Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then
$ grep -Eo '<TY[0-9]' file | wc -l
add a comment |Â
up vote
2
down vote
up vote
2
down vote
You want to use a word boundary instead of the start-of-line anchor:
$ grep -Ec '<TY[0-9]' file
2
Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then
$ grep -Eo '<TY[0-9]' file | wc -l
You want to use a word boundary instead of the start-of-line anchor:
$ grep -Ec '<TY[0-9]' file
2
Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then
$ grep -Eo '<TY[0-9]' file | wc -l
answered Jun 21 at 18:45
glenn jackman
45.6k265100
45.6k265100
add a comment |Â
add a comment |Â
up vote
1
down vote
If you want to find the number of occurrence of a ,
delimited field that starts with TY
and is followed by any number of decimal digits, you could do:
<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; ENDprint 0+$n'
Which on an input like:
TY1,TY2,TY,TYFOO
TY213,X-TY2,TY4
Would return 4
(TY1
, TY2
, TY213
, TY4
).
(?<!...)
and (?!...)
are respectively negative look behing and ahead operators. So here, we're looking for TY
followed by one or more (+
) digits (d
), provided its neither preceded nor followed by a character other than ,
.
Another way to do it would be to convert ,
s to newlines and count the number of resulting lines that start with TY
followed by one or more digits:
<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'
(on my system, that's about 10 times as fast as the perl
solution)
add a comment |Â
up vote
1
down vote
If you want to find the number of occurrence of a ,
delimited field that starts with TY
and is followed by any number of decimal digits, you could do:
<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; ENDprint 0+$n'
Which on an input like:
TY1,TY2,TY,TYFOO
TY213,X-TY2,TY4
Would return 4
(TY1
, TY2
, TY213
, TY4
).
(?<!...)
and (?!...)
are respectively negative look behing and ahead operators. So here, we're looking for TY
followed by one or more (+
) digits (d
), provided its neither preceded nor followed by a character other than ,
.
Another way to do it would be to convert ,
s to newlines and count the number of resulting lines that start with TY
followed by one or more digits:
<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'
(on my system, that's about 10 times as fast as the perl
solution)
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If you want to find the number of occurrence of a ,
delimited field that starts with TY
and is followed by any number of decimal digits, you could do:
<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; ENDprint 0+$n'
Which on an input like:
TY1,TY2,TY,TYFOO
TY213,X-TY2,TY4
Would return 4
(TY1
, TY2
, TY213
, TY4
).
(?<!...)
and (?!...)
are respectively negative look behing and ahead operators. So here, we're looking for TY
followed by one or more (+
) digits (d
), provided its neither preceded nor followed by a character other than ,
.
Another way to do it would be to convert ,
s to newlines and count the number of resulting lines that start with TY
followed by one or more digits:
<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'
(on my system, that's about 10 times as fast as the perl
solution)
If you want to find the number of occurrence of a ,
delimited field that starts with TY
and is followed by any number of decimal digits, you could do:
<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; ENDprint 0+$n'
Which on an input like:
TY1,TY2,TY,TYFOO
TY213,X-TY2,TY4
Would return 4
(TY1
, TY2
, TY213
, TY4
).
(?<!...)
and (?!...)
are respectively negative look behing and ahead operators. So here, we're looking for TY
followed by one or more (+
) digits (d
), provided its neither preceded nor followed by a character other than ,
.
Another way to do it would be to convert ,
s to newlines and count the number of resulting lines that start with TY
followed by one or more digits:
<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'
(on my system, that's about 10 times as fast as the perl
solution)
edited Jun 21 at 19:03
answered Jun 21 at 18:51
Stéphane Chazelas
278k52513844
278k52513844
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451168%2fgrep-a-range-of-values-with-specific-starting-characters%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password