How to make grep ignore lines without trailing newline character
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.
What is the best way to do this?
I encountered this issue in a python script that calls grep via the subprocess
module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.
grep
 |Â
show 3 more comments
up vote
3
down vote
favorite
I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.
What is the best way to do this?
I encountered this issue in a python script that calls grep via the subprocess
module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.
grep
In practice only the last line can missn
so maybe it would be enough to ignore last line?
â Arkadiusz Drabczyk
Jun 13 at 17:37
What is the best way to conditionally ignore the last line?
â dshin
Jun 13 at 17:37
Instead of doinggrep string FILE
dohead -n -1 FILE | grep 'string'
â Arkadiusz Drabczyk
Jun 13 at 17:39
That might throw out the last line incorrectly.
â dshin
Jun 13 at 17:40
Oh, ok. We can test if the last character in the file isn
. What shell do you use?
â Arkadiusz Drabczyk
Jun 13 at 17:41
 |Â
show 3 more comments
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.
What is the best way to do this?
I encountered this issue in a python script that calls grep via the subprocess
module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.
grep
I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.
What is the best way to do this?
I encountered this issue in a python script that calls grep via the subprocess
module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.
grep
edited Jun 13 at 18:25
ilkkachu
47.5k668130
47.5k668130
asked Jun 13 at 17:17
dshin
1234
1234
In practice only the last line can missn
so maybe it would be enough to ignore last line?
â Arkadiusz Drabczyk
Jun 13 at 17:37
What is the best way to conditionally ignore the last line?
â dshin
Jun 13 at 17:37
Instead of doinggrep string FILE
dohead -n -1 FILE | grep 'string'
â Arkadiusz Drabczyk
Jun 13 at 17:39
That might throw out the last line incorrectly.
â dshin
Jun 13 at 17:40
Oh, ok. We can test if the last character in the file isn
. What shell do you use?
â Arkadiusz Drabczyk
Jun 13 at 17:41
 |Â
show 3 more comments
In practice only the last line can missn
so maybe it would be enough to ignore last line?
â Arkadiusz Drabczyk
Jun 13 at 17:37
What is the best way to conditionally ignore the last line?
â dshin
Jun 13 at 17:37
Instead of doinggrep string FILE
dohead -n -1 FILE | grep 'string'
â Arkadiusz Drabczyk
Jun 13 at 17:39
That might throw out the last line incorrectly.
â dshin
Jun 13 at 17:40
Oh, ok. We can test if the last character in the file isn
. What shell do you use?
â Arkadiusz Drabczyk
Jun 13 at 17:41
In practice only the last line can miss
n
so maybe it would be enough to ignore last line?â Arkadiusz Drabczyk
Jun 13 at 17:37
In practice only the last line can miss
n
so maybe it would be enough to ignore last line?â Arkadiusz Drabczyk
Jun 13 at 17:37
What is the best way to conditionally ignore the last line?
â dshin
Jun 13 at 17:37
What is the best way to conditionally ignore the last line?
â dshin
Jun 13 at 17:37
Instead of doing
grep string FILE
do head -n -1 FILE | grep 'string'
â Arkadiusz Drabczyk
Jun 13 at 17:39
Instead of doing
grep string FILE
do head -n -1 FILE | grep 'string'
â Arkadiusz Drabczyk
Jun 13 at 17:39
That might throw out the last line incorrectly.
â dshin
Jun 13 at 17:40
That might throw out the last line incorrectly.
â dshin
Jun 13 at 17:40
Oh, ok. We can test if the last character in the file is
n
. What shell do you use?â Arkadiusz Drabczyk
Jun 13 at 17:41
Oh, ok. We can test if the last character in the file is
n
. What shell do you use?â Arkadiusz Drabczyk
Jun 13 at 17:41
 |Â
show 3 more comments
4 Answers
4
active
oldest
votes
up vote
1
down vote
accepted
grep
is explicitly defined to ignore newlines, so you can't really use that. sed
knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk
separates records by newlines (RS
), but doesn't really care if there was one, the default action of print
is to print a newline (ORS
) at the end in any case.
So the usual tools don't seem too helpful here.
However, sed
does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed
delete what it thinks is the last one. E.g.
sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...
If that's not an option, then there's always Perl. This should print only the lines that match /pattern/
, and have a newline at the end:
perl -ne 'print if /pattern/ && /n$/'
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent ofegrep "pattern1|pattern2|pattern3"
?
â dshin
Jun 13 at 18:35
@dshin, Perl regexes are pretty much an extension of extended regexes like whatgrep -E
uses, sothis|that
works out of the box. (see this question and Perl's docs for details)
â ilkkachu
Jun 13 at 18:39
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with... && substr($_, -1, 1) eq "n"'
would be faster.
â ilkkachu
Jun 13 at 18:42
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
add a comment |Â
up vote
4
down vote
With gawk
(using EREs similar to grep -E
):
gawk '/pattern/ && RT' file
RT
in gawk
contains what is matched by RS
the record separator. With the default value of RS
(n
) that would be n
except for a non-delimited last record where RT
would then be empty.
With perl
(perl REs similar to grep -P
where available):
perl -ne 'print if /pattern/ && /nz/'
Note that contrary to gawk
or grep
, perl
by default does work on bytes not characters. For instance, it's .
regexp operator would match on each of the two bytes of a UTF-8-encoded ã
. For it to work on characters as per the locale's definition of characters like for awk
/grep
, you'd use:
perl -Mopen=locale -ne 'print if /pattern/ && /nz/'
add a comment |Â
up vote
1
down vote
Something like this could do the job:
#!/usr/bin/env sh
if [ "$(tail -c 1 FILE)" = "" ]
then
printf "Trailing newline foundn"
# grep whole file
# grep ....
else
printf "No trailing newline foundn"
# ignore last line
# head -n -1 FILE | grep ...
fi
We rely on the following characteristic of command substitution
described in man bash
:
Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.
1
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
I've solved this sort of race condition in similar contexts by first doing adu -b
to get an exact byte size, and then doing atail -c
to only fetch that many bytes. I could do that here.
â dshin
Jun 13 at 18:09
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
add a comment |Â
up vote
1
down vote
If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep
or perl
depending on the complexity of the expression or if features such as --only-matching
are used.
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pcre.h>
#define MAX_OFFSET 3
int main(int argc, char *argv)
// getline
char *line = NULL;
size_t linebuflen = 0;
ssize_t numchars;
// PCRE
const char *error;
int erroffset, rc;
int offsets[MAX_OFFSET];
pcre *re;
if (argc < 2) errx(1, "need regex");
argv++;
if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
err(1, "pcre_compile failed at offset %d: %s", erroffset, error);
while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
if (line[numchars-1] != 'n') break;
rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
if (rc > 0) fwrite(line, numchars, 1, stdout);
exit(EXIT_SUCCESS);
This is about 49% faster than perl -ne 'print if /.../ && /nz/'
.
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
grep
is explicitly defined to ignore newlines, so you can't really use that. sed
knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk
separates records by newlines (RS
), but doesn't really care if there was one, the default action of print
is to print a newline (ORS
) at the end in any case.
So the usual tools don't seem too helpful here.
However, sed
does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed
delete what it thinks is the last one. E.g.
sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...
If that's not an option, then there's always Perl. This should print only the lines that match /pattern/
, and have a newline at the end:
perl -ne 'print if /pattern/ && /n$/'
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent ofegrep "pattern1|pattern2|pattern3"
?
â dshin
Jun 13 at 18:35
@dshin, Perl regexes are pretty much an extension of extended regexes like whatgrep -E
uses, sothis|that
works out of the box. (see this question and Perl's docs for details)
â ilkkachu
Jun 13 at 18:39
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with... && substr($_, -1, 1) eq "n"'
would be faster.
â ilkkachu
Jun 13 at 18:42
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
add a comment |Â
up vote
1
down vote
accepted
grep
is explicitly defined to ignore newlines, so you can't really use that. sed
knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk
separates records by newlines (RS
), but doesn't really care if there was one, the default action of print
is to print a newline (ORS
) at the end in any case.
So the usual tools don't seem too helpful here.
However, sed
does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed
delete what it thinks is the last one. E.g.
sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...
If that's not an option, then there's always Perl. This should print only the lines that match /pattern/
, and have a newline at the end:
perl -ne 'print if /pattern/ && /n$/'
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent ofegrep "pattern1|pattern2|pattern3"
?
â dshin
Jun 13 at 18:35
@dshin, Perl regexes are pretty much an extension of extended regexes like whatgrep -E
uses, sothis|that
works out of the box. (see this question and Perl's docs for details)
â ilkkachu
Jun 13 at 18:39
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with... && substr($_, -1, 1) eq "n"'
would be faster.
â ilkkachu
Jun 13 at 18:42
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
grep
is explicitly defined to ignore newlines, so you can't really use that. sed
knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk
separates records by newlines (RS
), but doesn't really care if there was one, the default action of print
is to print a newline (ORS
) at the end in any case.
So the usual tools don't seem too helpful here.
However, sed
does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed
delete what it thinks is the last one. E.g.
sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...
If that's not an option, then there's always Perl. This should print only the lines that match /pattern/
, and have a newline at the end:
perl -ne 'print if /pattern/ && /n$/'
grep
is explicitly defined to ignore newlines, so you can't really use that. sed
knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk
separates records by newlines (RS
), but doesn't really care if there was one, the default action of print
is to print a newline (ORS
) at the end in any case.
So the usual tools don't seem too helpful here.
However, sed
does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed
delete what it thinks is the last one. E.g.
sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...
If that's not an option, then there's always Perl. This should print only the lines that match /pattern/
, and have a newline at the end:
perl -ne 'print if /pattern/ && /n$/'
edited Jun 13 at 18:24
answered Jun 13 at 18:19
ilkkachu
47.5k668130
47.5k668130
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent ofegrep "pattern1|pattern2|pattern3"
?
â dshin
Jun 13 at 18:35
@dshin, Perl regexes are pretty much an extension of extended regexes like whatgrep -E
uses, sothis|that
works out of the box. (see this question and Perl's docs for details)
â ilkkachu
Jun 13 at 18:39
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with... && substr($_, -1, 1) eq "n"'
would be faster.
â ilkkachu
Jun 13 at 18:42
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
add a comment |Â
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent ofegrep "pattern1|pattern2|pattern3"
?
â dshin
Jun 13 at 18:35
@dshin, Perl regexes are pretty much an extension of extended regexes like whatgrep -E
uses, sothis|that
works out of the box. (see this question and Perl's docs for details)
â ilkkachu
Jun 13 at 18:39
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with... && substr($_, -1, 1) eq "n"'
would be faster.
â ilkkachu
Jun 13 at 18:42
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of
egrep "pattern1|pattern2|pattern3"
?â dshin
Jun 13 at 18:35
Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of
egrep "pattern1|pattern2|pattern3"
?â dshin
Jun 13 at 18:35
@dshin, Perl regexes are pretty much an extension of extended regexes like what
grep -E
uses, so this|that
works out of the box. (see this question and Perl's docs for details)â ilkkachu
Jun 13 at 18:39
@dshin, Perl regexes are pretty much an extension of extended regexes like what
grep -E
uses, so this|that
works out of the box. (see this question and Perl's docs for details)â ilkkachu
Jun 13 at 18:39
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with
... && substr($_, -1, 1) eq "n"'
would be faster.â ilkkachu
Jun 13 at 18:42
@dshin, also, since you're aiming for speed, you could try if replacing the second regex with
... && substr($_, -1, 1) eq "n"'
would be faster.â ilkkachu
Jun 13 at 18:42
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
Hm, your original actually is clocking in slightly faster for me, but they are very close.
â dshin
Jun 13 at 18:44
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â ilkkachu
Jun 13 at 18:49
add a comment |Â
up vote
4
down vote
With gawk
(using EREs similar to grep -E
):
gawk '/pattern/ && RT' file
RT
in gawk
contains what is matched by RS
the record separator. With the default value of RS
(n
) that would be n
except for a non-delimited last record where RT
would then be empty.
With perl
(perl REs similar to grep -P
where available):
perl -ne 'print if /pattern/ && /nz/'
Note that contrary to gawk
or grep
, perl
by default does work on bytes not characters. For instance, it's .
regexp operator would match on each of the two bytes of a UTF-8-encoded ã
. For it to work on characters as per the locale's definition of characters like for awk
/grep
, you'd use:
perl -Mopen=locale -ne 'print if /pattern/ && /nz/'
add a comment |Â
up vote
4
down vote
With gawk
(using EREs similar to grep -E
):
gawk '/pattern/ && RT' file
RT
in gawk
contains what is matched by RS
the record separator. With the default value of RS
(n
) that would be n
except for a non-delimited last record where RT
would then be empty.
With perl
(perl REs similar to grep -P
where available):
perl -ne 'print if /pattern/ && /nz/'
Note that contrary to gawk
or grep
, perl
by default does work on bytes not characters. For instance, it's .
regexp operator would match on each of the two bytes of a UTF-8-encoded ã
. For it to work on characters as per the locale's definition of characters like for awk
/grep
, you'd use:
perl -Mopen=locale -ne 'print if /pattern/ && /nz/'
add a comment |Â
up vote
4
down vote
up vote
4
down vote
With gawk
(using EREs similar to grep -E
):
gawk '/pattern/ && RT' file
RT
in gawk
contains what is matched by RS
the record separator. With the default value of RS
(n
) that would be n
except for a non-delimited last record where RT
would then be empty.
With perl
(perl REs similar to grep -P
where available):
perl -ne 'print if /pattern/ && /nz/'
Note that contrary to gawk
or grep
, perl
by default does work on bytes not characters. For instance, it's .
regexp operator would match on each of the two bytes of a UTF-8-encoded ã
. For it to work on characters as per the locale's definition of characters like for awk
/grep
, you'd use:
perl -Mopen=locale -ne 'print if /pattern/ && /nz/'
With gawk
(using EREs similar to grep -E
):
gawk '/pattern/ && RT' file
RT
in gawk
contains what is matched by RS
the record separator. With the default value of RS
(n
) that would be n
except for a non-delimited last record where RT
would then be empty.
With perl
(perl REs similar to grep -P
where available):
perl -ne 'print if /pattern/ && /nz/'
Note that contrary to gawk
or grep
, perl
by default does work on bytes not characters. For instance, it's .
regexp operator would match on each of the two bytes of a UTF-8-encoded ã
. For it to work on characters as per the locale's definition of characters like for awk
/grep
, you'd use:
perl -Mopen=locale -ne 'print if /pattern/ && /nz/'
edited Jun 13 at 20:43
answered Jun 13 at 18:29
Stéphane Chazelas
279k53513844
279k53513844
add a comment |Â
add a comment |Â
up vote
1
down vote
Something like this could do the job:
#!/usr/bin/env sh
if [ "$(tail -c 1 FILE)" = "" ]
then
printf "Trailing newline foundn"
# grep whole file
# grep ....
else
printf "No trailing newline foundn"
# ignore last line
# head -n -1 FILE | grep ...
fi
We rely on the following characteristic of command substitution
described in man bash
:
Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.
1
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
I've solved this sort of race condition in similar contexts by first doing adu -b
to get an exact byte size, and then doing atail -c
to only fetch that many bytes. I could do that here.
â dshin
Jun 13 at 18:09
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
add a comment |Â
up vote
1
down vote
Something like this could do the job:
#!/usr/bin/env sh
if [ "$(tail -c 1 FILE)" = "" ]
then
printf "Trailing newline foundn"
# grep whole file
# grep ....
else
printf "No trailing newline foundn"
# ignore last line
# head -n -1 FILE | grep ...
fi
We rely on the following characteristic of command substitution
described in man bash
:
Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.
1
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
I've solved this sort of race condition in similar contexts by first doing adu -b
to get an exact byte size, and then doing atail -c
to only fetch that many bytes. I could do that here.
â dshin
Jun 13 at 18:09
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Something like this could do the job:
#!/usr/bin/env sh
if [ "$(tail -c 1 FILE)" = "" ]
then
printf "Trailing newline foundn"
# grep whole file
# grep ....
else
printf "No trailing newline foundn"
# ignore last line
# head -n -1 FILE | grep ...
fi
We rely on the following characteristic of command substitution
described in man bash
:
Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.
Something like this could do the job:
#!/usr/bin/env sh
if [ "$(tail -c 1 FILE)" = "" ]
then
printf "Trailing newline foundn"
# grep whole file
# grep ....
else
printf "No trailing newline foundn"
# ignore last line
# head -n -1 FILE | grep ...
fi
We rely on the following characteristic of command substitution
described in man bash
:
Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.
answered Jun 13 at 17:58
Arkadiusz Drabczyk
7,15521532
7,15521532
1
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
I've solved this sort of race condition in similar contexts by first doing adu -b
to get an exact byte size, and then doing atail -c
to only fetch that many bytes. I could do that here.
â dshin
Jun 13 at 18:09
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
add a comment |Â
1
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
I've solved this sort of race condition in similar contexts by first doing adu -b
to get an exact byte size, and then doing atail -c
to only fetch that many bytes. I could do that here.
â dshin
Jun 13 at 18:09
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
1
1
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â dshin
Jun 13 at 18:02
I've solved this sort of race condition in similar contexts by first doing a
du -b
to get an exact byte size, and then doing a tail -c
to only fetch that many bytes. I could do that here.â dshin
Jun 13 at 18:09
I've solved this sort of race condition in similar contexts by first doing a
du -b
to get an exact byte size, and then doing a tail -c
to only fetch that many bytes. I could do that here.â dshin
Jun 13 at 18:09
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â dshin
Jun 13 at 18:12
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
@dshin: OK, I'm glad you found the right solution that works for you.
â Arkadiusz Drabczyk
Jun 13 at 18:13
add a comment |Â
up vote
1
down vote
If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep
or perl
depending on the complexity of the expression or if features such as --only-matching
are used.
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pcre.h>
#define MAX_OFFSET 3
int main(int argc, char *argv)
// getline
char *line = NULL;
size_t linebuflen = 0;
ssize_t numchars;
// PCRE
const char *error;
int erroffset, rc;
int offsets[MAX_OFFSET];
pcre *re;
if (argc < 2) errx(1, "need regex");
argv++;
if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
err(1, "pcre_compile failed at offset %d: %s", erroffset, error);
while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
if (line[numchars-1] != 'n') break;
rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
if (rc > 0) fwrite(line, numchars, 1, stdout);
exit(EXIT_SUCCESS);
This is about 49% faster than perl -ne 'print if /.../ && /nz/'
.
add a comment |Â
up vote
1
down vote
If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep
or perl
depending on the complexity of the expression or if features such as --only-matching
are used.
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pcre.h>
#define MAX_OFFSET 3
int main(int argc, char *argv)
// getline
char *line = NULL;
size_t linebuflen = 0;
ssize_t numchars;
// PCRE
const char *error;
int erroffset, rc;
int offsets[MAX_OFFSET];
pcre *re;
if (argc < 2) errx(1, "need regex");
argv++;
if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
err(1, "pcre_compile failed at offset %d: %s", erroffset, error);
while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
if (line[numchars-1] != 'n') break;
rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
if (rc > 0) fwrite(line, numchars, 1, stdout);
exit(EXIT_SUCCESS);
This is about 49% faster than perl -ne 'print if /.../ && /nz/'
.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep
or perl
depending on the complexity of the expression or if features such as --only-matching
are used.
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pcre.h>
#define MAX_OFFSET 3
int main(int argc, char *argv)
// getline
char *line = NULL;
size_t linebuflen = 0;
ssize_t numchars;
// PCRE
const char *error;
int erroffset, rc;
int offsets[MAX_OFFSET];
pcre *re;
if (argc < 2) errx(1, "need regex");
argv++;
if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
err(1, "pcre_compile failed at offset %d: %s", erroffset, error);
while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
if (line[numchars-1] != 'n') break;
rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
if (rc > 0) fwrite(line, numchars, 1, stdout);
exit(EXIT_SUCCESS);
This is about 49% faster than perl -ne 'print if /.../ && /nz/'
.
If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep
or perl
depending on the complexity of the expression or if features such as --only-matching
are used.
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pcre.h>
#define MAX_OFFSET 3
int main(int argc, char *argv)
// getline
char *line = NULL;
size_t linebuflen = 0;
ssize_t numchars;
// PCRE
const char *error;
int erroffset, rc;
int offsets[MAX_OFFSET];
pcre *re;
if (argc < 2) errx(1, "need regex");
argv++;
if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
err(1, "pcre_compile failed at offset %d: %s", erroffset, error);
while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
if (line[numchars-1] != 'n') break;
rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
if (rc > 0) fwrite(line, numchars, 1, stdout);
exit(EXIT_SUCCESS);
This is about 49% faster than perl -ne 'print if /.../ && /nz/'
.
answered Jun 14 at 18:10
thrig
21.8k12751
21.8k12751
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f449608%2fhow-to-make-grep-ignore-lines-without-trailing-newline-character%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
In practice only the last line can miss
n
so maybe it would be enough to ignore last line?â Arkadiusz Drabczyk
Jun 13 at 17:37
What is the best way to conditionally ignore the last line?
â dshin
Jun 13 at 17:37
Instead of doing
grep string FILE
dohead -n -1 FILE | grep 'string'
â Arkadiusz Drabczyk
Jun 13 at 17:39
That might throw out the last line incorrectly.
â dshin
Jun 13 at 17:40
Oh, ok. We can test if the last character in the file is
n
. What shell do you use?â Arkadiusz Drabczyk
Jun 13 at 17:41