How to make grep ignore lines without trailing newline character

up vote
3
down vote

favorite

I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.

What is the best way to do this?

I encountered this issue in a python script that calls grep via the subprocess module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.

edited Jun 13 at 18:25

ilkkachu

47.5k668130

asked Jun 13 at 17:17

dshin

1234

In practice only the last line can miss n so maybe it would be enough to ignore last line?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:37

What is the best way to conditionally ignore the last line?
â€“Â dshin
Jun 13 at 17:37

Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:39

That might throw out the last line incorrectly.
â€“Â dshin
Jun 13 at 17:40

Oh, ok. We can test if the last character in the file is n. What shell do you use?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:41

Â |Â
show 3 more comments

up vote
3
down vote

favorite

What is the best way to do this?

edited Jun 13 at 18:25

ilkkachu

47.5k668130

asked Jun 13 at 17:17

dshin

1234

In practice only the last line can miss n so maybe it would be enough to ignore last line?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:37

What is the best way to conditionally ignore the last line?
â€“Â dshin
Jun 13 at 17:37

Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:39

That might throw out the last line incorrectly.
â€“Â dshin
Jun 13 at 17:40

Oh, ok. We can test if the last character in the file is n. What shell do you use?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:41

Â |Â
show 3 more comments

up vote
3
down vote

favorite

What is the best way to do this?

edited Jun 13 at 18:25

ilkkachu

47.5k668130

asked Jun 13 at 17:17

dshin

1234

What is the best way to do this?

edited Jun 13 at 18:25

ilkkachu

47.5k668130

asked Jun 13 at 17:17

dshin

1234

edited Jun 13 at 18:25

ilkkachu

47.5k668130

edited Jun 13 at 18:25

ilkkachu

47.5k668130

edited Jun 13 at 18:25

ilkkachu

47.5k668130

asked Jun 13 at 17:17

dshin

1234

asked Jun 13 at 17:17

dshin

1234

asked Jun 13 at 17:17

dshin

1234

In practice only the last line can miss n so maybe it would be enough to ignore last line?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:37

What is the best way to conditionally ignore the last line?
â€“Â dshin
Jun 13 at 17:37

Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:39

That might throw out the last line incorrectly.
â€“Â dshin
Jun 13 at 17:40

Oh, ok. We can test if the last character in the file is n. What shell do you use?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:41

Â |Â
show 3 more comments

In practice only the last line can miss n so maybe it would be enough to ignore last line?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:37

What is the best way to conditionally ignore the last line?
â€“Â dshin
Jun 13 at 17:37

Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:39

That might throw out the last line incorrectly.
â€“Â dshin
Jun 13 at 17:40

Oh, ok. We can test if the last character in the file is n. What shell do you use?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:41

In practice only the last line can miss n so maybe it would be enough to ignore last line?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:37

What is the best way to conditionally ignore the last line?
â€“Â dshin
Jun 13 at 17:37

Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:39

That might throw out the last line incorrectly.
â€“Â dshin
Jun 13 at 17:40

Oh, ok. We can test if the last character in the file is n. What shell do you use?
â€“Â Arkadiusz Drabczyk
Jun 13 at 17:41

Â |Â
show 3 more comments

4 Answers
4

active

oldest

votes

up vote
1
down vote

accepted

grep is explicitly defined to ignore newlines, so you can't really use that. sed knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk separates records by newlines (RS), but doesn't really care if there was one, the default action of print is to print a newline (ORS) at the end in any case.

So the usual tools don't seem too helpful here.

However, sed does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed delete what it thinks is the last one. E.g.

sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...

If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:

perl -ne 'print if /pattern/ && /n$/'

edited Jun 13 at 18:24

answered Jun 13 at 18:19

ilkkachu

47.5k668130

Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
â€“Â dshin
Jun 13 at 18:35

@dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
â€“Â ilkkachu
Jun 13 at 18:39

@dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
â€“Â ilkkachu
Jun 13 at 18:42

Hm, your original actually is clocking in slightly faster for me, but they are very close.
â€“Â dshin
Jun 13 at 18:44

@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â€“Â ilkkachu
Jun 13 at 18:49

add a commentÂ |Â

up vote
4
down vote

With gawk (using EREs similar to grep -E):

gawk '/pattern/ && RT' file

RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.

With perl (perl REs similar to grep -P where available):

perl -ne 'print if /pattern/ && /nz/'

Note that contrary to gawk or grep, perl by default does work on bytes not characters. For instance, it's . regexp operator would match on each of the two bytes of a UTF-8-encoded Ã‚Â£. For it to work on characters as per the locale's definition of characters like for awk/grep, you'd use:

perl -Mopen=locale -ne 'print if /pattern/ && /nz/'

edited Jun 13 at 20:43

answered Jun 13 at 18:29

StÃ©phane Chazelas

279k53513844

add a commentÂ |Â

up vote
1
down vote

Something like this could do the job:

#!/usr/bin/env sh

if [ "$(tail -c 1 FILE)" = "" ]
then
 printf "Trailing newline foundn"
 # grep whole file
 # grep ....
else
 printf "No trailing newline foundn"
 # ignore last line
 # head -n -1 FILE | grep ...
fi

We rely on the following characteristic of command substitution
described in man bash:

Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

1

Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â€“Â dshin
Jun 13 at 18:02

I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
â€“Â dshin
Jun 13 at 18:09

On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â€“Â dshin
Jun 13 at 18:12

@dshin: OK, I'm glad you found the right solution that works for you.
â€“Â Arkadiusz Drabczyk
Jun 13 at 18:13

add a commentÂ |Â

up vote
1
down vote

If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep or perl depending on the complexity of the expression or if features such as --only-matching are used.

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <pcre.h>
#define MAX_OFFSET 3

int main(int argc, char *argv)

 // getline
 char *line = NULL;
 size_t linebuflen = 0;
 ssize_t numchars;
 // PCRE
 const char *error;
 int erroffset, rc;
 int offsets[MAX_OFFSET];
 pcre *re;

 if (argc < 2) errx(1, "need regex");
 argv++;
 if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
 err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

 while ((numchars = getline(&line, &linebuflen, stdin)) > 0) 
 if (line[numchars-1] != 'n') break;
 rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
 if (rc > 0) fwrite(line, numchars, 1, stdout);
 
 exit(EXIT_SUCCESS);

This is about 49% faster than perl -ne 'print if /.../ && /nz/'.

answered Jun 14 at 18:10

thrig

21.8k12751

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f449608%2fhow-to-make-grep-ignore-lines-without-trailing-newline-character%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
1
down vote

accepted

So the usual tools don't seem too helpful here.

sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...

If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:

perl -ne 'print if /pattern/ && /n$/'

edited Jun 13 at 18:24

answered Jun 13 at 18:19

ilkkachu

47.5k668130

Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
â€“Â dshin
Jun 13 at 18:35

@dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
â€“Â ilkkachu
Jun 13 at 18:39

@dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
â€“Â ilkkachu
Jun 13 at 18:42

Hm, your original actually is clocking in slightly faster for me, but they are very close.
â€“Â dshin
Jun 13 at 18:44

@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â€“Â ilkkachu
Jun 13 at 18:49

add a commentÂ |Â

up vote
1
down vote

accepted

So the usual tools don't seem too helpful here.

sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...

If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:

perl -ne 'print if /pattern/ && /n$/'

edited Jun 13 at 18:24

answered Jun 13 at 18:19

ilkkachu

47.5k668130

Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
â€“Â dshin
Jun 13 at 18:35

@dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
â€“Â ilkkachu
Jun 13 at 18:39

@dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
â€“Â ilkkachu
Jun 13 at 18:42

Hm, your original actually is clocking in slightly faster for me, but they are very close.
â€“Â dshin
Jun 13 at 18:44

@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â€“Â ilkkachu
Jun 13 at 18:49

add a commentÂ |Â

up vote
1
down vote

accepted

So the usual tools don't seem too helpful here.

sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...

If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:

perl -ne 'print if /pattern/ && /n$/'

edited Jun 13 at 18:24

answered Jun 13 at 18:19

ilkkachu

47.5k668130

So the usual tools don't seem too helpful here.

sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...

If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:

perl -ne 'print if /pattern/ && /n$/'

edited Jun 13 at 18:24

answered Jun 13 at 18:19

ilkkachu

47.5k668130

edited Jun 13 at 18:24

answered Jun 13 at 18:19

ilkkachu

47.5k668130

answered Jun 13 at 18:19

ilkkachu

47.5k668130

answered Jun 13 at 18:19

ilkkachu

47.5k668130

Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
â€“Â dshin
Jun 13 at 18:35

@dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
â€“Â ilkkachu
Jun 13 at 18:39

@dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
â€“Â ilkkachu
Jun 13 at 18:42

Hm, your original actually is clocking in slightly faster for me, but they are very close.
â€“Â dshin
Jun 13 at 18:44

@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â€“Â ilkkachu
Jun 13 at 18:49

add a commentÂ |Â

Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
â€“Â dshin
Jun 13 at 18:35

@dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
â€“Â ilkkachu
Jun 13 at 18:39

@dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
â€“Â ilkkachu
Jun 13 at 18:42

Hm, your original actually is clocking in slightly faster for me, but they are very close.
â€“Â dshin
Jun 13 at 18:44

@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â€“Â ilkkachu
Jun 13 at 18:49

Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
â€“Â dshin
Jun 13 at 18:35

@dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
â€“Â ilkkachu
Jun 13 at 18:39

@dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
â€“Â ilkkachu
Jun 13 at 18:42

Hm, your original actually is clocking in slightly faster for me, but they are very close.
â€“Â dshin
Jun 13 at 18:44

@dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
â€“Â ilkkachu
Jun 13 at 18:49

add a commentÂ |Â

up vote
4
down vote

With gawk (using EREs similar to grep -E):

gawk '/pattern/ && RT' file

RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.

With perl (perl REs similar to grep -P where available):

perl -ne 'print if /pattern/ && /nz/'

perl -Mopen=locale -ne 'print if /pattern/ && /nz/'

edited Jun 13 at 20:43

answered Jun 13 at 18:29

StÃ©phane Chazelas

279k53513844

add a commentÂ |Â

up vote
4
down vote

With gawk (using EREs similar to grep -E):

gawk '/pattern/ && RT' file

RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.

With perl (perl REs similar to grep -P where available):

perl -ne 'print if /pattern/ && /nz/'

perl -Mopen=locale -ne 'print if /pattern/ && /nz/'

edited Jun 13 at 20:43

answered Jun 13 at 18:29

StÃ©phane Chazelas

279k53513844

add a commentÂ |Â

up vote
4
down vote

With gawk (using EREs similar to grep -E):

gawk '/pattern/ && RT' file

RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.

With perl (perl REs similar to grep -P where available):

perl -ne 'print if /pattern/ && /nz/'

perl -Mopen=locale -ne 'print if /pattern/ && /nz/'

edited Jun 13 at 20:43

answered Jun 13 at 18:29

StÃ©phane Chazelas

279k53513844

With gawk (using EREs similar to grep -E):

gawk '/pattern/ && RT' file

RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.

With perl (perl REs similar to grep -P where available):

perl -ne 'print if /pattern/ && /nz/'

perl -Mopen=locale -ne 'print if /pattern/ && /nz/'

edited Jun 13 at 20:43

answered Jun 13 at 18:29

StÃ©phane Chazelas

279k53513844

edited Jun 13 at 20:43

answered Jun 13 at 18:29

StÃ©phane Chazelas

279k53513844

answered Jun 13 at 18:29

279k53513844

answered Jun 13 at 18:29

279k53513844

add a commentÂ |Â

up vote
1
down vote

Something like this could do the job:

#!/usr/bin/env sh

if [ "$(tail -c 1 FILE)" = "" ]
then
 printf "Trailing newline foundn"
 # grep whole file
 # grep ....
else
 printf "No trailing newline foundn"
 # ignore last line
 # head -n -1 FILE | grep ...
fi

We rely on the following characteristic of command substitution
described in man bash:

Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

1

Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â€“Â dshin
Jun 13 at 18:02

I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
â€“Â dshin
Jun 13 at 18:09

On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â€“Â dshin
Jun 13 at 18:12

@dshin: OK, I'm glad you found the right solution that works for you.
â€“Â Arkadiusz Drabczyk
Jun 13 at 18:13

add a commentÂ |Â

up vote
1
down vote

Something like this could do the job:

#!/usr/bin/env sh

if [ "$(tail -c 1 FILE)" = "" ]
then
 printf "Trailing newline foundn"
 # grep whole file
 # grep ....
else
 printf "No trailing newline foundn"
 # ignore last line
 # head -n -1 FILE | grep ...
fi

We rely on the following characteristic of command substitution
described in man bash:

Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

1

Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â€“Â dshin
Jun 13 at 18:02

I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
â€“Â dshin
Jun 13 at 18:09

On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â€“Â dshin
Jun 13 at 18:12

@dshin: OK, I'm glad you found the right solution that works for you.
â€“Â Arkadiusz Drabczyk
Jun 13 at 18:13

add a commentÂ |Â

up vote
1
down vote

Something like this could do the job:

#!/usr/bin/env sh

if [ "$(tail -c 1 FILE)" = "" ]
then
 printf "Trailing newline foundn"
 # grep whole file
 # grep ....
else
 printf "No trailing newline foundn"
 # ignore last line
 # head -n -1 FILE | grep ...
fi

We rely on the following characteristic of command substitution
described in man bash:

Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

Something like this could do the job:

#!/usr/bin/env sh

if [ "$(tail -c 1 FILE)" = "" ]
then
 printf "Trailing newline foundn"
 # grep whole file
 # grep ....
else
 printf "No trailing newline foundn"
 # ignore last line
 # head -n -1 FILE | grep ...
fi

We rely on the following characteristic of command substitution
described in man bash:

Bash performs the expansion by executing command and replacing the
command substitution with the standard output of the command, with any
trailing newlines deleted.

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

answered Jun 13 at 17:58

Arkadiusz Drabczyk

7,15521532

1

Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â€“Â dshin
Jun 13 at 18:02

I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
â€“Â dshin
Jun 13 at 18:09

On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â€“Â dshin
Jun 13 at 18:12

@dshin: OK, I'm glad you found the right solution that works for you.
â€“Â Arkadiusz Drabczyk
Jun 13 at 18:13

add a commentÂ |Â

1

Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â€“Â dshin
Jun 13 at 18:02

I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
â€“Â dshin
Jun 13 at 18:09

On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â€“Â dshin
Jun 13 at 18:12

@dshin: OK, I'm glad you found the right solution that works for you.
â€“Â Arkadiusz Drabczyk
Jun 13 at 18:13

Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
â€“Â dshin
Jun 13 at 18:02

I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
â€“Â dshin
Jun 13 at 18:09

On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
â€“Â dshin
Jun 13 at 18:12

@dshin: OK, I'm glad you found the right solution that works for you.
â€“Â Arkadiusz Drabczyk
Jun 13 at 18:13

add a commentÂ |Â

up vote
1
down vote

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <pcre.h>
#define MAX_OFFSET 3

int main(int argc, char *argv)

 // getline
 char *line = NULL;
 size_t linebuflen = 0;
 ssize_t numchars;
 // PCRE
 const char *error;
 int erroffset, rc;
 int offsets[MAX_OFFSET];
 pcre *re;

 if (argc < 2) errx(1, "need regex");
 argv++;
 if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
 err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

 while ((numchars = getline(&line, &linebuflen, stdin)) > 0) 
 if (line[numchars-1] != 'n') break;
 rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
 if (rc > 0) fwrite(line, numchars, 1, stdout);
 
 exit(EXIT_SUCCESS);

This is about 49% faster than perl -ne 'print if /.../ && /nz/'.

answered Jun 14 at 18:10

thrig

21.8k12751

add a commentÂ |Â

up vote
1
down vote

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <pcre.h>
#define MAX_OFFSET 3

int main(int argc, char *argv)

 // getline
 char *line = NULL;
 size_t linebuflen = 0;
 ssize_t numchars;
 // PCRE
 const char *error;
 int erroffset, rc;
 int offsets[MAX_OFFSET];
 pcre *re;

 if (argc < 2) errx(1, "need regex");
 argv++;
 if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
 err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

 while ((numchars = getline(&line, &linebuflen, stdin)) > 0) 
 if (line[numchars-1] != 'n') break;
 rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
 if (rc > 0) fwrite(line, numchars, 1, stdout);
 
 exit(EXIT_SUCCESS);

This is about 49% faster than perl -ne 'print if /.../ && /nz/'.

answered Jun 14 at 18:10

thrig

21.8k12751

add a commentÂ |Â

up vote
1
down vote

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <pcre.h>
#define MAX_OFFSET 3

int main(int argc, char *argv)

 // getline
 char *line = NULL;
 size_t linebuflen = 0;
 ssize_t numchars;
 // PCRE
 const char *error;
 int erroffset, rc;
 int offsets[MAX_OFFSET];
 pcre *re;

 if (argc < 2) errx(1, "need regex");
 argv++;
 if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
 err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

 while ((numchars = getline(&line, &linebuflen, stdin)) > 0) 
 if (line[numchars-1] != 'n') break;
 rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
 if (rc > 0) fwrite(line, numchars, 1, stdout);
 
 exit(EXIT_SUCCESS);

This is about 49% faster than perl -ne 'print if /.../ && /nz/'.

answered Jun 14 at 18:10

thrig

21.8k12751

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <pcre.h>
#define MAX_OFFSET 3

int main(int argc, char *argv)

 // getline
 char *line = NULL;
 size_t linebuflen = 0;
 ssize_t numchars;
 // PCRE
 const char *error;
 int erroffset, rc;
 int offsets[MAX_OFFSET];
 pcre *re;

 if (argc < 2) errx(1, "need regex");
 argv++;
 if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
 err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

 while ((numchars = getline(&line, &linebuflen, stdin)) > 0) 
 if (line[numchars-1] != 'n') break;
 rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
 if (rc > 0) fwrite(line, numchars, 1, stdout);
 
 exit(EXIT_SUCCESS);

This is about 49% faster than perl -ne 'print if /.../ && /nz/'.

answered Jun 14 at 18:10

thrig

21.8k12751

answered Jun 14 at 18:10

thrig

21.8k12751

answered Jun 14 at 18:10

thrig

21.8k12751

answered Jun 14 at 18:10

thrig

21.8k12751

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu