How to make grep ignore lines without trailing newline character

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.



What is the best way to do this?



I encountered this issue in a python script that calls grep via the subprocess module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.







share|improve this question





















  • In practice only the last line can miss n so maybe it would be enough to ignore last line?
    – Arkadiusz Drabczyk
    Jun 13 at 17:37










  • What is the best way to conditionally ignore the last line?
    – dshin
    Jun 13 at 17:37










  • Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
    – Arkadiusz Drabczyk
    Jun 13 at 17:39










  • That might throw out the last line incorrectly.
    – dshin
    Jun 13 at 17:40










  • Oh, ok. We can test if the last character in the file is n. What shell do you use?
    – Arkadiusz Drabczyk
    Jun 13 at 17:41














up vote
3
down vote

favorite












I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.



What is the best way to do this?



I encountered this issue in a python script that calls grep via the subprocess module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.







share|improve this question





















  • In practice only the last line can miss n so maybe it would be enough to ignore last line?
    – Arkadiusz Drabczyk
    Jun 13 at 17:37










  • What is the best way to conditionally ignore the last line?
    – dshin
    Jun 13 at 17:37










  • Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
    – Arkadiusz Drabczyk
    Jun 13 at 17:39










  • That might throw out the last line incorrectly.
    – dshin
    Jun 13 at 17:40










  • Oh, ok. We can test if the last character in the file is n. What shell do you use?
    – Arkadiusz Drabczyk
    Jun 13 at 17:41












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.



What is the best way to do this?



I encountered this issue in a python script that calls grep via the subprocess module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.







share|improve this question













I'd like to grep a file for a string, but ignore any matches on lines that do not end with a trailing newline character. In other words, if the file does not end with a newline character, I'd like to ignore the last line of the file.



What is the best way to do this?



I encountered this issue in a python script that calls grep via the subprocess module to filter a large text log file before processing. The last line of the file might be mid-write, in which case I don't want to process that line.









share|improve this question












share|improve this question




share|improve this question








edited Jun 13 at 18:25









ilkkachu

47.5k668130




47.5k668130









asked Jun 13 at 17:17









dshin

1234




1234











  • In practice only the last line can miss n so maybe it would be enough to ignore last line?
    – Arkadiusz Drabczyk
    Jun 13 at 17:37










  • What is the best way to conditionally ignore the last line?
    – dshin
    Jun 13 at 17:37










  • Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
    – Arkadiusz Drabczyk
    Jun 13 at 17:39










  • That might throw out the last line incorrectly.
    – dshin
    Jun 13 at 17:40










  • Oh, ok. We can test if the last character in the file is n. What shell do you use?
    – Arkadiusz Drabczyk
    Jun 13 at 17:41
















  • In practice only the last line can miss n so maybe it would be enough to ignore last line?
    – Arkadiusz Drabczyk
    Jun 13 at 17:37










  • What is the best way to conditionally ignore the last line?
    – dshin
    Jun 13 at 17:37










  • Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
    – Arkadiusz Drabczyk
    Jun 13 at 17:39










  • That might throw out the last line incorrectly.
    – dshin
    Jun 13 at 17:40










  • Oh, ok. We can test if the last character in the file is n. What shell do you use?
    – Arkadiusz Drabczyk
    Jun 13 at 17:41















In practice only the last line can miss n so maybe it would be enough to ignore last line?
– Arkadiusz Drabczyk
Jun 13 at 17:37




In practice only the last line can miss n so maybe it would be enough to ignore last line?
– Arkadiusz Drabczyk
Jun 13 at 17:37












What is the best way to conditionally ignore the last line?
– dshin
Jun 13 at 17:37




What is the best way to conditionally ignore the last line?
– dshin
Jun 13 at 17:37












Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
– Arkadiusz Drabczyk
Jun 13 at 17:39




Instead of doing grep string FILE do head -n -1 FILE | grep 'string'
– Arkadiusz Drabczyk
Jun 13 at 17:39












That might throw out the last line incorrectly.
– dshin
Jun 13 at 17:40




That might throw out the last line incorrectly.
– dshin
Jun 13 at 17:40












Oh, ok. We can test if the last character in the file is n. What shell do you use?
– Arkadiusz Drabczyk
Jun 13 at 17:41




Oh, ok. We can test if the last character in the file is n. What shell do you use?
– Arkadiusz Drabczyk
Jun 13 at 17:41










4 Answers
4






active

oldest

votes

















up vote
1
down vote



accepted










grep is explicitly defined to ignore newlines, so you can't really use that. sed knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk separates records by newlines (RS), but doesn't really care if there was one, the default action of print is to print a newline (ORS) at the end in any case.



So the usual tools don't seem too helpful here.



However, sed does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed delete what it thinks is the last one. E.g.



sed -n -e '$d' -e '/pattern/p' < somefile # or
< somefile sed '$d' | grep ...


If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:



perl -ne 'print if /pattern/ && /n$/'





share|improve this answer























  • Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
    – dshin
    Jun 13 at 18:35










  • @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
    – ilkkachu
    Jun 13 at 18:39










  • @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
    – ilkkachu
    Jun 13 at 18:42











  • Hm, your original actually is clocking in slightly faster for me, but they are very close.
    – dshin
    Jun 13 at 18:44










  • @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
    – ilkkachu
    Jun 13 at 18:49

















up vote
4
down vote













With gawk (using EREs similar to grep -E):



gawk '/pattern/ && RT' file


RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.



With perl (perl REs similar to grep -P where available):



perl -ne 'print if /pattern/ && /nz/'


Note that contrary to gawk or grep, perl by default does work on bytes not characters. For instance, it's . regexp operator would match on each of the two bytes of a UTF-8-encoded £. For it to work on characters as per the locale's definition of characters like for awk/grep, you'd use:



perl -Mopen=locale -ne 'print if /pattern/ && /nz/'





share|improve this answer






























    up vote
    1
    down vote













    Something like this could do the job:



    #!/usr/bin/env sh

    if [ "$(tail -c 1 FILE)" = "" ]
    then
    printf "Trailing newline foundn"
    # grep whole file
    # grep ....
    else
    printf "No trailing newline foundn"
    # ignore last line
    # head -n -1 FILE | grep ...
    fi


    We rely on the following characteristic of command substitution
    described in man bash:




    Bash performs the expansion by executing command and replacing the
    command substitution with the standard output of the command, with any
    trailing newlines deleted.







    share|improve this answer

















    • 1




      Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
      – dshin
      Jun 13 at 18:02










    • I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
      – dshin
      Jun 13 at 18:09










    • On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
      – dshin
      Jun 13 at 18:12










    • @dshin: OK, I'm glad you found the right solution that works for you.
      – Arkadiusz Drabczyk
      Jun 13 at 18:13

















    up vote
    1
    down vote













    If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep or perl depending on the complexity of the expression or if features such as --only-matching are used.



    #include <err.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>

    #include <pcre.h>
    #define MAX_OFFSET 3

    int main(int argc, char *argv)

    // getline
    char *line = NULL;
    size_t linebuflen = 0;
    ssize_t numchars;
    // PCRE
    const char *error;
    int erroffset, rc;
    int offsets[MAX_OFFSET];
    pcre *re;

    if (argc < 2) errx(1, "need regex");
    argv++;
    if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
    err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

    while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
    if (line[numchars-1] != 'n') break;
    rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
    if (rc > 0) fwrite(line, numchars, 1, stdout);

    exit(EXIT_SUCCESS);



    This is about 49% faster than perl -ne 'print if /.../ && /nz/'.






    share|improve this answer





















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f449608%2fhow-to-make-grep-ignore-lines-without-trailing-newline-character%23new-answer', 'question_page');

      );

      Post as a guest






























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote



      accepted










      grep is explicitly defined to ignore newlines, so you can't really use that. sed knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk separates records by newlines (RS), but doesn't really care if there was one, the default action of print is to print a newline (ORS) at the end in any case.



      So the usual tools don't seem too helpful here.



      However, sed does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed delete what it thinks is the last one. E.g.



      sed -n -e '$d' -e '/pattern/p' < somefile # or
      < somefile sed '$d' | grep ...


      If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:



      perl -ne 'print if /pattern/ && /n$/'





      share|improve this answer























      • Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
        – dshin
        Jun 13 at 18:35










      • @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
        – ilkkachu
        Jun 13 at 18:39










      • @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
        – ilkkachu
        Jun 13 at 18:42











      • Hm, your original actually is clocking in slightly faster for me, but they are very close.
        – dshin
        Jun 13 at 18:44










      • @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
        – ilkkachu
        Jun 13 at 18:49














      up vote
      1
      down vote



      accepted










      grep is explicitly defined to ignore newlines, so you can't really use that. sed knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk separates records by newlines (RS), but doesn't really care if there was one, the default action of print is to print a newline (ORS) at the end in any case.



      So the usual tools don't seem too helpful here.



      However, sed does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed delete what it thinks is the last one. E.g.



      sed -n -e '$d' -e '/pattern/p' < somefile # or
      < somefile sed '$d' | grep ...


      If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:



      perl -ne 'print if /pattern/ && /n$/'





      share|improve this answer























      • Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
        – dshin
        Jun 13 at 18:35










      • @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
        – ilkkachu
        Jun 13 at 18:39










      • @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
        – ilkkachu
        Jun 13 at 18:42











      • Hm, your original actually is clocking in slightly faster for me, but they are very close.
        – dshin
        Jun 13 at 18:44










      • @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
        – ilkkachu
        Jun 13 at 18:49












      up vote
      1
      down vote



      accepted







      up vote
      1
      down vote



      accepted






      grep is explicitly defined to ignore newlines, so you can't really use that. sed knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk separates records by newlines (RS), but doesn't really care if there was one, the default action of print is to print a newline (ORS) at the end in any case.



      So the usual tools don't seem too helpful here.



      However, sed does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed delete what it thinks is the last one. E.g.



      sed -n -e '$d' -e '/pattern/p' < somefile # or
      < somefile sed '$d' | grep ...


      If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:



      perl -ne 'print if /pattern/ && /n$/'





      share|improve this answer















      grep is explicitly defined to ignore newlines, so you can't really use that. sed knows internally if the current line (fragment) ends in a newline or not, but I can't see how it could be coerced to reveal that information. awk separates records by newlines (RS), but doesn't really care if there was one, the default action of print is to print a newline (ORS) at the end in any case.



      So the usual tools don't seem too helpful here.



      However, sed does know when it's working on the last line, so if you don't mind losing the last intact line in cases where a partial one isn't seen, you could just have sed delete what it thinks is the last one. E.g.



      sed -n -e '$d' -e '/pattern/p' < somefile # or
      < somefile sed '$d' | grep ...


      If that's not an option, then there's always Perl. This should print only the lines that match /pattern/, and have a newline at the end:



      perl -ne 'print if /pattern/ && /n$/'






      share|improve this answer















      share|improve this answer



      share|improve this answer








      edited Jun 13 at 18:24


























      answered Jun 13 at 18:19









      ilkkachu

      47.5k668130




      47.5k668130











      • Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
        – dshin
        Jun 13 at 18:35










      • @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
        – ilkkachu
        Jun 13 at 18:39










      • @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
        – ilkkachu
        Jun 13 at 18:42











      • Hm, your original actually is clocking in slightly faster for me, but they are very close.
        – dshin
        Jun 13 at 18:44










      • @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
        – ilkkachu
        Jun 13 at 18:49
















      • Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
        – dshin
        Jun 13 at 18:35










      • @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
        – ilkkachu
        Jun 13 at 18:39










      • @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
        – ilkkachu
        Jun 13 at 18:42











      • Hm, your original actually is clocking in slightly faster for me, but they are very close.
        – dshin
        Jun 13 at 18:44










      • @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
        – ilkkachu
        Jun 13 at 18:49















      Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
      – dshin
      Jun 13 at 18:35




      Thanks, I like the perl solution. It appears to be within a factor of 2 of the speed of grep, which is faster than sed/python/gawk solutions. Is there a simple way to extend it to do the equivalent of egrep "pattern1|pattern2|pattern3"?
      – dshin
      Jun 13 at 18:35












      @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
      – ilkkachu
      Jun 13 at 18:39




      @dshin, Perl regexes are pretty much an extension of extended regexes like what grep -E uses, so this|that works out of the box. (see this question and Perl's docs for details)
      – ilkkachu
      Jun 13 at 18:39












      @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
      – ilkkachu
      Jun 13 at 18:42





      @dshin, also, since you're aiming for speed, you could try if replacing the second regex with ... && substr($_, -1, 1) eq "n"' would be faster.
      – ilkkachu
      Jun 13 at 18:42













      Hm, your original actually is clocking in slightly faster for me, but they are very close.
      – dshin
      Jun 13 at 18:44




      Hm, your original actually is clocking in slightly faster for me, but they are very close.
      – dshin
      Jun 13 at 18:44












      @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
      – ilkkachu
      Jun 13 at 18:49




      @dshin, allright, just goes to show that Perl's regex processing is pretty well optimized
      – ilkkachu
      Jun 13 at 18:49












      up vote
      4
      down vote













      With gawk (using EREs similar to grep -E):



      gawk '/pattern/ && RT' file


      RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.



      With perl (perl REs similar to grep -P where available):



      perl -ne 'print if /pattern/ && /nz/'


      Note that contrary to gawk or grep, perl by default does work on bytes not characters. For instance, it's . regexp operator would match on each of the two bytes of a UTF-8-encoded £. For it to work on characters as per the locale's definition of characters like for awk/grep, you'd use:



      perl -Mopen=locale -ne 'print if /pattern/ && /nz/'





      share|improve this answer



























        up vote
        4
        down vote













        With gawk (using EREs similar to grep -E):



        gawk '/pattern/ && RT' file


        RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.



        With perl (perl REs similar to grep -P where available):



        perl -ne 'print if /pattern/ && /nz/'


        Note that contrary to gawk or grep, perl by default does work on bytes not characters. For instance, it's . regexp operator would match on each of the two bytes of a UTF-8-encoded £. For it to work on characters as per the locale's definition of characters like for awk/grep, you'd use:



        perl -Mopen=locale -ne 'print if /pattern/ && /nz/'





        share|improve this answer

























          up vote
          4
          down vote










          up vote
          4
          down vote









          With gawk (using EREs similar to grep -E):



          gawk '/pattern/ && RT' file


          RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.



          With perl (perl REs similar to grep -P where available):



          perl -ne 'print if /pattern/ && /nz/'


          Note that contrary to gawk or grep, perl by default does work on bytes not characters. For instance, it's . regexp operator would match on each of the two bytes of a UTF-8-encoded £. For it to work on characters as per the locale's definition of characters like for awk/grep, you'd use:



          perl -Mopen=locale -ne 'print if /pattern/ && /nz/'





          share|improve this answer















          With gawk (using EREs similar to grep -E):



          gawk '/pattern/ && RT' file


          RT in gawk contains what is matched by RS the record separator. With the default value of RS (n) that would be n except for a non-delimited last record where RT would then be empty.



          With perl (perl REs similar to grep -P where available):



          perl -ne 'print if /pattern/ && /nz/'


          Note that contrary to gawk or grep, perl by default does work on bytes not characters. For instance, it's . regexp operator would match on each of the two bytes of a UTF-8-encoded £. For it to work on characters as per the locale's definition of characters like for awk/grep, you'd use:



          perl -Mopen=locale -ne 'print if /pattern/ && /nz/'






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jun 13 at 20:43


























          answered Jun 13 at 18:29









          Stéphane Chazelas

          279k53513844




          279k53513844




















              up vote
              1
              down vote













              Something like this could do the job:



              #!/usr/bin/env sh

              if [ "$(tail -c 1 FILE)" = "" ]
              then
              printf "Trailing newline foundn"
              # grep whole file
              # grep ....
              else
              printf "No trailing newline foundn"
              # ignore last line
              # head -n -1 FILE | grep ...
              fi


              We rely on the following characteristic of command substitution
              described in man bash:




              Bash performs the expansion by executing command and replacing the
              command substitution with the standard output of the command, with any
              trailing newlines deleted.







              share|improve this answer

















              • 1




                Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
                – dshin
                Jun 13 at 18:02










              • I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
                – dshin
                Jun 13 at 18:09










              • On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
                – dshin
                Jun 13 at 18:12










              • @dshin: OK, I'm glad you found the right solution that works for you.
                – Arkadiusz Drabczyk
                Jun 13 at 18:13














              up vote
              1
              down vote













              Something like this could do the job:



              #!/usr/bin/env sh

              if [ "$(tail -c 1 FILE)" = "" ]
              then
              printf "Trailing newline foundn"
              # grep whole file
              # grep ....
              else
              printf "No trailing newline foundn"
              # ignore last line
              # head -n -1 FILE | grep ...
              fi


              We rely on the following characteristic of command substitution
              described in man bash:




              Bash performs the expansion by executing command and replacing the
              command substitution with the standard output of the command, with any
              trailing newlines deleted.







              share|improve this answer

















              • 1




                Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
                – dshin
                Jun 13 at 18:02










              • I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
                – dshin
                Jun 13 at 18:09










              • On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
                – dshin
                Jun 13 at 18:12










              • @dshin: OK, I'm glad you found the right solution that works for you.
                – Arkadiusz Drabczyk
                Jun 13 at 18:13












              up vote
              1
              down vote










              up vote
              1
              down vote









              Something like this could do the job:



              #!/usr/bin/env sh

              if [ "$(tail -c 1 FILE)" = "" ]
              then
              printf "Trailing newline foundn"
              # grep whole file
              # grep ....
              else
              printf "No trailing newline foundn"
              # ignore last line
              # head -n -1 FILE | grep ...
              fi


              We rely on the following characteristic of command substitution
              described in man bash:




              Bash performs the expansion by executing command and replacing the
              command substitution with the standard output of the command, with any
              trailing newlines deleted.







              share|improve this answer













              Something like this could do the job:



              #!/usr/bin/env sh

              if [ "$(tail -c 1 FILE)" = "" ]
              then
              printf "Trailing newline foundn"
              # grep whole file
              # grep ....
              else
              printf "No trailing newline foundn"
              # ignore last line
              # head -n -1 FILE | grep ...
              fi


              We rely on the following characteristic of command substitution
              described in man bash:




              Bash performs the expansion by executing command and replacing the
              command substitution with the standard output of the command, with any
              trailing newlines deleted.








              share|improve this answer













              share|improve this answer



              share|improve this answer











              answered Jun 13 at 17:58









              Arkadiusz Drabczyk

              7,15521532




              7,15521532







              • 1




                Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
                – dshin
                Jun 13 at 18:02










              • I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
                – dshin
                Jun 13 at 18:09










              • On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
                – dshin
                Jun 13 at 18:12










              • @dshin: OK, I'm glad you found the right solution that works for you.
                – Arkadiusz Drabczyk
                Jun 13 at 18:13












              • 1




                Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
                – dshin
                Jun 13 at 18:02










              • I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
                – dshin
                Jun 13 at 18:09










              • On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
                – dshin
                Jun 13 at 18:12










              • @dshin: OK, I'm glad you found the right solution that works for you.
                – Arkadiusz Drabczyk
                Jun 13 at 18:13







              1




              1




              Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
              – dshin
              Jun 13 at 18:02




              Unfortunately if the file is being written to, there is a potential race condition. The else-case race condition doesn't worry me too much, but the then-case race condition can lead to processing of a partial log line, which is the problem I'm trying to avoid.
              – dshin
              Jun 13 at 18:02












              I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
              – dshin
              Jun 13 at 18:09




              I've solved this sort of race condition in similar contexts by first doing a du -b to get an exact byte size, and then doing a tail -c to only fetch that many bytes. I could do that here.
              – dshin
              Jun 13 at 18:09












              On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
              – dshin
              Jun 13 at 18:12




              On the other hand, in practice skipping the last line unconditionally is probably going to be ok for my purposes. So if the simple trick I was hoping for doesn't exist, I might just do that.
              – dshin
              Jun 13 at 18:12












              @dshin: OK, I'm glad you found the right solution that works for you.
              – Arkadiusz Drabczyk
              Jun 13 at 18:13




              @dshin: OK, I'm glad you found the right solution that works for you.
              – Arkadiusz Drabczyk
              Jun 13 at 18:13










              up vote
              1
              down vote













              If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep or perl depending on the complexity of the expression or if features such as --only-matching are used.



              #include <err.h>
              #include <stdio.h>
              #include <stdlib.h>
              #include <unistd.h>

              #include <pcre.h>
              #define MAX_OFFSET 3

              int main(int argc, char *argv)

              // getline
              char *line = NULL;
              size_t linebuflen = 0;
              ssize_t numchars;
              // PCRE
              const char *error;
              int erroffset, rc;
              int offsets[MAX_OFFSET];
              pcre *re;

              if (argc < 2) errx(1, "need regex");
              argv++;
              if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
              err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

              while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
              if (line[numchars-1] != 'n') break;
              rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
              if (rc > 0) fwrite(line, numchars, 1, stdout);

              exit(EXIT_SUCCESS);



              This is about 49% faster than perl -ne 'print if /.../ && /nz/'.






              share|improve this answer

























                up vote
                1
                down vote













                If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep or perl depending on the complexity of the expression or if features such as --only-matching are used.



                #include <err.h>
                #include <stdio.h>
                #include <stdlib.h>
                #include <unistd.h>

                #include <pcre.h>
                #define MAX_OFFSET 3

                int main(int argc, char *argv)

                // getline
                char *line = NULL;
                size_t linebuflen = 0;
                ssize_t numchars;
                // PCRE
                const char *error;
                int erroffset, rc;
                int offsets[MAX_OFFSET];
                pcre *re;

                if (argc < 2) errx(1, "need regex");
                argv++;
                if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
                err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

                while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
                if (line[numchars-1] != 'n') break;
                rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
                if (rc > 0) fwrite(line, numchars, 1, stdout);

                exit(EXIT_SUCCESS);



                This is about 49% faster than perl -ne 'print if /.../ && /nz/'.






                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep or perl depending on the complexity of the expression or if features such as --only-matching are used.



                  #include <err.h>
                  #include <stdio.h>
                  #include <stdlib.h>
                  #include <unistd.h>

                  #include <pcre.h>
                  #define MAX_OFFSET 3

                  int main(int argc, char *argv)

                  // getline
                  char *line = NULL;
                  size_t linebuflen = 0;
                  ssize_t numchars;
                  // PCRE
                  const char *error;
                  int erroffset, rc;
                  int offsets[MAX_OFFSET];
                  pcre *re;

                  if (argc < 2) errx(1, "need regex");
                  argv++;
                  if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
                  err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

                  while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
                  if (line[numchars-1] != 'n') break;
                  rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
                  if (rc > 0) fwrite(line, numchars, 1, stdout);

                  exit(EXIT_SUCCESS);



                  This is about 49% faster than perl -ne 'print if /.../ && /nz/'.






                  share|improve this answer













                  If you need speed then using PCRE (or some other possibly faster regex library) from C would allow the use of both a regular expression and a check whether there is a newline. Downsides: new code to maintain and debug, time to re-implementing portions of grep or perl depending on the complexity of the expression or if features such as --only-matching are used.



                  #include <err.h>
                  #include <stdio.h>
                  #include <stdlib.h>
                  #include <unistd.h>

                  #include <pcre.h>
                  #define MAX_OFFSET 3

                  int main(int argc, char *argv)

                  // getline
                  char *line = NULL;
                  size_t linebuflen = 0;
                  ssize_t numchars;
                  // PCRE
                  const char *error;
                  int erroffset, rc;
                  int offsets[MAX_OFFSET];
                  pcre *re;

                  if (argc < 2) errx(1, "need regex");
                  argv++;
                  if ((re = pcre_compile(*argv, 0, &error, &erroffset, NULL)) == NULL)
                  err(1, "pcre_compile failed at offset %d: %s", erroffset, error);

                  while ((numchars = getline(&line, &linebuflen, stdin)) > 0)
                  if (line[numchars-1] != 'n') break;
                  rc = pcre_exec(re, NULL, line, numchars, 0, 0, offsets, MAX_OFFSET);
                  if (rc > 0) fwrite(line, numchars, 1, stdout);

                  exit(EXIT_SUCCESS);



                  This is about 49% faster than perl -ne 'print if /.../ && /nz/'.







                  share|improve this answer













                  share|improve this answer



                  share|improve this answer











                  answered Jun 14 at 18:10









                  thrig

                  21.8k12751




                  21.8k12751






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f449608%2fhow-to-make-grep-ignore-lines-without-trailing-newline-character%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?