Trying to find files that contain only NULs, but getting some others

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
7
down vote

favorite
2












The files I am trying to find/list are:



  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?










share|improve this question























  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
    – wjandrea
    Aug 17 at 0:12














up vote
7
down vote

favorite
2












The files I am trying to find/list are:



  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?










share|improve this question























  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
    – wjandrea
    Aug 17 at 0:12












up vote
7
down vote

favorite
2









up vote
7
down vote

favorite
2






2





The files I am trying to find/list are:



  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?










share|improve this question















The files I am trying to find/list are:



  • Any size (0 bytes accepted)

  • Consist only of ASCII NUL characters (0x00)

  • If there are any characters other than 0x00, the file shouldn't be listed.

The command I have now is:



grep -RLP '[^x00]' .


Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.



Is there any better command to find such files?







command-line text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 17 at 1:32









muru

130k19273466




130k19273466










asked Aug 16 at 22:27









pbies

1406




1406











  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
    – wjandrea
    Aug 17 at 0:12
















  • Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
    – wjandrea
    Aug 17 at 0:12















Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 at 0:12




Note the default system encoding for Ubuntu is UTF-8, not ASCII. Though up to byte 0x7F, they're identical.
– wjandrea
Aug 17 at 0:12










2 Answers
2






active

oldest

votes

















up vote
8
down vote



accepted










In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



LC_CTYPE=C grep -RLP '[^x00]' .



UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)


Which you can use in a find to locate all matches recursively:



$ find . -type f -exec allzeroes.py ; -print


I hope that helps.






share|improve this answer


















  • 3




    +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
    – steeldriver
    Aug 17 at 1:23

















up vote
2
down vote













You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



grep -L -z -e . ...


Replace ... with the file set that you want to scan (here: -R .).



Explanation




  • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


  • -e . – Use . as the search pattern, i. e. match any character.


  • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1

Test case



Set-up:



: > empty
truncate -s 100 zero
printf '%s' foo bar > foobar


Run test:



$ grep -L -z -e . empty zero foobar
empty
zero



1 From the grep(1) manual page.






share|improve this answer




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "89"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    8
    down vote



    accepted










    In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



    (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



    Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



    LC_CTYPE=C grep -RLP '[^x00]' .



    UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



    @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



    Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



    #!/usr/bin/python3
    import sys
    assert len(sys.argv) == 2
    with open(sys.argv[1], 'rb') as f:
    for block in iter(lambda: f.read(4096), b''):
    if any(block):
    sys.exit(1)


    Which you can use in a find to locate all matches recursively:



    $ find . -type f -exec allzeroes.py ; -print


    I hope that helps.






    share|improve this answer


















    • 3




      +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
      – steeldriver
      Aug 17 at 1:23














    up vote
    8
    down vote



    accepted










    In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



    (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



    Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



    LC_CTYPE=C grep -RLP '[^x00]' .



    UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



    @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



    Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



    #!/usr/bin/python3
    import sys
    assert len(sys.argv) == 2
    with open(sys.argv[1], 'rb') as f:
    for block in iter(lambda: f.read(4096), b''):
    if any(block):
    sys.exit(1)


    Which you can use in a find to locate all matches recursively:



    $ find . -type f -exec allzeroes.py ; -print


    I hope that helps.






    share|improve this answer


















    • 3




      +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
      – steeldriver
      Aug 17 at 1:23












    up vote
    8
    down vote



    accepted







    up vote
    8
    down vote



    accepted






    In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



    (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



    Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



    LC_CTYPE=C grep -RLP '[^x00]' .



    UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



    @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



    Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



    #!/usr/bin/python3
    import sys
    assert len(sys.argv) == 2
    with open(sys.argv[1], 'rb') as f:
    for block in iter(lambda: f.read(4096), b''):
    if any(block):
    sys.exit(1)


    Which you can use in a find to locate all matches recursively:



    $ find . -type f -exec allzeroes.py ; -print


    I hope that helps.






    share|improve this answer














    In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.



    (In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)



    Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):



    LC_CTYPE=C grep -RLP '[^x00]' .



    UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.



    @DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.



    Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:



    #!/usr/bin/python3
    import sys
    assert len(sys.argv) == 2
    with open(sys.argv[1], 'rb') as f:
    for block in iter(lambda: f.read(4096), b''):
    if any(block):
    sys.exit(1)


    Which you can use in a find to locate all matches recursively:



    $ find . -type f -exec allzeroes.py ; -print


    I hope that helps.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Aug 17 at 16:16

























    answered Aug 16 at 23:23









    Filipe Brandenburger

    5867




    5867







    • 3




      +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
      – steeldriver
      Aug 17 at 1:23












    • 3




      +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
      – steeldriver
      Aug 17 at 1:23







    3




    3




    +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
    – steeldriver
    Aug 17 at 1:23




    +1 although since grep is line-based, this will also output files that consist entirely of newlines - you may be able to work around that by specifying null-terminated mode using -z (although that will slurp any regular text files wholly into memory). Also I don't think -P is required here?
    – steeldriver
    Aug 17 at 1:23












    up vote
    2
    down vote













    You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



    grep -L -z -e . ...


    Replace ... with the file set that you want to scan (here: -R .).



    Explanation




    • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


    • -e . – Use . as the search pattern, i. e. match any character.


    • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1

    Test case



    Set-up:



    : > empty
    truncate -s 100 zero
    printf '%s' foo bar > foobar


    Run test:



    $ grep -L -z -e . empty zero foobar
    empty
    zero



    1 From the grep(1) manual page.






    share|improve this answer
























      up vote
      2
      down vote













      You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



      grep -L -z -e . ...


      Replace ... with the file set that you want to scan (here: -R .).



      Explanation




      • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


      • -e . – Use . as the search pattern, i. e. match any character.


      • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1

      Test case



      Set-up:



      : > empty
      truncate -s 100 zero
      printf '%s' foo bar > foobar


      Run test:



      $ grep -L -z -e . empty zero foobar
      empty
      zero



      1 From the grep(1) manual page.






      share|improve this answer






















        up vote
        2
        down vote










        up vote
        2
        down vote









        You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



        grep -L -z -e . ...


        Replace ... with the file set that you want to scan (here: -R .).



        Explanation




        • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


        • -e . – Use . as the search pattern, i. e. match any character.


        • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1

        Test case



        Set-up:



        : > empty
        truncate -s 100 zero
        printf '%s' foo bar > foobar


        Run test:



        $ grep -L -z -e . empty zero foobar
        empty
        zero



        1 From the grep(1) manual page.






        share|improve this answer












        You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:



        grep -L -z -e . ...


        Replace ... with the file set that you want to scan (here: -R .).



        Explanation




        • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1


        • -e . – Use . as the search pattern, i. e. match any character.


        • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1

        Test case



        Set-up:



        : > empty
        truncate -s 100 zero
        printf '%s' foo bar > foobar


        Run test:



        $ grep -L -z -e . empty zero foobar
        empty
        zero



        1 From the grep(1) manual page.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Aug 17 at 9:18









        David Foerster

        26.3k1362106




        26.3k1362106



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1066057%2ftrying-to-find-files-that-contain-only-nuls-but-getting-some-others%23new-answer', 'question_page');

            );

            Post as a guest













































































            XInGk bpF eW8Mp7,wb8
            bYmND xoQdlz uigw4V7oi3 bOCeURlpHqWHtz TpCXpkDD0BpHqRp,nsyjiv48,B5lhF DwtuSvC

            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            How many registers does an x86_64 CPU actually have?

            Displaying single band from multi-band raster using QGIS