How can I run a grep on epub/mobi files?

Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?
grep
add a comment |Â
up vote
3
down vote
favorite
Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?
grep
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?
grep
Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?
grep
grep
asked May 1 '14 at 16:51
InquilineKea
1,79982841
1,79982841
add a comment |Â
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
6
down vote
accepted
You can easily grep these files by providing the -a option to interpret the files as ascii:
grep -a "author" *.epub *.mobi
The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.
EPUB and MOBI are both container formats. EPUB is a essentially .zip file with some structural requirements, MOBI is a Palm Database Format file.
Both formats allow for compressed or uncompressed data to be put in the containers.
If the data you are looking for is in a "file" within the container,
and that file is compressed you will need to provide the compressed string not the expanded, uncompressed version of the string. In particular, if you are reading an EPUB/MOBI on an ebook reader, you will of course generally not find a word 'abcde' you just read by using grep -a 'abcde' on all EPUB and MOBI files, as the contents of the book are likely (but not necessarily, it is just an efficiency measure) in compressed "files" in the container.
This is not a problem of grep being incapable of searching in these files, but of you not providing the correct search string. The same would happen if you read a file with Japanese text using some Japanese to English translation software and then hoped you could find the English words by grepping the original file. With -a and the correct Japanese (binary) word patterns, grep would work just fine.
add a comment |Â
up vote
1
down vote
The epub format is a compressed binary file, so you must uncompress it before trying to parse the text. MOBI format doesn't appear to be plain text either, so, no, I would say that epub and mobi files can't be grepped since they are not plain text files. Use calibre or other reader that allows in-file searchs.
add a comment |Â
up vote
1
down vote
To search a compressed file you can use zgrep. This should work for epub since it is a compressed file. Here is some additional information on zgrep: http://manpages.ubuntu.com/manpages/oneiric/man1/zgrep.1.html
The supported compressors are bzip2, gzip, lzip and xz.Neither MOBI or EPUB files are in either of these formats.zgrep -adoesn't find anything more than plaingrep -awould do.
â Anthon
May 2 '14 at 6:35
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
1
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte stringapplication/epub+zip), how is that gone help unless you search for any of those 3 words?
â Anthon
May 2 '14 at 13:32
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
add a comment |Â
up vote
1
down vote
This worked on windows7+cygwin; search text inside the zip archives.
c:> zipgrep "regex" file.epub
shell script in c:/cygwin/bin/zipgrep, and this also works:
c:> unzip -p "*.epub" | grep -a --color regex
-p is for pipe.
grep-epub.sh script
PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
echo $0 $i
unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
grep -Pi --color "$PAT" # color the matches.
done
add a comment |Â
up vote
0
down vote
One can combine former answers with find:
find . -name "*.epub" -exec zipgrep pattern ;
This way one can search in a directory tree, obviating the need for all files to be on the same directory level.
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
You can easily grep these files by providing the -a option to interpret the files as ascii:
grep -a "author" *.epub *.mobi
The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.
EPUB and MOBI are both container formats. EPUB is a essentially .zip file with some structural requirements, MOBI is a Palm Database Format file.
Both formats allow for compressed or uncompressed data to be put in the containers.
If the data you are looking for is in a "file" within the container,
and that file is compressed you will need to provide the compressed string not the expanded, uncompressed version of the string. In particular, if you are reading an EPUB/MOBI on an ebook reader, you will of course generally not find a word 'abcde' you just read by using grep -a 'abcde' on all EPUB and MOBI files, as the contents of the book are likely (but not necessarily, it is just an efficiency measure) in compressed "files" in the container.
This is not a problem of grep being incapable of searching in these files, but of you not providing the correct search string. The same would happen if you read a file with Japanese text using some Japanese to English translation software and then hoped you could find the English words by grepping the original file. With -a and the correct Japanese (binary) word patterns, grep would work just fine.
add a comment |Â
up vote
6
down vote
accepted
You can easily grep these files by providing the -a option to interpret the files as ascii:
grep -a "author" *.epub *.mobi
The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.
EPUB and MOBI are both container formats. EPUB is a essentially .zip file with some structural requirements, MOBI is a Palm Database Format file.
Both formats allow for compressed or uncompressed data to be put in the containers.
If the data you are looking for is in a "file" within the container,
and that file is compressed you will need to provide the compressed string not the expanded, uncompressed version of the string. In particular, if you are reading an EPUB/MOBI on an ebook reader, you will of course generally not find a word 'abcde' you just read by using grep -a 'abcde' on all EPUB and MOBI files, as the contents of the book are likely (but not necessarily, it is just an efficiency measure) in compressed "files" in the container.
This is not a problem of grep being incapable of searching in these files, but of you not providing the correct search string. The same would happen if you read a file with Japanese text using some Japanese to English translation software and then hoped you could find the English words by grepping the original file. With -a and the correct Japanese (binary) word patterns, grep would work just fine.
add a comment |Â
up vote
6
down vote
accepted
up vote
6
down vote
accepted
You can easily grep these files by providing the -a option to interpret the files as ascii:
grep -a "author" *.epub *.mobi
The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.
EPUB and MOBI are both container formats. EPUB is a essentially .zip file with some structural requirements, MOBI is a Palm Database Format file.
Both formats allow for compressed or uncompressed data to be put in the containers.
If the data you are looking for is in a "file" within the container,
and that file is compressed you will need to provide the compressed string not the expanded, uncompressed version of the string. In particular, if you are reading an EPUB/MOBI on an ebook reader, you will of course generally not find a word 'abcde' you just read by using grep -a 'abcde' on all EPUB and MOBI files, as the contents of the book are likely (but not necessarily, it is just an efficiency measure) in compressed "files" in the container.
This is not a problem of grep being incapable of searching in these files, but of you not providing the correct search string. The same would happen if you read a file with Japanese text using some Japanese to English translation software and then hoped you could find the English words by grepping the original file. With -a and the correct Japanese (binary) word patterns, grep would work just fine.
You can easily grep these files by providing the -a option to interpret the files as ascii:
grep -a "author" *.epub *.mobi
The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.
EPUB and MOBI are both container formats. EPUB is a essentially .zip file with some structural requirements, MOBI is a Palm Database Format file.
Both formats allow for compressed or uncompressed data to be put in the containers.
If the data you are looking for is in a "file" within the container,
and that file is compressed you will need to provide the compressed string not the expanded, uncompressed version of the string. In particular, if you are reading an EPUB/MOBI on an ebook reader, you will of course generally not find a word 'abcde' you just read by using grep -a 'abcde' on all EPUB and MOBI files, as the contents of the book are likely (but not necessarily, it is just an efficiency measure) in compressed "files" in the container.
This is not a problem of grep being incapable of searching in these files, but of you not providing the correct search string. The same would happen if you read a file with Japanese text using some Japanese to English translation software and then hoped you could find the English words by grepping the original file. With -a and the correct Japanese (binary) word patterns, grep would work just fine.
answered May 2 '14 at 6:28
Anthon
59.4k17100162
59.4k17100162
add a comment |Â
add a comment |Â
up vote
1
down vote
The epub format is a compressed binary file, so you must uncompress it before trying to parse the text. MOBI format doesn't appear to be plain text either, so, no, I would say that epub and mobi files can't be grepped since they are not plain text files. Use calibre or other reader that allows in-file searchs.
add a comment |Â
up vote
1
down vote
The epub format is a compressed binary file, so you must uncompress it before trying to parse the text. MOBI format doesn't appear to be plain text either, so, no, I would say that epub and mobi files can't be grepped since they are not plain text files. Use calibre or other reader that allows in-file searchs.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The epub format is a compressed binary file, so you must uncompress it before trying to parse the text. MOBI format doesn't appear to be plain text either, so, no, I would say that epub and mobi files can't be grepped since they are not plain text files. Use calibre or other reader that allows in-file searchs.
The epub format is a compressed binary file, so you must uncompress it before trying to parse the text. MOBI format doesn't appear to be plain text either, so, no, I would say that epub and mobi files can't be grepped since they are not plain text files. Use calibre or other reader that allows in-file searchs.
answered May 1 '14 at 17:16
Braiam
22.8k1972133
22.8k1972133
add a comment |Â
add a comment |Â
up vote
1
down vote
To search a compressed file you can use zgrep. This should work for epub since it is a compressed file. Here is some additional information on zgrep: http://manpages.ubuntu.com/manpages/oneiric/man1/zgrep.1.html
The supported compressors are bzip2, gzip, lzip and xz.Neither MOBI or EPUB files are in either of these formats.zgrep -adoesn't find anything more than plaingrep -awould do.
â Anthon
May 2 '14 at 6:35
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
1
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte stringapplication/epub+zip), how is that gone help unless you search for any of those 3 words?
â Anthon
May 2 '14 at 13:32
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
add a comment |Â
up vote
1
down vote
To search a compressed file you can use zgrep. This should work for epub since it is a compressed file. Here is some additional information on zgrep: http://manpages.ubuntu.com/manpages/oneiric/man1/zgrep.1.html
The supported compressors are bzip2, gzip, lzip and xz.Neither MOBI or EPUB files are in either of these formats.zgrep -adoesn't find anything more than plaingrep -awould do.
â Anthon
May 2 '14 at 6:35
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
1
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte stringapplication/epub+zip), how is that gone help unless you search for any of those 3 words?
â Anthon
May 2 '14 at 13:32
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
add a comment |Â
up vote
1
down vote
up vote
1
down vote
To search a compressed file you can use zgrep. This should work for epub since it is a compressed file. Here is some additional information on zgrep: http://manpages.ubuntu.com/manpages/oneiric/man1/zgrep.1.html
To search a compressed file you can use zgrep. This should work for epub since it is a compressed file. Here is some additional information on zgrep: http://manpages.ubuntu.com/manpages/oneiric/man1/zgrep.1.html
answered May 1 '14 at 17:25
Andrew Stern
38949
38949
The supported compressors are bzip2, gzip, lzip and xz.Neither MOBI or EPUB files are in either of these formats.zgrep -adoesn't find anything more than plaingrep -awould do.
â Anthon
May 2 '14 at 6:35
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
1
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte stringapplication/epub+zip), how is that gone help unless you search for any of those 3 words?
â Anthon
May 2 '14 at 13:32
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
add a comment |Â
The supported compressors are bzip2, gzip, lzip and xz.Neither MOBI or EPUB files are in either of these formats.zgrep -adoesn't find anything more than plaingrep -awould do.
â Anthon
May 2 '14 at 6:35
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
1
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte stringapplication/epub+zip), how is that gone help unless you search for any of those 3 words?
â Anthon
May 2 '14 at 13:32
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.â Anthon
May 2 '14 at 6:35
The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.â Anthon
May 2 '14 at 6:35
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â Andrew Stern
May 2 '14 at 13:15
1
1
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string
application/epub+zip), how is that gone help unless you search for any of those 3 words?â Anthon
May 2 '14 at 13:32
Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string
application/epub+zip), how is that gone help unless you search for any of those 3 words?â Anthon
May 2 '14 at 13:32
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â Andrew Stern
May 2 '14 at 13:44
add a comment |Â
up vote
1
down vote
This worked on windows7+cygwin; search text inside the zip archives.
c:> zipgrep "regex" file.epub
shell script in c:/cygwin/bin/zipgrep, and this also works:
c:> unzip -p "*.epub" | grep -a --color regex
-p is for pipe.
grep-epub.sh script
PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
echo $0 $i
unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
grep -Pi --color "$PAT" # color the matches.
done
add a comment |Â
up vote
1
down vote
This worked on windows7+cygwin; search text inside the zip archives.
c:> zipgrep "regex" file.epub
shell script in c:/cygwin/bin/zipgrep, and this also works:
c:> unzip -p "*.epub" | grep -a --color regex
-p is for pipe.
grep-epub.sh script
PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
echo $0 $i
unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
grep -Pi --color "$PAT" # color the matches.
done
add a comment |Â
up vote
1
down vote
up vote
1
down vote
This worked on windows7+cygwin; search text inside the zip archives.
c:> zipgrep "regex" file.epub
shell script in c:/cygwin/bin/zipgrep, and this also works:
c:> unzip -p "*.epub" | grep -a --color regex
-p is for pipe.
grep-epub.sh script
PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
echo $0 $i
unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
grep -Pi --color "$PAT" # color the matches.
done
This worked on windows7+cygwin; search text inside the zip archives.
c:> zipgrep "regex" file.epub
shell script in c:/cygwin/bin/zipgrep, and this also works:
c:> unzip -p "*.epub" | grep -a --color regex
-p is for pipe.
grep-epub.sh script
PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
echo $0 $i
unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
grep -Pi --color "$PAT" # color the matches.
done
edited Aug 30 '17 at 12:14
answered Aug 29 '17 at 16:35
mosh
1413
1413
add a comment |Â
add a comment |Â
up vote
0
down vote
One can combine former answers with find:
find . -name "*.epub" -exec zipgrep pattern ;
This way one can search in a directory tree, obviating the need for all files to be on the same directory level.
add a comment |Â
up vote
0
down vote
One can combine former answers with find:
find . -name "*.epub" -exec zipgrep pattern ;
This way one can search in a directory tree, obviating the need for all files to be on the same directory level.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
One can combine former answers with find:
find . -name "*.epub" -exec zipgrep pattern ;
This way one can search in a directory tree, obviating the need for all files to be on the same directory level.
One can combine former answers with find:
find . -name "*.epub" -exec zipgrep pattern ;
This way one can search in a directory tree, obviating the need for all files to be on the same directory level.
edited 21 mins ago
answered Jan 10 at 19:34
lfd
487
487
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f127458%2fhow-can-i-run-a-grep-on-epub-mobi-files%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password