How can I run a grep on epub/mobi files?

Clash Royale CLAN TAG#URR8PPP

up vote
3
down vote

favorite

Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?

asked May 1 '14 at 16:51

InquilineKea

1,79982841

add a commentÂ |Â

up vote
3
down vote

favorite

Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?

asked May 1 '14 at 16:51

InquilineKea

1,79982841

add a commentÂ |Â

up vote
3
down vote

favorite

Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?

asked May 1 '14 at 16:51

InquilineKea

1,79982841

Is there a way to do it, particularly on a set of multiple epub/mobi files in one directory?

grep

asked May 1 '14 at 16:51

InquilineKea

1,79982841

asked May 1 '14 at 16:51

InquilineKea

1,79982841

asked May 1 '14 at 16:51

InquilineKea

1,79982841

asked May 1 '14 at 16:51

InquilineKea

1,79982841

asked May 1 '14 at 16:51

InquilineKea

1,79982841

add a commentÂ |Â

5 Answers
5

active

oldest

votes

up vote
6
down vote

accepted

You can easily grep these files by providing the -a option to interpret the files as ascii:

grep -a "author" *.epub *.mobi

The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.

EPUB and MOBI are both container formats. EPUB is a essentially .zip file with some structural requirements, MOBI is a Palm Database Format file.
Both formats allow for compressed or uncompressed data to be put in the containers.

If the data you are looking for is in a "file" within the container,
and that file is compressed you will need to provide the compressed string not the expanded, uncompressed version of the string. In particular, if you are reading an EPUB/MOBI on an ebook reader, you will of course generally not find a word 'abcde' you just read by using grep -a 'abcde' on all EPUB and MOBI files, as the contents of the book are likely (but not necessarily, it is just an efficiency measure) in compressed "files" in the container.

This is not a problem of grep being incapable of searching in these files, but of you not providing the correct search string. The same would happen if you read a file with Japanese text using some Japanese to English translation software and then hoped you could find the English words by grepping the original file. With -a and the correct Japanese (binary) word patterns, grep would work just fine.

answered May 2 '14 at 6:28

Anthon

59.4k17100162

add a commentÂ |Â

up vote
1
down vote

The epub format is a compressed binary file, so you must uncompress it before trying to parse the text. MOBI format doesn't appear to be plain text either, so, no, I would say that epub and mobi files can't be grepped since they are not plain text files. Use calibre or other reader that allows in-file searchs.

answered May 1 '14 at 17:16

Braiam

22.8k1972133

add a commentÂ |Â

up vote
1
down vote

To search a compressed file you can use zgrep. This should work for epub since it is a compressed file. Here is some additional information on zgrep: http://manpages.ubuntu.com/manpages/oneiric/man1/zgrep.1.html

answered May 1 '14 at 17:25

Andrew Stern

38949

The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.
â€“Â Anthon
May 2 '14 at 6:35

This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â€“Â Andrew Stern
May 2 '14 at 13:15

1

Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string application/epub+zip), how is that gone help unless you search for any of those 3 words?
â€“Â Anthon
May 2 '14 at 13:32

epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â€“Â Andrew Stern
May 2 '14 at 13:44

add a commentÂ |Â

up vote
1
down vote

This worked on windows7+cygwin; search text inside the zip archives.

c:> zipgrep "regex" file.epub

shell script in c:/cygwin/bin/zipgrep, and this also works:

c:> unzip -p "*.epub" | grep -a --color regex

-p is for pipe.

grep-epub.sh script

PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
 echo $0 $i
 unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
 perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
 grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
 grep -Pi --color "$PAT" # color the matches.
done

edited Aug 30 '17 at 12:14

answered Aug 29 '17 at 16:35

mosh

1413

add a commentÂ |Â

up vote
0
down vote

One can combine former answers with find:

find . -name "*.epub" -exec zipgrep pattern ;

This way one can search in a directory tree, obviating the need for all files to be on the same directory level.

edited 21 mins ago

answered Jan 10 at 19:34

lfd

487

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f127458%2fhow-can-i-run-a-grep-on-epub-mobi-files%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
6
down vote

accepted

You can easily grep these files by providing the -a option to interpret the files as ascii:

grep -a "author" *.epub *.mobi

The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.

answered May 2 '14 at 6:28

Anthon

59.4k17100162

add a commentÂ |Â

up vote
6
down vote

accepted

You can easily grep these files by providing the -a option to interpret the files as ascii:

grep -a "author" *.epub *.mobi

The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.

answered May 2 '14 at 6:28

Anthon

59.4k17100162

add a commentÂ |Â

up vote
6
down vote

accepted

You can easily grep these files by providing the -a option to interpret the files as ascii:

grep -a "author" *.epub *.mobi

The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.

answered May 2 '14 at 6:28

Anthon

59.4k17100162

You can easily grep these files by providing the -a option to interpret the files as ascii:

grep -a "author" *.epub *.mobi

The above works on all of my 1000+ EPUB and MOBI files, giving the expected results.

answered May 2 '14 at 6:28

Anthon

59.4k17100162

answered May 2 '14 at 6:28

Anthon

59.4k17100162

answered May 2 '14 at 6:28

Anthon

59.4k17100162

answered May 2 '14 at 6:28

Anthon

59.4k17100162

add a commentÂ |Â

up vote
1
down vote

answered May 1 '14 at 17:16

Braiam

22.8k1972133

add a commentÂ |Â

up vote
1
down vote

answered May 1 '14 at 17:16

Braiam

22.8k1972133

add a commentÂ |Â

up vote
1
down vote

answered May 1 '14 at 17:16

Braiam

22.8k1972133

answered May 1 '14 at 17:16

Braiam

22.8k1972133

answered May 1 '14 at 17:16

Braiam

22.8k1972133

answered May 1 '14 at 17:16

Braiam

22.8k1972133

answered May 1 '14 at 17:16

Braiam

22.8k1972133

add a commentÂ |Â

up vote
1
down vote

answered May 1 '14 at 17:25

Andrew Stern

38949

The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.
â€“Â Anthon
May 2 '14 at 6:35

This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â€“Â Andrew Stern
May 2 '14 at 13:15

1

Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string application/epub+zip), how is that gone help unless you search for any of those 3 words?
â€“Â Anthon
May 2 '14 at 13:32

epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â€“Â Andrew Stern
May 2 '14 at 13:44

add a commentÂ |Â

up vote
1
down vote

answered May 1 '14 at 17:25

Andrew Stern

38949

The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.
â€“Â Anthon
May 2 '14 at 6:35

This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â€“Â Andrew Stern
May 2 '14 at 13:15

1

Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string application/epub+zip), how is that gone help unless you search for any of those 3 words?
â€“Â Anthon
May 2 '14 at 13:32

epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â€“Â Andrew Stern
May 2 '14 at 13:44

add a commentÂ |Â

up vote
1
down vote

answered May 1 '14 at 17:25

Andrew Stern

38949

answered May 1 '14 at 17:25

Andrew Stern

38949

answered May 1 '14 at 17:25

Andrew Stern

38949

answered May 1 '14 at 17:25

Andrew Stern

38949

answered May 1 '14 at 17:25

Andrew Stern

38949

The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.
â€“Â Anthon
May 2 '14 at 6:35

This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â€“Â Andrew Stern
May 2 '14 at 13:15

1

Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string application/epub+zip), how is that gone help unless you search for any of those 3 words?
â€“Â Anthon
May 2 '14 at 13:32

epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â€“Â Andrew Stern
May 2 '14 at 13:44

add a commentÂ |Â

The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.
â€“Â Anthon
May 2 '14 at 6:35

This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â€“Â Andrew Stern
May 2 '14 at 13:15

1

Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string application/epub+zip), how is that gone help unless you search for any of those 3 words?
â€“Â Anthon
May 2 '14 at 13:32

epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â€“Â Andrew Stern
May 2 '14 at 13:44

The supported compressors are bzip2, gzip, lzip and xz. Neither MOBI or EPUB files are in either of these formats. zgrep -a doesn't find anything more than plain grep -a would do.
â€“Â Anthon
May 2 '14 at 6:35

This page seems to indicate that the epub is in zip format: mobileread.com/forums/showthread.php?t=31040 . Also gzip supports the zip format.
â€“Â Andrew Stern
May 2 '14 at 13:15

Of course EPUB is a zip file. gzip however only supports extracting zip files with a single member (read the gzip man page). Since the first file in an EPUB file according to the standard has to be the "mimetype" file (with has as content the 20 byte string application/epub+zip), how is that gone help unless you search for any of those 3 words?
â€“Â Anthon
May 2 '14 at 13:32

epub is compressed in a zip format. It seems that gzip doesn't uncompress this file but unzip will. After decompression I found the text of the book in index_split_001.xhtml but I don't know if that is true of every epub. It should be possible to unzip the contents of the file then recompress the contents into a .gz file so that zgrep would work. I haven't found a simple one line command to do this conversion.
â€“Â Andrew Stern
May 2 '14 at 13:44

add a commentÂ |Â

up vote
1
down vote

This worked on windows7+cygwin; search text inside the zip archives.

c:> zipgrep "regex" file.epub

shell script in c:/cygwin/bin/zipgrep, and this also works:

c:> unzip -p "*.epub" | grep -a --color regex

-p is for pipe.

grep-epub.sh script

PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
 echo $0 $i
 unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
 perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
 grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
 grep -Pi --color "$PAT" # color the matches.
done

edited Aug 30 '17 at 12:14

answered Aug 29 '17 at 16:35

mosh

1413

add a commentÂ |Â

up vote
1
down vote

This worked on windows7+cygwin; search text inside the zip archives.

c:> zipgrep "regex" file.epub

shell script in c:/cygwin/bin/zipgrep, and this also works:

c:> unzip -p "*.epub" | grep -a --color regex

-p is for pipe.

grep-epub.sh script

PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
 echo $0 $i
 unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
 perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
 grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
 grep -Pi --color "$PAT" # color the matches.
done

edited Aug 30 '17 at 12:14

answered Aug 29 '17 at 16:35

mosh

1413

add a commentÂ |Â

up vote
1
down vote

This worked on windows7+cygwin; search text inside the zip archives.

c:> zipgrep "regex" file.epub

shell script in c:/cygwin/bin/zipgrep, and this also works:

c:> unzip -p "*.epub" | grep -a --color regex

-p is for pipe.

grep-epub.sh script

PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
 echo $0 $i
 unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
 perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
 grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
 grep -Pi --color "$PAT" # color the matches.
done

edited Aug 30 '17 at 12:14

answered Aug 29 '17 at 16:35

mosh

1413

This worked on windows7+cygwin; search text inside the zip archives.

c:> zipgrep "regex" file.epub

shell script in c:/cygwin/bin/zipgrep, and this also works:

c:> unzip -p "*.epub" | grep -a --color regex

-p is for pipe.

grep-epub.sh script

PAT=$1:?"Usage: grep-epub PAT *.epub files to grep"
shift
: $1:?"Need epub files to grep"
for i in $* ;do
 echo $0 $i
 unzip -p $i "*.htm*" "*.xml" "*.opf" | # unzip only html and content files to stdin
 perl -lpe 's![<][^>]1,200?[>]!!g;' | # get rid of small html <b>tags
 grep -Pinaso ".0,60$PAT.0,60" | # keep some context around matches
 grep -Pi --color "$PAT" # color the matches.
done

edited Aug 30 '17 at 12:14

answered Aug 29 '17 at 16:35

mosh

1413

edited Aug 30 '17 at 12:14

answered Aug 29 '17 at 16:35

mosh

1413

answered Aug 29 '17 at 16:35

mosh

1413

answered Aug 29 '17 at 16:35

mosh

1413

add a commentÂ |Â

up vote
0
down vote

One can combine former answers with find:

find . -name "*.epub" -exec zipgrep pattern ;

This way one can search in a directory tree, obviating the need for all files to be on the same directory level.

edited 21 mins ago

answered Jan 10 at 19:34

lfd

487

add a commentÂ |Â

up vote
0
down vote

One can combine former answers with find:

find . -name "*.epub" -exec zipgrep pattern ;

This way one can search in a directory tree, obviating the need for all files to be on the same directory level.

edited 21 mins ago

answered Jan 10 at 19:34

lfd

487

add a commentÂ |Â

up vote
0
down vote

One can combine former answers with find:

find . -name "*.epub" -exec zipgrep pattern ;

This way one can search in a directory tree, obviating the need for all files to be on the same directory level.

edited 21 mins ago

answered Jan 10 at 19:34

lfd

487

One can combine former answers with find:

find . -name "*.epub" -exec zipgrep pattern ;

This way one can search in a directory tree, obviating the need for all files to be on the same directory level.

edited 21 mins ago

answered Jan 10 at 19:34

lfd

487

edited 21 mins ago

answered Jan 10 at 19:34

lfd

487

answered Jan 10 at 19:34

lfd

487

answered Jan 10 at 19:34

lfd

487

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu