QPDF renders streams as gibberish

up vote
1
down vote

favorite

I have been trying to use a variety of programs to render a multilingual pdf (Hebrew/English dictionary) machine readable. QPDF (as well as pretty much every other program) renders the text as gibberish. I have set --decode-level=all to no avail.

What could be the issue here?

asked Sep 16 at 16:07

Theodcyning

345

add a commentÂ |Â

up vote
1
down vote

favorite

What could be the issue here?

asked Sep 16 at 16:07

Theodcyning

345

add a commentÂ |Â

up vote
1
down vote

favorite

What could be the issue here?

asked Sep 16 at 16:07

Theodcyning

345

What could be the issue here?

pdf conversion

asked Sep 16 at 16:07

Theodcyning

345

asked Sep 16 at 16:07

Theodcyning

345

asked Sep 16 at 16:07

Theodcyning

345

asked Sep 16 at 16:07

Theodcyning

345

asked Sep 16 at 16:07

Theodcyning

345

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
0
down vote

I can't say a lot without seeing that PDF, but some basics:

A PDF contains objects, and some objects contain streams of a simplified variant of Postscript which places glyphs on a page. (You can see the objects by opening the PDF in a text editor, and if you decompress the streams e.g. with mutool, you can also see the streams in a text editor).

It's really difficult to convert that back into the original text (I assume that's what you mean by "machine readable"), because any such attempt has to make assumptions how the rendering application works. If the rendering application just places glyphs in the order in which they are in the original text, you can try to remap glyphs to characters, and just output the characters in this order.

If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.

So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.

answered Sep 17 at 6:18

dirkt

14.9k2932

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f469405%2fqpdf-renders-streams-as-gibberish%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

I can't say a lot without seeing that PDF, but some basics:

If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.

So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.

answered Sep 17 at 6:18

dirkt

14.9k2932

add a commentÂ |Â

up vote
0
down vote

I can't say a lot without seeing that PDF, but some basics:

If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.

So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.

answered Sep 17 at 6:18

dirkt

14.9k2932

add a commentÂ |Â

up vote
0
down vote

I can't say a lot without seeing that PDF, but some basics:

If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.

So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.

answered Sep 17 at 6:18

dirkt

14.9k2932

I can't say a lot without seeing that PDF, but some basics:

If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.

So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.

answered Sep 17 at 6:18

dirkt

14.9k2932

answered Sep 17 at 6:18

dirkt

14.9k2932

answered Sep 17 at 6:18

dirkt

14.9k2932

answered Sep 17 at 6:18

dirkt

14.9k2932

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu