QPDF renders streams as gibberish
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have been trying to use a variety of programs to render a multilingual pdf (Hebrew/English dictionary) machine readable. QPDF (as well as pretty much every other program) renders the text as gibberish. I have set --decode-level=all
to no avail.
What could be the issue here?
pdf conversion
add a comment |Â
up vote
1
down vote
favorite
I have been trying to use a variety of programs to render a multilingual pdf (Hebrew/English dictionary) machine readable. QPDF (as well as pretty much every other program) renders the text as gibberish. I have set --decode-level=all
to no avail.
What could be the issue here?
pdf conversion
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have been trying to use a variety of programs to render a multilingual pdf (Hebrew/English dictionary) machine readable. QPDF (as well as pretty much every other program) renders the text as gibberish. I have set --decode-level=all
to no avail.
What could be the issue here?
pdf conversion
I have been trying to use a variety of programs to render a multilingual pdf (Hebrew/English dictionary) machine readable. QPDF (as well as pretty much every other program) renders the text as gibberish. I have set --decode-level=all
to no avail.
What could be the issue here?
pdf conversion
pdf conversion
asked Sep 16 at 16:07
Theodcyning
345
345
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
I can't say a lot without seeing that PDF, but some basics:
A PDF contains objects, and some objects contain streams of a simplified variant of Postscript which places glyphs on a page. (You can see the objects by opening the PDF in a text editor, and if you decompress the streams e.g. with mutool
, you can also see the streams in a text editor).
It's really difficult to convert that back into the original text (I assume that's what you mean by "machine readable"), because any such attempt has to make assumptions how the rendering application works. If the rendering application just places glyphs in the order in which they are in the original text, you can try to remap glyphs to characters, and just output the characters in this order.
If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.
So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I can't say a lot without seeing that PDF, but some basics:
A PDF contains objects, and some objects contain streams of a simplified variant of Postscript which places glyphs on a page. (You can see the objects by opening the PDF in a text editor, and if you decompress the streams e.g. with mutool
, you can also see the streams in a text editor).
It's really difficult to convert that back into the original text (I assume that's what you mean by "machine readable"), because any such attempt has to make assumptions how the rendering application works. If the rendering application just places glyphs in the order in which they are in the original text, you can try to remap glyphs to characters, and just output the characters in this order.
If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.
So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.
add a comment |Â
up vote
0
down vote
I can't say a lot without seeing that PDF, but some basics:
A PDF contains objects, and some objects contain streams of a simplified variant of Postscript which places glyphs on a page. (You can see the objects by opening the PDF in a text editor, and if you decompress the streams e.g. with mutool
, you can also see the streams in a text editor).
It's really difficult to convert that back into the original text (I assume that's what you mean by "machine readable"), because any such attempt has to make assumptions how the rendering application works. If the rendering application just places glyphs in the order in which they are in the original text, you can try to remap glyphs to characters, and just output the characters in this order.
If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.
So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I can't say a lot without seeing that PDF, but some basics:
A PDF contains objects, and some objects contain streams of a simplified variant of Postscript which places glyphs on a page. (You can see the objects by opening the PDF in a text editor, and if you decompress the streams e.g. with mutool
, you can also see the streams in a text editor).
It's really difficult to convert that back into the original text (I assume that's what you mean by "machine readable"), because any such attempt has to make assumptions how the rendering application works. If the rendering application just places glyphs in the order in which they are in the original text, you can try to remap glyphs to characters, and just output the characters in this order.
If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.
So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.
I can't say a lot without seeing that PDF, but some basics:
A PDF contains objects, and some objects contain streams of a simplified variant of Postscript which places glyphs on a page. (You can see the objects by opening the PDF in a text editor, and if you decompress the streams e.g. with mutool
, you can also see the streams in a text editor).
It's really difficult to convert that back into the original text (I assume that's what you mean by "machine readable"), because any such attempt has to make assumptions how the rendering application works. If the rendering application just places glyphs in the order in which they are in the original text, you can try to remap glyphs to characters, and just output the characters in this order.
If the rendering program did something more complex, for example because you have two languages with different reading directions, such attempts will fail.
So if you really really need it, you'll have to closely look at how your PDF does things, and write a custom program to convert it back to text.
answered Sep 17 at 6:18
dirkt
14.9k2932
14.9k2932
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f469405%2fqpdf-renders-streams-as-gibberish%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password