I want to get the text of a PDF/A-1b document and the method GetPageText() return specials caracters. The result is not alphanumeric. I have the same problem with the method GetPageContent().
When I try with a simple PDF I don't have this problem.
Thanks for help.
Unfortunately this PDF is generated without embedding correct font encoding (aka cMap or difference table). So there is basically nothing that can be done to associate each rendered glyph to the correct character ID. You will obtain the same result with Adobe reader: select the text, copy it (ctrl+c) and past it in notepad.
Please let us know if you need further information.
With best regards,
Who is online
Users browsing this forum: No registered users and 2 guests