is there any kind of method I can use to create a file with the hOCR-standard?
The method "PdfAddGdPictureImageToPdfOCR(...)" returns only the recordnized text.
The method "GetPageTextWithCoords(..)" from the "GdPicturePDF" class returns just the text with coordinates, but not as an hOCR-File.
I need the option to extract an hOCR file by an searchable PDF. Is there any method I can use? Or should I create my own method with that called before?
Is there any plan to allow HOCR file creation with Tesseract? I've tried to use gdp.OCRTesseractSetVariable("tessedit_create_hocr", "1"); but it doesn't seems to save the file anywhere.
Maybe this functionnality would not be too hard to implement considering Tesseract already does it.
Thank you and have a nice week end!
The feature is part of the wish list but we have not set a high priority on it. At the moment I thus cannot communicate a release date.
Would it be possible for you to describe us what you wish to do with the HOCR ouput?
Please note you can access the full text by the mean of GetPageText:
http://guides.gdpicture.com/content/web ... eText.html
GetPageText will not retrieve all of the details the HOCR may contain but maybe the rough text will be sufficient for your need. Please note it GetPageText works with searchable PDF and also with text PDF.
- Extract the text layer in HOCR format.
- Make manipulations on the results
- Create a searchable PDF from HOCR file. You wouldn't need to make OCR again at this point here, so it's a performance gain.
Also, none of the provider that I know off really support or have great support for that. That's a sweet spot for GdPicture to exploit
I'm confident you could get that done with a minimum of effort as Tesseract already supports it and you offer a wrapper around their libraries.
GdPicture now offers a completely new class for OCR: https://guides.gdpicture.com/content/we ... reOCR.html
Unfortunately, we do not provide the hOCR option and we do not have any plans to support it a short or medium term.
Who is online
Users browsing this forum: No registered users and 1 guest