OCR output differences between v11 & v14

Discussions about Tesseract OCR integration in GdPicture.
Post Reply
RandallB
Posts: 1
Joined: Thu May 31, 2018 6:54 pm

OCR output differences between v11 & v14

Post by RandallB » Thu May 31, 2018 7:24 pm

We recently upgraded from version 11 to version 14. I'm in the process up updating our source code to conform to the new changes. I've ran a test on the OCR between v11 and v14 and have noted that the output from v14 is significantly poorer. I've attached a zip file that has text files labeled as OCR-v11.txt and OCR-v14.txt. Also, I've included a copy of the function I've modified for v14 along with a screen shot showing some of the differences between the two text files. There is ~40K difference in the text extracted between the two files. Outside of what the C# function shows, I'm using whatever the installed library settings are. I'm assuming since there is a large difference between our versions that perhaps some defaults have changed along the way. Please advise on how to at least make the OCR output between these versions equivalent or preferably better. I will not be able to upgrade our software until we resolve this.

Please contact me if you need any further information or files. (the original PDF I'm using for the test is too large to upload on the site)

Thank you...
Attachments
OCRTest.7z
(157.39 KiB) Downloaded 13 times

User avatar
Loïc
Site Admin
Posts: 5592
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: OCR output differences between v11 & v14

Post by Loïc » Sun Jun 03, 2018 5:59 am

Hello,

The latest version 14 improved globally the OCR accuracy and speed. So you should first check that you are using our latest release, if the problem persists we will need the input document to be able to reproduce the issue. You can share it through our helpdesp that can be reached here: http://support.orpalis.com

With best regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests