Hi
I have been experiencing problems performing OCR on multiple documents. I get different results returned depending on the order in which the documents are processed.
I believe this is because of the “natural learning algorithm” employed by the Tesseract Engine as mentioned in other posts.
I am using the ActiveX version which does not have the option to “clear” the “learnt” information. As a result, I guess, it learns from previous documents that can be different sizes, different fonts, different quality, etc – making the results differ apparently randomly and quite significantly according to what has been read before.
This is far from ideal – you really want to receive the same OCR results each time the same document is read!
Has this been fixed in a later version or does a workaround exist to get over this problem?
Thanks
Nigel
