Hello,
I saw the thread below this asking about detecting the page language of a document being OCR'd. I saw the response by the admin saying they have no looked into this feature, and therefore I assume this does not exist in the current version of the Tesseract OCR engine plugin.
I guess that I will have to come up with some way to automate that part of the OCR process. Does anyone have any neat tricks that they use to detect, automatically, what language a document is in? We will be OCR'ing hundreds of documents at a time, and usually we have documents from all over the world. I'd like to detect the document language and then OCR using that dictionary, if possible.
Thanks,
Ryan
