- Examples of duplicate OCR selection
- (726.36 KiB) Downloaded 75 times
Because if yes, the duplicate text comes from the fact that you are extracting both the original text and the OCR text (which is transparently written on top on the image).
Running OCR on a document that is not an image is not really useful since you already have extractible text in the first place.
To avoid text duplication, you can either convert your source PDF to an image (it's called rasterization) and then run the OCR on it or you can directly extract the text without running the OCR.
Who is online
Users browsing this forum: Bing [Bot] and 1 guest