Here is a new version (vb.net & c#) based on built-in multitasking support of GdPicture 11. Hi there,
Based on many customer requests we provide a vb.net demo application which aims to convert multipage Tiff document to PDF/OCR using a predefined number of threads.
The app has been created using Visual Studio 2010 (vb language).
- Expects for user to provide a multipage tiff to convert to PDF/OCR, valid dictionary path and language (default is english)
- Splits the input tiff document in several tiffs (1 file = 1 page)
- Performs OCR in multi-thread mode. 1 page = 1 thread. And create 1 PDF per page
- When OCR is done, the app merges the produced PDFs to a single PDF.
- Visual Studio 2010 or higher.
- Install GdPicture.NET 8.4.3 or higher.
- Open the app and replace "XXX" by a valid trial or commercial key.
Feel free to post any question or comment.
- the app.
- (19.55 KiB) Downloaded 532 times
Thank you very much. This looks fantastic. Just one question. I am a bit confused with the license numbers. In your example, you use "oLicenseManager.RegisterKEY"... but I received TWO keys, one for GdPicture Image and another for the Tesseract Add-On. What is the correct way to set my license?
Just call the Register key for each of your license. No matter the order.
I did not purchase the PDF Add-On. When it tries the "oGdPicturePDF.MergeDocuments(files, fileDest)" I get a message that I'm not licensed for the PDF Add-On. What would be the cleanest way to do this without the Add-On?
unfortunately there is no other way than using the GdPicture PDF plugin to get this sample working. I am sorry, I forgot to specify that.
In the next minor release MergeDocuments() will generate PDF/A according to the input documents. I will upload soon a modified version of the Demo for demonstrating example of usage.
Please find attached the version that supports PDF/A as output. To be used with GdPicture.NET 8.5.15 and higher.
- Multi-thread TIFF to PDF-OCR PDFA.zip
- Multithread TIFF 2 PDF/OCR with PDF/A support.
- (18.84 KiB) Downloaded 325 times
When I exract and open this project, I am missing the "modGlobals.vb".
Also, in the previous version, "MultiPageOCRThreading.zip", I found that the pages were not being processed in the proper order. The problem was in the "cmdRun_Click" event. When storing the individual pages, the sort order for the files goes off track if there are more than 9 pages. For example;
I was able to correct this by modifying the "SaveAsTIFF" with "Format" statement as follows:
oGdPictureImaging.SaveAsTIFF(tiffID, tmp_path + "\page" + Format(i, "0000").ToString + ".tif", GdPicture.TiffCompression.TiffCompressionAUTO)
This way, they are sorted correctly as;
This way, the files are ordered as:
One more thing. Do you have any experience with the new .NET Framework 4 "System.Threading.Tasks" namespace or TaskFactory class? It seems to be very powerful, and hopefully, easier to implement?
Dim taskA = _
... multithreaded statements ...
Are there any updates to this demo app. I am looking at a way to OCR a PDF in a multithreaded or parallel processing way not a multipage TIFF. With .NET 4.0 is there any plan to handle OCR of a multipage document (PDF) in Parallel mode internally in the toolkit? Something like this would be a great feature.
Who is online
Users browsing this forum: No registered users and 1 guest