Loading...

Sample Programm doesn't produce searchable pdf file

Support for GdPicture Tessaract Plugin.

Sample Programm doesn't produce searchable pdf file

Postby luke92 » Tue Apr 05, 2011 2:51 pm

I tried to convert a pdf file to an pdf ocr file using the sample from GdPicture.Net "PDF to PDF-OCR".
I was able to produce a file, but the file isn't searchable for text. Do I have to modify the sample program to make it work?

Thanks for your help.
luke92
 
Posts: 6
Joined: Tue Apr 05, 2011 2:45 pm

Re: Sample Programm doesn't produce searchable pdf file

Postby Loïc » Tue Apr 05, 2011 2:58 pm

Hi,

Please send the resulting PDF to http://support.gdpicture.com for investigation.

If you provided the good dictionary path, the program should works.

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Sample Programm doesn't produce searchable pdf file

Postby luke92 » Tue Apr 05, 2011 3:16 pm

I used the standard dictionary path C:\Programme\GdPicture.NET\Redist\Commons\OCR
luke92
 
Posts: 6
Joined: Tue Apr 05, 2011 2:45 pm

Re: Sample Programm doesn't produce searchable pdf file

Postby Loïc » Tue Apr 05, 2011 3:17 pm

OK. Please send the produced PDF for investigation purpose.
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Sample Programm doesn't produce searchable pdf file

Postby ryancole11 » Wed May 04, 2011 9:15 pm

Can you please keep me informed about this? I am currently trying to use that example code to turn a non-searchable PDF into an OCR'd searchable PDF, also. The example code is not producing a searchable PDF. The example code only produces a PDF/A but does not have any embedded text. I know that it is at least performing the OCR operations with the dictionary files because each page takes a couple of seconds to process. There is no need for an example PDF because this does not work for any PDF that I test it with.

I am using C# and the .NET version of GdPicture Pro and Tesseract. Here's my code:

http://dpaste.org/uLWu/

Code: Select all
String dictionaries = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + @"\dictionaries";

// open the new pdf in the viewer
viewer.DisplayFromFile(out_file);

for (int x = 1; x <= viewer.PageCount; x++)
{
   Console.WriteLine("Performing image twain on page {0}", x);

   viewer.DisplayFrame(x);
   Int32 rasterized_page = viewer.GetNativeImage();

   if (x == 1)
      imaging.TwainPdfOCRStartEx(String.Format("{0}.ocr.pdf", out_file), "", "", "", "", "", PdfEncryption.PdfEncryptionNone, PdfRight.PdfRightCanModify);

   imaging.TwainAddGdPictureImageToPdfOCR(rasterized_page, TesseractDictionary.TesseractDictionaryEnglish, dictionaries);
}

// close the twaining
imaging.TwainPdfOCRStop();
viewer.CloseImage();
ryancole11
 
Posts: 17
Joined: Fri May 21, 2010 7:19 pm

Re: Sample Programm doesn't produce searchable pdf file

Postby Loïc » Thu May 05, 2011 6:24 pm

Hi,

Please send a standalone application reproducing the issue + input and output PDF to http://support.gdpicture.com

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Sample Programm doesn't produce searchable pdf file

Postby ryancole11 » Thu May 05, 2011 6:25 pm

Alright, give me about 30 minutes. I'm in the middle of something, at the moment.
ryancole11
 
Posts: 17
Joined: Fri May 21, 2010 7:19 pm


Return to GdPicture Tesseract OCR Engine Plugin

Who is online

Users browsing this forum: No registered users and 0 guests