Image cleaning before create searchable PDF

Discussions about Tesseract OCR integration in GdPicture.
Post Reply
metrofile
Posts: 1
Joined: Wed Aug 03, 2016 7:55 pm

Image cleaning before create searchable PDF

Post by metrofile » Thu Aug 04, 2016 7:45 pm

Hi.

How can i perform image cleaning measures like despeckle before the ocr aiming best result possible, but after the OCR undo the cleaning process.

I'm asking this because we are not alowed any kind of change in the customer image... so i need to keep the original image, but use measures to get a bether OCR result in the pdf.

David
Posts: 66
Joined: Mon Feb 08, 2016 3:12 pm

Re: Image cleaning before create searchable PDF

Post by David » Thu Aug 11, 2016 9:11 am

Hi,

The development team has implemented a new feature especially for this topic. The latest GdPicture.NET release fires an event when the image to be submitted to the engine is ready.

You can catch this event and modify the image before it is provided to the ocr.
In a nutshell:     
''' Occurs when a bitmap is ready to be sent to the OCR engine. This event is fired after the BeforePageOcr() event.     
''' This event is particularly useful to apply custom pre-processing to the image before recognizing it.     '''     
''' The number of the page to be processed.     
''' A GdPicture image identifier. The bitmap that will be used by the OCR engine.     
''' The event will be always triggered within the caller thread of the OcrPages() method.     
Public Event PageBitmapOcrReady(ByVal PageNo As Integer, ByVal ImageID As Integer)

For instance, a slight modification of the PDF to PDF OCR sample to implement "invisible despeckle". The engine reads a cleanup image while the original image is wrote to the PDF.

Code: Select all

private void Form1_Load(object sender, System.EventArgs e) 
{ 
  LicenseManager oLicenseManager = new LicenseManager(); //Go to http://www.gdpicture.com/download-gdpicture/ to get a 1 month trial key unlocking all features of the toolkit. 
  oLicenseManager.RegisterKEY("XXXX"); //Please, replace XXXX by a valid demo or commercial license key. 
  txtDictsPath.Text = oLicenseManager.GetRedistPath() + "OCR\\"; 
  _nativePdf.OcrPagesProgress += this.OcrPagesProgress; 
  _nativePdf.BeforePageOcr += this.BeforePageOcr; 
  _nativePdf.OcrPagesDone += this.OcrPagesDone; 
  _nativePdf.PageBitmapOcrReady += this.PageBitmapOcrReady; 
} 

private void PageBitmapOcrReady(int pageNo, int ImageID ) 
{ 
  GdPictureImaging oImaging = new GdPictureImaging(); 
  GdPictureStatus status = oImaging.FxDespeckle(ImageID); 
  if (status != GdPictureStatus.OK) 
  { 
  MessageBox.Show("An error occured on page " + pageNo + ". Status: " + status, "error", MessageBoxButtons.OK, MessageBoxIcon.Stop); 
  } 
} 
You can download GdPicture.NET 12.0.28 including this new feature from the following location: http://www.gdpicture.com/download-gdpicture/

Regards,

David

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 1 guest