PageBitmapOcrReady Possible bug

Discussions about Tesseract OCR integration in GdPicture.
Post Reply
MaxAssunc
Posts: 1
Joined: Thu Apr 06, 2017 7:21 pm

PageBitmapOcrReady Possible bug

Post by MaxAssunc » Thu Apr 06, 2017 8:10 pm

was creating a test aplicattion the goal is create some searchable PDF

When using the PageBitmapOcrReady event to clear the image before the ocr the aplictation was unable to get ocr results from the sample page (grayscale image)

investigating the situation i notice that result from ConvertTo1BppSauvola is not the same when used inside the PageBitmapOcrReady

if i just place a button and call ConvertTo1BppSauvola from the same tif the result is the expected.tif atached here

when the call is from inside the PageBitmapOcrReady the result is the bad_result.tif also atached here

the code is the same...


here is the example code from the button CLEAR TIF


Code: Select all

procedure TForm4.btn_clear_tifClick(Sender: TObject);
var
  ListaArquivo : TStringList;
  i:integer;
begin
  Set8087CW($133f);//controla e formata os pontos flutuantes e formas de arredondadmento conforme documentacao do componente
  GdPictureImaging2 := CreateComObject(CLASS_GdPictureImaging) as _GdPictureImaging;
  ImageID2:= GdPictureImaging2.CreateGdPictureImageFromFile('c:\test_pagebitmapocr\sample.tif');
  GdViewer1.DisplayFromGdPictureImage(ImageID2);
  ShowMessage('Click ok to apply Imagefix');
  if not GdPictureImaging2.GetBitDepth(ImageID2)<>1 then begin
    if GdPictureImaging2.IsGrayscale(ImageID2) then
      GdPictureImaging2.ConvertTo1BppSauvola_2(ImageID2, 0.55, 50, 4);
  end;
  GdViewer1.Redraw;
  if not DirectoryExists('c:\test_PageBitmapOcr') then
    if not ForceDirectories('c:\test_pagebitmapocr') then begin
      ShowMessage('Cannot create directory c:\test_PageBitmapOcr for saving the output');
      Exit;
    end;
  GdPictureImaging2.SaveAsTIFF(ImageID2,'c:\test_pagebitmapocr\expected_result.tif',4);
  ShowMessage('File created on c:\test_pagebitmapocr');
end;
and here is the code from the PageBitmapOcrReady

Code: Select all


procedure TForm4.PageBitmapOcrReady(ASender: TObject; PageNo,  ImageID: Integer);
var
   GdPictureImaging: TGdPictureImaging;
begin
  GdPictureImaging :=TGdPictureImaging.Create(nil);
  if not GdPictureImaging.GetBitDepth(ImageID)<>1 then begin
    if GdPictureImaging.IsGrayscale(ImageID) then
      GdPictureImaging.ConvertTo1BppSauvola_2(ImageID, 0.55, 50, 4)
    else
      GdPictureImaging.ConvertTo1BppAT(ImageID);
  end;
  GdPictureImaging.SaveAsTIFF(ImageID2,'c:\test_pagebitmapocr\bad_result.tif',4);

end;
Attachments
sample.tif
Sample for create the PDF
expected_result.tif
Result for the ConvertTo1BppSauvola called from inside a simple button click
bad_result.tif
Result for the ConvertTo1BppSauvola from inside PageBitmapOcrReady event

User avatar
Loïc
Site Admin
Posts: 5526
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: PageBitmapOcrReady Possible bug

Post by Loïc » Sat Apr 29, 2017 3:59 pm

Hello Max,

I think there is a misunderstanding about this event. This event will provide the image created by the toolkit for the OCR. In other words, this image is not your input image but an already binarized version optimized for OCR.

Please let us know if you need further assistance.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest