Missing left spaces in extract text

Discussions about OCR & MICR support in GdPicture.
Post Reply
rassekst
Posts: 15
Joined: Thu Oct 04, 2007 6:58 pm

Missing left spaces in extract text

Post by rassekst » Fri Feb 22, 2013 10:52 am

Hi,

I store the ocr text in a text file:

Code: Select all

                                               
oGdPictureImagingSource.OCRTesseractReinit();
oGdPictureImagingSource.OCRTesseractSetPassCount(5);
sOCR = oGdPictureImagingSource.OCRTesseractDoOCR(iImagePage, "deu", Application.StartupPath + "\\OCR", "");

if (oGdPictureImagingSource.GetStat() == GdPictureStatus.OK)
{

    oGdPictureImagingSource.OCRTesseractClear();

   System.IO.Stream fs = new System.IO.FileStream("Text.OCR", System.IO.FileMode.Create);
   byte[] data = System.Text.Encoding.UTF8.GetBytes(sOCR);
   fs.Write(data, 0, data.Length);
   fs.Close();

}
The problem is, in the file is missing the left spaces. Every line is left trim. The position of the text in the extracts text is not equal to the text position in the pdf dokument.
What is the problem. When I use the method GetPageTextWithCoords(...) then is the coordinates correct.

Regards
Steffen

Gabriela
Posts: 245
Joined: Wed Nov 22, 2017 9:52 am

Re: Missing left spaces in extract text

Post by Gabriela » Wed Jan 30, 2019 4:08 pm

Hello,

To be able to assist you here, we need the source document and the compilable and fully executable code snippet. Maybe you can try the GdPictureOCR class that offers completely new and refined OCR functionality:
http://guides.gdpicture.com/content/web ... rePDF.html
Kind regards,

Gabriela
GdPicture Team

rassekst
Posts: 15
Joined: Thu Oct 04, 2007 6:58 pm

Re: Missing left spaces in extract text

Post by rassekst » Mon Feb 04, 2019 1:07 pm

Hello,
Here is a source:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using GdPicture14;


namespace Test
{
class Program
{
static void Main(string[] args)
{
GdPictureImaging oGdPictureImaging = new GdPictureImaging();
int imageId = oGdPictureImaging.CreateGdPictureImageFromFile(args[0]);

GdPictureOCR oGdPictureOCR = new GdPictureOCR();

// oGdPictureOCR.Context = OCRContext.OCRContextSingleBlock;

oGdPictureOCR.ResourceFolder = @".\OCR";
if ((oGdPictureOCR.AddLanguage(OCRLanguage.German) == GdPictureStatus.OK) && (oGdPictureOCR.SetImage(imageId) == GdPictureStatus.OK))
{
//Running the OCR.
string resID = oGdPictureOCR.RunOCR();

string sOCR = oGdPictureOCR.GetOCRResultText(resID);

System.IO.Stream fs = new System.IO.FileStream("test.txt", System.IO.FileMode.Create);
byte[] data = System.Text.Encoding.UTF8.GetBytes(sOCR);
fs.Write(data, 0, data.Length);
fs.Close();
}
oGdPictureOCR.Dispose();


}
}
}

Regards,

Steffen

Gabriela
Posts: 245
Joined: Wed Nov 22, 2017 9:52 am

Re: Missing left spaces in extract text

Post by Gabriela » Mon Feb 04, 2019 1:25 pm

Hello, Steffen,

Thanks for the code snippet, but without the input document we cannot provide any tests. :wink:
If you think the issue is in the toolkit, please create a ticket on the support platform, where you can also attach a source document:
https://support.gdpicture.com
Our development team can provide further analysis and the issue (if any) can be properly handled. They can also better navigate you to find a solution to your issue. Thank you for understanding.
Kind regards,

Gabriela
GdPicture Team

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest