Loading...

New methods

Support for GdPicture Tessaract Plugin.

New methods

Postby versilej » Mon Jun 27, 2011 5:53 pm

Hi,

I need the new syntax in order to be able to convert a PDF into a searchable PDF. My previous code following:

Code: Select all
                GdPictureImaging gdImaging = new GdPictureImaging();
                GdViewer gdViewer = new GdViewer();
                lock (licenseLock)
                {                   
                    gdImaging.SetLicenseNumber(Properties.Settings.Default.GDPictureLicense);
                    gdImaging.SetLicenseNumberOCRTesseract(Properties.Settings.Default.GDTesserectLicense);                   
                    gdViewer.SetLicenseNumber(Properties.Settings.Default.GDPictureLicense);
                    gdViewer.DisplayFromFile(fileName);
                }
                // seems to take a second to get access to the gd library
                Thread.Sleep(500);
                if (fileName.IndexOf(".pdf") > -1)
                {
                    int newPDFID = gdImaging.PdfOCRStart(defPdfFilePath + "working" + core.ToString() + ".pdf", true, string.Empty, string.Empty, string.Empty, string.Empty, string.Empty);
                    for (int y = 1; y <= gdViewer.PageCount; y++)
                    {
                        if (stopRunning)
                        {
                            return;
                        }
                        imageID = gdViewer.PdfRenderPageToGdPictureImage(400, y);
                        gdImaging.ConvertTo1Bpp(imageID);
                        gdImaging.PdfAddGdPictureImageToPdfOCR(newPDFID, imageID, TesseractDictionary.TesseractDictionaryEnglish, defPdfFilePath + "OCR\\", string.Empty);
                        gdImaging.ReleaseGdPictureImage(imageID);
                    }
                    gdImaging.PdfOCRStop(newPDFID);
                }
                else if (fileName.IndexOf(".tif") > -1)
                {
                    gdImaging.TiffOpenMultiPageForWrite(false);
                    imageID = gdImaging.TiffCreateMultiPageFromFile(fileName);
                    gdImaging.PdfOCRCreateFromMultipageTIFF(imageID, TesseractDictionary.TesseractDictionaryEnglish, defPdfFilePath + "OCR\\", string.Empty, defPdfFilePath + "working" + core.ToString() + ".pdf", true, fileName, string.Empty, fileName, string.Empty, string.Empty);
                    gdImaging.ReleaseGdPictureImage(imageID);
                }
                if (stopRunning)
                {
                    return;
                }

                // determine if PDF is searchable now         
                gdViewer.DisplayFromFile(defPdfFilePath + "working" + core.ToString() + ".pdf");
                string pdfText = string.Empty;
                for (int y = 1; y <= (gdViewer.PageCount > defPages ? defPages : gdViewer.PageCount); y++)
                {
                    pdfText += gdViewer.PdfGetPageText(y);
                }
                gdViewer.CloseDocument();
versilej
 
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Postby versilej » Tue Jun 28, 2011 3:12 am

So I finally found the right DEMO to replace my code but now am running into another problem (repeatedly) - System.OutOfMemory Exception in the PdfAddGdPictureImageToPdfOCR function. Any hints, I've tried it with loading into memory and not loading into memory?

Code: Select all
GdPictureImaging gdImaging = new GdPictureImaging();
                GdPicturePDF gdPDF = new GdPicturePDF();
                lock (licenseLock)
                {                   
                    gdImaging.SetLicenseNumberUpgrade(Properties.Settings.Default.GD7PictureLicense, Properties.Settings.Default.GD8PictureLicense);
                    gdImaging.SetLicenseNumberOCRTesseract(Properties.Settings.Default.GD8TesserectLicense);                   
                    gdPDF.SetLicenseNumber(Properties.Settings.Default.GD8PictureLicense);
                }
                if (fileName.IndexOf(".pdf") > -1)
                {
                    if (gdPDF.LoadFromFile(fileName, true) == GdPictureStatus.OK)
                    {
                        int newPDFID = gdImaging.PdfOCRStart(defPdfFilePath + "working" + core.ToString() + ".pdf", true, string.Empty, string.Empty, string.Empty, string.Empty, string.Empty);
                        for (int y = 1; y <= gdPDF.GetPageCount(); y++)
                        {
                            if (stopRunning)
                            {
                                return;
                            }
                            gdPDF.SelectPage(y);
                            imageID = gdPDF.RenderPageToGdPictureImage(res, true);
                            gdImaging.ConvertTo1Bpp(imageID);
                            gdImaging.PdfAddGdPictureImageToPdfOCR(newPDFID, imageID, "eng", defPdfFilePath + "OCR\\", string.Empty);
                            gdImaging.ReleaseGdPictureImage(imageID);                           
                            Application.DoEvents();
                        }
                        gdImaging.PdfOCRStop(newPDFID);
                        gdPDF.CloseDocument();
                    }
                    else
                        throw new Exception("Failed to Load From File with gdPDF");
                }
                else if (fileName.IndexOf(".tif") > -1)
                {
                    gdImaging.TiffOpenMultiPageForWrite(false);
                    imageID = gdImaging.TiffCreateMultiPageFromFile(fileName);
                    gdImaging.PdfOCRCreateFromMultipageTIFF(imageID, "eng", defPdfFilePath + "OCR\\", string.Empty, defPdfFilePath + "working" + core.ToString() + ".pdf", true, fileName, string.Empty, fileName, string.Empty, string.Empty);
                    gdImaging.ReleaseGdPictureImage(imageID);
                }
                if (stopRunning)
                {
                    return;
                }
                // determine if PDF is searchable now 
                string pdfText = string.Empty;
                if (gdPDF.LoadFromFile(defPdfFilePath + "working" + core.ToString() + ".pdf", false) == GdPictureStatus.OK)
                {                   
                    for (int y = 1; y <= (gdPDF.GetPageCount() > defPages ? defPages : gdPDF.GetPageCount()); y++)
                    {
                        gdPDF.SelectPage(y);
                        pdfText += gdPDF.GetPageText();
                    }
                    gdPDF.CloseDocument();
                }
                if (pdfText.Length > defLength)
                {
                    // PDFfilepath + working.pdf needs to replace old pdf
                    string newFileName = fileName.Replace("tif", "pdf");
                    File.Copy(defPdfFilePath + "working" + core.ToString() + ".pdf", newFileName, true);
                    File.Delete(defPdfFilePath + "working" + core.ToString() + ".pdf");
                    if (fileName.IndexOf("tif") > -1 &&
                        delTiff)
                    {
                        File.Delete(fileName);
                    }
                }               
                // look for any text to determine searchability
                if (pdfText.Length > defLength)
                {
                    success = true;
                }
                else
                {
                    success = false;
                }
versilej
 
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Postby versilej » Tue Jun 28, 2011 3:37 am

Here is a screenshot, it's random as I have it processing the same PDF file over and over, 90% of the time it does not do this. It never does it using the TIFF multipage, always on the same function call.

Image
versilej
 
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Postby Loïc » Tue Jun 28, 2011 11:21 am

Hi,

We are investigating the issue. Is it possible for you to send us the document you are converting to http://support.gdpicture.com ?
Also what is the value used in res ?

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: New methods

Postby versilej » Tue Jun 28, 2011 6:03 pm

It happens on many different documents, I tested quite a few of them and it repeats itself. One thing I am not sure of is whether or not I should be creating a new gdImaging object for each thread or if I should just use one global object? These documents are typically 10-30 pages, 10-20 megs.
versilej
 
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Postby versilej » Sat Jul 02, 2011 9:17 pm

Hello? Hate to be a bother but I purchased these controls, and when I emailed support they told me to post here?
versilej
 
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Postby Loïc » Sat Jul 02, 2011 9:26 pm

Hi,

Sorry we have some late on support. Also you did not replied to this question: "what is the value used in res?"

We have tested multi-process PDF/OCR creation without encountering any problem. I think it should be better for you to reproduce the problem in a standalone application and send it to http://support.gdpicture.com for better investigation. With the code snippet and the information provided, we are absolutely unable to bring more help.

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: New methods

Postby versilej » Sun Jul 03, 2011 8:12 pm

I sent both the PDF, and full application to support. The resolution is user settable, but I have tried 200,300,400 and it's purely random.
versilej
 
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Postby karlie » Thu Jul 28, 2011 10:14 pm

Any updates on the out of memory exceptions when creating searcable PDF files? After upgrading our product to version 8 of GDPicture.NET we have been getting regular error reports from our customers with this exception

System.Exception: Exception of type 'System.OutOfMemoryException' was thrown.
at System.String.CtorCharCount(Char c, Int32 count)
at Microsoft.VisualBasic.Strings.Space(Int32 Number)
at aq.a(c[]& A_0, Int32 A_1)
at GdPicture.GdPictureImaging.PdfAddGdPictureImageToPdfOCR(Int32 PdfID, Int32 ImageID, String Dictionary, String DictionaryPath, String CharWhiteList)

Are you still testing GDPicture.NET in the x86 configuration? Because even though GDPicture version 8 is now ANYCPU, our program is still compiled as x86.
karlie
 
Posts: 4
Joined: Tue Mar 09, 2010 2:47 am

Re: New methods

Postby Loïc » Thu Jul 28, 2011 11:47 pm

Hi Karlie,

This problem have been fixd in GdPicture.NET 8.1.1. Please, update !

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: New methods

Postby vrtacic » Mon Aug 01, 2011 10:29 am

Hello,

At first I haved the same error than karlie and versilej....I have updated and now it is ok for out memory exceptions. Now I have another problem on the same method : PdfAddGdPictureImageToPdfOCR

The problem is : "Attempted to read or write protected memory. This is often an indication that other memory is corrupt."

Kind Regards

David


Le framework .NET a renvoyé l'erreur suivante :
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Exception: OCR exception: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Source: GdPicture.NET
StackTrace: at j.c(IntPtr A_0, String A_1, String A_2, String A_3, IntPtr& A_4, Int32& A_5, Int32 A_6, Int32 A_7, IntPtr A_8, Int32 A_9, Int32 A_10, Int32& A_11, Int32 A_12, Int32 A_13, Int32 A_14, Int32 A_15, Int32 A_16, Int32 A_17, Int32 A_18, Int32 A_19, Int32 A_20, Int32 A_21, Int32 A_22, Int32 A_23)
at j.a(IntPtr A_0, String A_1, String A_2, String A_3, IntPtr& A_4, Int32& A_5, Int32 A_6, Int32 A_7, IntPtr A_8, Int32 A_9, Int32 A_10, Int32& A_11, Int32 A_12, Int32 A_13, Int32 A_14, Int32 A_15, Int32 A_16, Int32 A_17, Int32 A_18, Int32 A_19, Int32 A_20, Int32 A_21, Int32 A_22, Int32 A_23)
at aq.a(Int32 A_0, Int32 A_1, Int32 A_2, Int32 A_3, Int32 A_4, String A_5, String A_6, String A_7, IntPtr& A_8, Int32& A_9, Int32 A_10)
at aq.a(Int32 A_0, Int32 A_1, Int32 A_2, Int32 A_3, Int32 A_4, String A_5, String A_6, String A_7, IntPtr& A_8, Int32& A_9, Int32 A_10)
at GdPicture.GdPictureImaging.PdfAddGdPictureImageToPdfOCR(Int32 PdfID, Int32 ImageID, String Dictionary, String DictionaryPath, String CharWhiteList)
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
at System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args)
at CDotNetType.bInvoke(CDotNetType* , Object gcrObj, SByte* pszNomMethode, CSLevel* pclPile, Int32 nNbParamPile, Int32 bValeurRetour, STOperationDotNet* pstOperation)
at CDotNetType.bInvoke(CDotNetType* , Object gcrObj, STMethodeDotNet* pstMethode, UInt32* pdwIdentifiant, CSLevel* pclPile, Int32 nNbParamPile, Int32 bValeurRetour, STOperationDotNet* pstOperation)
at CDotNetInstance.bAppelleMethode(CDotNetInstance* , STMethodeDotNet* pstMethode, UInt32* pdwIdentifiant, CSLevel* pclPile, Int32 nNbParamPile, Int32 bValeurRetour, STOperationDotNet* pstOperation)
vrtacic
 
Posts: 1
Joined: Mon Aug 01, 2011 10:07 am

Re: New methods

Postby Loïc » Tue Aug 02, 2011 1:52 am

Hi,

Please open a ticket to http://support.gdpicture.com providing instructions to reproduce the problem. W especially need code snippet and document causing the problem.

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4437
Joined: Tue Oct 17, 2006 10:48 pm
Location: France


Return to GdPicture Tesseract OCR Engine Plugin

Who is online

Users browsing this forum: Google [Bot] and 0 guests