Home | Translator | About us | Overview | Software - Download | Evaluate | Order | Support | Contact | F.A.Q. | Documentations | Blog | Newsletter | Site Content Search
Loading...

Generate searchable PDF from scanner, images or existing PDF

Example requests & Code samples for GdPicture.NET.

Generate searchable PDF from scanner, images or existing PDF

Postby dchillman » Mon Jan 26, 2009 4:34 pm

in the list of features for GDPicure.Net is "Generate searchable PDF from scanner, images or existing PDF files". I am interested particularly in the ability to open existing pdf documents and processing it to create a searchable pdf document. After examining the on-line documentation it wasn't obvious to me how this would be done. Do you have an example? Also if it is possible, can the source be a stream rather than a file? thanks Dan
dchillman
 
Posts: 2
Joined: Wed Jan 21, 2009 10:03 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Mon Feb 02, 2009 7:25 pm

- Sample 1: Creating multipage searchable PDF (PDF/A 1.4) from the content of the document feeder of a scanner:

Code: Select all
        Dim ImageID As Integer
        Dim bContinue As Boolean
        Dim PdfID As Integer
        Dim oGdPictureImaging As New GdPicture.GdPictureImaging

        oGdPictureImaging.SetLicenseNumber("GDPICTURE.NET_LICENSE_KEY")
        oGdPictureImaging.SetLicenseNumberOCRTesseract("GDPICTURE_TESSERACT_PLUGIN_LICENSE")

        If oGdPictureImaging.TwainOpenDefaultSource(Me.Handle) Then
            oGdPictureImaging.TwainOpenDefaultSource(Me.Handle)
            oGdPictureImaging.TwainSetAutoFeed(True) 'Set AutoFeed Enabled
            oGdPictureImaging.TwainSetAutoScan(True) 'To  achieve the maximum scanning rate

            oGdPictureImaging.TwainSetResolution(200)
            oGdPictureImaging.TwainSetPixelType(TwainPixelType.TWPT_BW) 'Black & White
            oGdPictureImaging.TwainSetBitDepth(1) ' 1 bpp

            PdfID = oGdPictureImaging.PdfOCRStart("c:\pdfocr.pdf", True, "", "", "", "", "")

            Do
                ImageID = oGdPictureImaging.TwainAcquireToGdPictureImage(Me.Handle)
                If ImageID <> 0 Then
                    oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(PdfID, ImageID, TesseractDictionary.TesseractDictionaryEnglish, "C:\Program Files\GdPicture.NET\Redist\OCR", "")
                    oGdPictureImaging.ReleaseGdPictureImage(ImageID)
                End If
                If oGdPictureImaging.TwainGetState <= TwainStatus.TWAIN_SOURCE_ENABLED Then
                    If MsgBox("Do you want acqure other pages ?", MsgBoxStyle.YesNo) = 6 Then
                        bContinue = True
                    Else
                        bContinue = False
                    End If
                Else
                    bContinue = True
                End If
            Loop While bContinue
            oGdPictureImaging.PdfOCRStop(PdfID)

            Call oGdPictureImaging.TwainCloseSource()
            MsgBox("Done !")
        Else
            MsgBox("can't open default source, twain state is: " & oGdPictureImaging.TwainGetState.ToString)
        End If





- Sample 2: Creating multipage searchable PDF (PDF/A 1.4) from a multipage TIFF image:

Code: Select all
        Dim ImageID As Integer
        Dim oGdPictureImaging As New GdPicture.GdPictureImaging

        oGdPictureImaging.SetLicenseNumber("GDPICTURE.NET_LICENSE_KEY")
        oGdPictureImaging.SetLicenseNumberOCRTesseract("GDPICTURE_TESSERACT_PLUGIN_LICENSE")
        oGdPictureImaging.TiffOpenMultiPageAsReadOnly(True)
        ImageID = oGdPictureImaging.TiffCreateMultiPageFromFile("")
        oGdPictureImaging.PdfOCRCreateFromMultipageTIFF(ImageID, TesseractDictionary.TesseractDictionaryEnglish, "C:\Program

Files\GdPicture.NET\Redist\OCR", "", "c:\pdfocr.pdf", True, "", "", "", "", "")
        oGdPictureImaging.ReleaseGdPictureImage(ImageID)





- Sample 3: Creating single page searchable PDF (PDF/A 1.4) from image:

Code: Select all
        Dim ImageID As Integer
        Dim oGdPictureImaging As New GdPicture.GdPictureImaging

        oGdPictureImaging.SetLicenseNumber("GDPICTURE.NET_LICENSE_KEY")
        oGdPictureImaging.SetLicenseNumberOCRTesseract("GDPICTURE_TESSERACT_PLUGIN_LICENSE")

        ImageID = oGdPictureImaging.CreateGdPictureImageFromFile("")
        oGdPictureImaging.SaveAsPDFOCR(ImageID, "c:\pdfocr.pdf", TesseractDictionary.TesseractDictionaryEnglish, "C:\Program

Files\GdPicture.NET\Redist\OCR", "", True, "", "", "", "", "")
        oGdPictureImaging.ReleaseGdPictureImage(ImageID)





- Sample 4: Creating multipage searchable PDF (PDF/A 1.4) from existing multipage PDF:

Code: Select all
        Dim ImageID As Integer
        Dim oGdViewer As New GdPicture.GdViewer
        Dim oGdPictureImaging As New GdPicture.GdPictureImaging
        Dim PdfID As Integer

        oGdViewer.SetLicenseNumber("GDPICTURE.NET_LICENSE_KEY")
        oGdPictureImaging.SetLicenseNumber("GDPICTURE.NET_LICENSE_KEY")
        oGdPictureImaging.SetLicenseNumberOCRTesseract("GDPICTURE_TESSERACT_PLUGIN_LICENSE")


        oGdViewer.DisplayFromFile("")

        PdfID = oGdPictureImaging.PdfOCRStart("c:\pdfocr.pdf", True, "", "", "", "", "")
        For i As Integer = 1 To oGdViewer.PageCount
            ImageID = oGdViewer.PdfRenderPageToGdPictureImage(200, i)

            oGdPictureImaging.ConvertTo1Bpp(ImageID)
            oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(PdfID, ImageID, TesseractDictionary.TesseractDictionaryEnglish, "C:\Program Files\GdPicture.NET\Redist\OCR", "")
            oGdViewer.ReleaseGdPictureImage(ImageID)
        Next
        oGdPictureImaging.PdfOCRStop(PdfID)
        oGdViewer.CloseDocument()
User avatar
Loïc
Site Admin
 
Posts: 3410
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby jloizagah » Wed May 27, 2009 6:37 pm

Hi...

I'm trying to use the example: creating multipage searchable PDF (PDF/A 1.4) from existing multipage PDF, but when I copy the code, seems that my object oGdPictureImaging has not the methods PdfOCRStart, PdfAddGdPictureImageToPdfOCR and PdfOCRStop. There is a version problem or something like thath...

Best regards...
jloizagah
 
Posts: 5
Joined: Tue Mar 17, 2009 3:45 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Jun 30, 2009 12:47 pm

Hi,

You need to download the latest edition: http://www.gdpicture.com/download/downl ... urenet.php

Kind regards,

Loïc
User avatar
Loïc
Site Admin
 
Posts: 3410
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Dec 01, 2009 6:44 pm

Hi Loic,

Could you post a sample about creating multipage searchable PDF from existing (multipage) PDF, without using GdViewer object. I need creating the file without preview.

Mirko
mirkop
 
Posts: 34
Joined: Wed Jun 24, 2009 6:38 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Dec 01, 2009 6:53 pm

Hi Mirko,

The code i gave don't generate any preview. You can use within a simple function in a formless application.


kind regards,

Loïc
User avatar
Loïc
Site Admin
 
Posts: 3410
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Feb 23, 2010 3:05 pm

Hi,

I'm using your sample code for creating a pdfseacheable from an existing pdf and it works.
I downloaded the latest version of gdpicture.

My application monitoring a folder and convert the pdf files to pdf searchable. In this folder there are pdf searchable and not searchable.
But, many times the new pdf has a size major then the original pdf. It's happen if the original file is a pdf searchable.

I send you to esupport@gdpicture.com the pdf file.

Mirko
mirkop
 
Posts: 34
Joined: Wed Jun 24, 2009 6:38 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Feb 23, 2010 3:51 pm

Hi Mirko,

It is a normal behavior. With this method a fully new PDF is created using raster bitmap. If the input PDF has page composed with only bitmaps you can have a small superior or inferior resulting file size.
However, if the input document is composed with vector objects (shapes & text) the resulting PDF will have in many case larger size. Usually this kind of PDF don't need to be processed because text is already embedded into the document.

What I can suggest you is to try to extract text of the original PDF using the GdViewer object. If the PDF contains text you should no process it. if there is no text inside, you can perform OCR.

Let me know if I am not clear enough.

With best regards,

Loïc Carrère
User avatar
Loïc
Site Admin
 
Posts: 3410
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Feb 23, 2010 4:22 pm

Hi,

It's clear .. i'll use your suggestion.

How can extract the text from the pdf , using PdfGetPageText()?

Mirko
mirkop
 
Posts: 34
Joined: Wed Jun 24, 2009 6:38 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Feb 23, 2010 4:23 pm

How can extract the text from the pdf , using PdfGetPageText()?


Yes, it is the faster way.

Kind regards,

Loïc
User avatar
Loïc
Site Admin
 
Posts: 3410
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Feb 23, 2010 4:32 pm

Hi Loic,

Thank you for your reply .. it works fine.
mirkop
 
Posts: 34
Joined: Wed Jun 24, 2009 6:38 pm


Return to Example requests & Code samples

Who is online

Users browsing this forum: No registered users and 0 guests

cron