Home | Translator | About us | Overview | Software - Download | Evaluate | Order | Support | Contact | F.A.Q. | Documentations | Blog | Site Content Search
Loading...

How to create searchable PDF

Example requests & Code samples for GdPicture ActiveX.

How to create searchable PDF

Postby Loïc » Sun Oct 12, 2008 6:44 pm

Several VB6 samples to create searchable PDF file using GdPicture Pro Imaging SDK:

Note: The optional GdPicture OCR Tesseract Plugin is needed: http://www.gdpicture.com/products/plugi ... engine.php


- Sample 1: Creating multipage searchable PDF from the content of the document feeder of a scanner:


Code: Select all
Dim nImageID As Long
Dim nCpt As Long
   
If Imaging1.TwainOpenDefaultSource() Then   
   Imaging1.TwainSetAutoFeed (True) 'Set AutoFeed Enabled
   Imaging1.TwainSetAutoScan (True) 'To  achieve the maximum scanning rate
   Imaging1.TwainSetCurrentResolution (300) 'We scan in 300 DPI
   Imaging1.TwainSetCurrentPixelType (TWPT_BW) 'Black & White scanning
   Imaging1.TwainSetCurrentBitDepth (1) ' 1 bpp scanning
     
   Imaging1.TwainPdfOCRStart ("output.pdf")
   While Imaging1.CreateImageFromTwain(Me.hWnd) <> 0
         nImageID = Imaging1.GetNativeImage
          'In AppData we should have ne needed dictionary files
         Call Imaging1.TwainAddGdPictureImageToPdfOCR(nImageID, TesseractDictionaryEnglish,  App.Path & "\AppData") 'AppData includes dictionary files
         Imaging1.CloseImage (nImageID)
   Wend
   Imaging1.TwainPdfOCRStop
     
   Call Imaging1.TwainCloseSource
Else
   MsgBox "can't open default source, twain state is: " & Trim(Str(Imaging1.TwainGetState))
End If



- Sample 2: Creating multipage searchable PDF from a multipage TIFF image:

Code: Select all
Dim nImageID As Long

Imaging1.TiffOpenMultiPageAsReadOnly (True)
nImageID = Imaging1.CreateImageFromFile("multipage.tif")
'In AppData we should have ne needed dictionary files
Call Imaging1.PdfOCRCreateFromMultipageTIFF(nImageID, "output.pdf", TesseractDictionaryEnglish, App.Path & "\AppData")  'AppData includes dictionary files
Call Imaging1.CloseImage(nImageID)



- Sample 3: Creating single page searchable PDF from image:

Code: Select all
Imaging1.CreateImageFromFile ("image.tif")
Call Imaging1.SaveAsPDFOCR("output.pdf", TesseractDictionaryEnglish, App.Path & "\AppData")  'AppData includes dictionary files
Imaging1.CloseNativeImage



- Sample 4: Creating multipage searchable PDF from existing multipage PDF:


Code: Select all
Dim nPage As Long
Dim oImaging As Object, oGdViewer As Object
Dim RasterizedPage As Long

Set oImaging = CreateObject("gdpicturepro5.Imaging")
Set oGdViewer = CreateObject("gdpicturepro5.GdViewer")

oGdViewer.SetLicenseNumber ("XXX")
oImaging.SetLicenseNumber ("XXX")
oGdViewer.LockControl = True
oGdViewer.PdfDpiRendering = 200
oGdViewer.DisplayFromPdfFile ("c:\test.pdf")

For nPage = 1 To oGdViewer.PageCount
    oGdViewer.DisplayFrame (nPage)

    RasterizedPage = oGdViewer.GetNativeImage

    If nPage = 1 Then
       oImaging.TwainPdfOCRStartEx ("c:\testocr.pdf")
    End If
    Call oImaging.TwainAddGdPictureImageToPdfOCR(RasterizedPage, TesseractDictionaryEnglish, App.Path & "\AppData") 'AppData includes dictionary files
Next nPage
oImaging.TwainPdfOCRStop

oGdViewer.CloseImage
User avatar
Loïc
Site Admin
 
Posts: 3025
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: How to create searchable PDF

Postby Dantevios » Sun Jan 18, 2009 9:01 pm

What data type is Imaging1 , and how did you instantiate it?
- Myself

Nevermind. I figured it out. Imageing1 is an instantiation of GdPicturePro5.cImaging, in C# you instantiate it like this:

Code: Select all
GdPicturePro5.cImaging cImage = new GdPicturePro5.cImaging();

I have figured out how to make a single page searchable PDF out of a tif in C#, here is the code for all those looking for it:

GdPicturePro5.cImaging cImage = new GdPicturePro5.cImaging();
            cImage.SetLicenseNumber("XXXXX"); //Replace XXXXX with your license #
            cImage.SetLicenseNumberOCRTesseract("XXXXX"); //Replace XXXXX with your license #
            cImage.CreateImageFromFile("C:\\input.tif");
            cImage.SaveAsPDFOCR("C:\\output.pdf", GdPicturePro5.TesseractDictionary.TesseractDictionaryEnglish, "PATH TO YOUR UNZIPPED DICTIONARY FILES", "", "Mr. Smith", "Mr. Smith", "Mr. Smith", "Mr. Smith");
            cImage.CloseNativeImage();


I know there are two topic posts on this forum about people wanting examples of how to create multipage searchable PDFs in C# so I am also providing this example to make multipage PDFs out of multipage TIFs

Code: Select all
      int nDimage = 0;

            GdPicturePro5.cImaging cImage = new GdPicturePro5.cImaging();
           
            cImage.SetLicenseNumber("XXXXX"); //Replace XXXXX with your license #
            cImage.SetLicenseNumberOCRTesseract("XXXXX"); //Replace XXXXX with your license #
            cImage.TiffOpenMultiPageAsReadOnly(true);           
            nDimage = cImage.CreateImageFromFile("C:\\input.tif");           
            cImage.PdfOCRCreateFromMultipageTIFF(nDimage, "C:\\output.pdf", GdPicturePro5.TesseractDictionary.TesseractDictionaryEnglish, PATH TO YOUR DICTIONARY FILES, "", "Mr. Smith", "Mr. Smith", "Mr. Smith", "Mr. Smith");           
            cImage.CloseImage(nDimage);
Last edited by Dantevios on Sun Jan 18, 2009 9:50 pm, edited 2 times in total.
Dantevios
 
Posts: 4
Joined: Sun Jan 18, 2009 8:59 pm

Re: How to create searchable PDF

Postby Loïc » Sun Jan 18, 2009 9:05 pm

Thank you Dante. This should be a good help for many users ;)

Loïc
User avatar
Loïc
Site Admin
 
Posts: 3025
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: How to create searchable PDF

Postby dchillman » Wed Jan 21, 2009 10:11 pm

I am evaluating your tool for a slightly different purpose. I have a bunch of pdf files in a sharepoint list that may or may not be text-searchable. My requirement is to create a feature which loops through the list, opens the pdf, updates it to make it text-searchable, then save it back to the list. I can open up each pdf file as a memory stream. Will it then be possible to pass the stream to your object, have it processed to make it text-searchable, and get back the updated stream, which I can them pass back to the sharepoint list? If so, can you post a code example of how to handle a memory stream with your objects? thanks
dchillman
 
Posts: 2
Joined: Wed Jan 21, 2009 10:03 pm

Re: How to create searchable PDF

Postby Loïc » Fri Jan 23, 2009 1:06 pm

Hi,

You can't open a PDF from a stream object in GdPicture ActiveX. This feature is only available in GdPicture.NET.

To create a searchable PDF from an existing PDF document see
Sample 4: Creating multipage searchable PDF from existing multipage PDF



Best regards,

Loïc
User avatar
Loïc
Site Admin
 
Posts: 3025
Joined: Tue Oct 17, 2006 11:48 pm
Location: France

Re: How to create searchable PDF

Postby alexandres » Fri May 08, 2009 5:48 pm

Hi,

I'm evualating your software for imaging and OCR,
My doubt is if there's a way to use other OCR engine than Tesseract to create searchable PDF.
alexandres
 
Posts: 1
Joined: Fri May 08, 2009 5:38 pm


Return to Example requests & Code samples

Who is online

Users browsing this forum: No registered users and 0 guests

Feedback Form