Loading...

Generate searchable PDF from scanner, images or existing PDF

Example requests & Code samples for GdPicture.NET Toolkits.

Generate searchable PDF from scanner, images or existing PDF

Postby dchillman » Mon Jan 26, 2009 3:34 pm

in the list of features for GDPicure.Net is "Generate searchable PDF from scanner, images or existing PDF files". I am interested particularly in the ability to open existing pdf documents and processing it to create a searchable pdf document. After examining the on-line documentation it wasn't obvious to me how this would be done. Do you have an example? Also if it is possible, can the source be a stream rather than a file? thanks Dan
dchillman
 
Posts: 2
Joined: Wed Jan 21, 2009 9:03 pm

Re: Generate searchable PDF from scanner, images or existing

Postby Loïc » Mon Feb 02, 2009 6:25 pm

- Sample 1: Creating multipage searchable PDF (PDF/A 1.4) from the content of the document feeder of a scanner:

Code: Select all
      Dim ImageID As Integer
      Dim bContinue As Boolean
      Dim PdfID As Integer
      Dim oGdPictureImaging As New GdPicture.GdPictureImaging


      If oGdPictureImaging.TwainOpenDefaultSource(Me.Handle) Then
         oGdPictureImaging.TwainOpenDefaultSource(Me.Handle)
         oGdPictureImaging.TwainSetAutoFeed(True) 'Set AutoFeed Enabled
         oGdPictureImaging.TwainSetAutoScan(True) 'To  achieve the maximum scanning rate

         oGdPictureImaging.TwainSetResolution(200)
         oGdPictureImaging.TwainSetPixelType(TwainPixelType.TWPT_BW) 'Black & White
         oGdPictureImaging.TwainSetBitDepth(1) ' 1 bpp

         PdfID = oGdPictureImaging.PdfOCRStart("c:\pdfocr.pdf", True, "MyTitle", "MyAuthor", "MySubject", "MyKeywords", "MyCreator") 'We generate PDF/A

         Do
            ImageID = oGdPictureImaging.TwainAcquireToGdPictureImage(Me.Handle)
            If ImageID <> 0 Then
               oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(PdfID, ImageID, "eng", "C:\Program Files\GdPicture.NET 8\Redist\OCR", "")
               oGdPictureImaging.ReleaseGdPictureImage(ImageID)
            End If
            If oGdPictureImaging.TwainGetState <= TwainStatus.TWAIN_SOURCE_ENABLED Then
               If MsgBox("Do you want acqure other pages ?", MsgBoxStyle.YesNo) = 6 Then
                  bContinue = True
               Else
                  bContinue = False
               End If
            Else
               bContinue = True
            End If
         Loop While bContinue
         oGdPictureImaging.PdfOCRStop(PdfID)

         Call oGdPictureImaging.TwainCloseSource()

         MsgBox("Done !")
      Else
         MsgBox("can't open default source, twain state is: " & oGdPictureImaging.TwainGetState.ToString)
      End If

      oGdPictureImaging.Dispose()





- Sample 2: Creating multipage searchable PDF (PDF/A 1.4) from a multipage TIFF image:

Code: Select all
       
      Dim oGdPictureImaging As New GdPicture.GdPictureImaging
      Dim ImageID As Integer = oGdPictureImaging.TiffCreateMultiPageFromFile("")
      If ImageID <> 0 Then
         oGdPictureImaging.PdfOCRCreateFromMultipageTIFF(ImageID, "eng", "C:\Program Files\GdPicture.NET 8\Redist\OCR", "", "c:\pdfocr.pdf", True, "MyTitle", "MyAuthor", "MySubject", "MyKeywords", "MyCreator")
         oGdPictureImaging.ReleaseGdPictureImage(ImageID)
      End If

      oGdPictureImaging.Dispose()





- Sample 3: Creating single page searchable PDF (PDF/A 1.4) from image:

Code: Select all
      Dim oGdPictureImaging As New GdPicture.GdPictureImaging
      Dim ImageID As Integer = oGdPictureImaging.CreateGdPictureImageFromFile("")
      If ImageID <> 0 Then
         oGdPictureImaging.PdfOCRCreateFromMultipageTIFF(ImageID, "eng", "C:\Program Files\GdPicture.NET 8\Redist\OCR", "", "c:\pdfocr.pdf", True, "MyTitle", "MyAuthor", "MySubject", "MyKeywords", "MyCreator")
         oGdPictureImaging.ReleaseGdPictureImage(ImageID)
      End If

      oGdPictureImaging.Dispose()





- Sample 4: Creating multipage searchable PDF (PDF/A 1.4) from existing multipage PDF:

Code: Select all
      Dim oGdPictureImaging As New GdPicture.GdPictureImaging
      Dim pdfOcrID As Integer
      Dim pdfInput As New GdPicture.GdPicturePDF

      If pdfInput.LoadFromFile("c:\test.pdf", False) = GdPictureStatus.OK Then
         pdfOcrID = oGdPictureImaging.PdfOCRStart("c:\pdfocr.pdf", True, "MyTitle", "MyAuthor", "MySubject", "MyKeywords", "MyCreator")

         For i As Integer = 1 To pdfInput.GetPageCount
            If pdfInput.SelectPage(1) Then
               Dim rasterPageID As Integer = pdfInput.RenderPageToGdPictureImage(200, True) 'Set False to don't render formfields & annots
               If rasterPageID <> 0 Then
                  oGdPictureImaging.ConvertTo1BppAT(rasterPageID) 'We generate bitonal PDF output, comment this line to keep true colour document
                  oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(pdfOcrID, rasterPageID, "eng", "C:\Program Files\GdPicture.NET 8\Redist\OCR", "")
                  oGdPictureImaging.ReleaseGdPictureImage(rasterPageID)
               End If
            End If
         Next
         pdfInput.CloseDocument()
         oGdPictureImaging.PdfOCRStop(pdfOcrID)
         oGdPictureImaging.Dispose()
      End If
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4430
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby jloizagah » Wed May 27, 2009 5:37 pm

Hi...

I'm trying to use the example: creating multipage searchable PDF (PDF/A 1.4) from existing multipage PDF, but when I copy the code, seems that my object oGdPictureImaging has not the methods PdfOCRStart, PdfAddGdPictureImageToPdfOCR and PdfOCRStop. There is a version problem or something like thath...

Best regards...
jloizagah
 
Posts: 10
Joined: Tue Mar 17, 2009 2:45 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Jun 30, 2009 11:47 am

Hi,

You need to download the latest edition: http://www.gdpicture.com/download/downl ... urenet.php

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4430
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Dec 01, 2009 5:44 pm

Hi Loic,

Could you post a sample about creating multipage searchable PDF from existing (multipage) PDF, without using GdViewer object. I need creating the file without preview.

Mirko
mirkop
 
Posts: 37
Joined: Wed Jun 24, 2009 5:38 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Dec 01, 2009 5:53 pm

Hi Mirko,

The code i gave don't generate any preview. You can use within a simple function in a formless application.


kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4430
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Feb 23, 2010 2:05 pm

Hi,

I'm using your sample code for creating a pdfseacheable from an existing pdf and it works.
I downloaded the latest version of gdpicture.

My application monitoring a folder and convert the pdf files to pdf searchable. In this folder there are pdf searchable and not searchable.
But, many times the new pdf has a size major then the original pdf. It's happen if the original file is a pdf searchable.

I send you to esupport@gdpicture.com the pdf file.

Mirko
mirkop
 
Posts: 37
Joined: Wed Jun 24, 2009 5:38 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Feb 23, 2010 2:51 pm

Hi Mirko,

It is a normal behavior. With this method a fully new PDF is created using raster bitmap. If the input PDF has page composed with only bitmaps you can have a small superior or inferior resulting file size.
However, if the input document is composed with vector objects (shapes & text) the resulting PDF will have in many case larger size. Usually this kind of PDF don't need to be processed because text is already embedded into the document.

What I can suggest you is to try to extract text of the original PDF using the GdViewer object. If the PDF contains text you should no process it. if there is no text inside, you can perform OCR.

Let me know if I am not clear enough.

With best regards,

Loïc Carrère
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4430
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Feb 23, 2010 3:22 pm

Hi,

It's clear .. i'll use your suggestion.

How can extract the text from the pdf , using PdfGetPageText()?

Mirko
mirkop
 
Posts: 37
Joined: Wed Jun 24, 2009 5:38 pm

Re: Generate searchable PDF from scanner, images or existing PDF

Postby Loïc » Tue Feb 23, 2010 3:23 pm

How can extract the text from the pdf , using PdfGetPageText()?


Yes, it is the faster way.

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4430
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: Generate searchable PDF from scanner, images or existing PDF

Postby mirkop » Tue Feb 23, 2010 3:32 pm

Hi Loic,

Thank you for your reply .. it works fine.
mirkop
 
Posts: 37
Joined: Wed Jun 24, 2009 5:38 pm


Return to Example requests & Code samples For GdPicture.NET

Who is online

Users browsing this forum: No registered users and 0 guests