Loading...

PDF + OCR not PDF/A

Support for GdPicture Tessaract Plugin.

PDF + OCR not PDF/A

Postby rromeijn » Tue Jul 06, 2010 1:20 pm

How can i save the image as a plain PDF with OCR
this sample code

Code: Select all
Imaging1.CreateImageFromFile ("image.tif")
Imaging1.SaveAsPDFOCR("output.pdf", TesseractDictionaryEnglish, App.Path & "\AppData")  'AppData includes dictionary files
Imaging1.CloseNativeImage


saves the image as PDF/A + OCR
i want plain PDF + OCR
rromeijn
 
Posts: 16
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Postby rromeijn » Mon Aug 16, 2010 11:24 am

a lot of views, but not 1 reply.
rromeijn
 
Posts: 16
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Postby eagleman » Mon Aug 16, 2010 3:22 pm

@rromeijn

I do the following:

imageID = Imaging1.CreateGdPictureImageFromFile("00000001.JPG");
iPdfId = Imaging1.TwainPdfStart("00000001.PDF", true, "", "", "", "", "");
Imaging1.TwainAddGdPictureImageToPdf(iPdfId, imageID);
Imaging1.TwainPdfStop(iPdfId);
Imaging1.ReleaseGdPictureImage(imageID);


Good luck.

Eagleman
eagleman
 
Posts: 27
Joined: Mon Jan 25, 2010 1:48 pm

Re: PDF + OCR not PDF/A

Postby rromeijn » Mon Aug 16, 2010 3:25 pm

thanks,

but that saves the image as a PDF without OCR
I need PDF with OCR, but not PDF/A with OCR
rromeijn
 
Posts: 16
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Postby Loïc » Mon Aug 16, 2010 3:31 pm

Hi,

This option is not available.
Why PDF/A is a problem for you ? PDF/A is certified 100% PDF compliant.

A workaround consists to remove the PDF/A flag replacing the header information "%âãÏÓ" by " " in the generated PDF. But there is no sense to do that as my humble opinion...

Kind regards,

Loïc
Loïc Carrère, support team.
www.orpalis.com
User avatar
Loïc
Site Admin
 
Posts: 4445
Joined: Tue Oct 17, 2006 10:48 pm
Location: France

Re: PDF + OCR not PDF/A

Postby eagleman » Mon Aug 16, 2010 8:58 pm

To do OCR on image and save as PDF:

imageID = Imaging1.CreateGdPictureImageFromFile("00000001.JPG");
iPdfId = Imaging1.PdfOCRStart("00000001.PDF", true, "", "", "", "", "");
Imaging1.PdfAddGdPictureImageToPdfOCR(iPdfId
, imageID
, GdPicture.TesseractDictionary.TesseractDictionaryDutch
, Application.StartupPath.ToString() + "\\OCR"
, "");
Imaging1.PdfOCRStop(iPdfId);
Imaging1.ReleaseGdPictureImage(imageID);


Eagleman

Note: According to the manual, the 2nd parameter of PdfOCRStart (boolean): True to generate PDF in PDF/A format else False.
eagleman
 
Posts: 27
Joined: Mon Jan 25, 2010 1:48 pm

Re: PDF + OCR not PDF/A

Postby rromeijn » Tue Aug 17, 2010 8:30 am

Eagleman,

according to my manual this function doesnt even exist.
rromeijn
 
Posts: 16
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Postby rromeijn » Tue Aug 17, 2010 8:34 am

Loic,

as you know, there are several restrictions to the PDF/A format that are not there in PDF(1.3)
(hyperlinks are not allowed)
I also have a customer who can only display PDF up to version 1.3 in his (expensive) software.

I will lookin to the option you described, but an option to save plain PDF would be nice.
rromeijn
 
Posts: 16
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Postby eagleman » Tue Aug 17, 2010 5:08 pm

@rromeijn,

Make sure you have the latest manual. Although the manual does not show any version number, its name = "GdPicture_NET Document Imaging SDK.pdf" and is about 7.1 MB.

The function I mentioned does exist. Try the code I wrote earlier.

Succes.

Groet,
Eagleman
eagleman
 
Posts: 27
Joined: Mon Jan 25, 2010 1:48 pm


Return to GdPicture Tesseract OCR Engine Plugin

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron