Creating smaller PDFs

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Creating smaller PDFs

Post by dpc »

cday wrote: 02 Jan 2023, 05:16 My earlier Acrobat XI Standard version I think supports batch conversion of source PDF files
Would it be too much trouble if you could verify that?
cday
Posts: 445
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Creating smaller PDFs

Post by cday »

dpc wrote: 02 Jan 2023, 15:28
cday wrote: 02 Jan 2023, 05:16 My earlier Acrobat XI Standard version I think supports batch conversion of source PDF files
Would it be too much trouble if you could verify that?
Confirmed in a quick test: I understand your surprise that a Standard version supports any form of batch processing!

Something I noticed late in my last scanning project, but not something that would actually have been useful to me, I don't know whether any other forms of processing might also support batch operation. As this is now quite an old Acrobat version I'll send you details in a PM.

I think batch conversion to 'ClearScan', now termed 'Editable Text and Images' I think, when available could potentially be useful to someone with a large number of PDFs to convert, but probably only for high quality input when no interaction will be required during the conversion process.
BruceG
Posts: 99
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Creating smaller PDFs

Post by BruceG »

My Acrobat 9 Pro will also do batch Clearscan OCR of files in folders. They need to be selected though.
Abbyy Finereader Corporate allows 5000 pages with 2 cores to be OCRed in a certain folder. So what ever is put in that folder over the month is done, well up to the 5000 page limit. Approx 250 pages a day.
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Creating smaller PDFs

Post by dpc »

Thanks for the info.

I use Acrobat XI Pro on Windows and I'd like to find some way to make it take all of the *.tif files in a specified directory and combine them into a single ClearScan PDF without any user interaction along the way. Looks like it can be done by using OLE Automation, but I haven't done anything with OLE in decades.
cday
Posts: 445
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Creating smaller PDFs

Post by cday »

dpc wrote: 02 Jan 2023, 15:28
cday wrote: 02 Jan 2023, 05:16 My earlier Acrobat XI Standard version I think supports batch conversion of source PDF files
Would it be too much trouble if you could verify that?
Confirmed, as the thread has continued on this general topic I'll post the details here.

To set up batch conversion of multiple PDF files to ClearScan PDFs:

Launch Acrobat XI, and then select:

Edit > Text & Images [If the option is grayed out open any file];

Text Recognition > In Multiple Files;

Add Files [Other options displayed: Add Folders and Add Open Files].

Then continue in the usual way.

Regarding dpc's new issue: I couldn't see a way to convert a folder of TIFF files directly to a single ClearScan PDF file, only to separate ClearScan PDF files. In view of the discovery reported above, though, it might be worth checking if that option might be available in your Adobe Acrobat Pro version. If that is a serious need it might be worth considering whether a scripting utility such as AutoHotKey might provide a solution, said to be very powerful but would of course require familiarisation. There is a forum where the basic feasibility could no doubt be checked.

There are actually as you probably know many scripting utilities, some including a recorder that can record keystrokes and mouse actions. If it isn't possible to detect the end of one processing step, when user action would normally be required, it should as the overall operation is to be run unattended be possible to simply use a generous time delay.
Oliver
Posts: 7
Joined: 19 Dec 2022, 14:14
E-book readers owned: Kindle
Number of books owned: 300
Country: Deutschland

Re: Creating smaller PDFs

Post by Oliver »

Hello together,

that is a lot of information to comprehend for someone who never worked with scanned books and creating PDFs. Luckily, you all answerd my other questions about OCR which I had on the way. Creating smaller PDFs and OCR are two very common topics of course.

I will try Adobe and Abbyy FineReader once I completed all the prework and I will use their trial periods to process those files in batches. But I don't think, displaying the OCR layer will be an option for me.

Those programs do some mistakes as I could see within your files and Italian isn't a language I speak. I can comprehend these books about fungi in Italien because a lot is written in scientific terms and I know some Latin, but I am not able to correct the OCR. You all can probably imagine how much work it would be to correct a text in a language you don't speak and know.

My university and its library are closed for two weeks during the winter time. Normally the library is always open, but due to high energy costs here in europe this winter, they are trying to limit the costs for energy in such way.
But I will try to scan a book in a different resolution. I am not quite sure yet, if I will be able to change that setting, but I will see.

Kind Regards
Oliver
cday
Posts: 445
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Creating smaller PDFs

Post by cday »

One of the advantages for the Adobe Acrobat 'ClearScan' output option is that the output file will display an exact image of the scanned page, so even when text recognition is not perfect you will still be able to view the original page. It depends how important complete searchability will be for you.

Regarding output filesize, I think that the enhanced 'ClearScan' processing in the current Adobe Acrobat version should produce file sizes about the same, or at least little larger, than FineReader output that uses standard fonts if the fonts are embedded, which in general will not match the original font(s) used, and may contain recognition errors. When the fonts used are not embedded in the file alternative fonts might be used when the file is viewed on another computer.

Adobe Acrobat is likely to be more expensive, but with careful planing, possibly after some preliminary tests using a short free trial, you could possibly complete a project that simply consists of reprocessing existing PDF files using standard settings very quickly. Maybe even a day, depending on how many files you have to convert!

Incidentally, I read somewhere in the current Adobe Acrobat support documentation that scanning at 300 DPI is now recommended for optimum results. Overall, I think that you should concentrate on refining your scanning and post-processing to consistently obtain high-quality PDF files that can be processed in Acrobat or FineReader to add searchability and possibly also optimise image compression. Using a flatbed scanner it should be possible to obtain results that will convert very well. You have my earlier offer to convert some test files to Acrobat XI 'ClearScan', although file sizes may be a bit larger in that earlier version, in a file with a significant number of images the size of the image content will dominate the overall filesize.
Post Reply