Doubling Output resolution (DPI)

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

Post Reply
Old Man's Teeth
Posts: 4
Joined: 04 Mar 2014, 00:53

Doubling Output resolution (DPI)

Post by Old Man's Teeth »

Hi Everyone,

I'm new to this game -- found this community on Thursday and built my first scanner on Saturday using Daniel's basic plan. Thanks for sharing all this information!

The problem I'm having is that the processing stages are taking foreeeeeeever. I've tried Abbyy Reader 10 to convert my Scan Tailored tiff files to a searchable PDF, and that takes several hours. So I decided to try Acrobat Pro ClearScan and that also took a few hours. Why is it taking so long?

Does it have to do with pixel size? One inch of text in my photos works out to around 500 dpi and, as recommended in the video tutorial, I doubled the output resolution in Scan Tailor: 1000 dpi. Is this the reason my processing is so slow?

Why do I have to double the output resolution? Is there a standard output resolution I should shoot for? Why do my photos have such high dpi?

How long should the computer processing stage reasonably take for a 300 page text-based book?

Thanks!

OMT
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Doubling Output resolution (DPI)

Post by rob »

Well, typical OCR programs are optimized for images of around 300 dpi, so by going higher, you're actually making the program's life more difficult, and make you wait more time :) Lower the resolution on the images, and you should get better times, but even a 300 page book isn't going to be quick.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Doubling Output resolution (DPI)

Post by spamsickle »

I just ran 500 pages through Acrobat Clearscan in 45 minutes, so unless you're running with a very old CPU I'd say a 300-page book shouldn't be taking hours. Try reducing your output DPI to 600 and see if that doesn't help. I've always ignored Tulon's advice and slammed 300 DPI on my input, and taken the default 600 DPI, and it seems to have been close enough.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Doubling Output resolution (DPI)

Post by Tulon »

There is probably no reason to go higher than 600 DPI on output, just as there is no reason to go higher than twice the input DPI. There is however a reason to have the output DPI higher than the input one. When doing geometric transformations on an image, you loose details. Upscaling the image before or during such transformations helps to preserve them. Another argument is upscaling allows us to trade the loss of color resolution (grayscale -> B/W) for an increase in spacial resolution. The reason there is probably no reason to go above 600 DPI on output is because 600 DPI is already very high quality and you probably don't need more. If you are going to OCR Scan Tailor's output and then discard it, 300 DPI output might be fine for you. As for most OCRs being optimized for 300 DPI, I would say that any self respecting OCR would appreciate a higher DPI input, provided the DPI is correctly specified in the file, which would be the case with Scan Tailor's output.

However, messing with input DPIs is highly discouraged. You can't imagine how many times I had to deal with complains that were caused by people putting arbitrary values as input DPIs.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Old Man's Teeth
Posts: 4
Joined: 04 Mar 2014, 00:53

Re: Doubling Output resolution (DPI)

Post by Old Man's Teeth »

Thanks, Tulon. I won't be discarding the Scan Tailor output, the OCR software makes too many errors to be able to discard the original image (plus the original just looks nicer). So I will aim for a focal length that gives me 300 dpi images, output 600 dpi in ST, and from there create PDFs, either with an OCR overlay or just simply based off the image files. Does that make sense?

When creating a non-searchable PDF from the ST tiff files in Acrobat, what quality settings should I use to get a good product while keeping the file size down?
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Doubling Output resolution (DPI)

Post by daniel_reetz »

When creating a non-searchable PDF from the ST tiff files in Acrobat, what quality settings should I use to get a good product while keeping the file size down?
Be sure to do a Google search of the forums - this is something that has been discussed often, other people are working on it and sharing info. I'm on a mobile phone, so I can't search it for you right now.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Doubling Output resolution (DPI)

Post by Tulon »

Old Man's Teeth wrote:So I will aim for a focal length that gives me 300 dpi images, output 600 dpi in ST, and from there create PDFs
I wouldn't do that. Shoot at maximum zoom to get the maximum DPI, then set the output DPI to be twice that, or 600, whichever is less. There is no benefit in having the output DPI exactly twice the input DPI. You are not going to get exact pixel-to-pixel mapping anyway, because of deskew.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Post Reply