Hi Everyone,
I'm new to this game -- found this community on Thursday and built my first scanner on Saturday using Daniel's basic plan. Thanks for sharing all this information!
The problem I'm having is that the processing stages are taking foreeeeeeever. I've tried Abbyy Reader 10 to convert my Scan Tailored tiff files to a searchable PDF, and that takes several hours. So I decided to try Acrobat Pro ClearScan and that also took a few hours. Why is it taking so long?
Does it have to do with pixel size? One inch of text in my photos works out to around 500 dpi and, as recommended in the video tutorial, I doubled the output resolution in Scan Tailor: 1000 dpi. Is this the reason my processing is so slow?
Why do I have to double the output resolution? Is there a standard output resolution I should shoot for? Why do my photos have such high dpi?
How long should the computer processing stage reasonably take for a 300 page text-based book?
Thanks!
OMT
Doubling Output resolution (DPI)
Moderator: peterZ
- rob
- Posts: 773
- Joined: 03 Jun 2009, 13:50
- E-book readers owned: iRex iLiad, Kindle 2
- Number of books owned: 4000
- Country: United States
- Location: Maryland, United States
- Contact:
Re: Doubling Output resolution (DPI)
Well, typical OCR programs are optimized for images of around 300 dpi, so by going higher, you're actually making the program's life more difficult, and make you wait more time Lower the resolution on the images, and you should get better times, but even a 300 page book isn't going to be quick.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
-
- Posts: 596
- Joined: 06 Jun 2009, 23:57
Re: Doubling Output resolution (DPI)
I just ran 500 pages through Acrobat Clearscan in 45 minutes, so unless you're running with a very old CPU I'd say a 300-page book shouldn't be taking hours. Try reducing your output DPI to 600 and see if that doesn't help. I've always ignored Tulon's advice and slammed 300 DPI on my input, and taken the default 600 DPI, and it seems to have been close enough.
Re: Doubling Output resolution (DPI)
There is probably no reason to go higher than 600 DPI on output, just as there is no reason to go higher than twice the input DPI. There is however a reason to have the output DPI higher than the input one. When doing geometric transformations on an image, you loose details. Upscaling the image before or during such transformations helps to preserve them. Another argument is upscaling allows us to trade the loss of color resolution (grayscale -> B/W) for an increase in spacial resolution. The reason there is probably no reason to go above 600 DPI on output is because 600 DPI is already very high quality and you probably don't need more. If you are going to OCR Scan Tailor's output and then discard it, 300 DPI output might be fine for you. As for most OCRs being optimized for 300 DPI, I would say that any self respecting OCR would appreciate a higher DPI input, provided the DPI is correctly specified in the file, which would be the case with Scan Tailor's output.
However, messing with input DPIs is highly discouraged. You can't imagine how many times I had to deal with complains that were caused by people putting arbitrary values as input DPIs.
However, messing with input DPIs is highly discouraged. You can't imagine how many times I had to deal with complains that were caused by people putting arbitrary values as input DPIs.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
-
- Posts: 4
- Joined: 04 Mar 2014, 00:53
Re: Doubling Output resolution (DPI)
Thanks, Tulon. I won't be discarding the Scan Tailor output, the OCR software makes too many errors to be able to discard the original image (plus the original just looks nicer). So I will aim for a focal length that gives me 300 dpi images, output 600 dpi in ST, and from there create PDFs, either with an OCR overlay or just simply based off the image files. Does that make sense?
When creating a non-searchable PDF from the ST tiff files in Acrobat, what quality settings should I use to get a good product while keeping the file size down?
When creating a non-searchable PDF from the ST tiff files in Acrobat, what quality settings should I use to get a good product while keeping the file size down?
- daniel_reetz
- Posts: 2812
- Joined: 03 Jun 2009, 13:56
- E-book readers owned: Used to have a PRS-500
- Number of books owned: 600
- Country: United States
- Contact:
Re: Doubling Output resolution (DPI)
Be sure to do a Google search of the forums - this is something that has been discussed often, other people are working on it and sharing info. I'm on a mobile phone, so I can't search it for you right now.When creating a non-searchable PDF from the ST tiff files in Acrobat, what quality settings should I use to get a good product while keeping the file size down?
Re: Doubling Output resolution (DPI)
I wouldn't do that. Shoot at maximum zoom to get the maximum DPI, then set the output DPI to be twice that, or 600, whichever is less. There is no benefit in having the output DPI exactly twice the input DPI. You are not going to get exact pixel-to-pixel mapping anyway, because of deskew.Old Man's Teeth wrote:So I will aim for a focal length that gives me 300 dpi images, output 600 dpi in ST, and from there create PDFs
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.