As far as I know, DPI or PPI is used to express the amount of dots or pixels obtained digitally for a length of one inch on actual printed material. So I think we need the sizes of a flat page of the book being processing to determine the optimal value. Then we can compute PPI by dividing, for example, number of pixels on the width of a flat page scanned, into the length of the actual flat page in inch. Or for standard sized books, an avarage pre-calculated value may be used...dtic wrote: 3. A script loads images to Scan Tailor.
4. manual step: user sets DPI value (Or could this be automated reliably already?) Time: < 1 minute.
I think the most practical way for both determining page borders and the 3d structure of the pages is to use 4-5 laser line projectors and take additional photos of the pages with superimposed lasers line on them. I think the secret of the automation of the post-processing underlie this application (determining page limits and 3d form properly).dtic wrote: 6. manual step: user readjusts selections. Time: this is the most time consuming step! The number of user actions can be decreased a bit using this script. But improving Scan Tailor could save a lot of time here. I think the main problems are when (1) Scan Tailor misses content in the header/footer of pages e.g. page numbers and (2) Scan Tailor positions content incorrectly, e.g. text on a title page starts further down from the top but will be top aligned by Scan Tailor in automatic mode. Scan Tailor enhanced has some tweaks for problem 2 but doesn't fix it completely.
Then flat page images can be achieved with information about page corners and 3d form of the page and a distortion correction algorithm including an interpolation technique.
That is reasonable, you're right, dtic.dtic wrote: As long as step 10 is included in the workflow then, for some actual use cases, 100% accuracy isn't necessary since flaws in the finished document discovered later on can be handled by going back and manually redoing some step.