Page 1 of 36

Scan Tailor

Posted: 27 Sep 2009, 13:45
by daniel_reetz
My friend Mary M. just pointed me to this very interesting software package, Scan Tailor.
Scan Tailor is an interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be printed or assembled into a PDF or DJVU file. Scanning, optical character recognition, and assembling multi-page documents are out of scope of this project.

Re: Scan Tailor

Posted: 27 Sep 2009, 15:58
by spamsickle
This looks good. A bit slow, but most of the pages are adequately recognized in automatic mode -- problems with some page numbers being clipped out, a few pages "recognizing" bits of the facing page, and a problem with blank pages that can be made moot by running Scan Tailor after merging the left and right views -- but it's sufficiently robust and sufficiently flexible that I'll probably stop using YAPP. I'll still be using ImageMagick -- Scan Tailor, as far as I can tell, only puts out TIFF files, and I still need to convert them to PDFs. Also, it's possible to bloat the original images by 10-20 times by choosing output parameters poorly -- color and 600 DPI takes a 1.5 MB JPEG and turns it into a 25 MB TIFF -- but the "mixed" mode does a good job of putting out crisp text and still preserving greyscale images.

I need to play with it some more, but I think this is going to become my main post-processing engine, at least until something better comes along. Thanks for the tip.

Re: Scan Tailor

Posted: 28 Sep 2009, 11:27
by Turtle
That's a great find. Automatically splits page pretty well. Too bad there's no option to use only one feature like split page alone. You have to run your pages through the whole process which is very time consuming.

Re: Scan Tailor

Posted: 28 Sep 2009, 12:28
by daniel_reetz
Mary's pointed me to a few interesting things now. Thanks for taking the time to check this out and come back with your experiences.

You know, if Scan Tailor had a few extra features, and especially if it had a "camera model" -- in other words, taking into account focal length and lens distortion, it could really be a killer processor. You could probably get this done with Fulla from the Hugin suite, or some other panotools prog.

His page says he's looking for developer help. If only I had any worthwhile programming skillz...

Re: Scan Tailor

Posted: 28 Sep 2009, 16:25
by rob
Fascinating... I'm going to take a look!

Re: Scan Tailor

Posted: 29 Sep 2009, 01:29
by Mandor
May be you don't know, but ScanTailor is written as "reply" to Scan Kromsator - very powerfull, but very sophisticated and not-well-documented program. Many users in Russia used SK for post-scan image processing.

Re: Scan Tailor

Posted: 29 Sep 2009, 02:08
by daniel_reetz
That is super-interesting, Mandor. I just found the abbreviated guide to Kromsator. I speak enough Russian to understand the instructions, but I don't recognize or understand the word "kromsator". Does it sound like anything to you?

Re: Scan Tailor

Posted: 29 Sep 2009, 02:18
by Mandor
Well, you can use Толковый словарь русского языка:
КРОМСАТЬ, аю, аешь; кромсанный; несов., что (разг.). Грубо, неаккуратно резать на части. К. хлеб
and sounds like: "roughly, neglect cutting in pieces".

Re: Scan Tailor

Posted: 29 Sep 2009, 02:45
by daniel_reetz
Thanks for the link and explanation. I usually useKatzner's dictionary, but it's in a box out in my workshop. I'll use the Толковый словарь from now on... certainly looks more complete than the Promt online engine...

I love Russian for words like "неаккурат()".

Re: Scan Tailor

Posted: 29 Sep 2009, 11:00
by rob
Ha, the only Russian I know is, божемой!

Anyway, I compiled scantailor on OSX, and it seems pretty interesting, but it does not seem to take care of the two major problems using cameras, which are splitting the page properly (almost always chooses the wrong side for the page [EDIT: I misinterpreted Scan Tailor's output, and found it was actually selecting the proper side]), and keystone correction (as in, there is none). Here's an example of the auto-deskewed version of a page. Notice that there is no fixed skew amount that will correct a keystoned image.
scantailor-nokeystoning.png
(601.25 KiB) Downloaded 34561 times
I really should work on PostProcessor again...

--Rob