Scan Tailor

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

intermediatic
Posts: 11
Joined: 23 Apr 2010, 23:14

Re: Scan Tailor

Post by intermediatic »

Hmm... indeed that could be the problem. The first type of page is 1649 x 1275 at 150 dpi, the second is 1100 x 850 at 100 dpi.

I tried it under Parallels (which is really the same thing as running Windows XP) and it was much faster but the same problem.

A number of different sizes are indeed given in the initial import panel where I can fix DPIs. These seem to roughly match the correct DPI in the files. When I fix them, this seems to cause the problem.

Are you suggesting that I pre-process them in another program?

I suspect I'm doing something really stupid.
intermediatic
Posts: 11
Joined: 23 Apr 2010, 23:14

Re: Scan Tailor

Post by intermediatic »

I found a program called smallimage (for the mac) and preprocessed them. now all the files are the same dpi. great news! except that now ALL the pages are postage stamp size in a sea of white (see above picture). Judging from your method of estimating, all the images are roughly 100x100 dpi but scantailor won't let me use them. Now on the pages that scan tailor DID process on the earlier run looked awesome and were totally readable, so I'm not sure why it won't let me enter in 100 x 100.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

Scan Tailor won't accept anything less than 150 DPI, as in 98% of cases that would produce such poor results that you shouldn't even try. You can still fake a higher DPI, but you need to make it proportionally higher for other pages as well.
Anyway, the problem you describe is not about low DPIs. Pages ending up tiny is caused by them having exaggerated DPIs, for example a file tells its 600 DPI while in reality it's 100 or 150. If you inspect DPIs for each page in the "Fix DPI" dialog, you should be able to locate and fix such files there. Note that it doesn't change the files themselves, it just saves the correct values to Scan Tailor's project file. If you've got just two size groups, just apply the correct DPIs for each group - there is no point in trying to locate problematic pages individually if you can mass-fix them. And again, don't do it in the "Needs Fixing" tab, as it only contains obviously wrong pages. Do it in "All Pages" tab instead. BTW, if the "Fix DPI" dialog is not appearing automatically, there is an option to force it to appear in the "Project File" dialog.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Misty wrote:I'm pretty late to the party here, but I got the chance to try out the version of Scan Tailor with Rob's dewarping algorithm. It seems to have somewhat... wacky results, at least with the book I tested. Could this be the result of something I did wrong? It's set to output DPI 600, black and white, default thickness. I know the algorithm isn't complete yet, so I may just be running into its current limitations.
Could you send me the original file? Unless the file in the post is at the original size (seems too small!). I can take a look at my standalone dewarper. I think I recall that Tulon removed some of the options to the dewarper.

What is the physical size of the original page?

Thanks,

--Rob
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Scan Tailor

Post by Misty »

Rob: Sure. I've attached the output TIFF from Scan Tailor - do you only need that, rather than the original page?

The original page is about 5.5 x 8.5".
Attachments
0130_2010DW001.131.tiff
(188.52 KiB) Downloaded 731 times
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

I ran my standalone dewarper (v1.1, which can be found here), with no extra parameters (using the other parameters resulted in worse output). It's not great, but it's much better than from ScanTailor...

The main problem with dewarping this page is that the page doesn't contain a lot of full cross-page lines, which is what the dewarper relies on. I haven't yet found an adequate line estimator that will "nicely" handle partial lines.

--Rob
no-params.png
(473.14 KiB) Downloaded 721 times
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Scan Tailor

Post by Misty »

Thanks, Rob. It looks like dewarping won't help for this book, then; the dewarped output actually resulted in more errors when doing OCR, since the dewarped portions were fairly mild while it introduced some extra warping into previously readable sections.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
leescott
Posts: 19
Joined: 29 Apr 2010, 03:17

Re: Scan Tailor

Post by leescott »

I learned here through google,so many persons working hard for scanning and promote to exchange of knowledge.
Dewarping and deskewing is attracting things,and much more promising things for person here.
I want to say a little words for such difficult things--The aim of dewarping and deskewing is to set all textlines paralleled with each other,first deskewing ,second dewarping.
More advanced demands for this is to doing like OCR software,dividing picture into text ,image,and other sections.
I find a interested things,a simple things,that's __every line have virtual upper line and virtual lower line,for English there're two virtual middle lines in every textline.
I wish my simplest findings is helpful to make dewarping effective.Thanks Rob for all efforts!
Locked