Postprocessing: The hardware/software divide

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

tibob
Posts: 24
Joined: 18 Nov 2011, 16:38
E-book readers owned: None
Number of books owned: 100

Re: Postprocessing: The hardware/software divide

Post by tibob »

Hello everyone,

I'm new to book scanners, and have build a simplified version of the standard book scanner. No drawers, no platen (I need to build one). I did shoot a book and tried to get something usable with my "raw" pictures.

I was disapointed by scantailor (it's promising, but there is no handling of keystoning, which I need badly) and bsw did not work for me (I wanted to crop + de-keystone every picture separately). Both projects seem to be dead.
As I wanted to play around with Qt, I started my own software, which I called "Yet Another Scan Wizard" (you have to call the project in Qt Creator...) ;-) This piece of code is not releasable and I don't know if it will ever be, as I can only work max. 1 hour a day on it.

Now I read this thread and got a look a Scantailor's code. Scantailor is a very good basis and is written in Qt, but the code is not easy to read : there are very few comments, and tons of classes I don't event know what they are for, like SystemLoadWidget. My idea would have been to replace/complete the "rotation" step with a de-keystoning+rotation (as in bsw), but I did not even found the class corresponding to it... I ought to find it by reading the code precisely, but this is *not* an easy task.

So I think I will continue working on "Yet Another Scan Wizard", and if I get something valuable, I'll share it with the community.
And I've got to shoot better pictures (and build a platen) to ease the post processing ;-)
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Postprocessing: The hardware/software divide

Post by daniel_reetz »

I moved this topic into general discussion because, well, it's no longer ST specific.

tibob, I'm interested in your new package... but just out of curiosity... would you be interested in implementing keystone correction into ST instead? As I mentioned, I'll trade you a scanner frame for your efforts. In any case I'm interested in what you are producing.

BTW, be careful calling things dead around here. :D we have a real way of bringing up old threads and digging up old software. ;)
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Postprocessing: The hardware/software divide

Post by Tulon »

Mmm, actually dekeystoning, being the special case of dewarping is there. It's a bit discouraging when something you worked on for a whole year goes unnoticed. It's not on the Deskew stage, where it logically should be, as putting it there requires more work both on the architectural and on image processing sides.

Scan Tailor is not completely dead either. I just spend significantly less time on it than I used to. I am also not accepting any feature requests, but that's hardly news.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
tibob
Posts: 24
Joined: 18 Nov 2011, 16:38
E-book readers owned: None
Number of books owned: 100

Re: Postprocessing: The hardware/software divide

Post by tibob »

Sorry about calling scantailor "dead": I misunderstood Tulon and the devel-mailing list has not been active for months.

@Tulon: I know the dewrapping functionality of scantailor, and it is great (and I really mean it). The problem is, as you said, that it comes too late in the processing to be useful to me: I need dewrapping/dekeystoning before cropping.

@Daniel: for now, I won't code dekeystoning in scantailor because I don't want to change its architecture (to much work). But I'm still playing with "yasw", and will share it as soon as I have something working.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Postprocessing: The hardware/software divide

Post by daniel_reetz »

Well, I for one am very interested in what you come up with!
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Postprocessing: The hardware/software divide

Post by dtic »

Good discussion!

I'm not worried about using several programs. I think it might be better to concentrate on making the transition between them as quick and easy as possible. For that it would be useful to collect and share scripts for renaming, sorting, rotating and moving the images and preprocessing (cropping) for ScanTailor. And also collecting and sharing workflows, like JonEP does in the OP.
JonEP wrote: I have finally figured out that it is very important not to allow the entire 1/2 platen to be included in the image taken by the camera, as this poses problems for scan tailor (it cant accurately find the book edges) [note --I noticed Daniel's new machine is taking the entire 1/2 platen image).
I agree that ST not autodetecting the page/content correctly is an issue that needs to be worked around. But more careful monitoring of exact camera position/zoom can also be time consuming, at least when using simple cardboard type cradles that tend to move somewhat. I think this would be a more general solution: put a special color and/or pattern on the cradle that some software can detect and use for autocropping, as a ST preprocessing step.
revjoe
Posts: 22
Joined: 17 Jan 2011, 15:31

Re: Postprocessing: The hardware/software divide

Post by revjoe »

Tangenting back to the discussion of developing individual or platform tools, has anyone looked at the tools used by Project Gutenberg's Distributed Processing staff?

As I see it, their tools are less image, and more text related (I think they are assuming that the initial OCR is as good as it will get, so they start from the position of working from that text, rather than doing image correction), but many of their processes I think overlap with what we do here, and their tools might as well.

Things like:
* Sorting Images
* OCR
* Swapping out common OCR errors

They also seem to have had a number of discussions over the same topics we are covering (cli vs. gui particularly). I am not sure they have solved all those themselves, but the reading was educational for me, and I thought I would pass the links along.

http://www.pgdp.net/phpBB2/viewforum.php?f=13

(you might have to sign up).
cfmorrill
Posts: 56
Joined: 17 Apr 2011, 21:20
Number of books owned: 0
Location: Charlottesville, Virginia

Re: Postprocessing: The hardware/software divide

Post by cfmorrill »

An interesting, thoughtful, thread. I suspect there will be more in this section of the forum when more of us get here. I've just finished a scanner and have started reading with the idea of beginning scans.

The state of book scanner software reminds me of some adventures I had cnc-ing a small milling machine nearly 20 years ago now. It was quite frustrating, but also rewarding when one finally understood a small part of what was going on.

My sense is that it would be a mistake to try and come up with one large software package. It simply asks too much of too few people.

So, we're going between pieces of software. Consequently, I'm finding the posts describing current workflows to be the most helpful, and I read them as much as I can.

I think it might be most helpful if we had a small "diy book scanner fair" on either coast of the US sometime this year, or maybe piggyback on a maker fair? We could bring a couple scanners together and show each other exactly how we do things and why. We could skype with folks overseas at such events and start to connect some names with faces.

Actually, get this, I think the biggest impediment to the diy book scanner movement currently is that so many of us do not list our locations. If I knew you lived next door, or even in the next state, it would help. I would know, for example, whether it would even make sense to get together in one area or whether we should all get better at youtube videos...

Finally, I'd also like to suggest that if any of us find something like ST helpful, ship the guy some cash to say thanks.

Charles
tibob
Posts: 24
Joined: 18 Nov 2011, 16:38
E-book readers owned: None
Number of books owned: 100

Re: Postprocessing: The hardware/software divide

Post by tibob »

Okay, you can have a first look at my work at https://github.com/tibob/yasw

you need git, qt and here are the (short) instructions for linux:
git clone git://github.com/tibob/yasw.git
cd yasw/src/
qmake
make
(copy a few scanned pages here in yasw/src)
./yasw

Select an image,
- the first tab (Base Filter) does nothing, its the base Class all filter inherit.
- the second tab (Rotation) is to rotate the image
- the third tab (Dekeystoning) is transform the polygon (drag and drop the edges) into a rectangle.
Use the "preview" check box to see the result

A new filter can be created by subclassing BaseFilter. See the (incomplete) documentation (run "make" in yasw/documentation, you will need doxygen) in documentation/doxygen or read yasw/src/filter/rotation/* for a simple class).

This is very early work, my next steps are:
- implement cropping
- handling of a project (choose and sort source images; load and save parameters from filters; load and save projects)
- handling of output
- develop/port more filters like autocalibration (see the checkerboard thread), color adjustment.
User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: Postprocessing: The hardware/software divide

Post by JonEP »

Any news on any of this?
Post Reply