My name is Gem, from Turkey. I'm new here, and very interested in designing amateur book scanner systems. Because I'm a researcher in Turkish Natural Language Processing, I need Turkish texts from various kinds of books to work on. As anyone know, traditional book scanning is a very time consuming task for human and destructive task for books. So I'm trying to find a speedier and gentle to run, practical and low-cost to construct solution for book scanning.
As far as I understand, the most important sub tasks of auto book scanning;
- flipping pages automatically
- obtaining as flat as possible photos of pages
- keeping books and pages at most suitable position while photographing
I have seen the videos of some nice and intelligent amateur auto book scanner designs from enthusiasts on the web. They mostly rely on electronic hardware and mechanical parts.
For scanning pages automatically, typically a turning wheel is used to fold the paper and a thin metal arm is used to flip the folded page. The biggest difficulties of these systems are:
- contact position of the folding wheels is changing during scanning so wheel must change its touching position according to the current page to scan.
- Towards the end of the scanning the wheel fold more than one page at once because there are not enough pages under the current page to provide enough friction resistance.
For keeping pages fixed for photographing, moving page holders are used in some designs. Because while scanning first pages there is a risk for scanned pages to flip back because of the force by the other pages haven't been scanned yet.
* * *
You have probably heard about "auto page flipping book scanner" BFS-Auto developed by some researchers of a Japanese University.
They've found a high speed and gentle solution for auto book scanning. Their system mostly rely on a software they developed. This system works in this way:
- The book is fixed by a press from its binding.
- Pages to scan are hold by an arm. Arm presses pages from top with metal bars (and heavy springs around them) on it.
- Pages are flipped by releasing the arm and some air blowing. To this purpose arm is shifted out the book by turning the long screw attached to it.
- Photos of the pages are taken by high-speed cameras (500 frames per second - perhaps the most costly part of the system) at their natural positions without being flattened. An extra photograph is taken with laser lines on the pages to capture the 3d form of them.
- Pages are reconstructed to its flat forms by using their 3d form information and their actual photos with a special software before passed to OCR.
The system is more advantageous in many directions, I think:
- speedier
- simple mechanism for page flipping
- no destruction on binding
But in my opinion, the biggest drawback will be its price. Because there are prices beginning from 15 K $ on the web for high speed cameras which take minimum 500 frames per second at a reasonable resolution.
* * *
Recently I've come across some OCR software with page curvature correction feature. Some examples:
DocScanner (Mac/Iphone/Ipad/Android):
http://www.docscannerapp.com/2011/01/24 ... ture/ebook
ABBY Fine Reader(Windows):
http://www.abbyy.com/ocr_sdk_windows/ke ... amera_ocr/
And maybe If I have enough time, I can try to develop an open source automatic perspective and color correction tool for book digitization. There are some source codes for inspiration and starting point
These days these are some ideas flying in my mind about if we can build a system with:
- an auto page flipping mechanism similar to that of BFS-Auto but with lower speed (because our photographing will be much slower than that of BFS-Auto)
- two cheap compact digital cameras with macro shooting (like canon a810)
- a 3d perspective correction software
And of course If I find or develop a free solution to page curvature correction I share here. I'm looking for it.
Finally, I'm really happy to find this Forum, there are a lot of useful information and good guys