Hi, I am Francesco From Italy.
Programmer by trade, reader and lover of literature and linguistics.
Also, AVID book scanner, surrounded by people who treat me like a nuthead, I know that this is the only place were my toys/dreams won't laughed about....
So be warned, STRONG bragging ahead...
=================
I started... ehm.. typing texts in the late '80s
Yes, typing, no scanner by then, I was working on a text database of ancient Italian poetry (from the origins to Dante)
The resulting "Duecento Archive" went online in 1993, and in 1994 it had its CGI full-text search interface written in ANSI C
It's now online since 15 years, and boy it shows its age!!!
http://www.silab.it/frox/200/ind_src.htm
Then I got my first scanner and things started to get serious.
For a few years, I scanned for my own pleasure, then around 2002 I ran into a book that I thought could be interesting: an old Italian ethymological dictionary whose copyright was expired.
I scanned it and developed some custom software to process the pages and cut each definition in separate images.
The result was
http://www.etimo.it, a site with around 19.700 pages, each page devoted to a single word, containing one (or more) portions of the scanned page.
Sample page:
http://www.etimo.it/?term=carta
If you browse in the entries you see that the page detection, adjustment and cutting still has serious flaws.
This was done with a flatbed scanner, I later bought a scanner with automatic feed (a Fujitsu fj 5120, one of the first ones with a reasonable price), that required cutting the book but allowed temendous scan speeds.
I started thinking big, about developing a system that would allow me to publish on-line facsimile editions of books.
First test with an old Italian biographical dictionary:
2000 pages, not fragmented in entries and only indexed by the first three letters of the name:
http://www.biografo.it/?pageurl=car
Second test with a numismatics book "Description historique des monnaies frappées sous l'Empire Romain", by Henry Cohen.
8 volumes, for a total of about 4.250 pages.
The indexing starts being a bit more complex, with a large tree of emperors, but the pages are still kept whole.
http://www.virtualcohen.com/octavius-augustus-1
The search is only done on the names of the emperors, not on the text itself.
Finally, the largest project (still uncompleted)
The Dizionario della Lingua Italiana by Niccolò Tommaseo, a huge nineteenth-century dictionary, roughly the Italian equivalent of the OED. 8 in-folio volumes, 7.300 pages cut into 120.000 definitions.
http://www.dizionario.org/d/index.php?pageurl=carta
I currently have online only the letters A-G, with 55.000 words, and when completed it should be (afaik) the largest Italian dictionary online.
Apart from this large and slow effort, I have a few smaller books that I want to scan and put online.
Some of these books are ancient and/or rare, so cutting the book is not possible, hence my interest for a DIY photography-based system, and that's why I am here.
As chance has it, I also love to take photos and I play quite a lot with CHDK, having developed some scripts like this one:
http://chdk.setepontos.com/index.php?topic=2877.0
and this one
http://chdk.setepontos.com/index.php/to ... 35342.html
So, it will take time but hopefully, thanks to all your expertise, I will have my scanning rig built and I hope I can somehow help back the project