My second build in progress, could use some advice (Rebel T3

Built a scanner? Started to build a scanner? Record your progress here. Doesn't need to be a whole scanner - triggers and other parts are fine. Commercial scanners are fine too.

Moderator: peterZ

Post Reply
Bookman
Posts: 5
Joined: 06 Aug 2013, 14:13
E-book readers owned: iPad, MacBook
Number of books owned: 0
Country: Australia

My second build in progress, could use some advice (Rebel T3

Post by Bookman »

Hi all. I've been interested in book scanning since a few years ago, when I found an instructables page that linked to scantailor. My first build involved a cheap point and shoot mounted on a ruler with rubber bands and taking photos of a book that is just open on the floor taking photos of both pages at once. I used three warm yellow halogen flood lamps for lighting, and pressed the shutter button the camera (carefully!) for each photo. OCR was a bit beyond me at the time, so I just split the images, and then trimmed them and turned it into a pdf full of images (the books came out about 3GB each!).

So I'm ready for round two now. My main motivation is to scan textbooks so I don't have to carry them around, I tire very easily and carrying around heavy textbooks just wipes me out. So this time I have bought a Rebel T3 (aka 1100D), with the kit lens (EF-S 18-55 III), and I've actually blown pretty much my entire budget on this camera. I also purchased Prizmo 2 for OS X (which is like scantailor, but for Mac and includes OCR). I also purchased two Philips CFL flood lights, 23W 1200 lumens "cool sunlight". I figured two lights should do it. And I bought a cheap tripod. My plan is to lean the tripod all the way over, have the two lights above and either side of it, and just photograph the book within the Prizmo application using full tether mode (which is the reason I bought a Canon DSLR, for the full computer tethering). I tested the camera out on a page from the instruction manual of the camera, and Prizmo works fantastic, OCR'd everything. I'm still not sure how much fun it will be manually processing each page, or how often I'll have to use image + searchable text when weird symbols are used in text (or even if Prizmo allows me to do this for one page but not for others!), but I will find out very soon.

What I would really like to do is take a photo of every second page, then flip the book and take a photo of every other page, but again I don't know if Prizmo allows this.

What I would really like to hear is some feedback on my camera choice and the two flood lamps. I don't have any glass, and not sure if I need it, but I was thinking of pulling apart a broken flatbed scanner I own and liberating the glass from that, and just flipping the glass over to get each page one by one. Or buying some dead scanners from an auction site and pulling off two pieces of glass, but with just one camera I don't know how necessary this would be. I don't even know if I need glass at all. Prizmo does straighten out pages, but if possible I would like to have them flat in the first place.

Once I'm familiar with Prizmo I'll post a review of it, already it has proven useful in processing OCR for scientific journals which don't have it already applied (which is rare).

Sorry if this is at all rambling, I've had book scanning rattling around in my head for years now and I haven't actually talked about it with anyone yet :lol:
Bookman
Posts: 5
Joined: 06 Aug 2013, 14:13
E-book readers owned: iPad, MacBook
Number of books owned: 0
Country: Australia

Re: My second build in progress, could use some advice (Rebe

Post by Bookman »

Results are good so far. I've decided to use a lazy Susan to rotate the book, as reaching over to lift it was too much bother. I had to trim the box down a bit so it didn't hang over the edge of the lazy Susan. I can't quite figure out where to place the flood lights I bought, as both on the side doesn't really light up the page very well (though it still seems to work okay when importing to Prizmo 2). I'm wondering if CFL bulbs was actually a good idea or not, maybe I should have sprung for a few LED strips. OCR is very good too though, even when the page isn't evenly lit. I'll post some pictures of my setup soon, as well as some example scans. This 2.0 build has come a long way since my original cheap point and shoot held to a wooden ruler design. Prizmo is also working quite well, it's easy to correct OCR too. It doesn't seem to add bullets, and sometimes it asks me to convert to "higher resolution" which makes the formatting better, but I'm snapping them in 12.1 RAW so I don't know what that's about. I'm also worried about the size of the Prizmo file, each image is like 12 MB each, so a 1000 page book will come to 12 GB while in the Prizmo workspace. They export fine though, about 20 kb per page (more if it has large figures). I'm pretty sure I can have a large book scanned and OCR'd in probably 90 minutes.

Another issue is finger prints on the glass, which gets handled with each page. I'll have to keep a microfibre cloth handy. I should prolly oil the bearings in the lazy Susan too. And I still haven't decided what to do about the little figures in the book, like little pictures of eyes or cartoon exclamation points next to important learning info.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: My second build in progress, could use some advice (Rebe

Post by daniel_reetz »

Sometimes it really helps to post a picture of your setup to help visualize it. Do you have a picture of how it's currently set up?
Bookman
Posts: 5
Joined: 06 Aug 2013, 14:13
E-book readers owned: iPad, MacBook
Number of books owned: 0
Country: Australia

Re: My second build in progress, could use some advice (Rebe

Post by Bookman »

Good idea. I've added a photo below. I have both lamps on and a room light on (which has a solid shade around it). The setup is not solid or perfect like I've seen on here, the box seems to move a bit with each turn. I've added Postit note guides, but even still it seems to shift a bit with each turn. I've added reminders for myself one that says "turn page" on one side and "flip glass" on the other, because it is a very repetitive task and I'll mix myself up otherwise. Even though things don't line up or orientate perfectly or reliably, it doesn't seem to matter as Prizmo seems to be able to deal with it fine anyway. The camera is basically parallel, but I've done an entire book where it was shooting on a bit of an angle and again Prizmo had no problems with it. I'll provide some exams soon of pages, I'll need to look at my book shelf and find some good examples to use.

Image

And yes, at first having the lights pointed towards me is kind of blinding. I do it with sunglasses on and I quickly get used to it.

I think ultimately the lazy Susan is a bad idea, and that I'd be better off not using Prizmo to snap the pictures, but to build a solid platen and do one side of the book at a time. Then somehow organise the files in the proper order and then add them to Prizmo in separate projects 100 pages at a time. Then OCR and check the pages, export each as 100 page PDFs, then combined the PDFs into one large PDF. Because I am shooting in 12.2 megapixel RAW, having hundreds of photos open at the same time in Prizmo is just too much trouble (I only have 8GB of memory).
Bookman
Posts: 5
Joined: 06 Aug 2013, 14:13
E-book readers owned: iPad, MacBook
Number of books owned: 0
Country: Australia

Re: My second build in progress, could use some advice (Rebe

Post by Bookman »

This is what the camera takes (cropped and resolution lowered). Note that I it took it at a twisted angle, though it is roughly parallel with the glass. I have also not configured Prizmo at this focal length for this lens and camera with the test pattern. It is 12.2mP, but I cropped quite a lot out, the book occupies less than 50% of the shot. I'm also shooting on just automatic mode, as I really don't yet know how to work the other settings on this DSLR:
Screen Shot 2013-08-16 at 1.23.04 AM.png
Screen Shot 2013-08-16 at 1.23.04 AM.png (789.07 KiB) Viewed 7381 times
I select the four corners in Prizmo (no need to select curviature, as it was under glass):
Screen Shot 2013-08-16 at 1.23.52 AM.png
Screen Shot 2013-08-16 at 1.23.52 AM.png (594.15 KiB) Viewed 7381 times
This is the screen Prizmo shows after it selects regions to OCR. You can see that it missed some things, like the supertext 21 and the page number. You can add text detection boxes around parts it missed and manually correct mistakes in OCR:
Screen Shot 2013-08-16 at 1.24.34 AM.png
Here is the PDF itself after output. I didn't correct anything manually for this example:
Screen Shot 2013-08-16 at 1.26.22 AM.png
Post Reply