Building a Book Scanner Rig

There are a lot of options for building a rig. You can build an established design from scratch, or forge out on your own and make something completely new.

Picking Cameras

There are a wide variety of cameras that you can scan with. If you plan on using Pi Scan to control your cameras, then you should use Canon PowerShot ELPH 160 cameras. But if you are using some other setup, then here are some general guidelines for choosing cameras.

Selecting the right camera is really important. We have years and years of debate on the topic. No question gets asked more often, and so nobody has thought about this more than the DIY book scanning community. And we have a three step process for you to figure it out.

Step 1. How many megapixels do you need?

A. Measure at the books you intend to scan. Aim for the largest average size (don't choose the largest outliers). For example, most textbooks are around 9 x 11in (22.86cm x 27.94cm).

B. Now multiply that size by the PPI (pixels per inch) that you intend to capture. 300 is a safe minimum, though you can't go wrong by capturing higher than that. So, in our example - 9*300=2700. 11*300=3300. We need an image that's at least 2700x3300 = 8910000 pixels, or about 9 megapixels. Now, that's if you used every pixel perfectly to capture every part of the page, which NEVER happens. So to be safe, add 20-30% for wasted pixels. In this case, that makes 12 megapixels the minimum to get at least 300PPI capture.

Step 2. How much control do you need?

If you're just scanning one book, or you're scanning a book for it's information content only (as opposed to trying to capture the actual physical appearance of the book), you don't need very good captures. If the lighting changes, or the camera settings change from shot-to-shot, you’ll still get some kind of result. However, the more perfectly you want to capture the book, and the more pages you want to capture, the more control you need. So assuming you want to do a good job and care about more than just the raw text on any page, you need a camera that lets you control the following:

  1. Shutter speed
  2. White balance
  3. Aperture
  4. ISO
  5. Flash on/off
  6. Any custom image processing (sharpenng, color enhancements, etc)
  7. Focus (ideally being able to lock focus)
  8. Exposure compensation
  9. Zoom

Most DSLRs allow for all this kind of control; for compact cameras only Canon Powershot cameras that are capable of running CHDK give you control over all these parameters. To see if a camera is capable of running CHDK, you can check here.

One more factor to consider: ideally you want to run the cameras from an AC adapter instead of batteries. Check availability of these accessories.

Step 3. How much money do you have?

If you have a healthy budget, just buy DSLR cameras and use those. Buy the highest resolution you can afford, and try the “kit lens” that comes with the camera body as a starting place (they usually cost only $50-100 over the price of the camera body alone and perform reasonably well).

If you're on a budget, the aforementioned Canon compact cameras can often be purchased for as little as $75 USD each, and, with CHDK, produce incredibly high-quality images. They are by far the best “bang for the buck” - which is what DIY Scanning is all about.

CHDK and Canon Cameras

Most cheap compact cameras do not have a software interface. They can be controlled only by manual or mechanical triggering. But a team of volunteers has developed software which can allow Canon compact cameras to be controlled and configured remotely. This software is called CHDK.

CHDK is loaded onto an SD card which is then inserted into the camera. When the camera starts up, CHDK is run automatically. Since CHDK never makes any permanent changes to the camera, you can always just remove the special CHDK SD card to run the camera normally.

CHDK is an essential pre-requisite to the software controllers listed below. The controllers run on a PC or Raspberry Pi and communicate with the CHDK software running on the cameras over USB. CHDK provides many enhanced capabilities, including the ability to configure the camera over USB, capture photographs, and then transfer the resulting images over USB to the controller.

Because CHDK is so useful and there is no equivalent for other kinds of cheap point and shoot cameras, most users in the forums use Canon cameras in their rigs. When using other kinds of cheap cameras, the only control option is some kind of mechanical or manual triggering.

Controlling the Cameras

The first task when digitizing books is capturing an image of each page and then putting those images in a convenient place. There are a few ways you can go about this task.

Images to eBooks

After capture, you will have a folder full of images. Turning those images into an eBook is called 'post-processing'. What steps this actually entails depends on your needs. Some people want to compress things down as much as possible and extract the text of the book using OCR. Others just want to crop each image to the page and bind them into a PDF. A free book called E-Book Enlightenment has sections about how to make e-books. There are also a number of software tools to help you perform these tasks. Here are a few:

Questions? Ideas? Join us in the Forum.