Let's Make A DIY Book Scanner Test Chart

A place to tell us about your work and projects. Self-links encouraged!

Moderator: peterZ

User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Let's Make A DIY Book Scanner Test Chart

Post by daniel_reetz »

Hey everyone, as discussed here, we should really make a DIY Book Scanner test chart. The idea is to give everyone a common standard with which to test their scanners.

I am going to quote bnz to start out the discussion, since I think his ideas were spot-on:
bnz wrote:I think the first step would be to have such a test pages available as a PDF with maybe some guidelines how to do the printing (e.g., printer DPI, no cheap ink printers, etc.). I can imagine that a lot of people in this forum do have some form of access to some kind of professional printing device. While this may not yield exact results, they will probably be good enough for the start.

Regarding the page or pages: I imagine that it would be useful to have some text block (lorem ipsum or something like that) printed in different font sizes and different fonts. Then, maybe a few tables with again different font sizes and different border line strengths, some complex shaped b/w figure, a greyscale picture, and a color picture. Also, text with different colors and again in different sizes and fonts might be a good idea. These are just some initial ideas. There is probably lots to add.
Alright, I like pretty much everything bnz said, and I could make such a test chart in one day, maybe tomorrow night. Now for my questions:

Should we make it 8.5x11" (letter size) or should we make it A4? Both?

How many of you have or have access to a nice printer?

Color or black and white (or both)?

Any other ideas?

I really need feedback on this, so let's get started... give me your thoughts and I will make a test chart for us ASAP.
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

I am not quite sure about what is being proposed in this thread.
daniel_reetz wrote:The idea is to give everyone a common standard with which to test their scanners.
I thought the idea, as proposed by bnz on the "DSLR like cameras" thread, was to determine the camera quality / price / resulting image sweet spot.
bnz wrote:what I'd really like to see are comparison shots in full resolution of the same single page with the low-budget canons pocket cameras done with sdm/chkd, such midrange non-dslr cameras with dslr-features, and "normal" dslr cameras like the canon t2i/550d and in the same conditions. It's kind of better to judge for yourself
And Daniel replied:
daniel_reetz wrote:What we really need is a standardized test page so we can all compare. Does anyone have any suggestions on how to make such a thing? I'm considering making a Print On Demand book, but I could just as well print a huge number of test pages and ship them worldwide. That way we could standardize on camera performance for book scanning purposes.
This forum has been working on scanner designs for a year and a half and as a prospective scanner builder or owner I am confident that any cradle / platen combination I choose will serve. What I am less clear on is the ideal camera that has enough but not too much resolution for the least cost. So I am very interested in isolating performance of DIY scanner cameras in some testing regime.

Comment #1: For the testing regime to work (for the resulting images to be comparable) input has to be controlled. If end testers print out their test pages, the input would not be controlled, because of both the different types of printers and their current state of repair / effectiveness. The test pages have to either come from a central source, or, less ideally but perhaps acceptably, be produced on machines at least likely to be roughly comparable, say, the output devices at Kinkos. Going to somewhere to print a file seems like a lot more trouble than clicking on a link to have someone send a standard test page within a couple days. If necessary, I am willing to be that central sender and am willing to kick in $20 to a collective printing cost (none, if black and white) / envelope / postage kitty.

Comment #2: Assuming input is comparable, presumably output can be comparable in being specified as one or a couple standard camera outputs, say the cameras' maximum-resolution JPEG and its RAW format, perhaps post processed in some standard way (if necessary - I don't know much about camera output). If the point is to allow people to decide for themselves, per bnz, then the ultimate output is not the image but forum members' computer monitors, none of which will likely render the images accurately, but, for any single monitor, will render all the images comparably to each other and allow bnz's general user to decide for themselves what camera to buy for his or her scanner.

Comment #3: Per Daniel's questions:

a) the best test-page size might that of a standard not-too-big book page, for the reason that effective resolution has to include cropping out all the excess generated by the mismatch of camera image shape and book page shape. So the test image should be the size of the most commonly scanned image of regular use

b) this is not pertinent if comment #1 is on the mark, but since Daniel asked, I have access to an allegedly 1200dpi monochrome laser, the Samsung ML-2851ND, and an allegedly 6000x2400, 4-ink, color inkjet, the Brother MFC-6490CW (btw, a cool device for having 11x17 scanning capability, and available for only $200 with free shipping and 2-year warranty)

c) the test pages should be black and white -- color test pages will be expensive to print centrally and so wildly diverse if printed locally as to not be usable for comparison purposes

d) other ideas:

--if the plan is to figure out a camera price / performance sweet spot, and that requires comparable images, wouldn't it be necessary to go beyond a standard test page to request certain conditions for scanning the test page, for example (I don't know, just throwing these out) use of some relative aperture (such as the most open, or the next-to-most open) / shutter speed / lighting intensity. Given the variation in the distance that scanners put the cameras from the page, and the variation in types of bulbs that are part of the scanner and the additional background light that is present, I can see that specifying picture-taking conditions is a challange. Yet for comparisons to be useful, this seems very important to me -- the effectiveness of OCR depends on a clean image, which depends on a high-contrast image. A perfectly great camera (irrespective of price, high or low) is imperfect for text OCR if the base images produced are too low-contrast, and that may be determined entirely by conditions rather than camera.

--if the plan is to figure out a camera price / performance sweet spot, we would want to assure a diverse sampling of cameras people might want to buy. We would want to figure out representative camera types that encompass all the major combinations of features. I'm no camera expert, so the following is the product of my rudimentary knowledge: the key features could be sensor size, megapixels, and lens characteristics like highest speed and maximum zoom. So the representative camera types would be combinations of sensor size categories, say 1/2.3, 1/1.7, micro 4/3s, and APS-C; megapixel categories of, say, 8-10, 10-12, and over 12; lens speed categories of, say, 2.0 or under, 2.0-3.0, and over 3.0; and zooms of, say, up 4x and over 4x. Those categories would make for 72 theoretical combinations (though probably quite a few less since so few small-sensor cameras have lenses faster than 3.0 or megapixel counts over 12 million), which is perhaps on the outer edge of practical for this forum community. Collapsing the 1/2.3 and 1/1.7 sensor size categories would make substantially less than 54 theoretical combinations, etc.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Let's Make A DIY Book Scanner Test Chart

Post by daniel_reetz »

reggilbert wrote:I am not quite sure about what is being proposed in this thread.
daniel_reetz wrote:The idea is to give everyone a common standard with which to test their scanners.
I thought the idea, as proposed by bnz on the "DSLR like cameras" thread, was to determine the camera quality / price / resulting image sweet spot.
Let me clarify what I am thinking.

An idea that has been floating around the forums since the beginning is to have a standard chart or book that we can all compare our scanners with. It would be used in the following way:

1. Build scanner.
2. Put chart in scanner.
3. Scan it and post it here. This would include post-processing and an unprocessed image.

That means
4. Other people can see if they like the results. AND
5. Other people can help you get better results.
--if the plan is to figure out a camera price / performance sweet spot
That is not my plan, because it is not the same as showing the total output of the scanner system. Each scanner is a little different, and the camera/software/entire workflow is a little different for each of us. That's because we all have slightly different goals and approaches. Now, many such goals and approaches have been worked out here - most of the work has been done. Having a common standard page will allow us to compare our scanners on equal input - input being the page only, because we all can't build a separate controlled diffused lighting rig, mount, etc to do a studio shot of a chart, to show the isolated performance of the camera. (and if you want to stretch that out, we can't all buy the same bulbs, diffusers, blacking cloth, etc to build the proper light-controlled environment to do so).

Of course, you're welcome to take on the work of making that happen. In that case, shooting a test chart that we self-print is probably not the best idea. There are already sites like DPreview and elsewhere that survey almost every kind of camera with test charts. You simply have to filter their data, all the shooting work has already been done. Example here: http://www.dpreview.com/reviews/Q408bud ... page11.asp . In fact, figuring out the camera price / performance sweet spot is the explicit goal of these websites. You could scrape their data, filter it according to current price on amazon VS whatever features you want, check for CHDK compatibility here, etc etc etc.
Comment #1: For the testing regime to work (for the resulting images to be comparable) input has to be controlled. If end testers print out their test pages, the input would not be controlled, because of both the different types of printers and their current state of repair / effectiveness. The test pages have to either come from a central source, or, less ideally but perhaps acceptably, be produced on machines at least likely to be roughly comparable, say, the output devices at Kinkos. Going to somewhere to print a file seems like a lot more trouble than clicking on a link to have someone send a standard test page within a couple days. If necessary, I am willing to be that central sender and am willing to kick in $20 to a collective printing cost (none, if black and white) / envelope / postage kitty.
I'd be happy if you'd handle that, because I can't, but people should also have the option of printing locally, because there are a lot of places where going to print something isn't an option. One of the big lessons I've learned here is that there is an amazing range of resources available to people, but they are not at all evenly distributed. In my experience, approaches to evening that out (like using Home Depot as the basis for a scanner) is fraught with problems. And people in Europe or Africa will not want to wait two weeks for a test sheet when they can just print one out on their own.

So my idea (which might not be what ends up happening) is not to characterize cameras -- a well-covered space, in my opinion -- but to characterize scanners, which is something that we haven't done as well as we could be doing, if we all had a couple test pages to lay on top of a book.
User avatar
Gerard
Posts: 154
Joined: 17 Oct 2010, 07:15
Number of books owned: 0
Location: Berlin (Germany)

Re: Let's Make A DIY Book Scanner Test Chart

Post by Gerard »

Hi,
for measuring the resolution i would suggest
http://en.wikipedia.org/wiki/Modulation ... d_imaging)

it is easer to make an sharp edge competed to an Siemens star
http://en.wikipedia.org/wiki/Siemens_star
wendel
Posts: 6
Joined: 04 Mar 2014, 00:53

Re: Let's Make A DIY Book Scanner Test Chart

Post by wendel »

How about adding some checker board squares of known size (e.g., 1 inch, 1 cm) or rulers so the standard could be used to help determine DPI as well.

As for size... I would keep it to the area of a 6 x 9 book (or some other "smaller" book), but it could then be centered on larger sheets of paper too. Then it can be used across a wide range of sizes.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Let's Make A DIY Book Scanner Test Chart

Post by daniel_reetz »

Yeah, DPI measurement spots across the page would be good. I like the idea of doing some standard book sizes, concentrically nested on the sheet.

I don't think an MTF chart is in order, rather, lots of text at different sizes would be great... maybe down to 8pt and up to 24, across the page?
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

daniel_reetz wrote:a standard chart or book that we can all compare our scanners with. It would be used in the following way:

1. Build scanner.
2. Put chart in scanner.
3. Scan it and post it here. This would include post-processing and an unprocessed image.

That means
4. Other people can see if they like the results. AND
5. Other people can help you get better results.
Both 4 and 5 would be helped if the physical test page, not just the digital source for it, were standard. I am willing to produce and send out standard test pages if a consensus develops that a standardized physical test page would be useful. If the requests for test pages became numerous, we would have to come up with a kitty to cover costs.

I would go 4 on the smallest test font size, because some books have small type and even smaller footnotes that really are that size, plus very small fonts show scan deficiencies most easily.

Some printing-industry technical advice may be useful for any test images we want to come up with. I am assuming here that ideally the test page simulates photos found in books. The following derives from only basic knowledge, so experts please forgive possibly muddled information. Most book images are printed via very specific screens (specific dot patterns) rather than either the continuous tone of film-paper photos or the typical resolutions of digital images. These screens are denoted in lines per inch, which may or may not be the same as dots per inch. But the thing is that they are specific, such as 90, 110, or 120 LPI. LPI is a requirement of the printer (that is, the printing company), related to how specific offset and other type printing machines that produce books work. Like including multiple font sizes of identical text, it may be useful to include more than one version of a test image, each with different screening characteristics. There may be issues involved with the fact that these dot-patterned, printed images are then digitized via a scanning process. There may be further issues introduced if the resolution of the test images in the standard digital file we come up with interacts variously with the capabilities of the consumer printers (for either the central or distributed test page production scenarios) on which the physical test pages are produced. Bottom line, it would be nice if someone who knows about the LPI issues involved with screened photos printed in books, and the issues involved in scanning those images, and in printing them on consumer printers, could weigh in on the inclusion of images on our test page.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Let's Make A DIY Book Scanner Test Chart

Post by daniel_reetz »

reggilbert wrote:Both 4 and 5 would be helped if the physical test page, not just the digital source for it, were standard.
We are in agreement here, however, I still have the opinion that we need to give people the option of printing their own.
reggilbert wrote:I am willing to produce and send out standard test pages if a consensus develops that a standardized physical test page would be useful. If the requests for test pages became numerous, we would have to come up with a kitty to cover costs.
That's exactly the problem with taking this kind of thing on -- if it's successful, the costs grow with success. I used to make electronics packages for people here, but as they became more popular they ate up all my available time and I hated it.

It would be better to fund it outright and just do it until the funds run out. I would contribute some amount -- say $25, and you could contribute your time. I'd bet, if we weren't totally technically demanding, that we could get a lot of pages printed for $25, and then we could figure out what postage and envelopes for each page would cost.

That's also why I was originally trying to come up with some kind of print-on-demand model, where someone else would handle printing and shipping. The targets might not be perfect, but the effort would be low and we'd have a convenient standard.

Honestly, from experience, the best way to get this kind of program off the ground is to give people a free/easy option, like printing their own, or to make the ordering of a test pattern so simple that it is somehow easier than doing their own. It's better to get wide adoption than perfection right out the door, because it's easier to get perfection out of wide adoption, than vice versa.
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: Let's Make A DIY Book Scanner Test Chart

Post by StevePoling »

Dan, If you were to self-publish a book of test patterns surrounded by instructions for calibrating a DIY scanner, and evaluating its scans, I'd be pleased to buy a copy.
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

Dear fellow forum members,

In an earlier post in this thread I raised the possibility of technical complications with screened images on a proposed test page that would help us evaluate a scanner's ability to capture the majority of images found in books, including almost all photos in books. Today I was fixing a printing problem with my computer and came across a relevant passage in Acrobat Help. It is not precisely germane to the possible test page issue -- the passage deals with conversion from electronic image to printed image, rather than from printed image to electronic image -- but for me it confirms the possibility that test page images may need to be in particular resolutions. If anyone knows a graphics professional who prepares electronic documents for printing and might be willing to help, please consider asking them what we might want to take into account when choosing electronic images for inclusion on a test page to evaluate camera-based scanner effectiveness.

In the below Acrobat Help passage, it's interesting that the table's horizontal correspondences are not repeated. Presumably it is the 1200 dpi entry we are most concerned with, though the farther back in the 20th century you go the more you will have photos printed at the equivalent of 600, 300, and perhaps as far down as 100 dpi (just a guess: old newspapers, for example). "lpi" = lines per inch; "ppi" = pixels per inch:

"The resolution setting for color and grayscale images should be 1.5 to 2 times the line screen ruling at which the file will be printed. . . .

"Printer resolution / Default line screen / Image resolution

300 dpi (laser printer) / 60 lpi / 120 ppi

600 dpi (laser printer) / 85 lpi / 170 ppi

1200 dpi (imagesetter) / 120 lpi / 240 ppi

2400 dpi (imagesetter) / 150 lpi / 300 ppi "
Post Reply