Preserve color but make white background? (no OCR)

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

Posts: 445
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Preserve color but make white background? (no OCR)

Post by cday »

db-inf wrote: 18 Oct 2022, 03:16 You really do not need to be an image processing professional to get good images. ImageMagick is said to be well documented, but the tool is so powerfull that reading that documentation is unsurmountable. But there is a very usefull website with ready-made scripts, Fred's ImageMagick Scripts, from which I have let the textcleaner script loose on your two example images, with just the defaults, i.e. no options. Look for yourself.
When I quite unexpectedly started using the command line some years back, after discovering's useful NConvert image processing utility, I had a quick look at ImageMagick, and am aware of Fred's ImageMagick scripts. However, coming from only a brief acquaintance with NConvert I found basic operations very different, and didn't continue when I could do what I needed more conveniently using NConvert. However, there is no doubt that ImageMagick supports many more specialist actions, and I believe there is an active community with some expert help available when needed.

I downloaded the textcleaner script to have a quick look and the download was a Linux script, possibly because I am mainly in Linux these days, although I suspect that there is also a Windows command line version. The script certainly includes many optional adjustments to optimise enhancement of a source image, and the example enhancements shown are impressive. But to obtain those results presumably it is necessary to run the script after each change and check the result, which makes me think that a tool with a GUI preview where the effect of changes can be seen immediately would, other things being equal, enable faster progress?

Here is a link to the textcleaner detailed description, scroll down through the scripts index to the description.
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Preserve color but make white background? (no OCR)

Post by dpc »

Adding my two cents to this thread...

Looking at your original images (the first two in this thread), the lighting is a bit uneven. It's not as bad as what you typically see from a single-light V cradle scanner, but there's a noticeable difference between the right and left side of the white background of these pages. If these workbooks you're scanning can be scanned with a flatbed scanner, you'd be better off using that instead of photographing pages lighted by a light source that is hitting the pages unevenly.

The reason that even lighting across the page is important for your situation is because you want to replace the pixels that make up the background with pixels of pure white to make life easy for your printer. The easiest way to do this is to replace all of the pixels that lie within a specific range of intensity. The narrower that range, the less chance that pixels of that same intensity will be found elsewhere in the colored portion of your pages (or if they are found elsewhere on the page, they should be a pure white pixel anyway). If you have uneven lighting across the page, the range of pixel intensities that you'll need to replace will be greater. You may find that an intensity value on the poorly lit side of the page has the same intensity value as a color-filled block on the well-lighted other side of the page. If you simply replace those pixel intensity values across the entire image with pure white so that the background is a fixed white color, the color-filled block on the other side of the page will have white splotches. There are of course more advanced ways to handle situations like this, but they usually require trial-and-error handholding to get it right for a particular page, and then those same settings might fail on other pages making the sharpness of your text degrade.

If you're unable to get even lighting across the page when acquiring the image, you might want to read this thread about how to postprocess images with uneven lighting. The typical way to do this is to scan a completely white page and then not move your light or camera as you photograph the pages of the book. The per-pixel intensity values across the white page can tell you how much you need to increase the page photograph's pixel intensities to be pure white. Note that you need to do this processing before any cropping/sizing/de-skew operations in your postprocessing pipeline. Finally, unless you actually need a final image containing multi-colored pixels, it's usually easier to work with grayscale images.
Post Reply