Fabian wrote:You're right, I should have chosen one of the many examples where no B&W version is offered. Unfortunately, your OCR'd copy is not suitable for my purposes because the original typeface is sacrificed. File size and copy-and-paste is important to me too but I'd also like to preserve as much fidelity to the original as possible.
Is the bottom line, therefore, that each page must be individually processed in order to remove the tint? There is no program that will do it for me in one book-length pass?
Out of interest, I did a quick test using the above freeware program -- which can be run on a Mac -- and obtained a proof of concept result, although as stated above if searchability is needed the output file would have to be run through OCR software. But there are a number of practical considerations, starting with the fact that a little knowledge of image processing would be quite helpful, and including the fundamental issues that apply to creating multi-page PDF image files in terms of the output file size.cday wrote:I think that it may be possible to do what you want directly using the cross-platform freeware software XnConvert , based on a quick test, although you would still need to OCR the resulting file if you need searchability. Note that you will also need to install the freeware utility Ghostscript and to set a DPI value in XnConvert to obtain suitable image quality. And as there is no or minimal documentation other than the XnView forum, the easiest way to learn how to use the program is to explore the interface fully and to do some tests.
For my test I opened the downloaded colour file with the yellow tint in an image editor, and then applied a fairly aggressive levels adjustment to a typical page, to determine white and black points that effectively removed the yellow tint without causing too much collateral damage to the text. I then ran the file through XnConvert to apply that levels adjustment automatically to each page, and save the resulting output file as a PDF. The processing was direct PDF-to-PDF file without extracting the images in the file, as originally requested.
Saving the output file as a colour or grayscale images, as expected, produced quite a large file size even using JPEG compression, although there may be scope for experimenting with greater compression settings. I then repeated the test converting the images to black and white after the levels adjustment and then saving with Fax (CCITT 4) compression: that as expected produced a much smaller file size and image quality was in fact only slightly degraded.
Sadly the image doesn't display without downloading: note that the file size of the composite image is irrelevant, as lossless compression was used to ensure that both images reflected the originals, I hope...
Although the output for the book text looks reasonable on a quick check, there are necessarily trade-offs involved in the processing, and some text with small character sizes in the front matter of the book is compromised in the output file, or possibly in some cases missing. The 248-page grayscale version had a file size of 98MB, and the black and white version around 8MB which means I can upload it for inspection:
So proof of concept only, a lot of scope for experimentation and optimisation, and not an entirely easy ride for anyone with no image editing experience, but potentially a direct solution other than the need to OCR the resulting output file if searchability is required...