New problem
Pictorial pages with text - I want to split out the text, to run through OCR (so that the final pdf is searchable and highlightable/copyable) but, doing so right now yields and image like:
I can't seem to keep the background around the text (plus, the original text is a grey color, which looks a lot better on the background). Unfortunately, I can't find an option that lets me get the text split and the mask, while keeping the original actual image to go in as the main layer on that page for the pdf. I suppose I COULD process everything but the splits, move that output folder, save as the project and run the splits, then replace the non-split versions before "binding" to pdf, but, is there a better way?
Advanced - split text for OCR, but keep "original" image
Moderator: peterZ
-
- Posts: 24
- Joined: 09 Feb 2024, 22:21
- E-book readers owned: Nook Glowlight 4, Kindle Fire 5th gen
- Number of books owned: 100
- Country: USA
-
- Posts: 24
- Joined: 09 Feb 2024, 22:21
- E-book readers owned: Nook Glowlight 4, Kindle Fire 5th gen
- Number of books owned: 100
- Country: USA
Re: Advanced - split text for OCR, but keep "original" image
Ok, did a test, as long as I don't try and split these pages for better compression, everything works out fine.
Re: Advanced - split text for OCR, but keep "original" image
Premature, perhaps, to offer a suggestion on that, but I recently discovered a cross-platform software new to me that did an excellent job of making images searchable, NAPS2.nightshift wrote: ↑26 Mar 2024, 14:44 Ok, did a test, as long as I don't try and split these pages for better compression, everything works out fine.
I know that there are other well-known OCR softwares commonly used with Scan Tailor, and that you tend to prefer command line tools, but if you have some time to spare you might try running a few test pages through it, the interface allows existing image files to be loaded. No difficult bending required!
A PDF file created from a few pages from my own early (flatbed) book scanning experiments attached.
Seems a useful tool for easily making scanned images searchable...