Homer software package

Heelgrasper · Post by **Heelgrasper** » 29 Feb 2012, 15:50

The Homer Book Scanner Open Project, http://bookscanner.pbworks.com/w/page/4 ... /FrontPage , includes a very useful software package for both Windows and Mac OS X that makes the processing part of book scanning quite easy. In particular for newcomers and people who just want to keep it simple.

The system is meant for a single scanner system where you first take photos of the left pages and then the right pages (or the other way around). In that case Homer can rename and rotate images to make a single collection of image files in the right order.

If a scanner with two cameras is used you’ll have to do that yourself. There should be a solution with a batch file (since you already have ImageMagick installed) according to http://www.diybookscanner.org/forum/vie ... w&start=10 but I can’t get it to work. Might be my bad computer skills. Would be extremely nice if this was worked into Homer but it isn’t (perhaps someone who knows how it could be done could talk the guys behind Homer into including it? Or simply modify it to include it, if there's no license problems doing that).

Homer first of all provides an installer so a number of applications are easily installed. In Windows (my OS) it’s done by downloading a zip-file, un-zipping it and running an .exe file. It then installs the applications and you end up with two new icons on the desktop: One for Homer and one for ScanTailor.

The installed applications in Windows are (see homepage for details):

ImageMagick (for manipulation images)
Jpegtran (loseless jpeg transformation)
JBIG2 encoder (compression tool for bi-level images)
Tesseract-OCR
RubyInstaller (installs the Ruby programming language)
Hpricot (HTML parser)
RMagick (interface between the Ruby programming language and ImageMagick)
Pdfbeads (to create searchable PDF)
Cmdow.exe (command-line utility used in Homer)
ScanTailor (post-processing tool)
Homer (command-line bash script)

Secondly it provides the Homer command-line bash script for renaming and rotating images as well as making them into a searchable PDF. This can be downloaded as stand-alone if you already have the other applications installed.

ScanTailor is meant to be used as a step between renaming and rotating images and making the PDF. There’s a lot written about ScanTailor in the forums here but the tutorial video is a good starting point: http://vimeo.com/12524529

Finally you use Homer to very simply do OCR and create a PDF file.

The small tests I’ve done worked perfectly and for any beginner it’s very easy to use. Only problem is the renaming and rotating of files with a two cameras solution.

victoriaaustralia · Post by **victoriaaustralia** » 22 Mar 2012, 15:43

Thank you Jacob, this worked terrfically for me.
The package downloaded and installed flawlessly and then opened and processed some book scans I had previously processed with scantailor seperately. These Tiffs were trimmed and rotated, the initial image file was 170mb across 126 tiff files. The produced, oCR'd pdf produced was 1.26mb with excellent read ability.

This is the clearest, easiest package that has demonstrated on this forum - bravo Jacob.

My computer: Toshiba laptop running Vista.

again a flawless package that did everything as you would wish the first time (with the caveat that I have processed the images in a seperate ScanTailor install, I have only trialled the Homer OCR and pdf building component).

Bravo Jacob

Post by **daniel_reetz** » 25 Mar 2012, 15:35

Exciting stuff. If anyone finds the time to do a mini-review of how the software works, perhaps with a screenshot or two demonstrating the workflow, that would help a lot of people here. I'd dive right in, but I'm going to be in Zurich for the next week or two on a job...

victoriaaustralia · Post by **victoriaaustralia** » 28 Mar 2012, 16:25

Homer mini-review.

This is for a Windows Vista system.

Download the program from Jacobs link above.

The extractable will download a version of scantailor and the Homer program which contains the tesseract OCR

program and pdfbeads.

(I had previously struggled to get these to work but given up due to my inadequate programing experience and

inability to make sense of the python environment - this just works, no command line instructions or ruby or gem

environment required!)

You will then have a copy of Homer and Scantailor on the desktop.

You can start with left and right picture streams, put them into Homer and move through the ScanTailor process.

However my workflow is different with left and right pages getting renamed with ant renamer to their actually page

numbers when still as jpg's straight from the camera. They are then combined into one folder which then goes into

Scantailor.

If you are going to input your own processed pictures, you need to get the numbering correct as Homer will not

order pages correctly. Homer does like the format bookname123, bookname.

The input files need to be in the format 0001, 0002, 0003 within one folder.

After the Scantailor portion of the program, you are left with an output folder containing the dpi adjusted,

rotated, trimmed files as a TIFF format.

This folder is then dragged and dropped onto the Homer icon on the desktop. This will cause the folder to open and

you can then choose option 4 to proceed to the OCR and pdf creation part of the process. You can choose the

langugae required at this stage as well.

This proceeds very well and outputs a completed pdf to the desktop.

For myself, a raw photograph from the camera file was 73MB across 51 jpegs. This then created a folder containing

9.19MB of TIFF files across 51 files. This then resulted in a 0.9MB completed pdf (this was a black and white

book). Excellent compression!

Pro

- this program just works. I am not smart enough to get Bindery, PDFbeads BookFu, BookScan Wizard or any of the

other presumably excellent options that require more programing knowledge. THis program downloads and extracts and

is very easy to use. It does not have GUI but is controlled using plain english instructions in the DOS type

window. No command line knowledge required
- Excellent quality output of pdfs. The out put is significantly smaller than anything I was previously able to

create using a pdf printer like CutePDF or Adobe.

Con
- OCR does not seem to be working for my version. There is a text file created as part of the processing so the

OCR process is occuring, this is just not getting into the finished pdf as a text layer. See attached PDF page and

text file as a demonstration of this current bug.
- There is no options to alter the numbering convention used

Heelgrasper · Post by **Heelgrasper** » 28 Mar 2012, 19:27

The OCR worked as it should in my small test. It was a 25 page article, where the tif-files were about 34 MB (mixed, some color on most pages) and output turned out to be a 1,34 MB searchable pdf in good quality. I haven't experimented more since then since other things have taken priority.

As I mentioned in the first post it would just be nice if it could handle renaming/rotating files from a two-camera scanner. For me the major advantage was that it was so easy to install compared to how troublesome just installing PDFBeads seemed to me. And that when you have tif-files ready from ScanTailor it's very easy to create a searchable PDF.

There might be a "need" to install the newest version of ScanTailor after installing the Homer package though since I'm pretty sure it's not the newest version you get but that's pretty easy to do.

victoriaaustralia · Post by **victoriaaustralia** » 28 Mar 2012, 20:09

I don't think the numbering convention or the rotation restriction is a problem, only that you need to be aware of it.

Would re-installing Scantailor help with the OCR component? Wouldn't this bug be happening more in the PDFbeads component of Homer?

Even if I can't get OCR to work this still program still produces far better looking and compressed pdf's than any of the options out there (open options that is - I haven't tried Abby Adobe Nine , Snapter or other proprietary options)

simon

Post by **daniel_reetz** » 29 Mar 2012, 11:40

Thank you for that mini-review, Homer looks pretty compelling!

I don't think reinstalling ST can help with OCR... I agree that seems wrong given the function of each program.

victoriaaustralia · Post by **victoriaaustralia** » 02 Apr 2012, 16:35

One work around for the OCR not working in Homer that I have used is to take the pdf file and run it through pdfViewer which does a very good job for freeware of creating a searchable OCR'd text layer in the pdf.

For a 39 page booklet, the initial non-OCR Homer output file was 1.21MB and the OCR'd file was 1.4MB.

So one more program or step to use but does create the best looking, OCR'd ebooks that I have been able to create so far.

Heelgrasper · Post by **Heelgrasper** » 02 Apr 2012, 17:48

The problem with the OCR must be in the PDFBeads since the Tesseract produces output like it should as I understand it. I just did a re-run of the material from my trial-run and it worked just as last time with a searchable PDF as the end result so I'm a bit puzzled what is going wrong.

victoriaaustralia · Post by **victoriaaustralia** » 02 Apr 2012, 19:30

Then probably something that I or my computer completed incorrectly.

Could some one else test it and report?

DIY Book Scanner

Homer software package

Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package

Re: Homer software package