http://rubyforge.org/projects/pdfbeads/
It uses JBIG2 and JPEG2000 encoding. Output PDF file is very small.
Example (301 pages, 4.1 Mb): http://narod.ru/disk/27466236000/network.pdf.html
Optionally PDFBeads adds hOCR to PDF.
You need Ruby with RubyGems, ImageMagick, jbig2enc.
In Linux:
Code: Select all
gem install rmagick
gem install pdfbeads
For OCR put *.html or *.hocr in hOCR format for every scan into the same directory. Also you need install hpricot.
Manual in Russian only:
http://rubyforge.org/docman/view.php/97 ... beads.html