PDFBeads — Convert Scanned Images to a Single PDF File

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

PDFBeads — Convert Scanned Images to a Single PDF File

Postby Lazy_Kent » 14 Nov 2010, 14:23

Alexey Kryukov wrote a ruby program PDFBeads.
http://rubyforge.org/projects/pdfbeads/

It uses JBIG2 and JPEG2000 encoding. Output PDF file is very small.
Example (301 pages, 4.1 Mb): http://narod.ru/disk/27466236000/network.pdf.html

Optionally PDFBeads adds hOCR to PDF.

You need Ruby with RubyGems, ImageMagick, jbig2enc.

In Linux:
Code: Select all
gem install rmagick
gem install pdfbeads

In Windows you should install Windows versions of programs.

For OCR put *.html or *.hocr in hOCR format for every scan into the same directory. Also you need install hpricot.

Manual in Russian only:
http://rubyforge.org/docman/view.php/97 ... beads.html
Lazy_Kent
 
Posts: 37
Joined: 26 Oct 2010, 10:06
Location: Moscow

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby lupocos » 29 May 2011, 00:05

hello!
I managed to install pdfbeads as well as cuneiform on Windows.
Both are working, but I cannot get pdfbeads to add the hOCR layer to the final PDF (ie. to make it searchable).
I have also installed hpricot ruby gem correctly.
Did anybody have any success in creating searchable PDF via cuneiform + pdfbeads on Windows?
I can only create the hOCR files with Cuneiform on the one hand, and the PDF file (encoded in jbig2) with pdfbeads, but I cannot join them in a single searchable PDF...

PS: by the way, I'm using the binary of cuneiform 1.1.0 which can be found inside this Windows program: CuneiDjvu
http://www.djvu-soft.narod.ru/scan/cuneidjvu.htm (Russian, as usual... use Google Translate)

Thanks for all your help,
Cosimo


EDIT:
Finally I managed to join the hOCR text layer and the tiff images in a PDF using pdfbeads!
I simply had to ensure that both the .html and the .tif files were named exactly the same (eg., image_001.tif --> image_001.html) and put in the same directory.

pdfbeads rocks! :D
lupocos
 

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby daniel_reetz » 31 May 2011, 18:40

I'm glad the solution is so uncomplicated. Thanks for sharing it back with us, lupocos.
User avatar
daniel_reetz
 
Posts: 2739
Joined: 03 Jun 2009, 13:56

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby seasalt » 04 Jun 2011, 19:23

has anyone got PDFbeads working in MAC environmennt that can post the steps (for a non technical person)?
thankyou in advance
seasalt
 

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby Misty » 07 Jun 2011, 11:36

1. Install Xcode from your OS X install DVD, if you don't already have Xcode.

2. Install Homebrew using the instructions from https://github.com/mxcl/homebrew/wiki/installation
The installer can be run by pasting the following into your terminal and hitting enter:
Code: Select all
/usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"


3. Use Homebrew to install the extra tools you need:
Code: Select all
brew install imagemagick tesseract jbig2enc


This will install PDFBeads' dependencies, as well as the Tesseract OCR engine that can produce hOCR that is compatible with PDFBeads.

4. Run the following commands to install PDFBeads and the required tools:
Code: Select all
gem install pdfbeads hpricot rmagick


If you're using the version of Ruby that comes with the OS, you will need to use 'sudo' to install. This will require a password. The commands in that case should be
Code: Select all
sudo gem install pdfbeads hpricot rmagick
Last edited by Misty on 21 Oct 2011, 16:24, edited 13 times in total.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
Misty
 
Posts: 481
Joined: 06 Nov 2009, 12:20
Location: Frozen Wasteland

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby seasalt » 08 Jun 2011, 02:28

many thanks Misty - I'm further along but not all the way.

2. Install Homebrew
Per the link (install instructions) I made sure using terminal I was in /usr/local
message received: successfully installed

3. In terminal, typed
gem update

and receceived error message after a bit of processing
Updating installed gems
Updating acts_as_ferret
WARNING: Installing to ~/.gem since /Library/Ruby/Gems/1.8 and
/usr/bin aren't both writable.
WARNING: ..../.gem/ruby/1.8/bin in your PATH, gem executables will not run.
ERROR: Error installing acts_as_ferret: bundler requires RubyGems version >= 1.3.6
Updating arel
Successfully installed arel-2.1.1
Updating builder
Successfully installed builder-3.0.0
Updating erubis
Successfully installed erubis-2.7.0
Updating mail
Successfully installed mail-2.3.0
Updating rack
Successfully installed rack-1.3.0
Updating rack-mount
Successfully installed rack-mount-0.8.1
Updating rack-test
Successfully installed rack-test-0.6.0
Updating rails
ERROR: Error installing rails:
bundler requires RubyGems version >= 1.3.6
Gems updated: arel, builder, erubis, mail, rack, rack-mount, rack-test
Installing ri documentation for arel-2.1.1...
Installing ri documentation for builder-3.0.0...
ERROR: While generating documentation for builder-3.0.0
... MESSAGE: Unhandled special: Special: type=17, text="<!-- HI -->"
... RDOC args: --ri --op /Users/..../.gem/ruby/1.8/doc/builder-3.0.0/ri --title Builder -- Easy XML Building --main README.rdoc --line-numbers --quiet lib CHANGES Rakefile README README.rdoc TAGS doc/releases/builder-1.2.4.rdoc doc/releases/builder-2.0.0.rdoc doc/releases/builder-2.1.1.rdoc --title builder-3.0.0 Documentation
(continuing with the rest of the installation)
Installing ri documentation for erubis-2.7.0...

then continues til end....

So I am not sure what is the impact...
I have not tried next step of "gem install pdfbeads hpricot rmagick" in terminal window

any ideas?
on 10.6x

(plus bandwidth usuage very high -- 1gb used up -- is this the HomeBrew install?)
seasalt
 

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby Misty » 08 Jun 2011, 10:17

Sorry about that. I'd forgotten that the default gem folder in Mac OS X requires superuser (sudo) access to write to. The error messages you showed are actually not very serious, even though they might sound bad.

You can go ahead with the rest of step 3, but use the command
Code: Select all
sudo gem install pdfbeads hpricot rmagick


You will have to give it your password. This will install the required gems in a location where you can run them. Follow the rest of the steps as written.

Edit: Just found a trick I was not familiar with. To install jbig2enc without waiting for Homebrew to officially add it, use the following command:
Code: Select all
brew install https://raw.github.com/mistydemeo/homebrew/0c3427ee1e9be6aaaed5a15f8d0d6e63d610d2f1/Library/Formula/jbig2enc.rb
Last edited by Misty on 06 Jul 2011, 11:51, edited 1 time in total.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
Misty
 
Posts: 481
Joined: 06 Nov 2009, 12:20
Location: Frozen Wasteland

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby seasalt » 08 Jun 2011, 21:51

thanks Misty

Ientered:
sudo gem install pdfbeads hpricot rmagick

and then password
and ruby looks to be sorted now

then continued step 4:
entered:
brew update

result:
Please `brew install git' first.

... yesterday I got confirmation message installed successfully for brew.

ideas?
seasalt
 

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby Misty » 09 Jun 2011, 12:09

Homebrew was installed, but it looks like it doesn't install git by default - which it needs to do "brew update". Just do a
Code: Select all
brew install git


And wait for that to finish. You'll then be able to brew update without a problem, and finish the steps.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
Misty
 
Posts: 481
Joined: 06 Nov 2009, 12:20
Location: Frozen Wasteland

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Postby seasalt » 09 Jun 2011, 22:51

okee... we are a little further ... and 2 error messages

I typed in:
brew install git

got several "what looked to be successful install"
e.g.
==> Downloading http://kernel.org/pub/software/scm/git/ ... .7.5.4.tar.
######################################################################## 100.0%
==> Caveats

then....
Bash completion has been installed to:
/usr/local/etc/bash_completion.d

Emacs support has been installed to:
/usr/local/Cellar/git/1.7.5.4/share/doc/git-core/contrib/emacs

The rest of the "contrib" has been installed to:
/usr/local/Cellar/git/1.7.5.4/share/contrib
Error: The linking step did not complete successfully
The formula built, but is not symlinked into /usr/local
You can try again using `brew link git'
Error: Permission denied - /usr/local/etc/bash_completion.d
==> Summary
/usr/local/Cellar/git/1.7.5.4: 1062 files, 19M, built in 80 seconds

ideas?
(thankyou for all your help Misty)
seasalt
 

Next

Return to Programs, Software releases, and more.

Who is online

Users browsing this forum: No registered users and 3 guests