If you're at the point of running jbig2enc, you should already have processed TIFF files you want to turn into a PDF. If you haven't processed any files with Scan Tailor yet, you should do that first! You can install Scan Tailor in Homebrew by doing
Then run it by typing "scantailor" at the command prompt. Don't worry, it's a GUI app.
jbig2enc runs in two steps. Step one takes a page, or several pages, and produces raw JBIG2 data. That's not very useful for you, since you probably want to actually read your page, not just admire the file in your folder. For that, jbig2enc comes with a handy utility that converts the raw data into a PDF for you.
If you want to make a book that has OCR text (so you can search or copy and paste text), or a mixture of text and colour pictures, then a program called PDFBeads is the best way to do that. I'll show you how to do it with jbig2enc first, then how to use PDFBeads.
Let's assume you have a set of processed PDFs, and they're located in Documents/mybook. Here's what you would do:
1) Open Terminal.
2) Change to the folder containing your pages. We said that's in Documents/mybook, so you want to change to that directory. You can do that with the following command:
cd stands for
change directory. The tilde (squiggly) represents your home folder, so you want that to come first since Documents is inside your home folder.
3) This folder should contain a whole bunch of TIFF files. Let's say they're named image001.tiff, image002.tiff, image003.tiff, etc. You want jbig2enc to compress all of those, and leave a single
symbol file with one name that can be used to create a single PDF from. You can do that with the following command:
Let's break down what that means:
jbig2: This is just the program name.
-b mybook: This defines the "basename" for the files. Each page will be saved separately, but it will also create a single symbol file for all of your images, which will make it easier to combine it together into a single PDF.
-p: This makes sure the file is pdf-ready.
-s: This uses the symbol coder, which is best for making JBIG2 files from text.
image*.tiff: These are your images. The star is a wildcard, so it matches all .tiff files that start with "image".
jbig2enc will churn away for a bit, then tell you when it's done.
4) OK, now it's time to make a PDF! This is pretty easy.
"mybook" is the basename you used in the last step. The little arrow (">") tells it where to send the output, and "mybook.pdf" is the name of the book to write to. You can change all those names to whatever you want, of course.
pdf.py won't tell you when it's done, but once you get the command prompt back, it's finished! Take a look at the PDF file, and you should have no trouble reading it.
Making a PDF with OCR, or mixed image/text content, is a bit more complex. I'll show you how to do that in the next post.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.