PDFMaker 0.3 - help beta test!

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

PDFMaker 0.3 - help beta test!

Post by Misty »

EDIT: PDFMaker is not really supported anymore. If it's useful to you, use it by all means, but I would suggest that you use PDFBeads instead.

I finally have a version of my Scan Tailor PDF assembly script ready to go! (Took long enough, right? ;)) I'm interested in getting some feedback and bugtesting. If you prefer to use PDF for your e-books, or need PDF for compatibility, I think you'll find this utility very useful. I originally wrote it as an internal utility at the County of Brant Public Library to distribute PDF e-books on the County of Brant Public Library Digital Collections website. I'll be releasing the source under GPLv3 once a final release is ready.

PDFMaker is a pretty simple commandline program. It takes output TIFFs from Scan Tailor, and assembles them into a PDF ready for viewing on a computer or handheld device. Unlike most PDF compression software for this type of task, it treats image and text compression separately to get smaller file sizes.

Before running PDFMaker, you need to install ImageMagick and GraphicsMagick. Just install them to the default location and you'll be fine.

The syntax is easy. To use it, run it from a DOS commandline, then type

Code: Select all

pdfmaker -d "outdir"
Where "outdir" is the full path to the Scan Tailor "out" folder containing your processed book pages. Make sure that this folder name is in quotes! e.g., "F:\out"

Your final PDF will be a file called out.pdf inside a folder called "out" within the folder you specified.

There are a few extra commandline options, which most people won't need to use. They are:

-picdpi X
This selects what resolution to downsample your book's illustrations to, if you have any. The default is 100dpi, which is suitable for viewing onscreen but not for printing. Use 200 or 300 for printing. It accepts the values 600, 300, 200, 150, 120, 100

-quality X
This selects the JPEG quality for illustrations. The default is 85, from a scale of 1 - 100.

Known issues
  • OCR is not integrated into the script right now. That will be a version 2.0 feature.
  • jbig2enc produces PDFs with an incorrect DPI value, which makes them unsuitable for OCR. This is in the process of being fixed.
  • PDFMaker is Windows only. Yes, this is a bug. ;) A multiplatform (Windows/Linux/Mac) Python rewrite will come in version 2.0
PDFMaker depends on the following software, with much appreciation to the respective authors:
  • jbig2enc, written by Adam Langley; binary by Steven Lee of RubyPDF
  • ImageMagick, (c) ImageMagick Studio LLC
  • GraphicsMagick, (c) GraphicsMagick Group
  • pdftk, (c) Sid Stewart
Edit: Would people prefer this be in this forum, or in Software?
Attachments
pdfmaker 0.3.zip
PDFMaker 0.3
(6.37 MiB) Downloaded 1599 times
Last edited by Misty on 04 Jul 2011, 11:48, edited 6 times in total.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: PDFMaker 0.1 - help beta test!

Post by Tulon »

Great news, Misty.

Until now, producing reasonably sized PDFs from ScanTailor's Mixed output was problematic even when using commercial software.

What are the file sizes you are getting with this tool?
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.1 - help beta test!

Post by Misty »

Text-only pages tend to vary from 10-50KB each. Images depend on the type of content and physical size, but usually add about 50-100KB per page at 100dpi.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: PDFMaker 0.1 - help beta test!

Post by univurshul »

Misty wrote: [*...]OCR is not integrated into the script right now. That will be a version 2.0 feature... A multiplatform (Windows/Linux/Mac) Python rewrite will come in version 2.0[/list]
...looking forward to v2.0 for Mac. (Many thanks for your hard work).
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.1 - help beta test!

Post by Misty »

I'll definitely let you know once a Mac version's ready for testing.

I noticed that the JPEG2000 compression doesn't seem to be working optimally. ImageMagick is consistently producing larger JPEG2000 illustrations (when the -gamma switch is not used) than regular JPEG files (when -gamma is used). I can only get comparable filesizes by making the JPEG2000s much lower quality than the equivalent JPEGs. While I can obtain better quality from proprietary JPEG2000 encoders, I think I may have to disable JPEG2000 at this point in time. Anyone know of a good open-source encoder?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.1 - help beta test!

Post by Misty »

I've updated PDFMaker to 0.1.1. The only change is the removal of JPEG2000 compression; until I can resolve my issues in ImageMagick, it's not worth using. The -gamma option is no longer supported or necessary as a result.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: PDFMaker 0.1 - help beta test!

Post by strider1551 »

Congratulations Misty! There's nothing quite as uniquely terrifying as releasing a program and wondering how it will work for other people.

I tried pdfmaker-0.1.1 on a Windows XP machine, but ST Separator 2.7 is crashing. I get a dialog that "ST Separator has encountered a problem and needs to close." That of course, causes everything else to get mad:

Code: Select all

C:\Documents and Settings\Owner\Desktop\pdfmaker 0.1.1>pdfmaker.exe -d "C:\book_scan"

Running ST Separator... done
Running jbig2enc...Unable to open "C:\book_scan\txt\0001.tif"Traceback (most recent call last):
  File "pdf.py", line 168, in <module>
TypeError: usage() takes exactly 2 arguments (1 given)
Unable to open "C:\book_scan\txt\0002.tif"Traceback (most recent call last):
  File "pdf.py", line 168, in <module>
TypeError: usage() takes exactly 2 arguments (1 given)
Unable to open "C:\book_scan\txt\0003.tif"Traceback (most recent call last):
  File "pdf.py", line 168, in <module>
TypeError: usage() takes exactly 2 arguments (1 given)
Unable to open "C:\book_scan\txt\0004.tif"Traceback (most recent call last):
  File "pdf.py", line 168, in <module>
TypeError: usage() takes exactly 2 arguments (1 given)
Unable to open "C:\book_scan\txt\0005.tif"Traceback (most recent call last):
  File "pdf.py", line 168, in <module>
TypeError: usage() takes exactly 2 arguments (1 given)
Unable to open "C:\book_scan\txt\0006.tif"Traceback (most recent call last):
  File "pdf.py", line 168, in <module>
TypeError: usage() takes exactly 2 arguments (1 given)
 done
Running ImageMagick... done
Running pdftk... done!
All done! Finished processing files.
What is ST Separator designed to do? I couldn't find a good description of it.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.1 - help beta test!

Post by Misty »

Oh hm, that's not good. I'll let Andrei know about that. Did it give any errors?

ST Separator handles separation of Scan Tailor files. It takes an input file from the Scan Tailor outdir and outputs a bitonal text-only page and an image-only page. It also has support for downsampling the image-only page to a lower DPI. Since a program already existed to do that, it was easier to draw on that functionality than to rewrite it myself.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: PDFMaker 0.1 - help beta test!

Post by strider1551 »

Misty wrote: I'll let Andrei know about that. Did it give any errors?
Nope. Running directly also crashes, with a message about division by zero at ST_Separator.MainForm.ShowProgress(), but I might not even be calling the program correctly.

For what it's worth, I use ImagMagick to separate out the text and image:

Code: Select all

convert -opaque black "page.tif" "temp_graphics.tif"
convert +opaque black "page.tif" "temp_textual.tif"
I'm by no means suggesting you should change how you do things, especially if you're invested in ST Separator... just sharing ideas.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.1 - help beta test!

Post by Misty »

I think reducing dependencies is always a good idea. -opaque and +opaque will fill the unwanted regions with pure white?

Edit: Also, out of curiosity, how do you deal with images where the separated image version is empty?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Post Reply