I just ran "pdfbeads * > abc.pdf", without the quotes, inside the directory with the TIF files, and that's it...knappen wrote:Could someone please give an example of the command line I should write to simply convert a folder of Scan Tailor converted files with text&images into a compressed PDF file?
PDFBeads — Convert Scanned Images to a Single PDF File
Moderator: peterZ
Re: PDFBeads — Convert Scanned Images to a Single PDF File
Re: PDFBeads — Convert Scanned Images to a Single PDF File
Thanks!
I ran the command loyukfai gave and ended up with a PDF file that was in fact a lot smaller than the result I got using Acrobat Pro on the same folder.
BUT: The images also look A LOT more compressed and blurred. Not sure if this is because the batch of tiff images were a mix of greyscale/mixed/b&w encodings from Scan Tailor.
I still get the "Warning: the hpricot extension is not available. I'll not be able to create hidden text layer from hOCR files" message in the start and during the encoding process
"TIFFetchCirectory; TIFFstream: Can not read Tiff directory.
TIFFReadDirectory: TIFFStream: Failed to read directory at offset 0.
Error in findTiffCompression: tif not opened"
show up a lot.
I ran the command loyukfai gave and ended up with a PDF file that was in fact a lot smaller than the result I got using Acrobat Pro on the same folder.
BUT: The images also look A LOT more compressed and blurred. Not sure if this is because the batch of tiff images were a mix of greyscale/mixed/b&w encodings from Scan Tailor.
I still get the "Warning: the hpricot extension is not available. I'll not be able to create hidden text layer from hOCR files" message in the start and during the encoding process
"TIFFetchCirectory; TIFFstream: Can not read Tiff directory.
TIFFReadDirectory: TIFFStream: Failed to read directory at offset 0.
Error in findTiffCompression: tif not opened"
show up a lot.
Re: PDFBeads — Convert Scanned Images to a Single PDF File
The hpricot thing is related to OCR, I think the OP has mentioned that.
In short, the scanned images are, images, that cannot be searched. OCR recognize the characters and all things work together, make the final PDF "searchable".
I'm primarily focused on making a portable version of PDFBeads right now and haven't yet looked into the OCR thing.
Cheers.
In short, the scanned images are, images, that cannot be searched. OCR recognize the characters and all things work together, make the final PDF "searchable".
I'm primarily focused on making a portable version of PDFBeads right now and haven't yet looked into the OCR thing.
Cheers.
Re: PDFBeads — Convert Scanned Images to a Single PDF File
Thanks again.
I'm less concerned that I have to do the OCR scan with another program than with the fact that I get such low quality pictures in the PDF. Is there a way to manually choose the level of compression?
A portable version of PDFBeads sounds great!
I'm less concerned that I have to do the OCR scan with another program than with the fact that I get such low quality pictures in the PDF. Is there a way to manually choose the level of compression?
A portable version of PDFBeads sounds great!
Re: PDFBeads — Convert Scanned Images to a Single PDF File
There may be. One of the goals of PDFBeads is to produce smaller-sized PDFs, so it might be defaulting to lower resolution image.
You can see the options by typing pdfbeads --help
The -B option chooses the resolution for images. Try experimenting with various options to see what looks good to you. According to the documentation, the default value is 300 (is that right?) - try other values like -B 400 or -B 600
You can see the options by typing pdfbeads --help
The -B option chooses the resolution for images. Try experimenting with various options to see what looks good to you. According to the documentation, the default value is 300 (is that right?) - try other values like -B 400 or -B 600
This means you don't have the hpricot gem installed. You need that if you want OCR, otherwise it just means PDFBeads will bug you about it.knappen wrote:I still get the "Warning: the hpricot extension is not available. I'll not be able to create hidden text layer from hOCR files" message in the start and during the encoding process
Are you using the RubyPDF build of jbig2enc? That's a bug in it - it will work fine, it'll just give you error messages."TIFFetchCirectory; TIFFstream: Can not read Tiff directory.
TIFFReadDirectory: TIFFStream: Failed to read directory at offset 0.
Error in findTiffCompression: tif not opened"
show up a lot.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: PDFBeads — Convert Scanned Images to a Single PDF File
I'm using the exact same software as suggested in the YouTube instruction video. Can't recall that the jbig2enc being a RubyPDF build.
I tried -B 600 and the result is indeed something completely different. The size was tripled compared to the default value, but still a tad smaller than with Acrobat Pro. Will try some lower values too and see what I prefer.
I'll try to be a little more independent from now on, but there is one thing that I would really like to get your opinion on: What is the best choice in Scan Tailor to get quality results in PDFBeads for pages with both text and images? With the mixed mode you get crisp b/w text, but an overexposed image, with greyscale you lose no image quality, but get text that is impossible to convert to vector graphic OCR; the white margins+equalized illumination option is a compromise not 100% satisfactory...
Would PDFBeads be a program that offers a better solution to this?
I tried -B 600 and the result is indeed something completely different. The size was tripled compared to the default value, but still a tad smaller than with Acrobat Pro. Will try some lower values too and see what I prefer.
I'll try to be a little more independent from now on, but there is one thing that I would really like to get your opinion on: What is the best choice in Scan Tailor to get quality results in PDFBeads for pages with both text and images? With the mixed mode you get crisp b/w text, but an overexposed image, with greyscale you lose no image quality, but get text that is impossible to convert to vector graphic OCR; the white margins+equalized illumination option is a compromise not 100% satisfactory...
Would PDFBeads be a program that offers a better solution to this?
Re: PDFBeads — Convert Scanned Images to a Single PDF File
I just checked, and it is RubyPDF's build that is recommended in that video. You can safely ignore the error messages; they might be a little irritating, but they're harmless.
-B 600 is probably overkill unless you're viewing your PDFs at a pretty high res. Experiment, I'm sure you'll find a value that looks good and still gets you a decent filesize.
PDFBeads isn't intended as a binarizer - it's intended to work on images processed via Scan Tailor. Are you finding that Scan Tailor's lightening of your images is overzealous? Can you post an example, and what you'd like it to look like?
-B 600 is probably overkill unless you're viewing your PDFs at a pretty high res. Experiment, I'm sure you'll find a value that looks good and still gets you a decent filesize.
PDFBeads isn't intended as a binarizer - it's intended to work on images processed via Scan Tailor. Are you finding that Scan Tailor's lightening of your images is overzealous? Can you post an example, and what you'd like it to look like?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: PDFBeads — Convert Scanned Images to a Single PDF File
I'm talking about effects that you get in mixed mode like the one in the right bottom corner. I would obviously like it to be darker and not washed out.
I remember Tulon answering this in a Scan Tailor thread and saying that it would be difficult to solve the problem.
I remember Tulon answering this in a Scan Tailor thread and saying that it would be difficult to solve the problem.
Re: PDFBeads — Convert Scanned Images to a Single PDF File
@loyukfai: Sorry, but I'm not planning to develop PDFMaker in its current incarnation. It was a pretty simple, slightly hacky script that I wrote before I knew any real programming. It was designed for Windows because I made it to scratch an itch at my Windows-centric employer; my current work environment is multi-platform, and I'm working in OS X. If I were to pick up PDFMaker again, it would be to rewrite it using a different programming language - and right now, that would probably be Ruby, which PDFBeads seems to have covered nicely.
One major thing I can see as improving for PDFBeads is installation on Windows. Coming from a Mac OS X environment, where Ruby is installed by default and dev tools are readily available, I wasn't aware how much harder it would be to get going on Windows. I'll see if I can help out getting a portable version of PDFBeads working.
One major thing I can see as improving for PDFBeads is installation on Windows. Coming from a Mac OS X environment, where Ruby is installed by default and dev tools are readily available, I wasn't aware how much harder it would be to get going on Windows. I'll see if I can help out getting a portable version of PDFBeads working.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.