Learning to Create Tiny DJVU files
Posted: 25 Apr 2012, 11:17
Hi all, I'm contemplating scanning all my physical books to djvu, and am practicing by converting some PDFs I have around (some which were good scans, some not so good). I figure that would give me practice cleaning up images and creating the smallest high-quality representation possible. I'm only a few days in, but I've already made quite a few observations that I haven't seen spelled out anywhere.
I thought maybe other beginners might benefit, or maybe the advanced people here can tell me how wrong I am
So, first topic.... I really like minidjvu over cjb2. I think the lossy output looks a little nicer, and I assume the shared dictionary across multiple pages is helping me size-wise. BUT, I had a hard time figuring out how to incorporate the mixed-mode (text-and-graphics) files into that mix. In other words, if one of the pages has a picture, I want to:
I also later realized that just having minidjvu create an indirect djvu file saves me the step of extracting the pages and dictionaries. That was a nice discovery!
So... here's basically what I do:
... and obviously when I'm done I convert it all to a bundled file with djvmcvt.
Does that seem like a sane way to proceed? (Don't worry, I have scripts that help out... I don't type all that stuff every time)
Some other things that confuse(d) me: For the Sjbz argument to djvumake, you can just give it a djvu file and it pulls out the Sjbz chunk automatically. (at least it seems to)! BUT you can't do the same for the BG44 argument. That's too bad. It seems logical that it should accept djvu files for any arguments like that, and just extract the relevant chunk on the fly.
I thought maybe other beginners might benefit, or maybe the advanced people here can tell me how wrong I am
So, first topic.... I really like minidjvu over cjb2. I think the lossy output looks a little nicer, and I assume the shared dictionary across multiple pages is helping me size-wise. BUT, I had a hard time figuring out how to incorporate the mixed-mode (text-and-graphics) files into that mix. In other words, if one of the pages has a picture, I want to:
- send the bitonal part through minidjvu along with all the other pages at 600dpi
- send the colorful part through c44 at 120 or 300 dpi
- splice in the colorful part into the correct page as the background
I also later realized that just having minidjvu create an indirect djvu file saves me the step of extracting the pages and dictionaries. That was a nice discovery!
So... here's basically what I do:
Code: Select all
minidjvu -d 600 -i -r -l page*.tif indirect.djvu
# let's say page 3 has a diagram, which I've resampled to 120dpi and run through c44
# we need to get the BG44 chunk out of it
djvuextract page-003image.djvu BG44=page-003image.bg44
# now move the minidjvu output out of the way...
mv page-003.djvu page-003text.djvu
# ... and replace it with the combined text and graphics page:
djvumake page-003.djvu INFO=,,600 Sjbz=page-003text.djvu INCL=pages-001.iff BG44=pages-003.bg44
Does that seem like a sane way to proceed? (Don't worry, I have scripts that help out... I don't type all that stuff every time)
Some other things that confuse(d) me: For the Sjbz argument to djvumake, you can just give it a djvu file and it pulls out the Sjbz chunk automatically. (at least it seems to)! BUT you can't do the same for the BG44 argument. That's too bad. It seems logical that it should accept djvu files for any arguments like that, and just extract the relevant chunk on the fly.