Alternative Software Workflow

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

jradi

Re: Alternative Software Workflow

Post by jradi »

I still got the error. It's strange, it doesn't happen to all photos, sometimes it might only happen to 2 or 3 in a batch of 600. It's the most annoying thing about my workflow, is that I have to wait until abbyy is ocr'ing (which can take quite a long time) before I leave the process alone. Until then, there's a chance that several of the pages aren't accepted and I'm forced to recrop the photos.

The strange thing is that once a jpg is corrupted, if that's what happens, then no amount of tweaking and resaving the image will recover it. The only solution is to go back to the original jpg and crop/save from there.

Maybe if pagebuilder will be better...
xylon
Posts: 27
Joined: 04 Mar 2014, 00:52

Re: Alternative Software Workflow

Post by xylon »

i was approaching the problem form the prospective that abbyy was the problem not jpgcrops.
jakegaisser
Posts: 63
Joined: 04 Mar 2014, 00:52

Re: Alternative Software Workflow

Post by jakegaisser »

daniel_reetz wrote:Surya is employing your method now (I hear from him via email much more often than the forum).

I know you've got a method worked out and everything, but PageBuilder will do your first two steps -- crop and rotate. From those JPGs you could use ABBY. All you need is PageBuilder 2 and the Matlab Component Runtime. Apologies if you've already tried it. Just be sure to check the JPG output radio button.
second link is dead, here is one:

Matlab component Runtime installer: http://www.sccn.ucsd.edu/~arno/download ... taller.exe
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Alternative Software Workflow

Post by daniel_reetz »

pagebuilder is presently unmaintained, i wouldn't waste time on it.
jakegaisser
Posts: 63
Joined: 04 Mar 2014, 00:52

Re: Alternative Software Workflow

Post by jakegaisser »

What are you currently using then? I have the Left and Right folders full of pictures for a book I scanned and I am not sure what to do from here?

I need to get my images rotated, cropped, and into PDF format with the least amount of effort possible. (it does not need to be OCCR scanned, I am fine with just having the images in a pdf)

edit: Metamorphose is a very powerful file renamer... it is also open source, I have put in a request for a feature so that you could easily rename all the files in one swoop, instead of having to do the left pages, and then the right.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Alternative Software Workflow

Post by daniel_reetz »

I am using Scan Tailor.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Alternative Software Workflow

Post by spamsickle »

jakegaisser wrote:Metamorphose is a very powerful file renamer... it is also open source, I have put in a request for a feature so that you could easily rename all the files in one swoop, instead of having to do the left pages, and then the right.
If you have Perl installed on Windows, you can do what I do. I start each new book in its own directory, within which I create subdirectories L, R, and Both.

Then, I have a little script called DIYmerge.cmd:

Code: Select all

cd L
perl f:\Scripts\Perl\DIYrename0.plx 0 > DIYrename.cmd
call DIYrename.cmd
move *.jpg ..\Both
cd ..\R
perl f:\Scripts\Perl\DIYrename0.plx 1 > DIYrename.cmd
call DIYrename.cmd
move *.jpg ..\Both
The Perl routine itself is pretty simple too:

Code: Select all

# glob an array of all the JPG files
@files = <*.jpg>;

# get starting page number from command line
$page = $ARGV[0];

# print "ren file.jpg page.jpg" for each file in array
foreach $file (@files) {
      print "ren " . $file . " ";
      printf ("%04d", $page);
      print ".jpg" . "\n";
      $page += 2;
}
The only complication arises when you "wrap" the page numbers in your camera. In my Canon, when I pass another 10,000 picture mark, it creates a new directory on the SD card, and starts the new image numbers with 0000, so my L and R directories end up with 9999-names which should come before 0000-names. In this circumstance, I just do my own rename before running DIYmerge:

ren 0* 5*
ren 9* 4*
DIYmerge
jakegaisser
Posts: 63
Joined: 04 Mar 2014, 00:52

Re: Alternative Software Workflow

Post by jakegaisser »

I wrote a script to rename, merge and rotate, it requires imagemagick to be installed.

I have:
D:\Book\L
D:\Book\R

I place this windows batch script into D:\book

makebook.batch:

Code: Select all

set count=0
FOR /r %%A IN (*.jpg) DO CALL :NUMBER %%A
goto :EOF
:NUMBER
IF "%count%"=="0" (
	set count=1
	set odd=1
) ELSE (
	IF NOT "%~p1"=="%PREVDIR%" (
		IF "%odd%"=="1" (
			set count=2
			set odd=0
		) ELSE (
			set count=1
			set odd=1
		)	
	)
)
set NUM=000%count%
set NUM=%NUM:~-4%
ren "%1" "../%NUM%.JPG"
IF "%odd%"=="1" (
convert %NUM%.JPG -rotate 90 %NUM%.JPG
) ELSE (
convert %NUM%.JPG -rotate 270 %NUM%.JPG
)
set /a count+=2
set PREVDIR=%~p1
goto :EOF
jakegaisser
Posts: 63
Joined: 04 Mar 2014, 00:52

Re: Alternative Software Workflow

Post by jakegaisser »

I am now trying to see if I can automatate cropping.... so far it does not look like I can... maybe I can find a way to crop all images as a whole for both L & R folder before renaming and rotating.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Alternative Software Workflow

Post by spamsickle »

I think your way of doing the rename/merge is better than mine, because it doesn't require Perl. Old habits...

I wouldn't bother doing the rotates in pre-processing though, if you're using Scan Tailor. Scan Tailor will rotate the images faster than ImageMagick, with less wear and tear on your hard drive. Before I became aware of Scan Tailor, I was doing exactly what you're trying to do, using ImageMagick to rotate and crop. The "cropping" was hit and miss, because there was a bit of jitter in the scanning process, so I'd typically have to include a bit of slop in the dimensions to make sure I wasn't cropping content. For a while, I was using JPEGcrops to do the cropping without the slop, but now I'm doing all of my content-selection tweaking in Scan Tailor. Cropping as a pre-processing step doesn't really guarantee that Scan Tailor's content selection won't still need to be adjusted, and I'd rather do something once than twice.

I am still using ImageMagick on the back end, to convert Scan Tailor's TIFFs to PDFs:

mogrify -format PDF *.TIFF

then pdftk to merge the individual pages into the final book:

pdftk 0*.pdf cat output finalbookname.pdf
Post Reply