BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Post Reply
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic »

BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

https://github.com/nod5/BookGapCheck
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dpc »

Thanks for posting. It's an interesting tool and could be quite helpful. I first heard about this technique in the video describing the Google linear traveling book scanner in 2012 and thought it could be handy (Google open sources a DIY page scanner). The section of the original YouTube video describing this page number mosaic technique can be viewed here.

At one time I thought about doing something similar but using OCR and fully-automating the process. Before starting a scan you'd define a subrect of the page where the page numbers typically lie (as BookGapCheck does) and after each page is scanned the contents of that subrect (that hopefully include the page number) are saved as a separate small jpg image that would then be passed to a batch command that runs an OCR program and produces a txt file containing just the page number. The contents of that txt file could be read as input to another program that compared its page number with previous results and let the operator of the scanner know if they have missed/duplicated a page. It's also possible to correlate an image file with the actual page number from the book, which could come in handy if you're looking at a folder with several hundred image files for a particular page from the scanned book.

As I was thinking of other clever ways to use this info I believe it was about that time that my lovely wife wanted to know when I was going to get around to painting the house, so I shelved that idea and haven't done anything with it since.
BillGill
Posts: 139
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by BillGill »

I'm not sure that using something like that would be worth the effort. Currently I check for gaps/duplicates by going to File Explorer and viewing each page to see if there are any skipped/duplicate pages. If I had to edit each image to get the page number it would take much longer, so there would be a large loss in efficiency.

If it could be automated that might make a difference, but I'm not sure how the page number detection would work.

Bill
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dpc »

I think you only need to define the subrect of a page that contains the page number once. After you've done that the program assumes the page number is in the same location relative to the edge of the page on subsequent pages. At least I hope that's the case.

The program that I wrote that controls my DSLRs while scanning renames the image files to match the page number (based on an offset that I manually enter) so when I'm scanning I just need to occasionally compare the page number with the name of the image file being written to see if I'm still on track. Also makes it easier to locate the image file associated with a particular page number in a folder containing hundreds of images.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic »

dpc wrote: 12 Oct 2018, 13:43 Thanks for posting. It's an interesting tool and could be quite helpful. I first heard about this technique in the video describing the Google linear traveling book scanner in 2012 and thought it could be handy (Google open sources a DIY page scanner). The section of the original YouTube video describing this page number mosaic technique can be viewed here.
Thanks, I've seen that youtube video but hadn't noticed (or forgot) that detail.
dpc wrote: 12 Oct 2018, 13:43 At one time I thought about doing something similar but using OCR and fully-automating the process.
I considered going that route but in the end liked the image grid/mosiac method better. Someone who wants OCR could pretty easily add code for a step that runs Tesseract on the BookGapCheck output image.
BillGill wrote: 13 Oct 2018, 09:29 If I had to edit each image to get the page number
You only open one single image and draw a rectangle around the pagenumber. The program does the rest.
BillGill
Posts: 139
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by BillGill »

Is the location of that rectangle based on the image edges? The way I scan I wind up with the images moving around in the camera's field of view. Also I sometimes wind up with the images being in different orientations, at least as seen in Windows Explorer. With at least one of my scanners I wound up with the pages alternating which way was up.

Bill
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic »

BillGill wrote: 15 Oct 2018, 09:24 Is the location of that rectangle based on the image edges?
Yes. Should work ok if the page numbers are even only roughly at the same x/y position in each photo. Probably easiest if you give it a try and see if it works well with the type of photos you have to work with.
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dpc »

My mistake. I thought it based the subrect from the edge of the page and not the edge of the image. Still could be helpful for a number of scanner designs that contain the page to the same area of the platen (i.e. camera frame) across the scan. It doesn't work for situations such as Bill's scanner, as well as pages that have the page number printed in the upper left corner on the left side pages, and the upper right corner on the right side pages. If you were to handle that left/right issue by allowing the user to specify two subrects (one for the left side pages, one of for the right), that might cover Bill's case as well?
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic »

dpc wrote: 15 Oct 2018, 16:14 It doesn't work for situations such as Bill's scanner
Not sure what the issue would be. If you draw the area a bit larger then a little movement in the position of the book relative to the camera is no problem. But if the book is moved around a lot between shoots then yeah this might not work. Or some previous step is needed that first crops all images down to only the book page.
dpc wrote: 15 Oct 2018, 16:14 as well as pages that have the page number printed in the upper left corner on the left side pages, and the upper right corner on the right side pages.
It captures the same area in every subsequent 10th image which means you get either only right corners or only left corners. If every 10th seems too sparse then the code could pretty easily be modified to do every 8th, 6th, 4th or 2nd image while still only drawing one rectangle.

I suppose if someone modified the code to get the same area from every image instead of every 10th image then there would be a left/right corner problem only solvable by drawing two rectangles.
reproman
Posts: 24
Joined: 03 Apr 2014, 21:20
E-book readers owned: 8 kindles
Number of books owned: 3000
Country: us

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by reproman »

A useful tool, especially when scanning large books, 500 pges and more
Thanks,
J
Post Reply