BookReaper - HTML generation from PDF/DJVU

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Post Reply
bookscanner
Posts: 12
Joined: 18 May 2016, 08:24
E-book readers owned: 2 Android tablets
Number of books owned: 500
Country: Mongolia

BookReaper - HTML generation from PDF/DJVU

Post by bookscanner »

This cross-platform soft automates common tasks for generation of HTML content from PDF/DJVU documents - text/picture zone selection, OCR, text manipulation, and document fragmentation to HTML articles:
breaper01.png
For PDF/DJVU manipulation and OCR it uses free tools: Ghostscript, DJVULibre and Tesseract:
breaper02.png
User can manually select rectangular/polygonal zones (hold Ctrl key and press/drag left button for rectangular zone, single clicks for polygonal), define associated attributes (type,title,tags,label), OCR and edit its text (there is Lens tool available for page image - hold Shift when moving mouse):
breaper03.png
breaper04.png
Every zone has label attribute, which defines associated HTML article, and title (HTML or image title).

Example of generated HTML content (click on images to see in original size): http://elboom.getforge.io/

Soft (requires Java for operation):
breaper_2016.05.18-20.24.zip
(7.48 MiB) Downloaded 355 times
bookscanner
Posts: 12
Joined: 18 May 2016, 08:24
E-book readers owned: 2 Android tablets
Number of books owned: 500
Country: Mongolia

Re: BookReaper - HTML generation from PDF/DJVU

Post by bookscanner »

Another example of generated content: http://elboom.getforge.io/avr_stab_001.html
bookscanner
Posts: 12
Joined: 18 May 2016, 08:24
E-book readers owned: 2 Android tablets
Number of books owned: 500
Country: Mongolia

Re: BookReaper - HTML generation from PDF/DJVU

Post by bookscanner »

Update. Added many hotkeys, zone rotation, editor font size adjustment:
breaper05.png
breaper_2016.05.20-21.10.zip
(7.5 MiB) Downloaded 361 times
bookscanner
Posts: 12
Joined: 18 May 2016, 08:24
E-book readers owned: 2 Android tablets
Number of books owned: 500
Country: Mongolia

Re: BookReaper - HTML generation from PDF/DJVU

Post by bookscanner »

Added spellcheck (language is selected by right-click) and export to FB2.
HTML and FB2 templates are in "template" subdirectory.
breaper_2016.05.25-16.01.zip
(15.22 MiB) Downloaded 408 times
Post Reply