This cross-platform soft automates common tasks for generation of HTML content from PDF/DJVU documents - text/picture zone selection, OCR, text manipulation, and document fragmentation to HTML articles:
Example of generated HTML content (click on images to see in original size): http://elboom.getforge.io/
Soft (requires Java for operation):
For PDF/DJVU manipulation and OCR it uses free tools: Ghostscript, DJVULibre and Tesseract:
User can manually select rectangular/polygonal zones (hold Ctrl key and press/drag left button for rectangular zone, single clicks for polygonal), define associated attributes (type,title,tags,label), OCR and edit its text (there is Lens tool available for page image - hold Shift when moving mouse):
Every zone has label attribute, which defines associated HTML article, and title (HTML or image title).BookReaper - HTML generation from PDF/DJVU
Moderator: peterZ
-
- Posts: 12
- Joined: 18 May 2016, 08:24
- E-book readers owned: 2 Android tablets
- Number of books owned: 500
- Country: Mongolia
-
- Posts: 12
- Joined: 18 May 2016, 08:24
- E-book readers owned: 2 Android tablets
- Number of books owned: 500
- Country: Mongolia
Re: BookReaper - HTML generation from PDF/DJVU
Another example of generated content: http://elboom.getforge.io/avr_stab_001.html
-
- Posts: 12
- Joined: 18 May 2016, 08:24
- E-book readers owned: 2 Android tablets
- Number of books owned: 500
- Country: Mongolia
Re: BookReaper - HTML generation from PDF/DJVU
Update. Added many hotkeys, zone rotation, editor font size adjustment:
-
- Posts: 12
- Joined: 18 May 2016, 08:24
- E-book readers owned: 2 Android tablets
- Number of books owned: 500
- Country: Mongolia
Re: BookReaper - HTML generation from PDF/DJVU
Added spellcheck (language is selected by right-click) and export to FB2.
HTML and FB2 templates are in "template" subdirectory.
HTML and FB2 templates are in "template" subdirectory.