Scan Tailor

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

Добро пожаловать Tulon! Thanks for being here.
We should probably start translating Scan Tailor's documentation from Russian into English. I guess I am the only person capable of doing this?
I know enough Russian to be dangerous -- I can and will help toward the end of this month.

If you have the time, would you consider writing a little something on the Russian book scanning community? I have learned a lot of Russian from reading scanned Russian books, have scanned many Russian-language books, and enjoyed many, many DJVU from book scanning blogs, (though it seems like the links always die too soon). I also enjoyed having ebooks (scanned and otherwise) being supplied by my ISP while I lived in Obninsk. Maxnet was my provider, and they had every O'Reilly book under the sun, as well as dozens of others.

I think a lot of people around here would be very interested to know more about the Russian scanning/sharing efforts, legal and otherwise. Again, thanks for coming by and I hope we can work together!
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

Well, the book scanning community in Russia is quite prolific. I would say no less than 10 scanned books get published daily, although some of them are beautification efforts on scans released earlier. The largest and the oldest legal site is http://www.lib.ru It exists since 1994 and hasn't changed its design since then :)
Then we have less legal sites, that don't try to hide themselves though. I am not sure if I should link to them from here. I heard than even linking to a site that links to copyright infringing materials is a crime these days. Anyway, those sites contain almost exclusively titles in Russian. Now you are probably wondering, doesn't the Russian government try to shut down these sites? I would say, no. I they wanted to, they would do it. Book publishers sometimes try to shut them down using legal actions, or even just buying them. I think they've given up on trying to shut them down though. Maybe they've just run out of money :) I mean, book publishing is not as lucrative as music publishing.
Then we have sites that do try to hide their existence. I don't have access to most of them, and they don't appreciate revealing them anyway.

I am actually keeping distance from the Russian book scanning community. In particular, I never publish anything, and don't even give out any copyrighted stuff privately. That's because I am a too easy target for legal actions. People in Russia don't generally need to worry about it. Piracy is so widespread there, that even ISPs typically run warez servers for their customers. I can't speak about Russia, since I've moved from there like 18 years ago, but that's definitely the case in Lithuania, where I lived until recently.
In fact, there was an event recently, that may have been an entrapment attempt targeted at me personally. It failed, and after that, a smearing campaign was launched against me. Well, "campaing" is probably a too strong term in this case. It's just a single individual keeping posting slanderous comments about me and about Scan Tailor. Because daniel_reetz speaks Russian, he may read the whole story from here. As a result of this event, I am afraid the Russian book scanning community will become suspicious and hostile towards foreigners.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

Tulon wrote:Piracy is so widespread there, that even ISPs typically run warez servers for their customers. I can't speak about Russia, since I've moved from there like 18 years ago, but that's definitely the case in Lithuania, where I lived until recently.

(SNIP)

In fact, there was an event recently, that may have been an entrapment attempt targeted at me personally. It failed, and after that, a smearing campaign was launched against me. Well, "campaing" is probably a too strong term in this case. It's just a single individual keeping posting slanderous comments about me and about Scan Tailor. Because daniel_reetz speaks Russian, he may read the whole story from here. As a result of this event, I am afraid the Russian book scanning community will become suspicious and hostile towards foreigners.
When I lived in Obninsk, my friends actually helped me choose my ISP by the quality and quantity of their warez. That was in 2006, and I know it is still going on, though much of the city is converted to ADSL at this point.

That was a nasty thread, I hope we never have such problems over here. The touched-up machine translation made it really hard to read. I agree with you -- if he's going to come after you, at least use English, so most everyone can understand the slander. ;)

Generally, I agree with you that some Russian communities can be very suspicious toward foreigners, it has certainly been my experience. I think the shining example of the total opposite is the CHDK project. They are really commendable and have been extremely open, patient, inclusive, and productive.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Hi Tulon,

I'm very, very impressed with Scan Tailor. I built it on an Intel Mac (native qt-4 from Darwinports) and it works perfectly. The GUI is really good, very responsive, and the processing doesn't even take very long.

But anyway, as mentioned earlier, the only thing missing is dekeystoning and/or compensating for lens distortion. With scans from a flatbed scanner, of course, that probably wouldn't be an issue.

I would suggest only two improvements. First, in the thumbnail view, it would be nice if images starting from the top were always fully displayed. The problem I've been having is that I like to use the thumbwheel on the mouse to rapidly scan up and down the thumbnail images to check on content boxes. At a high speed, the images seem to march, so my eye is constantly attempting to track.

The second improvement would be for multiple core/multiple processors. Pretty much all machines now have at least two cores. On my Intel Core Duo Mac, I can see that out of the two processors, only one is used at any time.

Anyway, I'm currently playing around with algorithms that will compensate for keystoning, distortion, and skew all at once, and hopefully compensate for illumination, but it relies on a calibration image that has to be inserted in the book every N pages (N probably 100 or so). See this thread for the idea. If you have any ideas, we would all be very happy!

Thanks again for ST!

--Rob
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

Tulon wrote:That was a nasty thread, I hope we never have such problems over here. The touched-up machine translation made it really hard to read. I agree with you -- if he's going to come after you, at least use English, so most everyone can understand the slander.
Have you actually read beyond the first page? If so, you missed the point completely. What this person wrote to the forum is not really relevant, except it seems to indicate he is the same person who contacted me privately, asked for things like raw scans I use to test Scan Tailor, asking to point him to specific books I may have downloaded from the internet, stuff like that. Having received none of that, he proceeded to post slanderous comments about me on sourceforge.net. It's still ongoing in fact. I delete them regularly. They are in English of course. There were quite a few other suspicious things about him, which are documented in that thread. I haven't actually revealed all the details at once. Instead I gave out information in small portions, allowing the forum audience to come to the same conclusions I did. That took 3 pages, and it worked. I don't really have the time or desire to repeat the whole process here. Besides, that person started that process by himself, when he came to the forum with a slightly different story, but asking for the same things he asked me about. I don't expect him to come here.
Tulon wrote:Generally, I agree with you that some Russian communities can be very suspicious toward foreigners, it has certainly been my experience. I think the shining example of the total opposite is the CHDK project. They are really commendable and have been extremely open, patient, inclusive, and productive.
Well, hostility towards foreigners from Russians in general have different roots. It all started when NATO bombed Serbia. Then it was Iraq, renditions, torture - that kind of stuff. Such behaviour from the US government was probably quietly welcomed by the Russian government. After all, now they could do nasty things, saying to critics: "You call that nasty? Look what Bush just did - that's nasty."
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

daniel_reetz wrote:Anyway, I'm currently playing around with algorithms that will compensate for keystoning, distortion, and skew all at once, and hopefully compensate for illumination, but it relies on a calibration image that has to be inserted in the book every N pages (N probably 100 or so). See this thread for the idea. If you have any ideas, we would all be very happy!
Well, users from the Russian Scan Tailor forums pointed me to this paper:
http://pubs.iupr.org/DATA/2009-IUPR-21Aug_1705.pdf
It describes exactly the thing you want, without any calibration images or stuff like that. It doesn't contain any details though, just referencing other relevant papers. If you are good at math (I am certainly not), we could try to implement that. For now, I am busy with other stuff though. I need to finish manual picture zones and make despeckling less aggressive and adjustable.
daniel_reetz wrote:The second improvement would be for multiple core/multiple processors. Pretty much all machines now have at least two cores. On my Intel Core Duo Mac, I can see that out of the two processors, only one is used at any time.
That's not really relevant any more. The next big thing is GPU processing (Cuda or OpenCL). That stuff accelerates things orders of magnitude.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Hmm, that method seems to rely on the presence of text lines. I'm always worried that whatever algorithm is used will not take into account pages with only pictures or graphs.

I like the GPGPU idea -- I didn't know it had gone mainstream, I thought it was limited to only certain video boards.

--Rob
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

rob wrote:Hmm, that method seems to rely on the presence of text lines. I'm always worried that whatever algorithm is used will not take into account pages with only pictures or graphs.
For such cases we would just create a manual mode. I heard (even seen some screenshots) that Photoshop has such a plugin. For manual correction of lens distortion I mean.
rob wrote:I like the GPGPU idea -- I didn't know it had gone mainstream, I thought it was limited to only certain video boards.
It had - we use it at work for a couple of years already. Well, they use it, not me. I am mostly doing server stuff and visual editing tools. We use high-end consumer graphics cards from NVidia. For those curious, we develop a video post-processing solution targeted at embedded advertising, emphasizing unattended server based processing.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

Tulon wrote:
Tulon wrote:That was a nasty thread, I hope we never have such problems over here. The touched-up machine translation made it really hard to read. I agree with you -- if he's going to come after you, at least use English, so most everyone can understand the slander.
Have you actually read beyond the first page? If so, you missed the point completely.
I read from page 65-68, and understood the story perfectly well. I just didn't want to drag that discussion over here, so I commented on an aspect not related to you personally. You can see that reflected in the sentiment I expressed: I hope we don't have such problems over here.
Tulon wrote:I don't really have the time or desire to repeat the whole process here.
Yep, we agree.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Wowzers... Mac OS 10.6 includes OpenCL. If Scan Tailor starts including GPGPU options, I may actually have a reason to move to 10.6!

--Rob
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
Locked