November 13 update: Here.

Hundreds of my books from decades ago are stashed away in a storage locker elsewhere in my apartment complex. They might as well not exist for me. My wife and I just lack space. But what if I could scan a 300-page book in only five minutes for reading on cell phones and tablets? So I’ve just plunked down $199, plus $35 shipping, for a “smart scanner” said to allow the above. Imagine—a DIY digital library from my own paper books.

The $199 is the current price of Czur scanner on Indiegogo, a big savings from the future one of $399 and taxes and shipping. Yes, the Czur is a gamble. But if I’m wrong, I’ll err in plenty of company. Check out the Tech Crunch write-up, for example, and watch the Czur video. Shipping is to start in January.

The Czur’s creators at CzurTek describe their baby as “the world’s first true smart scanner… Czur can scan books easily and connect to WiFi. Czur is faster than any scanner in the world, and also is a video projector.” A 32-bit MIPS CPU and fast  software for scanning and correction allow you to do the job at a clip of a page a second or so, aided by a foot pedal included with the scanner. Yes, there’s supposed to be first-rate OCR. The Czur also stands out because of the WiFi capabilities you can use to create a book cloud for tablet, e-reader or cell phone, as well as for the visual presentation capabilities, complete with an HDMI port for direct connection to a projector.

Imagine the benefits not only for booklovers but also for special needs people such as those with dyslexia, who could feed the text into iPads and use such software as Voice Dream Reader. Even at the $400 expected retail price, a truly smart scanner could be a natural for schools and libraries and volunteers for groups such as Project Gutenberg and Distributed Proofreaders. I assume that PG, DP, the Digital Public Library of America and the Internet Archive know about the Czur, but just in case not, I’ll email all four organizations about it. Perhaps they and others could team up on mass purchases and arrange for discounts for volunteers. Needless to say, since the Czur will work with many kinds of documents, not just books, it could be of use in other areas ranging from medicine to law enforcement.

From $400, the price might eventually drop to a fraction of that, and large book publishers should take note. In an era when paper books will cost next to nothing to scan, just how much more piracy-proof will they be compared to e-books—with or without DRM. Will we see a technology war between content providers and scanner makers? Will the Big Five try to release books in scan-proof ink? And what do five-minute book scans on an affordable machine mean legally and legislatively?

Related: The DPLA’s guidance for librarians and others digitizing historical documents and the rest is here.

Update, Nov. 7: Welcome, Slashdotters! And in response to the skeptics, yes, we need to see the scanner in operation. But do watch the video, and you’ll see why this could be very special if the claims true. Scanning speed is apparently several times faster than a somewhat similar $600 model from Fujitsu. And if you’re working through a lot of books, that could make a big difference.

Update, Nov. 8: Internet Archive founder Brewster Kahle gives thumbs-up to Czur scanner as an affordable tool for the masses.

34 COMMENTS

  1. Virtually all of my printed books are paperbacks. From the video it would appear that the czur could scan these only if I were to cut off the spines of the books, so that the pages could lie flat under the scan head.

    Using the Czur to scan a paperback would be much better than cutting off the spine of a book and feeding the loose pages through a page scanner, but it is still going to be a pain to do it.

  2. @Anon: Thanks for sharing your thoughts. I very much encourage you and other community members to give me constructive feedback privately about the ads (as opposed to going off topic). Davidrothman@pobox.com. Telephone: 703-370-6540.

    Via email, for my enlightenment, please tell me why ads are bothering you. Are there specific ones you want blocked? We are not showing you dancing bears. That said, I’d rather not mess with AdSense as the main source of revenue. My goal is to focus elsewhere, and that means getting up the numbers so that, for example, we can rely in a major way on honestly labeled sponsored content that more or less looks like regular posts except for the disclaimers.

    Meanwhile what to do about the little issue of paying writers who contribute everyday? Would you like to work hour after hour for free? And please don’t say “exposure.” Would Chris or Paul’s landlord give them free rent for “exposure”? Yes, that’s a factor in why our regulars write for TeleRead, not to mention a chance to speak their minds, but it’s hardly enough of an incentive by itself.

    One thing I may experiment with is an arrangement to remove or reduce ads for people who donate to this site. What do you think? Again, please drop me an e-mail or call. I invite others to do the same.

    Thanks,
    David

  3. If you watch the video on the Indigogo site, it explains that they correct for the curvature of the page, it may work very well for paperbacks.

    My first thought is to scan my thousands of old photos. A traditional flatbed scanner takes forever and so-called photo scanners have poor quality. Even if it just does b&w, that will save me a ton of time.

  4. Scanner seems a little bit of a misnomer for this device. It’s really a camera with a 16mp CMOS censor, a built-in light, and a viewing screen mounted on a monopod with a base that looks like an H.G. Wells creation. It looks like it will image up to A3 size paper (textbooks?). Two points:
    –David, are you really going to digitize all those books in storage, and if you do, are you any more likely to read them or even access them than you are now? And what’s the cost/benefit there? 🙂
    –I can’t wait to see entreprenuers posting Czur versions of books on Amazon under names like Amazing Publishing!
    –OK, one more point. You’re still going to have to proofread the vaguely implied “quality” OCR to get an accurate text. I guess you will be reading them.

  5. @cdiltz2525: If the hype pans out, this will not be an ordinary scanner system. We’re talking not only about the camera but also about high-speed OCRing and a speedy, comprehensive correction system. Will there still be errors? I assume so, just fewer. Can I live with them? Yes. This will be for my personal use. I won’t mess with every bleepin’ book, but there are enough about which I care, so that this will be worth the time. The added convenience will matter. As for sleazy folks posting “Czur versions of books” on Amazon or elsewhere, isn’t this more or less happening anyway? David

  6. @David I hope the OCR is good. Based on pre-processing, maybe it will be better. I’m skeptical but we shall see. Look forward to hearing your results with this device. Yes, the sleaze on Amazon and elsewhere is happening, you’re absolutely right. Lots of scanning going on. Will Czur increase the sleaze factor? Probably, unless Amazon (and others) try to become standard bearers, raise there standards for what they offer, and the information they supply about these offerings online, but that seems to be an incremental process likely woefully incomplete in the forseeable future. 🙂 I still don’t get the personal book scanning angle but I understand. Space and books has always been a conundrum. i cull. It’s not a satisfying process.

  7. David, nothing to do with the scanner, but rather with the ads. I have turned off ad blocker for Teleread and also Nate’s Digital Reader site because I care about you guys, and find your writings valuable to me. There are not too many sites I can say that about.

  8. @cdiltz2525: I know you’re thinking the best for me in regard to the scanner, and I appreciate that. Stay tuned. If the scanner is a lemon, you can bet I’ll say so.

    Meanwhile, given my wife’s health issues and all the forms we must deal with, it’ll be nice to own a decent scanner and perhaps create a personal documents cloud—not just use it for books. My present scanner is a Canon I got on sale for maybe $60, and the software isn’t the best.

    Back to book-related uses. I suspect that some of our readers, especially older ones with many years of book purchases behind them and the desire to move to a smaller house or apartment, will appreciate our writings on this topic. What’s more, both older and younger readers may want to volunteer for worthy causes like Distributed Proofreaders or Project Gutenberg. And a good scanner could help.

    Thanks,
    David

  9. @Mary: I’m trying to encourage people to discuss housekeeping matters directly with me via email, but how ungrateful it would be of me not to thank you for your kind words about the ads. Let’s hope that Nate’s blog as well as ours will be among the white-listed. Please keep the feedback coming (via email)! Thanks again. David / davidrothman@pobox.com

  10. I prefer to remain anonymous and can therefore only comment via articles. If you wish create an article about the issue we could comment there.

    I object to any animated gif/flash/html5 ads. (I actually don’t like the ‘newest’ rotating title you have at the top of the screen, either, for the same reason.) I am using chrome with ad blocker plus. I have the “Allow some non-intrusive advertising” checked and it will allow ads that meet those requirements.

    Plenty of sites I visit still show ads and that is fine. Put static image ads as horizontal top bars, vertical side bars, even embedded horizontal bars in the article, I don’t care. Animated ads screaming “LOOK AT ME” or popup window ads are unacceptable.

    I ran for years without ad blocking but finally had to install one because of the abuses of the ad industry. Instead of fixing the issue, sites have resorted to shaming/guilt displays like yours to ask me to allow all ads again.

    Sorry, the ad blocker will stay on. I have allowed acceptable ads to be shown, fix your ads and everyone will be happy. I like your site but may have to stop coming here since the guilt popup is just as bad as any ad popup.

    • Mr. Anonymous, I block ads regularly for 99% of the sites I browse, but I’ve set my ad-blocker to make an exception for TeleRead. Have you tried browsing TeleRead with your blocker set to whitelist TeleRead? Most ad-blockers that I know of make it easy to whitelist sites, often with a single button-push.

      While I’m not in charge of the advertising, to my knowledge TeleRead is very good about not permitting animations, pop-ups, or other obnoxious ads. I haven’t seen one animated ad on TeleRead for as long as I’ve been reading here. Indeed, David is usually quite good about taking down even offensive or questionable ads, such as ads for dating services or potential scams.

      I personally have a problem with permitting ad-blockers to show “acceptable ads,” because I don’t know that the criteria those blockers use for acceptability would match mine. Furthermore, I’m opposed to the idea of letting ad-blocker manufacturers take money from advertisers for the sake of permitting their ads through–especially since anyone who doesn’t pay doesn’t get through, even if their ads are completely acceptable by every other criteria.

      So, let the site operators themselves disable obnoxious ads–as TeleRead does–and I’ll choose to unblock them myself. “Acceptable ads” is unacceptable to me.

      Yes, we prompt people to try disabling their ad-blockers when they visit. So try it. Our ads will not give people an epileptic seizure. There’s no annoying monkey to punch anywhere in sight. And we don’t pull any tricks like disabling our content if the ads aren’t viewed. But without some kind of a reminder, most people won’t even bother to try disabling their ads–and given that ads are what provide TeleRead with one of its only sources of revenue, that’s how the writers (hopefully) get paid.

  11. @Anon: I myself hate animated ads. Point to something that lets me ban them from AdSense here—while still allowing static image-based ads—and I’ll eagerly oblige.

    More significantly, I don’t see any animated ads when I look at TeleRead. AdSense customizes what it serves up to individual readers. I’d like people to email me at davidrothman@pobox.com if they themselves are seeing animated ads. Supply advertisers’ URLs (using the right click method to pick them up—rather than following the link). I really really want animated ads off this site. I already have told AdSense to show only text in the top banner. If you’re seeing an animated ad there, I especially need to give AdSense a piece of my mind.

    As for the rotating titles of articles, it’s a matter of personal taste. I didn’t enable that feature originally because I worried it would rub people the wrong way, but then I noticed that on Nate’s site, it actually was useful in bringing to my attention things I would have missed.

    Now—as to how to communicate with me directly: Just set up an anonymous e-mail account. Besides, I am not now nor have I ever been a staffer of NASA, the CIA or any other intel agency.

    Anyway, hang around, knowing we share similar concerns. Within the limits of my schedule (priority #1 is my wife, with inoperable pancreatic cancer), I’ll be exploring alternatives such as specific arrangements for sponsored content. And again, more immediately, I want to kill off the animated AdSense ads if they are indeed coming through and if there’s a way.

    Thanks,
    David

  12. This scanner doesn’t seem to address the really difficult book-scanning issues. It merely seems to make the photographing part go fast.

    Anyone who does a lot of book scanning learns pretty quickly that the real problems are the OCR and text reflow with full sentence detection across line breaks and pages. Just claiming that the OCR is good doesn’t mean anything.

    They also seem to be dancing around the Mac-compatibility issue saying that Macs can use the cloud. What does that even mean? Is *The Cloud* doing the OCR, formatting, etc.? What are the transfer issues involved. It’s certainly not going to scan, upload, process, and download everything at 1 sec/page. When people say the cloud is going to solve anyone’s issues, you should be come very skeptical, very quickly.

    • @Jeff: The photographing part is said to go a lot faster. It could be several times the speeds of even some more expensive machines. That will matter with long books or many books. I suspect that OCRing will be speedy as well. As for text reflow and other important issues you so correctly mention—well, I’ll find out firsthand when I get my Czur. David

  13. Please do report on the speed and quality of real world book scanning. The document camera should be usable for other applications as well. For example, making still images of 3D objects and video of those objects in use. As others have already mentioned, the OCR would be critical but that relies on camera auto-focus, etc.
    This system seems to be Windows-specific so it would be good to know if an OS X version is in the offing or if other OCR software might be used. It would also be good to know something about the images that the OCR works on. DPI and format (TIFF?) would also predict disk space needed.

  14. So far nobody has mentioned one thing I noticed – when she scans, her left hand thumb is on the page, covering part of the margin, yet in the completed scans, it is not there, but the graphic in the margin (that was under her thumb!) is – that’s pretty magical! So color me extremely skeptical.

  15. Phone software currently focuses on cloud integration, so they can justify a small app fee. What’s really needed is a scanner stand for your phone integrated with something like this software – you could use a Bluetooth switch to activate each photo, already available tech. The stand and software would be what, $49?

  16. The scanner is 16 MPixels, but what does that mean on a standard book with two 8.5 x 11 pages. All it says is “Clear enough for daily use”. I have stuff that I want to archive, and have excellent quality. What would the resulting DPI be?

  17. It’s interesting enough that I will buy one myself and test it, but there is a fair amount of bullshitting and technobabbling going on in the presentation:

    “A 32-bit MIPS CPU and fast software for scanning and correction”

    They make it sound Amazing, and it’s certainly better than some sort of Arduino based solution, but “32bit MIPS” says nothing about the power. The CPU of the Playstation 1 was a 32bit MIPS when it was released 1994.

    If you trawl through some of the comments it’s obvious that their software solution is far from release ready, and we don’t know what feature set it will have.

    All of this is not that important though. It’s a camera with a stand that doesn’t look hideous. So if the software doesn’t materialize then I suppose we have to cobble together the software part our selves.

  18. @Mark and @Peter and @Jeff and others: We won’t know if the scanner lives up to the hype until we try it, but forgetting technobabble, keep in mind the assertion that the speed per page is a second. I’m guessing that through multiple threading and/or buffering things could be kept moving along even if every page isn’t fully digested instantly. There is a mention of a cache arrangement. I also wonder about additional processing through the cloud. Still, it would be fascinating to know if the images are immediately available after you finish the photography, and if not, then how long will that 300-page book take. As for resolution, the spec sheet claims 4608X3456 for the CMOS itself. It’ll be interesting to see what the actual results are. Regarding image formats available, the Web site mentions JPG, PDF and TIFF (which isn’t to say that something else couldn’t be used internally to speed up processing of pages).

  19. I have yet to see an OCR that can accurately read and transcribe tables and columns of data, such as would be seen in counter-racks of old distribution-center cross-reference catalogs and price and parts catalogs. Getting those retyped in India for ten cents a page was the workaround… If you can’t get accurate OCR, it’s back to images and index/tagged metadata with data-dictionaries which is way more idiosyncratic than library science wants to allow.

    Is the OCR equal to or better than Google Books results? Column data-test, anyone?

    Imaging book pages works best from a customized cradle for the book in process, (curves, skew, coloration, lighting) with a perfectly square reference image guide also containing opposed gray scales and focusing res-pairs or gridlines: you can get exposure and sizing / curvature / flatness reference corrections in one image, if, if, if you locate the reference square “optimally”. Maybe use several reference squares and tile your work if size is an issue? Remove the reference scale/square for an archival / publish image and then “auto-process” with data from the square-containing images. I can’t imagine a workflow that is totally automated and costing $199. I still use PS to manually adjust key shots as the “auto-everything’ escapes my limited programming skills. I’ve worked with macro lenses and 35mm DSLR and polarizing filters (for lighting and with lens filters) for images on coated paper.

    Working with very old books is an experiment in building patience as your book may be the last surviving specimen. Look at HMML.org for inspiration. Think and work long term.
    – 30-

  20. @Steve: Perhaps some or all of the actual OCR processing is happening locally, but my hunch is that much or even all of it is farmed out to the cloud (remember, CzurTek talks about no need for local drivers). If so, the OCR could benefit from far more firepower than with simply local processing. And that could include the ability to handle column data and other complexities well. How many OCR systems in common use rely on cloud-based processing, not just cloud-based storage? Simply put, we need to think beyond existing paradigms. For all I know, CzurTek could have arranged for more cloud-based firepower for OCRing than Google might devote, assuming it’s using a cloud-based OCR approach at all.

    Maybe in a sense this is yet another cloud-related reversal or partial reversal of the distributed approach that characterized the personal computer revolution. And coming from of all countries China, which so often seeks to monitor and control information! One would hope that CzurTek will not provide the Chinese government with yet another monitoring tool, used in conjunction with the software to zero in on individuals and their data (in this case, both inside and outside China). I see no indication of that. It’s just something to wonder about. Hmm. Given NSA’s fondness for this kind of thing, maybe it can secretly subsidize a Czur competitor here in the States to facilitate snooping for the American intel community. Just joking. But these days you never know.

    Thanks,
    David

    Addendum: Potentially it isn’t just a brute-force issue. Perhaps the OCR algorithms are better.

  21. A am cautious about what this company may assume it can do with uploaded data. I recently used an online service to convert documents into online flip books. I uploaded original material that I own the copyright to. After (!!!) doing this, I received a notice that they had appropriated the material and rights to it, that it could not be erased, and that they would determine how and who could view it. I have no reason to believe that this company would do that, but it is something to have in mind.
    I love the idea and will get one if I can use it with my computer. I liked the appeal the CEO made to people to purchase as an investment in the development of the technology.

  22. Further to my comment above, I omitted the fact that the company that appropriated my innocently reformatted copyrighted material was based in China. That is not to say that Chinese companies are suspect, but that they are not necessarily operating under the same ethics and legal restrictions. The company that took advantage of me disclosed nothing up front and I was stupid enough to trust the service based on its web presentation.

  23. @Erica: Thank you so much for sharing your experiences. I myself broached the privacy issue earlier. How unfortunate that we may also have to worry about piracy or near piracy. Ideally the CzurTek people will share with us their TOS. If your case is well documented, I may want to publish it in TeleRead as a post rather than just your comment. As we become more and more cloud-oriented, we need to anticipate abuses of the kind you described. If you do want to write up the details in a more formal way and share with me the documentation, you can email me at davidrothman@pobox.com. I’d be very curious to know if the appropriated material is now on sale over in China. David

  24. It’s very odd to see an article promoting a product that the writer hasn’t even received yet.
    If they wont send it to you to test, don’t promote it.

    And what is the OCR?
    Most hobbyist scanners use ABBYY. Not cheap but seems to be the best this side of Google Books.
    Are these guys licensing a tried and tested solution like that or do they really think they can cobble together something themselves as good?

    I’ve got a USB desktop scanner, Cost me $15 second hand, does 4800 dpi.

    Anyway, the labour intensive part of creating a decent ebook isn’t the physical scanning, it’s proofing and correcting. You can find all kinds of garbage ebooks online done automatically and uploaded with no corrections.

    • @Alan: Thanks, but our coverage of the Czur scanner uses words such as “gamble” and “if” (in regard to performance).

      Even the headline of the first post on the Czur used a question mark: Scan a book in five minutes? $199 ‘Smart scanner’ with foot pedal and WiFi support.

      Furthermore, please keep in mind the promised image distortion correction and other features which, if they live up to the ballyhoo, might increase OCR accuracy. I’m happy about your bargain purchase. But if the Czur scanner is as described, we’re talking about an entirely different kind of device.

      As for getting a product to test when one is available, I’ve paid for a Czur myself. I am gambling $234 of my own money with shipping included. Maybe I’ll be wrong. But if the Czur lives up to the hype, we’ve got a real game-changer here.

      Technology is forward-looking. I can’t imagine TeleRead not covering forthcoming products—just so they genuinely look promising and just so that we include caveats even than.

      In a nutshell, history can do only so much to educate TeleRead community members about product options—as opposed to looking ahead.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.