Google Book SearchGoogle is digitizing at least 10M books a year, reports the Economist in a must-read piece on the search giant, the snippet issue, and the future of the book. That’s more than 3,000 a day.

Now compare that statistic to the far-smaller output of the public domain community—for example, Distributed Proofreaders, whose total output over the years is probably fewer than 11,000 titles. It will be interesting to see how genuine public domain people can keep up in ways other than numbers, where the battle is already lost.

Surviving Google

Hope exists. Right now, organizations such as the Internet Archive and Project Gutenberg—the outlet for the hardworking volunteers at DP—enjoy a major advantage over Google. Their output is easier to read. Google’s love affair with PDF for downloadable files of public domain books remains intact for the moment. But undoubtedly this will change, and meanwhile it’s clear that Google is excited over the concept of networked books. One possibility is that noncommercial entities may want to team up to originate content—including localized varieties—not found on Google. The key is to get closer to users than the corporate giant can. Imagine requesting Google to digitize your favorite public domain book, when instead, if you’re involved with Gutenberg, you can simply go ahead and scan it in yourself. Or, via OurMedia, perhaps someday you’ll even be able to encourage authors to write your favorite books from scratch.

Image: CC-licensed from tenz1225. It shows the Book Search end of Google, not to be confused with the actual library-scanning activities.

(Via Alex at MobileRead.)

6 COMMENTS

  1. The Google books exercise is of enormous value for me but it is like searching a garbage can. The quality of Google books is appaling, OCR is no better and their policies to show you only snippets of books more then 100 years old that are no longer available in bookshops or libraries is very frustrating. What is more, the snippets are in at least 50 per cent not relevant to my search.
    As an individual I can do much better scans/photos with OCR and indexing. Thanks and shame for Google.

  2. I guess I’m missing something — why should I care if Google supplants PG and IA for putting public domain books online. They are still in the public domain, so nothing will prevent me from grabbing the book and making my own version or redistributing it in text form.

    Now obviously if Google tries to assert some sort of ownership of derivatives of its public domain scans, that’s a problem, but assuming nothing like that happens and they don’t care if I grab the public domain PDF and do my own OCR and convert it to the ebook format flavor of the week, what’s the problem?

  3. Thanks for your thoughts, Brian. Remember, Google is already watermarking public domain content with a corporate logo, and that just might be the opening for much more. I’m all in favor of all kinds of biz models, including Google’s. But I do worry about it preempting libraries and other institutions and Gutenberg-style groups in various respects. – David

  4. Google Doubter makes an interesting point re: the scan quality…putting aside the fact the bloated file size is sometimes due to color reproductions of tan pages(!), some of the p.d. Google Books I tried reading were either missing pages or had them out of order. I still needed Project Gutenberg copies to be sure I got everything that was supposed to be there.

    I’m a big booster for digital facsimiles of classic editions, so in that respect I don’t mind PDF so much, but Google Books has a bit of a way to go in the quality control department before they’re a proper threat.

  5. I have been using Google Book search to conduct a simple fun research project that I hope to write about on this blog. It is a remarkably powerful and useful tool and Google in my opinion deserves praise for building it and letting people use it without charge. Government and non-profit groups have so far spectacularly failed in constructing a comprehensive e-book library. (Certainly the Internet Archive, Distributed Proofreaders, Gutenberg and other groups also deserve praise.)

    The Google tool has exasperating flaws as Google Doubter and Eric Wilson mention above. The database contains too many blurry unreadable scan images together with improperly cropped images and half-images. In addition there are sometimes fingers visible in scans and inaccurate optical character recognition (OCR) results. The snippets are too small and the graphical image companion for each snippet sometimes is cropped too tightly or shows the wrong text. (Google is under ferocious legal attack by publishers and author guilds and perhaps it has deliberately chosen a tiny snippet size.) The main problem hampering my research diversion is inaccurate/misleading publication dates.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.