image The Christian Science Monitor’s headline about Google says it all: "The field narrows for e-books. As Microsoft backs away from digitization, some worry that a single company could privatize world knowledge." Granted, there’s Amazon. But its library of the classics is tiny compared to Google’s, and it’s charging the public for e-book versions.

image "If we assume that a healthy, diverse, and accessible body of information is essential to science, politics, creativity, literature, then we really have to step back and say, ‘Do we really want to put this one company in the position of being the filter for the world’s information?’" So worries Siva Vaidhyanathan, a media specialist and cultural historian at the University of Virginia.  "I wouldn’t say Google is 100 percent of the digital book world," he says, "but it’s getting near 90 percent." See The Googlization of Everything, the site for his book in progress.

The financial angle: The same article quotes Brewster Kahle of the Internet Archive as saying that the Microsoft retreat shows the risk of depending on a for-profit company to pay for digitization. Exactly. We need a mix of approaches—philanthropic, for-profit and tax-funded.

image The OCR angle: The Monitor quotes Lotfi Belkhir, CEO of Kirtas Technologies, a major digitizing company, as saying: "Google is doing a very, very poor job…. Their OCR is very inaccurate, the image quality is very poor. You find cutoff text…. You find dirty text. You find incomplete pages."

"He predicts that much of what Google has digitized so far will need to be rescanned someday to bring it up to acceptable quality," the Monitor reports.

Usual disclosure: I own a speck of Google for retirement purposes, and TeleRead carries Google-supplied ads. Never know it from the above, would yeah? I just call the shots as I see ’em—both pro-Google and anti.

Related: Getting the classics right: Is Google spending enough on OCR for 19th and early 20th century type?, a TeleBlog item.

Technorati Tags:

4 COMMENTS

  1. Governments should be doing this. Especially nationalistic governments concerned with preserving their written history.

    After governments, all institutions of learning should be doing this. Especially institutions of higher learning. I notice that a university guy complains about this. Well, why isn’t UVA doing something about it? Why are all those Universities co-operating with Google rather than organizing their own collaborative effort?

    I think it’s just wrong for somebody in such a position to stand on the sidelines and say, ‘Let some private company take care of this,’ and then complain later on, ‘It’s a private company doing this!’

    Brewster Kahle on the other hand is a downright hero in ebook history. He has done plenty, and has a right to complain all he wants.

    As far as the quality of Google’s digitization, I have read that their methods were improved somewhere along the line. Rather than relying upon robots, they hired college students (in lieu of slaves) to turn the pages in between camera shots. I hope that’s right.

  2. The concerns towards the end of the CSM article are completely spurious. While I agree that we don’t want “one company in the position of being the filter for the world’s information”, regardless of how good that company is, that’s not what’s happening here. Google scanning books simply adds one more source for that information (two, if you count the copy they give the partner library) – it doesn’t prevent you from doing anything you already could do. More diversity would indeed be good, but some scanning is definitely better than none.

    Disclaimer: I work for Google (though not on anything book related). The above opinions are my own.

  3. Nick, it’s great you’re around to provide another perspective. Remember my own disclaimer: I’m a very very small Google shareholder. All in the family, eh? I encourage you to speak up with your personal opinions as you did just then.

    My response would be that the libraries may well rely on Google exclusively as those paper copies decay. And how many libraries will arrange for digitization twice? There’s also the related issue of contractual limitations. From Wikipedia—not the ultimate authority but a handy reference in this case:

    “Google licensing of public domain works is also an area of concern [38], Google apparently is claiming a restrictive ‘No-Commercial use’ term in respect of the PDF electronic versions it provides, as well as using digital watermarking techniques with them. Some articles that are in the public domain, such as all works created by the U.S. Federal government, are still treated like other works under copyright, and therefore locked after 1922.[39].”

    That said, I love the idea of Google digitizing books if it can do it more efficiently than others and if it can improve accuracy. It’s just that I want to see more companies in the game. And I’d like Google and others to be doing the work as contractors for libraries—so the works can go online without restrictions. What’s more, if Google wants to do things independently, since I fear the power of D.C. and want private alternatives out there, that’s fine. But have the library world entrust so much of its fate to Google? I indeed have problems. Google has stellar leadership, regardless of the criticisms I may make, but what happens if that changes?

    Firsthand, I can speak of future risks from corporate Big Brothers. Publishers Weekly wiped out tens of thousands of words I’d written in my PW E-Book Report blog, either for commercial reasons or because the editors felt uncomfortable with my enthusiasm for e-books or maybe because of corporate politics (PW also deleted the blogs of the former publisher and the woman who hired me).

    I’d also point out that under the control of GE and the like, TV networks are not nearly as adventurous as when they were more independent.

    See what I mean? Governments can censor, but so, in effect, can corporations—as I know first hand. What happens if Google in the future quietly deletes old data in a way that favors corporate interest? Undoubtedly you would say, “Impossible. Bad for credibility.” Of course. But that didn’t stop PW from doing an Orwell act on my blog archives.

    Thanks, Nick, and keep commenting away (with both of us understanding you don’t speak officially for Google)!

    David

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.