A pair of interesting articles about Google Books came to my attention over the last day or so. First, in The Atlantic, Alexis Madrigal looks at how Google has been tweaking and updating its search algorithms to trawl the linkless world of text on paper, where searchers have radically different needs than those who search the web.
In the last couple of days, Google has rolled out a new tweak called “Rich Results,” which presents one extra-large search result if Google thinks that you’re searching for a specific book title.
Rich Results is the latest in a series of smaller front-end tweaks that have been matched by backend improvements. Now, the book search algorithm takes into account more than 100 "signals," individual data categories that Google statistically integrates to rank your results. When you search for a book, Google Books doesn’t just look at word frequency or how closely your query matches the title of a book. They now take into account web search frequency, recent book sales, the number of libraries that hold the title, and how often an older book has been reprinted.
Google Books is becoming a better and better for finding the knowledge you want within the books that are available. Leaving the copyright controversy aside, there has never been a better way to search for material inside books; the old card catalog system (or even digital card catalog) just pales by comparison.
And perhaps we should leave the copyright controversy aside. At least, that’s the perspective offered by the other article that caught my eye. On The Guardian, Robert McCrum looks at the question of creating national digital libraries. He points to a recent presentation (reprinted in The New York Review of Books) by Robert Darnton, director of the Harvard University Library, which proposes the idea.
What’s remarkable about Darnton’s very short paper – a call to arms, really – is that by placing the "vexed question of copyright" in a national perspective, and by putting the idea of "cultural commons" to the service of the common good, Darnton debates an issue that usually generates heat not light in a way that sounds supremely rational. Neither Britain nor the US has plans for a national digital library but Japan, France and the Netherlands all do and, as Darnton remarks, if they "can do it, why can’t the United States?" I would add: why can’t Britain?
Darnton’s reprinted speech is worth reading in its own right. He points out that freedom of access to information was an important principle to founding fathers Jefferson (who said “Knowledge is the common property of mankind,” and also made the oft-quoted analogy of knowledge to candles) and Adams.
And unlike in the 18th century, the Internet offers the potential of enormous freedom of access to information. So, Darnton suggests, we should take advantage of that freedom and create a national electronic library. For all the copyright controversy surrounding Google, it has at least shown what is possible. If a corporation can do it, why can’t the government, or other organizations that work toward public interests?
I propose that we dismiss the notion that a National Digital Library of America is far-fetched, and that we concentrate instead on what we can learn from others about issues such as: How can we deal with the problem of copyright and of orphan books, i.e., books whose copyright holders can’t be located? How can we cope with the complexities of metadata—that is, catalog-type information necessary to locate digital texts in the ever-changing environment of cyberspace? How can we find funding and develop a business plan that will resolve the long-term difficulties of collection management and preservation?
McCrum suspects that publishers are still too much in shock from Google’s “audacious copyright snatch” to consider the idea, but points out that all that Google has really done is privatize what a national government or culture should be doing to begin with. It makes sense to me—after all, in the US we have a Library of Congress; why not an E-Library of Congress?