James Grimmelmann: HathiTrust single-handedly sinks orphan works reform


In a series of blog posts yesterday whose tone can only be described as “gleeful,” the Authors Guild has been showing that specific books aren’t orphans. So far, they’ve found copyright owners or literary agents for J.R. Salamanca’s The Lost Country, Albert Bandura’s Adolescent Aggression, and James Gould Cozzens’s Confusion. They didn’t track down Walter Lippmann’s The Communist World and Ours, but it appears that someone else did. The legwork involved wasn’t particularly intensive: some Google searches, some queries of standard copyright-related databases, and some phone calls.

This would be a dog-bites-man story, except for the fact that all of these books were on HathiTrust’s list of orphan works candidates. Oops. All of these books had gone through HathiTrust’s workflow, which was supposed to carry out “due diligence” to determine whether these works were likely to be orphans.

Once is a mistake, twice bad luck, and three times is a sign of a broken process. The Authors Guild’s experiment demonstrates that HathiTrust’s orphan-tagging workflow cannot be relied on to identify genuinely orphan works with sufficient confidence to be usable. Out of 166 books originally on the list, at least four have been identified as non-orphans. A 2.5% false positive rate isn’t going to be acceptable.

The workflow itself isn’t described in particularly much detail, despite HathiTrust’s promise to “post as much of the project’s internal documentation as appropriate on this page.” It calls for:

  • A check that the book is not available on Amazon or Bookfinder.
  • A check that the author isn’t on the “live list.”
  • “Look for copyright holder contact information.”
  • “Attempt email contact.”
  • “Attempt phone contact.”

Whatever those last three steps comprise, it isn’t working. Whatever databases they’re checking for contact information aren’t sufficient.

On Twitter, Justin Grimes referred to these findings as “The ‘one example’ rule for invalidating arguments.” It’s true that these are individual books, not necessarily representative of the broader corpus of books scanned by Google and held by HathiTrust libraries. But this was also a sample chosen by HathiTrust itself. This was the libraries’ chance to put their best foot forward, to show that their process could be trusted, to show that there are real orphans out there. The results were not reassuring.

Legally, there are reasons why these non-orphans may not matter much in this case. Paul Aiken, Executive Director of the Authors Guild, has said that the lawsuit is primarily about the large-scale digitization (millions of books), not the much smaller Orphans Works Project (hundreds). The Authors Guild may have a hard time making legal claims specifically about the Project, for procedural reasons I’ll get into in future posts. Still, these discoveries are, as Eric Hellman said in a comment, “Major egg on the elephant’s face!”

And, looking to the broader picture, these revelations will discredit other efforts to make genuine orphan works more accessible. No one will ever be able to make the orphan works argument again without opponents bringing up the HathiTrust orphans that weren’t. Copyright owners will always regard such efforts with suspicion, as a pretext just for distributing the books, copyright be damned. And the idea of a “diligent search” sounds a lot less reassuring now that HathiTrust’s initial searches have been shown to be ineffective in multiple cases. The title of this post may be an exaggeration, but not by much.

I hope to update this post to deal with any responses from HathiTtust and the libraries, and with further developments.

Reprinted under Creative Commons Attribution License from The Laboratorium

2 Comments on James Grimmelmann: HathiTrust single-handedly sinks orphan works reform

  1. All of this raises the very good question of who should be authoritative with respect to the currency of a specific copyright. Since copyright is a government grant, it seems reasonable that government should be authoritative. Yes, more bureaucracy but what alternatives do we have? One camp will elect to err on the liberal side and the other camp will elect to err on the conservative side. Who will be objective? Who will establish the objective criteria? How exhaustive must a search for the copyright holder be? What responsibility does the copyright holder (or representative) have?

  2. Of course, a simple solution to the “problem” of orphan works would be to limit copyright to a fixed term, of say 50 years, after the date of first publication.

    I have never understood why patents run for 28 years, but copyright lasts for the life of the author plus 50 or 70 years.

    So, if you discover the cure to cancer, the new drug is protected by patent law for 28 years. If you write a book about how you dicovered the cure for cancer, it might well be protected by copyright for 120 years.

    I just don’t get it.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail

wordpress analytics