If you think Web items are forever, just read Adrienne LaFrance’s Atlantic piece headlined Raiders of the Lost Web: If a Pulitzer-finalist 34-part series of investigative journalism can vanish from the Web, anything can. Even Apple these days can’t keep its App Store online as reliably as it ought to. Nate Hoffelder discusses this in an aptly indignant post today: iBooks Goes Down in Apple’s Fourth Outrage in Less Than a Week. I doubt that a single iBook title vanished for the rest of eternity. But Apple’s mishaps, plural, show how dodgy the technology can be at times, even decades after the Internet’s creation.
Now—how to make digital content dependably available beyond the short term? While Brewster Kahle of the Internet Archive has done some noteworthy work on the bits-and-bytes side, the issues are not just technical. They are also economic. Later on, TeleRead Editor Chris Meadows will offer his own take on the Atlantic piece. But meanwhile I hope the Atlantic’s reportage can serve as a reminder of the need not just for robust national digital library systems but also for a national digital library endowment to help finance them. See my related articles in The Chronicle of Philanthropy, Library Journal and Education Week as well as a guest contribution to Jim Fallows’ blog on the Atlantic site.
Over the years my focus has generally been on the public library and K-12 sides. But I have also also called for stable links, especially with networked books and multimedia in mind. And what about preservation of, say, blogs? Not all are three-month wonders. The TeleRead site, in one form or another, has been around since the 1990s. WordPress didn’t even exist then. Thank goodness it didn’t even though we’re using it now. Databases get corrupted. We’re backed up with VaultPress, but I’d still feel much better if the right software and storage were available on the library side for long-term preservation of serious blogs. So far, to my knowledge, the Digital Public Library of America just has not followed up on my suggestion to develop some good creation software for blogs and integrate it with the library’s archives.
In a related vein, a paragraph in the Atlantic piece leapt out at me: “Digital information itself has all kinds of advantages. It can be read by machines, sorted and analyzed in massive quantities, and disseminated instantaneously. ‘Except when it goes, it really goes,’ said Jason Scott, an archivist and historian for the Internet Archive. ‘It’s gone gone. A piece of paper can burn and you can still kind of get something from it. With a hard drive or a URL, when it’s gone, there is just zero recourse.’” As reported in the Atlantic, “Every once in a while, Scott will get word of a site founded around 1997 that’s about to go under. ‘These are really long-term tragedies,’ he said. ‘Simply because they’re almost all gone.’” Well, TeleRead isn’t. Our domain (originally ending with an .org, not a .com) was created in May 1997. But as I’ve written, “TeleRead existed [earlier] with updates on the old Clark.net without a separate domain.” I am endlessly grateful to the Archive for preserving this site even with some gaps.
That, however, is not a full solution. While TeleRead itself is full of old essays that could stand on their own, we’ve linked externally. And many of our link targets are gone.
Undeniably, the same problem will arise in the growing number of e-books, including titles from major publishers, that link to Web sites. Simply put, the issue of long-term preservation is not just one of scanning the contents of the Library for Congress and storing them in digital form—not if we care about link targets and about the Web as a whole, not just the preservation of specific books or sites. Priorities must be set. Who and what survives the triage? Even if we can store every byte from every blog, it will still help to be able to highlight the more promising items. So we’ll need not just machines but also human curators or at least rather sophisticated bots. That will cost.
Granted, the Library of Congress has entered the born-digital realm, but such efforts are far from what they need to be. No one, for example, even cared to return my phone call a few months ago when I left a message saying I owned the world’s oldest site devoted to e-book news and views of general interest. Among other things, TeleRead has helped lead the push for e-book standards, not just fought for well-stocked national digital libraries. But even the standards battle still goes on, complicated by proprietary digital rights management; and meanwhile, as the Atlantic notes, ephemeral technologies such as Flash have turned all too many sites into quicksand.
Let’s hope the resources will appear to enable librarians and technologists at LoC and elsewhere to do a better job. While not forgetting mass-level needs, such as those of K-12 and public libraries, I’ll continue to push for a national digital library endowment as a facilitator of preservation among its other purposes.
Image credit: Mark Roy. CC-licensed