EPub's tall shortcoming: How annotation needs linking and why we don't have It

By Aaron S. Miller, CTO of BookGlutton, a Web-based community of readers -

March 29, 2008

342

Moderator: Aaron Miller is CTO of BookGlutton.com, a Web-based community for e-book readers. He has 11 years of experience building Web sites for startups and established clients, including WellsFargo.com, Playstation.com, and Macys.com. Welcome to the ranks of TeleBlog contributors, Aaron, and keep the ePub criticism coming! Let’s hope that the IDPF will listen to all sides. Also see Tamas Simon’s essay. – D.R.

Epub Logo Links, bookmarks and annotations all depend on one important thing: the ability to uniquely identify a specific passage or point in a book. And it’s easy with paper. We put daggers and numbers where our notes belong. We highlight, clip, underline. Sometimes we just gesture at a page. But with a digital book, it’s not so easy. A digital book, materially, is something less—so we expect more. Go figure.

Humans need a computer to understand our paper-bound notions of footnotes and margin-notes so that a computer can do what computers are good at. Then we can share those notes, add our own, hide them, rearrange them, count them, abstract them into graphs, delete them. Moreover, we want pica-perfect pointers into texts, maybe even pixel-pointers, so that we have no doubts about where we left off, which syllable we’re analyzing, or where we want to jump next. To a computer, a book is a model, an abstraction of what it really is, and the more computers agree on that abstraction and how to interact with it, the better off we bookish humans will be. Too bad it’s easier said than done.

Key revelations

Smart folks of the digital book world have figured out some key things lately:

XML is a book’s best friend. It’s extensible, document-centric, thriving. It’s being used for .mobi, .lit, .epub and more generalized things like DocBook, ODT, and Docx. It can be criticized for bloat, but it’s open, extensible and a kind parent to XHTML. It happens to be more perfect for books than plain text.
Books are going Web. They’ve been on-line for awhile, in huge numbers, but until now, no one has taken the time or spent the money to care for them. By “care” I mean care in presentation, due diligence in cataloging, and measurement of the benefits and drawbacks of various technologies.
E-books will be cool. Right now, they’re not. At least not iTunes cool. Right now, they’re in the position the MP3 was in 1997. This was when audiophiles scoffed at the format as inferior. Half of them observed that CDs sounded better, and the other half said vinyl sounded best, and then proceeded to make fun of the ones who preferred CDs. Now, it seems, music fans realize that we can all co-exist, and that MP3s are cool in their own right. Book-loving groups aren’t so unified.

Whiffs of potential

Still, we can sense the potential. People are realizing there’s more possibility than the miles of typography-bereft scrolling and the various shopping-cart sites hawking trade at twice the price of paper. Amazon, a web company, is scrambling to figure out how to bridge worlds, extending the tradition of PHB (Proprietary Hardware for Books) while simultaneously trying to leverage their Web properties. Meanwhile publishers can be overheard babbling about widgets and blogs, and when they actually figure out what they’re saying, we’ll see an A-ha moment about DRM.

From a development angle, browser technology is quickly approaching a tipping point where typography and presentation will rival that of print and E Ink. Unlike E Ink, Web technologies are based on software, and this creates freedom and speed. And unlike print, which seems to get cheapened and not cheaper everyday, they’ll allow more at a lower cost. Someday we’ll all use something like E Ink, but not many of us will ever use E Ink as it is now.

More people can be seen firing up their MacBooks in Panera and Starbucks to get their dose of blogs and news. Younger generations, as any newspaper publisher will tell you, no longer read any news on paper.

Take note

This is all positive news. But in all this activity, no one has given much lip-service to a fundamental technology here: annotation. Granted, it’s not for everyone. But it rests upon the ability to point to fragments of documents, even as those fragments change.

The Web can be seen as an example of the perfect space to solve this problem, or a sad example of how annotation has been ignored, depending on one’s camp. Those in the Berners-Lee camp, if there is such a place, would look to the Semantic Web for standards and solutions. But those who look to Ted Nelson will tell you we didn’t implement everything we needed when we invented the Web. Nelson’s original concept included annotations and unbreakable links as part of the fabric of hypermedia. Now, we’re stuck improvising these things on top of a core infrastructure that was never intended for them. And we’re faced with the perplexing question: What happens to metadata when a resource disappears—or worse, when it changes?

The TeleRead challenge

If you made it this far with all the links on this blog tempting you, you’re up for the challenge. We need to address some important questions in this space. How far do we really have to go? Are we making progress, or are we the Streaming Web Video of 1999? Back then, we had a multitude of proprietary formats, a lack of unity and direction and a lack of focus on what exactly the goals were. Sound familiar? Feel bad?

Before I get slammed for making such a sad comparison, consider that while it’s excellent that publishers, book producers, digital warehousers, developers, retailers, etc. are adding .epub to their lists of formats, that’s also exactly what they’re doing: adding to a list of formats.

The IDPF is a start, and they appear to move faster than the W3C, which is good. But keep in mind that plenty of standards for this have been proposed before. And while the IDPF is mindful of these and takes care to adopt from and interoperate with them, there is still a huge danger of falling short of dominance, which would make .epub something like, um, the .OGG of the digital book landscape. Ouch.

Two big problems, and you can read a more technically bent exploration of them on the BookGlutton blog, are:

The container format, a zip file. Great for portability, bad for browsers. You can’t easily link to anything in a zip file. But without a container, you can’t share and transfer the book structure. There’s a dilemma. It’s actually kind of a catch-22, because the way around it is to just share links. Which brings us to the second big problem:
Wiffly-waffly linking language in the spec. All this biz about UUIDs and Fragment identifiers. Come on, now! What about Xlinks, Xpointers, Xpath, Node Collections, Ranges, Selections, CSS Selectors, and all the myriad ways we could specify linking to something more granular than a unique ID?

I’m sure these issues have been raised in the IDPF, and will be discussed extensively by smarter people than myself. I’m also aware that discussing these things may reveal other flaws in the spec, and that many people may be afraid of that. So be it—we need to resolve all the flaws fairly quickly, in sync with standardization, not afterwards. Be brave, because, let’s face it—if the publishing world can’t settle down with a nice MP3-ish boy like the audio universe did, there’s no hope! Pointing, links, annotations—whatever we call it, we need decisions if we’re seriously proposing a standard.

4 COMMENTS

Tamas Simon March 30, 2008 at 8:17 pm

what about a URI scheme for books
something like isbn://ISBN_NUMBER_HERE/blahblah/blahblah

…or am I reinventing something that was already said by the specs?

re:MP3-ish format
Look what it did to the record industry…
Why would publishers be interested in digging their own grave?

Log in to leave a comment
Aaron S. Miller, CTO of BookGlutton, a Web-based community of readers March 31, 2008 at 2:35 pm

No URI scheme is mentioned either. See blog.bookglutton.com for a discussion of this. An isbn: scheme hadn’t occurred to me, because I presumed that a lot of books would not carry one (Creative Commons, Public Domain, Personal Copyright or Self-Published, etc.) Also, from what I understand, there’s been some disagreement on whether it would be better to have an open, filesystem-based scheme or a managed, identifier-based scheme (again, see BG blog for clarification of this distinction). If people like the idea of an open scheme, then maybe there’s no need for a scheme other than http:, otherwise, there is.

Log in to leave a comment
Aaron S. Miller, CTO of BookGlutton, a Web-based community of readers April 2, 2008 at 5:10 am

Tamas, Re: Look what it did to the record industry…

The myth of ruin is perpetuated all over the place, most recently in the times uk where the bell tolls for the book industry:

Tracy Chevalier, author of Girl with a Pearl Earring (they made a movie out of that, must be good):
“For a while it will be great for readers because they will pay less and less but in the long run it’s going to ruin the information. People will stop writing. There’s a lot of ‘wait and see what the technology brings’ but the trouble is if you wait and see too long then it’s gone. That’s what happened to the music industry.”

[from: http://entertainment.timesonline.co.uk/tol/arts_and_entertainment/books/article3648813.ece
]

The idea that open standards will bring about ruin is a myth. MP3 did not bring down the record industry, the record industry adapted. Writers will not stop writing, no matter what happens. Maybe writers who only do it for a profit will stop writing, but only because others who do it for free do it better. The same can be said for the record industry. Musicians will continue to distribute songs, with or without the labels. From an artist’s perspective, what’s the difference between CDbaby and Warner? One takes a fragment, one takes most of it.

Log in to leave a comment
Biman Jyoti Uzir July 22, 2010 at 1:48 am

For xpointers to work for annotations, first of all a standard hierarchy of contents must be specified. Else for the newer version of a same .epub all such annotations attached to the previous version wud get orphaned, supposing there appeared some changes, both major and minor to the anchoring text.

And again there must be some standard way of representing all such version related changes for an .epub publication.

Log in to leave a comment

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

You must be logged in to post a comment.

Share this:

Related

4 COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

AMAZON

REVIEWS: E-Book & AUDIO BOOKS

SELF PUBLISHING: TECH & BIZ TIPS

MOST RECENT

POPULAR POSTS

MAJOR CATEGORIES