Picture of a DeLorean automobileOver four years ago I published an eBookWeb article entitled “OEBPS: The Universal Consumer eBook Format?”

Unfortunately, due to eBookWeb going defunct (a casualty of the “E-book Dark Ages” that resulted after the dotcom collapse), that article has essentially disappeared from the Internet.

So I am reposting the eBookWeb article here, not only for preservation purposes, but because its themes are stil very relevant today as will be briefly explained in this foreword.

DeLorean jokes

When I wrote that article, e-books were considered a lot like the DeLorean automobile — weird and impractical — the butt of many jokes. The DeLorean even played a prominently silly role in the movie trilogy Back To The Future.

But times have changed! Just as Google News is full of articles about an entrepreneur reviving the gull-wing-doored, stainless steel automobile to an enthusiastic public, so too e-books are finally being noticed and bought by an enthusiastic public. E-book sales are growing at a fast rate.

My 2003 article had three, closely related themes:

  1. If e-books are to succeed in the marketplace, they need to be as easy to use by the public as music CDs. We need a universal, open standard e-book format. The industry will not thrive and grow when there are more than a dozen incompatible, proprietary formats competing with each other for the public’s attention.

  2. The universal, open standard e-book format can’t be just anything, but needs to fulfill all the critical requirements of both publishers and end-users.

  3. A viable candidate for such a universal e-book standard is an OEBPS Publication distributed in a container of some sort, such as a zip file.

The first theme is obvious, at least to many of us who have been around the e-book industry since the 1990’s (I started publishing e-books in 1993.) This theme needs no further explanation.

The second theme is likewise obvious although rarely discussed: no standard can be forced on the marketplace, but must sufficiently meet the critical needs of the important players before it can be embraced. In the case of e-books, the most important players are the publishers and end-users.

As far as I know, my 2003 article is the first, and I think still the only, published discussion of the requirements for a universal e-book format. As such, it is still quite relevant today. Adobe’s Peter Sorotkin, in a recent blog article talking about Adobe Digital Editions, informally presents a few semi-general requirements, but that blog article is nowhere near as comprehensive as the eBookWeb article. (To be fair, it was not Peter’s intent to make his blog article into a comprehensive “requirements” discussion anyway.)

For the third theme (which I’ll let the reposted eBookWeb article explain in detail), I am happy to report that 2007 has proven the 2003 recommendation to be on target. The IDPF recently developed the open standard EPUB format which almost exactly follows my recommendation. EPUB is an OEBPS (now called OPS) Publication in a zip-based container.

Since I’ve been involved with the IDPF standards work since 1999, as well as the now-in-cold-storage OpenReader project which was catalyzed by the eBookWeb article, I have good reason to believe it was the eBookWeb article that set the wheels in motion leading to IDPF finally developing and releasing the EPUB format.

What now follows is the original 2003 eBookWeb article. In a few places I’ve added comments (in […]), and updated links where I could, but otherwise the article remains faithful to the original, warts and all.


OEBPS: The Universal Consumer eBook Format?

by Jon Noring

Originally published 20 May 2003 at eBookWeb


Introduction

Looking at the ebook landscape today, I am troubled by the large and growing number of essentially incompatible, proprietary consumer e-book formats and associated ebook reading applications and hardware. And I don’t believe I am alone here. Publishers, both large and small, are now overwhelmed by the need to supply their content to end-users in these formats, many of which do not integrate well into their publishing workflow. They probably say to themselves “Oh no, not another one,” every time a new ebook-capable reading device is marketed which supports a new (and usually proprietary) ebook format. When will it ever end?Likewise, end-users are equally confused by the myriad formats, and chagrined by the incompatibility between them, making it more difficult to use multiple devices, OS, and reading software of their choice. End-users clearly do not wish to be tied to any one hardware or software platform for the e-books they purchase—they want their e-books to be optimally readable on the systems of their choice, now and into the future.This brings up the obvious question: Is a single, universal consumer e-book format possible, one which meets nearly all the needs of both publishers and end-users?

This article presents a vision for such a universal consumer e-book format, to outline the important requirements, and demonstrate that, yes, there now exists just such a format meeting these requirements: The Open eBook Publication Structure (OEBPS). [Note that OEBPS 1.2 will very soon be officially replaced by OPS 2.0.]

Admittedly this article is long, somewhat technical and undoubtedly quite dry (I’ve done my best to keep tech-talk to a minimum), but the importance of this topic requires a level of analysis going beyond the usual level of a brief and flashy news article. In addition, this article is primarily directed towards those in the ebook industry interested in this topic: publishers, retailers, ebook hardware and software developers, librarians/archivists, accessibility advocates, and a few other important e-book industry stakeholder groups. Nevertheless, end-users — those who buy and read e-books — should find this article to be of interest, and hopefully understandable.

Seven Requirements for a Universal Consumer eBook Format

It is important to first present the necessary requirements the universal ebook format must fulfill. These requirements are derived from the general needs of publishers and end-users outlined above (and balanced where they may conflict), along with the known needs of other stakeholders in the e-book universe (e.g., archivists), and various other obvious needs. I do not claim this list to be complete or the final word, but, from my perspective as a long-time e-book industry participant, this list appears to be fairly comprehensive and adequate for the purpose of this analysis.

These requirements will only be summarized since fully explaining and justifying just any one of these requirements is a full article in itself. However, I believe both publishers and end-users will readily see the necessity and logic behind these requirements, especially when viewed as a coherent whole (and not focusing on any particular requirement, oblivious to the others.)

  • Typographic Richness: The format must have adequate internal structural resolution and presentation richness to allow (as the presentation system is capable) very high typographic quality presentation, up to the level we have come to expect for paper books. This must include the capability to include technical typography and more complex vector graphics for specific needs. Of course, the ability to include various types of multimedia (images, video, and sound) is necessary.

  • Adaptability: The format must allow optimum visual presentation by any hardware the end-user may possess, from very small screens of limited resolution and typographic capability (such as PDAs), to large, very high resolution screens capable of high typographic quality presentation. In addition, the format must allow end-users some latitude of control over the presentation parameters for personal needs and reading preferences, such as font size and other typographic settings. Enlarging the font size is especially critical for those with limitations in visual acuity (an Accessibility requirement, see next.) A corollary of this requirement is that the format must be fully reflowable (essentially “retypesettable on the fly”) in response to differing presentation hardware and end-user settings.

  • Accessibility: The format must be capable of high-quality presentation of the content in non-visual ways, such as text-to-speech and tactile (Braille).

  • International: The format must be capable of representing any language and glyph set in use today. The format is not universal unless it is truly international!

  • XML Compatibility: Publishing tools and publishing workflows are rapidly and inexorably moving towards XML, and the universal ebook format must be compatible in some way with an XML-based publishing workflow.

  • DRM Capability: Although end-users prefer not to purchase ebooks protected with DRM (Digital Rights Management), publishers are certainly interested in the DRM capability of the universal ebook format. Thus, the universal ebook format must allow inclusion of DRM protection technologies as needed.

  • Truly Open Standard (TOS): The format itself must be a “truly open standard”. A “truly open standard” is defined here to mean:

    • Fully published,

    • No licensing encumbrances (freely usable by all),

    • All component standards utilized by the standard are likewise “truly open standards”, and

    • Developed and maintained by a non-profit, independent (of any one company), industrial/trade organization representing a full cross-section of the various (and oftentimes competing) stakeholder groups.

    This is an especially important requirement (for various reasons, some of which are not readily apparent), and a future article on the necessity of this requirement is being contemplated.

OEBPS Meets All the Requirements (and More!)

As will be shown below, an e-book format which embeds a native OEBPS Publication (soon to be defined) meets all of these requirements. In addition, the OEBPS Specification provides several other advantages and features which bring out the full potential and power of e-books. (Unfortunately, space does not permit me to describe most of these other benefits — maybe a topic for a follow-up article.)

Obviously, before showing how OEBPS fulfills these requirements, a short tutorial on the vital essence of OEBPS, relevant to this article, is necessary. Those interested in learning more about OEBPS are encouraged to first read the OEBPS Specification FAQ, and then those with a technical bent (it will help to have a basic understanding of XML, XHTML and CSS) may wish to study the current online version of the OEBPS 1.2 Specification.

The OEBPS Specification is maintained by the Open eBook Forum [now IDPF as previously noted], a non-profit and independent ebook standards and trade organization representing a large number of companies and organizations with quite diverse (and oftentimes competing) interests in the ebook universe. The current membership of OeBF [now called IDPF] is given here.

In my words, a general one-sentence summary description of OEBPS is:

“The OEBPS Specification specifies a coherent, ebook-optimized framework for organizing XML documents containing book content into a powerful ebook representation of the work.”

Although seemingly abstract and admittedly a mouthful of prose, this pithy summary carries with it the vital essence of the power of OEBPS. The word framework is especially important, because without an overarching framework it is not possible to adequately represent the richness and specific intricacies of book publications using a simple collection of independent hypertext-linked XML documents.

Three distinct quantities in the OEBPS universe must be defined (they are relevant to this article): OEBPS Publication, OEBPS Package, and OEBPS Document. An OEBPS Publication is the complete set of files comprising an ebook publication conforming to the OEBPS Specification. An OEBPS Publication must include one OEBPS Package document (which is an XML document, not part of the book content itself, describing the Publication’s organizational framework), and at least one OEBPS Document (which is an XML document containing part or all of the book’s actual content.) Other auxiliary files, such as images, style sheets, etc., may also be present in the OEBPS Publication.

Now some mistakenly believe that OEBPS Documents are simply HTML files. This is not wholly correct. If the OEBPS Documents are restricted solely to HTML markup they are more correctly described as XHTML documents (XHTML is W3C’s XML-conforming version of HTML, the latest version is XHTML 1.1.) But an OEBPS Document is not restricted to only XHTML — it may be “Extended” by using non-HTML elements and attributes (“tags”) for richer content markup, and does so in a way which today’s XML-standards, CSS-aware browsers (such as IE6, Opera 7, Mozilla 1.3, and Netscape 7) will understand. An OEBPS Document may contain islands of specialized markup such as MathML (for high-quality representation of mathematical expressions, important for ebooks) and SVG (for vector graphics.) Even with this extensibility feature, a big advantage of OEBPS is that one may build high-quality OEBPS Publications leveraging well-known HTML markup practice and tools. Interestingly, a web site (which uses vanilla XHTML 1.1 pages — ignore JavaScript, Flash, and the other “dancing bears” stuff) can trivially be converted into an OEBPS Publication, showing that the XHTML and OEBPS worlds are not far apart. Publishers who now author ebooks in XHTML will find it trivially easy to upgrade to OEBPS.

With this short tutorial out of the way, let’s now look at how a native OEBPS Publication (“native” refers to the OEBPS Publication being available to the end-user’s ebook presentation system in its native, unaltered state) meets the seven requirements of a universal consumer e-book format:

  • Typographic Richness: Since OEBPS intelligently organizes book publications, and supports a substantial subset of the CSS2 Specification, it is possible to author OEBPS Publications with a high degree of typographic richness. And by using (as needed) MathML and SVG, it is possible to represent quite complex typographic layouts yet still allow substantial Adaptability (Requirement #2).Interestingly, the desktop version of Microsoft’s first generation ebook presentation system, MS Reader, foreshadows how OEBPS rendering engines can use OEBPS’ richness for high-quality typographic presentation. The proprietary ebook format for MS Reader, LIT, is (under-the-hood) essentially an OEBPS Publication that is only minimally digested.

  • Adaptability: Since OEBPS Documents are XML documents and essentially renderable in today’s XML-standards web browsers (as previously noted), it is clear that native OEBPS Publications are fully adaptable and reflowable to a large range of presentation hardware, from small PDA-size screens to large, high resolution desktop and laptop screens.To further illustrate this, in today’s web browsers, users can alter several aspects of web page presentation (such as font size, font family, window size, etc., — Opera 7 is especially flexible in this regard.) Likewise, those familiar with Microsoft Reader know that when the font size is changed, the whole e-book is optimally retypeset “on the fly.” (Microsoft Reader also demonstrates the adaptability of OEBPS in that the identical LIT document is nicely readable on both the desktop and PocketPC versions of MS Reader.)

  • Accessibility: Since OEBPS places all ebook content into XML documents, OEBPS Publications are naturally highly accessible. In addition, OEBPS implements a few other features which further aid accessibility.

  • International: By default, since OEBPS requires all OEBPS Documents to be XML (the XML specification requires XML processors to process UTF-8 and UTF-16 encodings of the Unicode character coding system), the documents are capable of representing all the currently used international character sets. Additional CSS2 properties further enhance internationalization.

  • XML Compatibility: Obviously, OEBPS is compatible with XML-based publishing workflows since OEBPS itself specifies XML for all book content. Publishers can directly author OEBPS Publications and use them as the “source” format for both direct distribution and for repurposing, or they can use a commercial integrated XML-based publishing workflow product and output as OEBPS.

  • DRM Capability: It is DRM-capable (this will be further discussed in a following section.)

  • Truly Open Standard (TOS): Without going into the details (but it should be obvious), the OEBPS Specification is a truly open standard, meeting all four of the defined TOS requirements previously noted.

A Small Issue, But Not a Real Problem

The savvy reader, familiar with ebook formats, will realize by now that a native OEBPS Publication is not a single file, but rather comprises a set of multiple files. Obviously, a native OEBPS Publication cannot be distributed by itself — publishers, distributors, retailers, and end-users require an ebook Publication to be in a single, distributable file. In addition, OEBPS does not itself provide a means for DRM encryption.

Thus, it is necessary to wrap (archive) a native OEBPS Publication into a single compressed binary file (such as using gzip) with optional DRM encryption of the contents. Unfortunately, OeBF has not yet developed such an OEBPS “wrapper” standard, and may not in the immediate future. [Note, IDPF recently published the OCF 1.0 Specification, which is a zip-based container of OEBPS/OPS Publications.]

Nevertheless, this is essentially a non-problem. The coding required of an OEBPS presentation system to unwrap an archive and access the native OEBPS Publication contained inside borders on the trivial. Thus, even if we have multiple wrapper standards, so long as they are TOS and all wrap pure native OEBPS Publications, this is essentially a non-issue. That is, it is not the wrapper which ultimately defines the ebook format, but what is inside the wrapper which truly defines the ebook format.

(A few of us are now informally discussing development of an open standards OEBPS Publication wrapper with optional DRM capability, and hopefully this article will catalyze its formal development.)

The DRM Aspect

Many publishers and self-published authors require their published content to be distributed with DRM protection. It is certainly possible to build into the native OEBPS Publication wrapper a DRM protection system. Microsoft LIT, as previously mentioned, is an excellent example proving this assertion since LIT is a DRM-protected wrapper of essentially an OEBPS Publication. No more need be said on this.

One argument that proprietary e-book format advocates will undoubtedly mention in rebuttal to this proposal is that proprietariness of the DRM wrapper is necessary for publication security. On the face of it, this argument appears compelling, but in reality it is a pipe dream.

Two notable examples prove that such “security by obscurity” will not work: Adobe PDF and Microsoft LIT. The DRM protections built into both of these ebook formats have recently been cracked (and the details published online), and tools are now being distributed (which are illegal in the U.S., a violation of DMCA) allowing anyone to bypass the DRM protection of PDF and LIT ebooks in their possession. No matter how Adobe and Microsoft continue to upgrade their DRM systems in response to cracking, the new systems will likely be cracked again and again because the ebooks are designed to be read on generally open hardware and OS platforms which by their nature are “open” (allowing many means by which the content can be digitally accessed.) So much for security by obscurity.

It is also noted that pending accessibility legislation may require (explicitly or implicitly) DRM standards to be published and non-proprietary, so as to allow full access to the ebook content by those with disabilities using third-party tools.

In addition, publishers will someday want the ability to digitally interlink their Publications (including with those of other publishers) so authorized linking systems must be able to bypass the DRM protection and access the internal OEBPS Publication to effect linking. Proprietary DRM solutions work against this. (Obviously, if all e-books, e-journals, and e-magazines use the same native OEBPS internal format, then universal digital linking between them will certainly become readily possible.)

And, finally, publishers and authors have a need, and a right, to be able to independently evaluate the robustness of DRM technologies used to protect their content. Only when the DRM technologies are TOS can this need be fully met. Publishers should never trust the “Trust us, our DRM is secure” promises of those touting proprietary or non-TOS DRM and content formats, especially when used on essentially open hardware and OS.

Where Do We Go From Here?

Even if a few of your are now intrigued regarding the assertion that OEBPS is a viable (and I believe outstanding) universal consumer ebook format (suitably wrapped of course), the question naturally arises: Where do we go from here?

Complicating the answer is the obvious “Catch-22” (or “chicken-egg”) situation where we currently have no native OEBPS ebooks being distributed, and no viable OEBPS presentation system on the market. In addition, various OEBPS authoring and verification tools have not yet been developed and marketed. (Note: there are two little-known, essentially experimental, Java-based OEBPS presentation systems deserving honorable mention: ION System’s eMonocle, and GlobalMentor’s Mentoract Reader. However, they do not appear to be ready for primetime, and are not presently being actively marketed for general ebook use in the ebook industry.)

First of all, those who want to see OEBPS as a universal consumer ebook format need to speak up. These include authors, publishers, librarians/archivists, open source advocates, accessibility activists, and most importantly end-users — those who buy and read ebooks. This article hopefully will catalyze activism in this area.

Second, those now developing commercial ebook-capable hardware (whether dedicated e-book readers or multipurpose devices) and commercial ebook presentation software for general OS, and who are considering developing their own proprietary e-book format to add to the gazillion others out there, should seriously reconsider and design their systems to render native OEBPS Publications. At first glance this might go against their business models, but in my estimation they are much more likely to succeed by hitching their wagon to native OEBPS and encourage other competitive reading systems to do the same, so as to make OEBPS, and thus the format their system supports, the dominant format standard for distributed e-books.

Third, open source advocates should understand the importance of developing cross-platform, open source licensed ebook presentation software for all the current mainstream OS out there: Windows, Mac, and Linux (and embedded versions thereof.) Obviously, the ebook format must be TOS, and OEBPS is an obvious choice. (There is one known open source project developing a multi-platform OEBPS presentation system using the Mozilla Gecko codebase: OpenBERG, a spin-off inspired from the open source OEBPS browser advocacy group, LiberGNU.)

It is important to mention to both commercial and open source developers that a native OEBPS Publication presentation system is tantalizingly close to an XML-standards, CSS-aware browser. The differences are actually quite minor. In many ways it will be easier to build an OEBPS presentation system than a web browser since a lot of the functional web-oriented baggage web browsers are expected to include will not be needed by OEBPS presentation systems. Additionally, while web browsers are still expected to handle “crappy”, malformed HTML, OEBPS Documents are well-formed XML, with other constraints on the document structure lending them to easier and more standardized processing and rendering. (There will be the expectation, however, that OEBPS ebooks must be presented in a more book-like fashion, such as how MS Reader renders LIT documents, so any OEBPS presentation system should, by default, render the content into a series of discrete pages with no scrolling within each page. A future article is being contemplated to detail the features and functionality a native OEBPS presentation system should possess.)

Thus, current rendering codebases, both open source (such as Gecko) and commercial (such as Opera’s blazingly fast, very compact, XML-standards, cross-platform codebase — my pick for the best one out there) can readily be adapted to build high-quality, cross-platform OEBPS e-book presentation systems.

Postscript

It is anticipated (and hoped) this article will generate vigorous discussion, with many counterpoints and alternative conclusions. This discussion is very much needed to move the e-book industry to the next stage in its development. Certainly, some of the points and assertions made in this article may eventually be proven to be incorrect, off base, maybe even silly. The important thing is not proving whether I am right or wrong, but what is the best for the long-term growth and viability of the e-book industry — that should be our focus and motivation in discussing this important issue. The seven requirements for a universal consumer e-book format can certainly serve as a starting point for framing the discussion; the requirements themselves are not claimed to be etched in stone, and certainly can be improved as we better understand the intricacies of this issue.

Of course, as part of this discussion, I hope that the fitness of OEBPS in being a viable universal consumer e-book format will be seriously discussed.

As a final comment, the e-book industry must not put its head in the sand and ignore the issue of a universal consumer e-book format. It will not go away on its own. One can certainly take a Free Market approach to the issue, to just let the chips fall where they may. But the question needs to be asked: Is what is good for one, the best for all? With respect to the ebook industry, I believe the answer to this is “not necessarily.”

(End of 2003 eBookWeb article)


Jon Noring is VP of Development for DigitalPulp Publishing


1 COMMENT

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.