James Garner looking under the hood of a JeepAs a onetime mechanical engineer with internal combustion engine experience, I sometimes like to peek under the hood.

Naturally, the new Adobe Digital Editions, presumably still under development, has piqued my interest. I’m especially curious about the XHTML-based “epub” format that’s supported by the reading system.[1]

While I sort of like what I see, I also have a few concerns that Adobe ideally will address in the final or next release of Digital Editions.

Now, I’ve known for almost a year that Adobe was developing an OEBPS-based reading system (OEBPS leverages XHTML), so Adobe’s preview announcement came as no surprise to me except that I expected it sooner. But I wasn’t sure how Adobe was going to implement OEBPS and the upcoming IDPF Open Container Format.

Checking under the hood

For deconstructing the “epub” format, I chose the XHTML title Adventures of Sherlock Holmes from the Digital Editions Sample eBook Library. It came with an “epub” file suffix, which is what the IDPF container specification recommends. With that clue, I used a ZIP application to inspect the file and, lo and behold, confirmed that Adobe’s “epub” is a ZIP file (as expected) which mostly, but not completely, conforms to the IDPF Container specification.[2]

Next, I inspected the claimed-to-be OEBPS 1.2 Publication contained inside and was very disappointed and concerned. After dissecting it, it clearly does not conform to any flavor of OEBPS (including the new one now under development) — not even close. Since the details of my discoveries of OEBPS nonconformance are fairly technical and nitpicky, I will list them in note #3 at the end of this article.

Following this, I checked to see if I could re-ZIP the file set, rename the ZIP file with a “epub” suffix, and see if it would render in the Digital Editions reader (round-tripping). I succeeded here, which is not unexpected but still nice to know since I wanted to tweak the publication in various ways to see how Digital Editions handled the tweaks.

Of course, the first thing I tried was to ZIP up a 100% conforming OEBPS 1.0.1 Publication of one of my e-books in a way which conformed with the IDPF Container Format. No luck — Digital Editions refused to display anything (no error messages, just a blank page.) This suggests (but does not conclusively show) the current “beta” of Digital Editions is not designed to render any fully conforming OEBPS Publication “as is,” even though it could do so relatively easily.

(I’m not sure yet what changes I have to make to any OEBPS Publication to make it work in Digital Editions — this would take a lot of work to determine. But the Sherlock Holmes example indicates I may have to make a lot of changes away from OEBPS conformity, which, if this is indeed the case, is not good.)

The next thing I tried was to understand the role of the included XSL-FO style sheet in the Adobe example, and how that related to the also-supplied CSS style sheet. My experiments showed that most of the textual styling is done with CSS, but enabling multicolumn support when decreasing font size required the XSL-FO file to be there. This is troubling, since XSL-FO is not currently “blessed” by OEBPS. I also believe everything Digital Editions would need for multicolumn support can be done with certain CSS3 properties, which though not yet supported by OEBPS (and still not a W3C Recommend), at least is in the CSS “ballpark.” (Personally, the reading system should let the end user pick the number of columns plus other page layout formatting parameters — sort of like the user applying their own style sheet — but that’s for a future article.)

Does Digital Editions have any lungs? The acid test

Portrait of J.S. BachSupposedly, J.S. Bach used the beginning of his best-known Baroque organ work, the Toccata and Fugue in D minor to see if an organ he was playing had “good lungs” — an early “acid test.” Likewise, an important acid test for OEBPS conformance is if the reading client supports the OEBPS “out-of-spine” (OOS) feature. Support or non-support of that feature tells a lot about the dedication of the developers to the OEBPS specification and the many innovations it enables — it indicates whether the developers want to simply emulate paper books, or want e-books to be more than paper books.

I’ve planned for a while to write an article explaining what OOS is, why it greatly enhances the e-book experience, and why publishers and readers should care.

A quick explanation, though, is that OOS is simply publication content which does not appear in the main flow of the publication. It is usually amplificatory material, such as a note, sidebar, etc. The OEBPS working group recognized early on that many publications are somewhat non-linear (and some highly non-linear such as hypertext fiction — not to mention typical web sites), and created a mechanism by which such out-of-spine content can be “identified and linked to” so the reader client can present OOS content to the reader in powerful and innovative ways that cannot be done with paper and ink.

Microsoft Reader (which reads the LIT format) beautifully implements the OEBPS out-of-spine feature using pagelets. In my estimation, the Microsoft pagelet feature is MS Reader’s greatest innovation, and one which Microsoft has amazingly squandered away by not really telling anyone about it. The OpenReader Format specifications suite also supports out-of-spine content, and even web site style of publications (which OEBPS does not support.) The informative commentary in the OpenReader Binder Specification further describes how a reading system may implement the out-of-spine feature.

Anyway, I experimented with the Sherlock Holmes book to see if Digital Editions will render out-of-spine content. Unfortunately, Digital Editions simply ignores a link which points to content in the Publication which is not part of the “spine.” So the current version of Digital Editions fails the “acid test,” and this is very disappointing — and troubling. I hope Adobe will fix this and support OOS; if the company continues to ignore OOS, this should tell publishers that Adobe is not interested in implementing OEBPS as dozens of top experts envisioned it to be implemented. This is, in effect, “backdoor proprietization” of the OEBPS spec since it tells publication authors what OEBPS features they should not use, and renders many existing OEBPS Publications unusable in Digital Editions.

Conclusion

Digital Editions and the associated “epub” format is a good attempt at rendering OEBPS, but as noted has a long way to go to be fully conformant to OEBPS, and a shorter distance to fully conform with the IDPF Container.

With respect to OEBPS nonconformance, I am quite troubled since it is relatively easy to be fully conformant considering where Adobe is now in development with Digital Editions. Publishers should beware of building publications for the Digital Editions reader which are not fully OEBPS-conforming — publishers should be able to use the full suite of innovations in OEBPS, without any special add-ons, and expect Adobe Digital Editions to support them, especially in that Adobe apparently wants to be a major player in the future development of OEBPS.

Anyone who builds a reading system to natively render OEBPS Publications should not pick and choose which features of OEBPS they will support — the reading system should, in good faith and only tempered by platform limitations, support them all and let publishers decide what features they want to use in their publications.

I am also troubled in that the “epub” format needlessly relies upon XSL-FO for some formatting. Since OEBPS does not “bless” XSL-FO (although OEBPS allows nonblessed resources when fallbacks are provided for them, which the example I looked at did not bother to provide), I see this as a proprietization of the OEBPS specification by Adobe, whether intentional or not. If Adobe really wants XSL-FO in OEBPS, the company should do the right thing and bring it up in the IDPF OEBPS Working Group for future inclusion rather than implementing it as they have.

Rather than forking the OEBPS spec, Adobe should be fervent in conforming fully and completely to the OEBPS specification, both in letter and in spirit, rather than ignoring certain features and tweaking it in proprietary directions. I hope this is their intention.


Referenced Notes

  • [1] Note my use of the techie phrase “reading system.” To evaluate Adobe Digital Editions, we have to consider the whole system which includes both the format and the reading client (in techese the “user agent”). The reading client side has already been addressed elsewhere, such as the recent TeleRead blog article by David Rothman, and the excellent article by Alexander Turcic at Mobileread. So I will focus on the “epub” format.

  • [2] Unfortunately, the Adobe “epub” container I looked at does not fully conform with the IDPF Container format. From a quick glance, I found two nonconformances:

    1. The “epub” container is missing the required file named “mimetype.”

    2. The contained OEBPS 1.2 Publication is seriously non-conforming, further detailed in the next note.

  • [3] In Digital Editions “epub” edition of the Adventures of Sherlock Holmes, I found the following nonconformances to OEBPS:

    1. The OEBPS Package file DOCTYPE references the OEBPS 1.2 Package DTD, but clearly the Package file is not valid to it, nor does it fully conform to the OEBPS 1.2 Package requirements beyond XML validation.

      The OEBPS Package includes some changes and constructs of what is planned for the next version of OEBPS, such as NCX support, but also includes some constructs I don’t believe (but I’m not sure) are planned for the next version, specifically the “xsi:type” attribute.

    2. All the content documents are XHTML 1.0 Transition, and do not include the required fallbacks to a supported OEBPS content document type, even for the planned next version which will support XHTML 1.1.

    3. The file “stylesheet.css” is clearly a part of the OEBPS Publication, but is not declared in the manifest in the Package. Likewise, the “add-on” XSL-FO style sheet is referenced from content documents, but is not mentioned in the manifest; if it were referenced in the manifest, a fallback to an approved media type (such as CSS) is required.


Technical Addendum

As an addendum to this article, I inspected one other “XHTML” e-book at the Adobe Digital Editions site, and noticed that it, too, had major OEBPS conformance problems, but overall the problems were different than in the Sherlock Holmes book.

This suggests, and I hope I am wrong, that the Digital Editions folk are not too concerned with strict adherence to a single standard, but rather will accept a wide range of junk, which does not help the e-book industry.

Even if Adobe’s intent for Digital Editions is not to support OEBPS 1.0.1 and 1.2, but rather the next version, the next version is in an advanced enough stage of development that Adobe can normalize all their demonstration “XHTML” Publications to the next version. For example, all the content documents must be valid XHTML 1.1 in UTF-8 encoding (no ISO-8859 encoding); the Package should be normalized and meet all known requirements (such as UTF-8 encoding) without any custom extensions; the CSS style sheets need to be declared in the manifest; the XSL-FO style sheet (which I noted they should not use) needs to be declared as part of the Publication with a fallback to CSS provided; etc.


6 COMMENTS

  1. One additional disadvantage of this not-quite support is that the existing toolkits and tool pipelines will not work.

    So, only Adobe’s tools would work, unless somebody goes out of the way to produce yet another tweaked output format.

    This, if deliberate, starts to look very much like Microsoft’s “Embrace, extend and extinguish” strategy.

  2. Thank you, I’ve been looking everywhere for info on the .epub format without any success.

    I like the straightforward interface of Digital Editions so far, but the lack of any kind of large-scale organisation will be a problem unless it’s addressed in subsequent releases.

    The complete lack of CHM support makes the library aspect useless for me, though: I _need_ a single point of access for my ebooks, I shouldn’t have to switch state depending on the format type.

  3. Thanks Michael. Yes, I invite the Adobe folk working on Digital Editions to discuss, possibly in a TeleRead blog article, their general goals, including those for supporting the OEBPS standard and the many innovative features in OEBPS, particularly “out-of-spine” content. In my estimation, out-of-spine is a “must enthusiastically support” feature, and one I will continue to be vocal on because of its importance.

    Another critical issue is Adobe-specific “add-ons” to OEBPS, which I also oppose. Innovations should be done through the IDPF OEBPS WG, not with vendor-specific add-ons which serve only to fragment the standards arena and lead to proprietization. The OEBPS specification (as is OpenReader) is quite robust and pretty much can do anything Adobe would want in the next couple years with Digital Editions — for what OEBPS can’t do, update the spec first so the innovation has been industry vetted.

    Embracing open standards brings along responsibilities as well as opportunities, and I hope Adobe will be a responsible participant in the development and use of open standards.

    Btw, as an addendum to my article, the next version OEBPS requires OEBPS reading systems to render out-of-spine content. Here’s specese from the latest draft. This is not final, but does reflect decisions the IDPF OEBPS Working Group has made:

    “Out-of-spine content refers to Content Documents 1) not included in the spine and 2) referenced, directly or indirectly, from a spine document or an OEBPS navigation structure. A Reading System must render, in some fashion, such out-of-spine content.”

    So I look forward to the next version of Digital Editions in how it will render out-of-spine content. I suggest something similar to Microsoft Reader’s pagelet — or let the end-user pick from two or three possibilities. There’s definitely room for innovation.

  4. Thanks for the feedback. As we indicated in our FAQ, OEBPS support in our beta 1 release is experimental, and we anticipate compatibility with the in-preparation revision to IDPF OEBPS. To be clear, this means that publications prepared according to this standard (as revised) are expected to be viewable via our client software. Obviously this is contingent on timely completion of the revision.

    You mention OEBPS 1.2 but effectively there is no meaningful suport in the industry for this format, which lacked key capabilities required for digital publications. What OEBPS support there is is largely based on the original OEBPS standard, and often lacking in even basic XML compatibility, and featuring other extensions and deviations. Therefore we plan to concentrate on supporting the upcoming OEBPS revision.

    Since there is not yet a working draft of the next OEBPS revision, there is as yet nothing to formally test compatibility against. However, several capabilities of this in-preparation OEBPS revision are experimentally supported in our software (inc. DAISY/NIMAS compatible declarative table of contents, SVG, and OpenType embedded font subsets). Additionally, as you noted, we support (but do not require) an “extra-experimental” XSL-FO based template capability.

    We look forward to helping to foster greater interoperability as the industry converges to widely-adopted eBook standards; PDF for final-form paginated content and OEBPS for reflow-centric content.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.