My recent article on the non-breaking space paragraph issue created some controversy among book markup experts, many of whom disagree with the idea of using such a paragraph to represent a blank line. Typesetting expert Laura Brady suggested it would be a good idea to attend the BISG Ebook Accessibility Workshop to learn more about proper steps for creating accessibility-focused EPUB files. I’d certainly be happy to if I could, but given that it’s in New York and I’m in Indianapolis, it’s not really in the cards for me at this point. (I’m just glad I’m going to be able to attend BookExpo America next week in Chicago; for a while it was looking iffy.)

The discussion on that post has generated some interesting comments. Nate Hoffelder of The Digital Reader points out that the job of e-reader apps is not to make value judgements about whether a book uses the “correct” formatting—it is to display the book exactly as coded. The non-breaking space paragraphs date back to the early days of e-books, pre-dating the use of CSS and modern rules—but enough old e-books from those days are still out there that readers should nonetheless be able to display such books correctly.

Jim Chapman, developer of Freda, chimes in that EPUB3 is such an expansive standard that to create apps that support it in its entirety would take a huge budget.

This would matter less if we were happy for all e-book reader development to be done by big software enterprises, charging $19.99 a pop for their products. But the market expects its e-reader apps to be cheap (or even free/ad-supported). That necessarily means that they are lightweight simple programs … and as such they can only handle lightweight simple ebook representations. I, for one, would really welcome it if someone came up with an “EPUB0” standard that was just about putting the words on the page, with a decent minimum of support for images, references, tables of contents, titles, chapters, sections, block-quotes, footnotes, and so forth. It would not need to be large or complicated (actually, FB2 format shows the way!). All the rest (colour, alignment, font … ) is something that (in my opinion) we should let book user decide anyway, according to their preferences and reading environment.

Maciel “Nux” Jaros suggests using the WebView API to display HTML, but Chapman notes that aspects of WebView that make it good for viewing web content make it less than ideal for displaying e-books.

Therefore, reading XHTML and applying CSS styles is basically a ‘roll-your-own’ development task, which gives results just as bad as you might expect.

That is why basing EPUB on HTML+CSS was basically a bad decision by the standards-writers: books are not web-sites.

In another comment, Chapman discusses the problems further and adds:

I’m sorry to bang on about this – but I have personally spent months wrestling with this problem, and they have been wasted months. Every time that I have tried to switch Freda over to a Webkit/whatever approach, I’ve eventually hit a brick wall, and had to go back to my tried, tested (and clunky) custom parsing and rendering. I agree with you – it would be nice if this stuff worked. But it doesn’t.

In the original MobileRead thread where I was discussing the question of blank lines between paragraphs, JSWolf pointed out that section breaks aren’t the only reasons to have blank lines within e-books, either. In some cases, people might want to set off text within a section, such as when a sign or other object is quoted centered.

Meanwhile, I heard back from Scrivener developer Ioa (aka “AmberV”) on the Literature & Latte forum in regard to my question about why Scrivener uses non-breaking space paragraphs to separate sections in the EPUBs it creates. For a bit of background, Scrivener is a text processing app whose users write stories one scene at a time. The scenes are arranged in outline-style trees, so that writers can reorder them simply by dragging them to different positions within the tree. When the book is created, each of those scenes ends up separated by a blank line.

Given that Scrivener keeps those scenes separate anyway, I asked why it separates them with a blank line in the final product rather than semantic markup like <hr />. Ioa explained that Scrivener actually doesn’t create e-books from the section layout directly. Before converting to EPUB, Scrivener translates the Scrivener project into a single rich-text document. It then converts that document to the HTML that goes into an EPUB file. Because RTF doesn’t have a semantic scene separator element, the EPUB created from it doesn’t get one either. Ioa adds:

Have you considered using MultiMarkdown with Scrivener? A lot of what I’m saying here is owing to the limitation of being an RTF based editor and trying to generate clean HTML out of that. MMD works by ignoring all of the rich text stuff and using Scrivener more like a plain-text editor with a simple syntax based heavily on Markdown. MMD itself does not have an ePub generator, but (a) the HTML5 it produces is super clean and semantic, and (b) there is another tool called Pandoc which can take MMD files created in Scrivener and turn them into ePubs—it does a pretty good job of it, too.

I was honestly a little surprised to learn that Scrivener generates its EPUBs from a RTF file. If you’re already creating separate sections within the editor itself, it would seem like it’s throwing information away just to slap those sections together in one document before converting it to an e-book. But I suppose the whole section-based layout is actually meant to help writers in organizing their work as they write it, not necessarily for consideration in how to structure the e-book.

It seems that the question of proper e-book markup comes down to a dichotomy between simple users and power users. The EPUB markup standard is complex enough that people who know the code can do pretty much anything they want to in terms of e-book arrangement, within the restrictions of the format.

But people who don’t know how to code and aren’t interested in knowing how to code have tools they can use like Scrivener or Calibre to make e-books without having to know any of that sort of thing. And those tools take shortcuts. When you get right down to it, that’s the nature of any tool that simplifies a complicated process. They miss nuance in the name of being “good enough” for most people most of the time.

From that point of view, the MultiMarkdown suggestion isn’t really helpful. As a writer using Scrivener, I only really care that it produces an e-book that looks right to me. If I’m reading an e-book in an e-reading app, do I particularly care whether it was coded with <hr /> or with <p>&nbsp;</p>  where a blank line is required? No, I just care that the blank line is there and it looks right. If it looks right to me, why am I going to want to bother to learn more complicated procedures for the sake of it being technically correct in every aspect?

Telling people who use these tools that they should learn to code their e-books properly instead is neither helpful nor useful. Many writers have better things to do with their time—such as, for example, write. And if Scrivener looks good enough to them, why should they have to learn something new for the sake of adhering to a technical standard that will still look exactly the same in most e-readers? (As I noted in my previous column, the popular e-readers that honor the non-breaking space paragraph vastly outnumber the ones that don’t—especially now that Freda’s updated to honor it as well.)

Yes, Scrivener should figure out some way to generate semantically-correct, standards-compliant code for its EPUBs. And I strongly encourage everyone who feels that way to contact Literature & Latte and ask its developers to come up with some way to fix it. But until and unless they do, it’s going to keep right on breaking standards with non-breaking spaces.

As long as these quick-and-dirty e-book-creation methods like Scrivener are around, e-reading apps need to be able to support both users who use such shortcuts, and the ones who use advanced tools to create letter-perfect standards-compliant versions. That’s why I made a big deal about Freda needing to support non-breaking space paragraphs, and why I’m glad Jim Chapman went ahead and added the feature.

So, where do we go from here?

3 COMMENTS

  1. With EPUB (IDPF) and HTML ()W3C) on a convergent path, we need to look at the history of web browsers and the history of web authoring tools as a foreshadowing of what is in store for ePublishing. Using a text editor isn;t the only or most common way to create either eBooks or web sites.
    To say that eBooks are not web sites may not be useful. One can certainly argue convincingly that an eBook is little more than a web site in a can (generic term for “container”).

  2. Hi Chris,

    I’m the developer of Scrivener for Mac (and the designer of the app). Ioa (“AmberV”) is our support guru and helps me refine a lot of the design issues, but he’s not actually a developer. We are in fact a very small team and I’m the sole developer of Scrivener for Mac and iOS (with another developer on the Windows version).

    It’s not true that Scrivener compiles everything into an RTF file – it doesn’t at all. When Ioa said that Scrivener compiles everything into a single rich text document, what he actually meant was that all of the rich text across Scrivener’s scenes are compiled into a single object internally before being converted to various HTML file for the ePub – no RTF is involved, so that was just a mix-up of terminology, I think. Ioa is a big MultiMarkdown fan and is a bit of an evangelist for it, whereas I don’t use it at all, so certainly no one is expected to learn such things to use Scrivener or generate a decent ePub file from it.

    One thing to remember is that it’s not quite accurate to say that Scrivener is designed to have separate scenes in each document – that’s just one way of using it. It’s entirely down to the user how to break down their book, so you could equally have a whole chapter in each document or a single paragraph. Scrivener therefore has to be agnostic about how the content is chopped up and let the user decide that at Compile time. Once that big text is built, it is then broken back down again into chapter-sized chunks for the different HTML files, for instance.

    I wouldn’t describe Scrivener’s ePub support as “quick-and-dirty” – believe me, many months of development went into it! One problem – relevant to the non-breaking space issue – is that, as a single independent developer, I have to use what resources are available to me. In this case, I use Apple’s HTML generator to turn the rich text data from the editor into HTML, and this is how Apple’s HTML generator treats those blank lines. This is also why Scrivener 2.x currently only supports ePub 2.0 – because Apple’s HTML generator only supports HTML4 whereas ePub 3.0 requires HTML5.

    For our next major (paid) update, which has been in the works for a long time now and I’ll start talking about more later in the year, I spent a long time on this issue. With no HTML5 converter available to me, and with writing my own HTML5 converter a Herculean task given the complexity of the format (when converting complex rich text objects, at least), I ended up using MultiMarkdown as a middle man, so that internally Scrivener will convert the rich text to MMD, then pipe that to HTML5 (MMD has very good HTML5 support). With this approach, I’m also able to provide much more control of the CSS to the user for those that want to get their hands dirty with that sort of thing. This should result in cleaner ePub files (as well as providing ePub 3.0 support) and should also make addressing issues such as the one you raise easier for us in the future.

    All the best,
    Keith
    Scrivener developer

  3. @Keith, thanks for the insights into the inner workings of Scrivener, especially the dependencies on various frameworks and their limitations. As I read, I wondered if you had looked into web authoring architectures such as those used by RapidWeaver and apps of that ilk. They use templates, plug-ins and other methods to distribute the work load to developers who specialize in one function or the other.
    I don’t suppose that Apple will share the frameworks that undergird Pages and iBooks Author. Even their Automator code for creating ePub from text is undocumented (see: http://www.macosxautomation.com/lion/epub/index.html).

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.