Home E-reading Tech EPUB e-readers disagree: When non-breaking spaces break standards

EPUB e-readers disagree: When non-breaking spaces break standards

May 3, 2016

3881

Over the last few days, I’ve been involved in an interesting conversation on MobileRead. It started when I posed the question why so many e-readers seem to ignore extra blank lines to separate sections in stories. I’ve complained about readers like Moon+ disregarding them before. When I generate EPUBs using Scrivener, it uses those extra blank lines—and likewise, so do many of the e-books I download from Baen. Trying to read those on a reader like Moon+ swiftly becomes an exercise in frustration, as there’s no visible sign of where one section ends and the next begins.

I reached out to Jim Chapman, developer of a Windows 10 e-reader app called Freda, about it. I really like everything else about Freda, and want to give it a good review for TeleRead, but the current version of it has that same annoying behavior and I asked if it could be fixed.

Jim took a look at the way the e-book file did things and determined that it used a paragraph with a non-breaking space character within it to create the blank line:  . Some of the other participants in the thread said that they coded section breaks in their e-books in the same way. Effectively, this creates a paragraph comprised of nothing but a single non-breaking space character—a character that is invisible and can be represented by HTML code.

Jim said that he would put in a tweak in the next version of Freda, expected out in a couple of weeks, so that it would stop disregarding those non-breaking space paragraphs. (I’ll do a full review of that version for TeleRead.) However, he added:

Regarding the ‘right’ way to represent vertical space in epub files: It is pretty clear that using a p element containing only   is an ugly hack, and is the wrong thing to do, in terms of web standards (see for instance the discussion at http://stackoverflow.com/questions/1…-editor-or-not ). The right thing is certainly to use CSS styles to add a margin of the appropriate size. I’d hope that the implementers of Scrivener etc. will get round to fixing their program at some point, to do the right thing. I don’t feel especially proud of having changed Freda to fit in with their broken interpretation of the xhtml standard 😉

This sparked an interesting discussion about the “right” way to do such things. MobileRead user Toxaris put it eloquently when he replied:

I don’t agree. It is ugly, but it is not wrong even when considering web standards. After all, an e-book is not an webpage even if it uses web technology.

Like I said, the purists use margins to create the empty lines and they are of course correct. It is the ‘best’ way to do it. However, the sad part is that a lot of reading programs and even some readers ignore parts or even the complete stylesheet and overrule it with their own. This of course removes all the section breaks you had. Using the [noparse] [noparse] is a fail-safe method and completely legal/allowed. Empty paragraphs should be ignored according to the web-standards you mention and usually are. However, this is not an empty paragraph and should not be ignored according to the same standards.

So, unless all the readers/reading applications play ball, it is the only failsafe method. Personal feelings aside of course.

It doesn’t do much good to set margins in the CSS if the e-reading app ignores them—and many of them do. You can disable that on some of them, but not all of them. Furthermore, many commercial e-books (such as those I mentioned from Baen) use this non-breaking space technique to insert extra blank lines, too—and Scrivener, intended as a one-stop shop for e-book creation from manuscript to EPUB, does it too. (I posted an inquiry about this in the Scrivener technical support forum, but no one has replied yet.) MobileRead poster Ripplinger pointed out that even Calibre retains these non-breaking-space paragraphs when you tell it to remove all spaces between lines. “It’s a simple solution that just works, even if not an elegant solution or correct.”

Books created with that method work just fine in Adobe Digital Editions, Nook, Kobo, iBooks, Google Play Books, eReader Prestigio, Aldiko (with “Use Advanced Formatting” unchecked), UB Reader, Marvin, Gerty, and Bibliovore. So if readers like Moon+ and the current version of Freda treat a non-breaking space paragraph as empty and skip it are supposedly honoring the XHTML standard for empty paragraphs, it seems this is one of those cases where adherence to the so-called “standard” is actually in the vast minority. (Just as EPUB is the “standard” for e-books, but the vast majority of commercial e-books actually sold are in Kindle format.)

I was discussing this with my friend and occasional TeleRead contributor Felix, and he pointed out that HTML actually has a semantic element for indicating a section break: <hr>, which by default produces a horizontal rule across the page, but can be defined to appear any way you want it to—including as a blank line. But from my point of view, you would run into the same problem as with using CSS to set an upper margin on your section: many e-readers simply ignore it.

It puts me in mind of the saying, “If it’s stupid but it works, then it isn’t stupid.” If   is nonstandard and wrong, but it’s also simple to put in, and is likely to be honored even when the e-reading app throws CSS out the window, then is it really “nonstandard and wrong”? It’s like the “descriptivist” vs. “prescriptivist” debate about dictionaries: should they reflect the way people use the English language, or should they tell people how they should use the English language? If many more e-readers honor non-breaking space sections than not, it seems as though that is the new standard for separating sections.

So, the world of EPUB e-readers is kind of a mess when it comes to applying standards. Whether any given reader supports CSS at all is kind of a toss-up. When I look at all this, it just makes me wonder—if apps can’t even agree on simple matters like CSS and paragraph separation, how can they hope to tackle really big issues of support for multimedia, interactivity, and other ways to extend the format? Do we really even have an effective EPUB “standard” at all? Maybe all those people who worry about “improving” the next generation of e-book should set their sights a little lower and see if there’s a way to get the current generation of e-book readers to agree with each other first.

19 COMMENTS

JSWolf May 3, 2016 at 12:16 pm

Adobe Digital Editions, Nook, Kobo are all the same thing. B&N & Kobo both use RMDSK to display ePub and RMDSK is the code from Adobe used in ADE.

Log in to leave a comment
Aaron Shepard May 3, 2016 at 3:34 pm

This whole issue seems a bit silly. If you want a sure method of indicating a break, just insert a line with three asterisks or such.

Blank lines don’t even work in print books, because they may fall at the bottom or top of a page, where they can be missed. The same on a screen. Hoping that a blank line will do the trick is just wishful thinking.

Log in to leave a comment
Bill Kempf May 3, 2016 at 4:14 pm

I’m not sure why ePub reader developers seem to think this is “non-standard”. The paragraph isn’t blank, and should not be removed. All browsers I’ve been able to check do the right thing here. For eReaders to behave different is wrong, and certainly not according to the standard.

That said, though, this is really a bad practice. HTML is intended to be semantic markup, and empty paragraphs are not semantic. Felix is correct, the tag is the appropriate way to do this. CSS styling with a class attribute on the tag would also be acceptable, but should be the preferred approach. All eReaders should do something appropriate with even if they override the CSS (which is bad practice).

Log in to leave a comment
- Nate Hoffelder May 3, 2016 at 4:43 pm
  
  I agree on the first part, but not the second. It’s not up to the app developers to decide which parts of an ebook’s code is ignored; the app should display the ebook exactly as it is coded.
  
  While this might be a bad practice, I would say that it is actually outdated. It predates CSS and violates modern rules, yes, but that doesn’t matter in the here and now.
  
  Old ebooks which use this practice are still floating around, and developers are still using it. That means that the apps have to display it correctly, or they are failing to do their job.
  
  Log in to leave a comment
Hrafn May 4, 2016 at 3:52 am

I suspect that the default reader on my Onyx Boox i62HD also ignores these.

I would also suggest that I quite frequently *want* my eReader to ignore CSSes, as it allows me to set things like fonts, margins, etc, etc to my own reading comfort. I also, not infrequently, turn off CSS on webpages, especially where the column-width is set unnecessarily narrow (this site, by no means the worst, only uses 1/3 of my screen’s available width for the main article).

Log in to leave a comment
- Bill Kempf May 6, 2016 at 9:02 am
  
  The C in CSS stands for cascading. You don’t have to turn off the publishers CSS just to apply your own fonts, margins, etc. The publishers CSS should be applied first, and your CSS second and everything should work out great. Granted, the issue there is in often having to apply CSS changes that are specific to the book in question. For example, you point out an issue you have with this website using 1/3 of your screens width. Due to this site using a complex layout with multiple columns using classes and identifiers specific to this site, you can’t fix this by applying a generic CSS file after the site’s CSS file. You’d have to have a site specific CSS file that used the classes and identifiers coded on this site. While that might still be true for eBooks as well, I think it’s less likely. In general, eBooks don’t use complex layout but instead just use semantic tags and simple styling, and a generic user CSS could easily be used to apply preferences.
  
  Log in to leave a comment
Jim Chapman May 4, 2016 at 5:27 am

Responding to Chris’ closing sentence – and speaking as developer of an e-book reader app, I agree 100% that there is a woeful lack of standardisation for ‘e-book rendering as it is really done’. And I think that the EPUB standard-development process is a big part of the problem … with EPUB3, we have a standard that includes everything but the kitchen sink. To build a reading program that is truly compliant with EPUB3 (and all its content types, and all its scripting features, and … ) is a mammoth exercise. I doubt that it could be done for less than $100k; maybe ten times that much.

Even EPUB2 is quite heavy going (because it includes pretty much everything you can do in CSS, and that includes some seriously funky stuff … formatting dependent on parent element-types, block-boxes and overflow, float-over text … ). Doing even a half-assed job of parsing that mess is a heavy programming effort.

This would matter less if we were happy for all e-book reader development to be done by big software enterprises, charging $19.99 a pop for their products. But the market expects its e-reader apps to be cheap (or even free/ad-supported). That necessarily means that they are lightweight simple programs … and as such they can only handle lightweight simple ebook representations. I, for one, would really welcome it if someone came up with an “EPUB0” standard that was just about putting the words on the page, with a decent minimum of support for images, references, tables of contents, titles, chapters, sections, block-quotes, footnotes, and so forth. It would not need to be large or complicated (actually, FB2 format shows the way!). All the rest (colour, alignment, font … ) is something that (in my opinion) we should let book user decide anyway, according to their preferences and reading environment.

All this is of course a personal opinion only; I’m quite aware that reasonable people might differ!

Log in to leave a comment
- Felix Pleşoianu May 4, 2016 at 9:40 am
  
  I was just about to jump in and mention the FB2 format (for which I wrote a primitive reader in HTML5), that is entirely semantic — appearance is fully up to the e-reading app. And guess what you do in FB2 to separate sections: you use an [empty-line/] tag. Funny how the authors of the format deemed it essential to have an explicit way to speficy section breaks, rather than relying on three asterisks on a line or other such nonsense…
  
  Log in to leave a comment
  - JayPanoz May 6, 2016 at 8:16 am
    
    OK, here’s the issues to take into account:
    
    1. this [empty-line/] is supposed to be in HTML 5;
    
    2. semantic meaning has changed, it was just a horizontal rule in HTML 4, it’s become a thematic break in HTML5 but its default styling/suggested rendering wasn’t updated accordingly;
    
    3. the three asterisks are actually called an “asterism” (⁂) or “dinkus” (***) and you can actually achieve that by styling since * * * is terrible in terms of accessibility;
    
    5. asterism | dinkus can mean the same as an empty line OR it can mean something else… so this is the reason why in some books, both might be used to bring out the nuances in the logical framework.
    
    Trust me, in some languages, this “three asterisks nonsense” makes a lot of sense, especially when the empty-line is displayed at the bottom/top of a page.
    
    Log in to leave a comment
    - Felix Pleşoianu May 7, 2016 at 11:03 am
      
      And that’s why I’m insisting on the need for a *semantic* element to signal thematic breaks. Call it [hr]. Call it [emty-line/]. Heck, invent a completely new one called [dinkus]. Whatever. Just let the *app* style it as a dinkus, horizontal line, ornamental graphic or whatever. (I agree that a simple blank space doesn’t work.) Either way, what authors need is a way to *unambiguously convey their intentions*, because even if you manually insert a utf-8 dinkus character, to a computer it means nothing, and sooner or later someone will mess it up.
      
      Log in to leave a comment
      - Maciej “Nux” Jaros May 7, 2016 at 1:34 pm
        
        Actually if you need semantic markup, then most books should only consist of `section` in which you have paragraphs `p`. If you have long chapters then just use `section` for chapter and in it add `section`. Then just use CSS to add whatever background image. Problem is e-reader creators would need to agree to let go and let creators of books make good products.
      - Felix Pleşoianu May 12, 2016 at 1:45 am
        
        Unhhh… How about NOPE. That’s the very opposite of semantic markup. Most books also have chapter titles — those are neither sections nor paragraphs. Also mottos. And if they’re non-fiction, you’re absolutely going to have citations, bullet point lists and subheadings. The very *concept* of SEMANTIC markup means each of these needs a tag of its own, so that the app can tell what each piece of text is MEANT to be. That’s what “semantics” *means*: the *meaning* of things.
        
        Why is that important? At the very least, because if the e-reading app knows what the author means, it can render each bit of text in a suitable style without ever needing CSS. You know, much like web browsers do by default. And while a web page without CSS is undoubtedly ugly, it’s nevertheless *fully functional* out of the box.
        
        The likes of “b” and “i” have their purpose. But don’t tell me that manually inserting a bullet point at the start of a paragraph turns it into a list item, because that’s not how it works at all.
      - Maciej “Nux” Jaros May 12, 2016 at 3:49 pm
        
        Chapters are sections. Sections inside a section are subsections. That’s how you structure things in HTML 5. For books this is very simply really. Most of them won’t have boxes of text flying around. They do not have articles lists inside and so on.
        
        Of you are right that some books might need more complicated markup. Especially how-to and science books. That’s not what I would read on a e-reader, but yes, those would still be e-books.
        
        Not sure what you were getting at at the end. I think you’ve misunderstood my previous comment. I would not suggest to use `b` and `i` as they are not semantic elements. Semantic equivalents are of course `strong` and `em` (emphasis).
- Nux May 5, 2016 at 3:25 am
  
  @Jim Chapman It’s not a huge effort to support HTML and CSS. There are engines like WebKit which are ported to almost any platform. In Android based reader all you have to do is use a WebView to display HTML.
  
  As with standard HTML pages – creators of EPUB should be allowed to be wrong. If they want to use paragraphs with nbsp for spacing they are allowed to do that. If they want to use… E.g. yellow text on white background – they are allowed to do that. Their product will probably not sell, but that’s not a problem of e-readers.
  
  All e-readers should behave consistently and use same standards. We’ve been through this in browsers and there is no other way.
  
  Of course you can have an option in e-reader to override some styles, but default behaviour should be to respect them.
  
  Log in to leave a comment
  - Jim Chapman May 5, 2016 at 8:50 am
    
    @Nux: The biggest problem with using WebView is that ebook users often expect to see a page-by-page view of the text, rather than a single vertically-scrolling window. Depending on the particular set of APIs that the platform’s WebView exposes, that may be impossible … or it may be difficult (= expensive) to code … or it may give terrible performance, particularly on mobile devices.
    
    There are other smaller problems (handling of image formats not supported by the WebView, management of bookmarks, following references, showing footnotes, interpreting paths within the archive, etc.) but it’s the ‘paginated’ view that is the killer … and is the reason why so many ebook reader apps (Bookviser, FBReader, Moon+, Freda, … ) do not use the approach that you suggest. To get a paginated view the developer has to hand-code the parse-and-present logic, normally in a way that does not fit the WebKit/MSHTML/EdgeHTML models.
    
    Therefore, reading XHTML and applying CSS styles is basically a ‘roll-your-own’ development task, which gives results just as bad as you might expect.
    
    That is why basing EPUB on HTML+CSS was basically a bad decision by the standards-writers: books are not web-sites.
    
    Log in to leave a comment
    - Maciej “Nux” Jaros May 5, 2016 at 4:27 pm
      
      Most EPUB files are very simple HTML code. Performance would not be a problem.
      
      But yes I guess controlling pageflow might need a bit more work. There are some solutions to work from though: http://stackoverflow.com/questions/3636052/html-book-like-pagination
      
      I understand that e-readers are not only simply something to display books. But having something to do most of the rendering work (rendering engine), especially as standards get more complicated, is a clear win that was seen by browser vendors long time ago.
      
      Log in to leave a comment
      - Jim Chapman May 6, 2016 at 4:29 am
        
        Exactly: those stackoverflow posts confirm my point. None of them actually solves the pagination problem (they don’t deal with images well, and can result in a line of text whose top half is on one page, and whose bottom half is on another). Nor do they allow hit-testing (for instance, to add a bookmark or highlight to a word).
        
        The underlying problem is that none of the engines (Webkit/EdgeHTML/MSHTML) gives you an API to hook/call-back ahead of the content-rendering stage, and none of them has any built-in support for rendering to a paginated canvas (or better yet, an off-screen canvas with overflow/bounds detection).
        
        The Artifex4 app is the best effort that I have seen at rendering using your WebView approach; it did a decent job … but even so, its performance was sometimes patchy, and it lacked some features (exactly the ones that I noted above as being problematic in this approach). The app has been off the market for a year or so.
        
        I’m sorry to bang on about this – but I have personally spent months wrestling with this problem, and they have been wasted months. Every time that I have tried to switch Freda over to a Webkit/whatever approach, I’ve eventually hit a brick wall, and had to go back to my tried, tested (and clunky) custom parsing and rendering. I agree with you – it would be nice if this stuff worked. But it doesn’t.
- JayPanoz May 6, 2016 at 7:45 am
  
  Well, in that case, you should provide feedback really fast on epub revision https://github.com/IDPF/epub-revision/issues because the second draft has just been published and there are some big changes in there.
  
  Also, considering non-fiction books, for which some styles may serve the function of editorial design e.g. asides, pull-quotes, etc. because there is a lot of different types of content to manage, I don’t necessarily agree the whole styling should be up to the user as visual hints will be lost in overrides, degrading comprehension, etc.
  
  CSS overrides are currently being discussed on the epub revision repo, and some have been putting a lot of effort into this particular issue as it has a huge impact for this type of books. Maybe it’s time Reading Systems developers discover that books are not necessarily novels—and nope, fixed-layout is no solution as it wasn’t created for text-heavy non-fiction in the first place.
  
  Log in to leave a comment
Chris Meadows May 7, 2016 at 3:18 pm

Added some further thoughts on this matter.

Log in to leave a comment

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

You must be logged in to post a comment.

Share this:

Related

19 COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

AMAZON

REVIEWS: E-Book & AUDIO BOOKS

SELF PUBLISHING: TECH & BIZ TIPS

MOST RECENT

POPULAR POSTS

MAJOR CATEGORIES