Picture of a red leather-bound bookA few weeks ago on The eBook Community, in three lengthy articles (1, 2, and 3), I outlined some thoughts and requirements for an open standards/open source e-book mastering system intended to be used by smaller ebook publishers. This article is a progress update. We are actually working on the system!

To summarize, the e-book authoring system is envisioned to enable “almost push button” conversion of a single and fairly simple master XML document into most, if not all, ebook formats in use today and tomorrow. Example formats of interest include OpenReader, OEBPS, native dotReader, Mobipocket, LIT, PDF, Plucker, Palm Reader, XHTML, etc.

The master XML document itself would be authorable in applications which smaller publishers and even individual authors will hopefully find comfortable and foolproof to use.

Since that first series of articles, we’ve made great progress on the design of the mastering format — the core of the system — and this blog article is an update of where things currently stand.

Designing this system, notably the “mastering format,” is not a simple matter as noted in the prior articles. Several competing requirements have to be met thus necessitating compromises — as a result it can’t be all things to all people.

Thus, the system will only be able to handle the simpler types of books, such as fiction. However, I believe it will meet the “80-20” or even “90-10” rule for most smaller publishers, who tend to publish mostly fiction and simpler non-fiction, and thus prove useful to them.

Now to get a little more technical.

The core of the system is what’s termed the mastering “vocabulary” — the elements and attributes used in the XML master document. We’ve been working on the vocabulary (and the associated grammar) the last few weeks, trying to reconcile the requirements, and finally have something to show. A precursor to this latest vocabulary has already been successfully “field tested” with an independent ebook publisher (using epcEdit as the authoring tool) showing the current approach just may work.

Now to get real technical. For those who are knowledgeable of XML, the current draft DTD of the “SimpleBook” vocabulary is available (note that this DTD will change often, so be sure to work with the latest version.)

In addition, for those who use the latest version of Opera (version 9), a demonstration (and silly) XML master document along with a CSS style sheet (with silly styling to assist with document visualization) is viewable. (The latest Firefox renders this demo XML document pretty well except that we haven’t figured out how to get CSS to number lists. Forget using Internet Explorer, even version 7, since it has insufficient CSS support, particularly with selectors.)

We seek your help. Here’s the areas we need help on:

  1. Finish the master vocabulary. No doubt it can be improved, so your feedback is appreciated!

  2. Work on adapting the master vocabulary to existing authoring applications. Some applications of interest include Word 2003/2007 (most publishers use Word), Vex (an open source WYSIWYM XML editor of great promise), and epcEdit (fairly expensive tool, but worked wonderfully — it has a 60 day free use period.)

    Of course, the more technically savvy small publishers will be able to master using a simple UTF-8 capable text editor (such as EditPad Pro, BabelPad or even Windows Notepad.) Over time some publishers may find using a text editor will be the easiest way to author the master documents. I do. But for the time being we have to meet the small publishers on their turf, and that turf is rooted in word processors.

  3. Start working on scripts to convert the master into other formats, such as a standardized XHTML 1.1 which in turn can form the basis for other formats such as OpenReader, OEBPS, LIT, Mobipocket, etc.

  4. Start work on a library of CSS style sheets (for the master format as well as the equivalent standardized XHTML 1.1.)

    Note that the master vocabulary is being designed so tightly (non-extensible and fully structural) that it will allow developing an open library of style sheets so publishers need not have to hand-author their own CSS, but rather can use and/or adapt others.

  5. We need to rename the system and vocabulary from “SimpleBook” to something else since the name “SimpleBook” is already taken by another company for a totally different product. Any ideas?

Let me know by email how you’d like to help. We will start a dedicated discussion group if there’s enough interest.

20 COMMENTS

  1. I do like ConciseBook, and will add it to the candidate list.

    Regarding NoringBook. Naw. (But thanks for suggesting it, I think <smile/>)

    Orca or OrcaBook is also a very good name. I’ll ask Lee Passey, who first thought of “Orca”, if he’d let us use that name for this purpose. He may have other plans for the name.

    Ron, can you elaborate on what O’Reilly does? It is my understanding that they use a variant of DocBook for mastering their books. DocBook is much more complicated than what “SimpleBook” needs (it is great for technical documentation and similar types of docs), but a subset could be chopped out and considered.

  2. O’Reilly didn’t give me that much information.

    I razzed them a bit because their PDFs that they sell aren’t formatted very well for the current crop of eInk eBook readers.

    They responded saying that all their books are stored as XML and that they can produce any type of book (print or eBook) that they want. When a dominant eBook format comes out, they would be able to easily support it.

  3. wow, lots of interest in coming up with a name.

    At one time I had “unibook.org” (for UniBook) reserved, but that lapsed and it is now taken by a squatter.

    OpenBook is too close to Open eBook, etc.

    123write — aren’t there other products with ‘123’ in it?

    RiteBook (or RightBook) — I think of “RiteAid” :^(

    BookPress is interesting, but press almost always refers to a publisher, so it may be confusing.

    Eprint, I don’t know. I think of Palm, Adobe, and Sony with that.

    But the key is to keep thinking, and use the proposed names to spur more creativity. As Mulder might say, there is a name out there (I do like OrcaBook a lot, and will ask Lee, but Lee may want to reserve that for something else.)

  4. “2. Work on adapting the master vocabulary to existing authoring applications.”

    Any plans to work on something for InDesign? We currently publish PDFs from InDesign, and my understanding is that it has built in capability to work with DTDs. My ideal workflow would be to design for PDF and also export a format more e-reader friendly.

  5. Am I to understand correctly that this system is not software, just the engine that can be used in future software?

    I’m sad to see the industry has been so slow in trying, much less adopting, any open standard. I am working on a long-term web programming project and really, really would love to be writing documentation as-I-go, to not miss anything. But no way do I want to create something in a proprietary format, or even XMl (too difficult when I’m already working on something else).

    It would be so nice to be able to open up a text editor and simply create content in OpenReader format without having to learn their DTD. The day that happens, the eBook industry will explode!

    I am with many people that do NOT want to see books replaced by electronics — it’s harder on the eyes for one thing, and it’s just NOT relaxing (sitting at a computer/sitting where I work — for me the same thing). But the ability to easily read and write documentation and technical books is another thing altogether, when you are working on the computer to begin with (I read programming-language manuals that way).

    I enjoy ThoutReader and hope I will soon be able to write the documentation for my project and read it as I go along.

    Kudos to you guys for contributing for the benefit of all of us.

  6. Because of the realities of no money to build the “Simple Book” system at this time, it is important to build the core components and hope for the good graces of the open source community to build the various converters plus adaptions to existing authoring tools which smaller publishers could use.

    The core to make this happen is the vocabulary itself, so the focus of effort is to get a rich-enough vocabulary that will be useful to meet the 80-20 or 90-10 rule for smaller publishers, and integrate into existing authoring systems. In addition, the vocabulary must be “rigid” enough (no looseness) thereby allowing standardized style sheets and reliable conversion (“perfect rice everytime”). As soon as one allows “loosey-goosey” markup, the system can no longer meet the various requirements and is essentially useless.

  7. I like what has been accomplished with the master document systems but I would recommend including Speech Synthesis Markup Language (SSML).

    http://www.w3.org/TR/speech-synthesis/

    Specifically it includes a voice tag. This will allows speech engines to read the book aloud without the need for voice actors and a audio production studio.

    Most new E-Book readers include audio out, usually mp3 player function. With SSML the reader could actually read it or even include pronunciation of specific names and titles.

  8. Thanks, RockApe, for suggesting SSML. Although the design of the “Simple Book” is focused heavily towards quite fine description of both structural and text semantics, I’ve not looked specifically at integrating it with text-to-speech engines. I believe, though, that the high semantic/structural quality will be quite compatible with high-quality text-to-speech. For example, Simple Book requires inline highlighted items to be described for what they are, and not only that they are emphasized. This is important for properly presenting highlighted items: “is it a linguistic emphasis, or a title to a book?”

    Although I’d like to see publishers think about text-to-speech, they are so overwhelmed as it is producing a publication for visual presentation that they have little energy left to make their publications highly accessible. This is one reason for the Simple Book design, so they don’t have to worry about this yet be able to produce something that should be of high accessibility.

    Certainly, I hope those reading this who are familiar with the nuts-n-bolts of accessible markup will look over the Simple Book DTD and suggest any changes/additions to improve it for text-to-speech and tactile (Braille) delivery systems.

  9. An update on the new name to replace “SimpleBook”.

    After looking over a bunch of suggestions (all of them great!), getting feedback from a few people, thinking through the ramifications of the various names, etc., one candidate name is floating to the top of the list:

    BookX

    It’s a simple name, intended to be pronounced “books”, and includes the “X”, short for XML. This name also allows for a simpler subset of BookX, which may be needed, e.g., “BookX-Lite”. In addition, there are no trademark encumberances to the name, and the bookx.net and bookx.org domains are available. The one downside is that we can’t get bookx.com domain, which is held by an investment company (oddly enough).

    What does everyone think of BookX to be the replacement name for “SimpleBook”?

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.