If OpenReader is going to be open, that won’t mean “opens in any e-reader.” It won’t mean “the spec is open to all.” No, the open in OpenReader should stand for open-ended. Forget XHTML tags or DocBook and TEI. An OpenReader file should be able to include text marked up in any fashion whatsoever, so long as it’s well-formed XML and accompanied by rendering information. (Part I)

MathML and SVG are slated for inclusion in OpenReader, apparently from the get-go. Displaying this kind of information in an e-book is a good idea, and you know what? The same problem of specifying book markup voacbularies applies to these markup vocabularies as well.

Today we propose including MathML. This is super-critical for e-textbooks, and the math community has been working for years to get MathML settled, plug-ins constructed, and rendering and calculation built into apps. Hooray.

But why MathML and not, say, Chemical Markup Language, whose markup is intended for rendering molecular structures visually? And what happens when a HistoryTimelineML and a FlowchartML and an ElectricalEngineeringDiagramML all appear? Will these be included as well?

The math people were building on their years of expressing equations in TeX and LaTex and so they got their “standard way to mark up things in our field” settled first. That’s not sufficient reason for OpenReader to say math in XML is OK, but everything else — molecules and flowcharts and ee diagrams and so on — has to be graphics.

Let’s face it. There will be a never-ending stream of such visual vocabularies. Adding them all isn’t possible. Using plugins for this sounds plausible: “You can only view FlowchartML fragments if your e-reader has the FlowchartML plugin.” It works with browsers. But how does an e-book publisher ensure that the plugin works in every OR-compatible e-reader?

Well, in my opinion, the OpenReader spec should require it. It should say the e-reader has to accept any conversion plugin that takes a something-or-other-ML fragment as input and generates SVG as output.

More and more, I see I’m disliking a pre-decided vocabulary in the OR spec (see part I). And this dislike extends beyond text markup to these tricky-but-invaluable visual-oriented vocabularies. SVG seems essential to handling latecomers to the party, so I’ll opt for that. And there’s no sense in leaving out MathML if browsers are able to render it natively and the rendering engines are used in e-books. But for everything that’s coming after? Let’s leave the door open.

9 COMMENTS

  1. Roger speaks from the heart, and I appreciate the possibilities of flexiblity. The trick is not to be so plug-in reliant that consumers will be confused. Also, I think it important to get a standard out in a timely way while continuing to give the world ample opportunity for input. Let’s see what Jon says. And other people, too! Folks, jump in on to the debate, which I appreciate Roger starting. Meanwhlie see Paolo Biggio’s take. – David

  2. An ebook format that is “extensible” does appear desirable but it is also potentially extremely dangerous when improperly implemented. The extensibility must be very carefully defined and circumscribed because there is another property that is much more important than extensibility. The ebook format must be “safe”. What is safety? Consider some formats that are unsafe: “Microsoft Word” format can carry viruses; “HTML” can report information about usage to distant websites; some media formats can contain embedded URLs that cause your browser to open up on a distant website; some media formats are sufficiently complicated that readers crash when opening ill-formed examples. Unfortunately, if a format causes a reader to crash it reveals an avenue of possible exploitation. A crash might allow a denial-of-service attack, or the installation of viruses, Trojans, spyware, and other malware.

    The cybersphere is now replete with malicious software and media files. The user of an ebook reader should not be burdened with worries of malware infestations each time he or she acquires a new ebook. Indeed, the text of an ebook may describe a dangerous adventure; however, the simple act of opening an ebook should not actually be a dangerous adventure.

  3. Garson makes a useful point. Note that I suggest only that OpenReader require any XML vocabulary describing non-text content to be translated to SVG, and that OpenReader require that plugins for these special vocabularies provide standard hooks for the e-reader to supply the XML and receive back the SVG.

    If malicious SVG can be constructed, I assume the browser makers have had to deal with that and e-reader makers should be able to implement the same solutions. If there are ways for malware to take over a program by crashing it, I’d say the spec can’t provide any protection and that the e-reader makers themselves have to defend the program.

    Keeping one’s eye on vulnerabilities and the possibility of being attacked seems like a good idea no matter what. It’s certainly not one I’ve heard brought up in conjunction with e-book software or formats before.

  4. If an ebook format is “extensible” then the designer of the format immediately confronts issues of document “searchability”. Consider the intriguing example given by Roger Sperberg:

    <scripture passage=”Mark 7:16″ version=”NKJV”> If anyone has ears to hear, let them hear! </scripture>

    Suppose an ebook user comes to a verse by Mark and wants to search for the next scriptural entry by Mark in the ebook. Can he or she do it? How does the search engine deal with these new tags? If a user searches for the word “passage” then he or she probably wants to find the word “passage” in the underlying text and not in a tag. The proper graphical rendering of tags is only part of the knowledge needed to properly make an “extensible” standard. Perhaps there should be a way to “extend” the search engine of the ebook to handle the new tags. But this can be difficult.

    Consider the example of Chemical Markup Language. Suppose a flexible ebook standard can be extended to accept and render Chemical Markup Language. Now suppose a user wants to search the ebook for molecules with hydroxyl groups, can he or she do it? Suppose the user selects one molecule and wants to find a molecule in the ebook with different chirality? These tasks might be reasonable for a chemistry textbook. Yet, an extension to the search engine is probably needed.

    I do not wish to dissuade designers from allowing for “extensibility” in an ebook standard, but I do wish to raise an issue. If the extension only changes the graphical output then in some ways it is a “shallow” form of extensibility. However, allowing for all “useful” forms of extension is probably too difficult.

  5. It seems to me that specifying the search capabilities in an e-book reader would be going too far (just as I think specifying the physical characteristics of an e-reading device, such as sub-pixel rendering). I don’t think that’s appropriate for a spec.

    Aside from that, I will admit that I haven’t thought about the issues you raise, and I think they are provocative.

    I suppose if the molecule markup included “hydroxyl” or “chirality” as elements, attributes, attribute values or element content (I’m not familiar enough with either chemistry or CML to ascertain that), then the search engine could look for those strings in the original XML of the e-book. Maybe the advanced search would let you specify which of these you want to include in your searches.

    As I say, this would be encroaching on the role of the e-reader developers. It’s clear, though, that current search engines would need to be enhanced, as you say. Of course, the full XML is right there to look at, since that’s the e-book format. The graphic molecule on-screen would be rendered from a transformation of the CML into SVG.

    Again, note that XML editors have been searching XML files for many, many years, so it is not like e-reader makers have to come up with something no one has ever dealt with before.

    Garson, in your two posts you’ve raised issues that have been overlooked in previous discussions. I hope you will participate in the discussions of the OpenReader spec once this preliminary version Jon Noring has mentioned is made public.

  6. Garson, I second Roger’s enthusiasm for having you involved! We need to consider all the angles, and you’ve come up with some very important ones. Jon is the real mark-up expert—I’m not—and I am confident he’d agree. Meanwhile let the discussion continue! David

  7. I have a couple of points to make:

    1) Although I have nothing against chemistry, MathML is more important than it and some of those other things that were mentioned. Every student is supposed to learn at least some math in high school, whereas only a few take chemistry. Virtually half of a typical university’s depts use math in their teaching. I could go on. Of course, MathML wouldn’t want to go up against SVG!

    2) We at Design Science are heavily involved in MathML and would love to see a universal browser/word processor/presentation/whatever application plugin model capable of handling the requirements for MathML display. We believe that these requirements are not that hard to meet and that such a plugin mechanism would be used by developers, open and closed source, to add support for many, many XML-based languages. I know that there will be some that don’t want to hear it, but Microsoft’s Internet Explorer has such an interface and we found it powerful enough to handle MathML (see our MathPlayer plugin). It is not standards-based, like most Microsoft stuff, but the basic functionality is quite sound. Let’s create such a standard.

    Paul

  8. MathML has different characteristics than a FlowchartML, and it’s already in place for any e-reader that leans upon the available browser rendering capabilities. So I don’t have any compunction in saying an OpenReader-compliant e-reader should be able to render MathML. Computer too, I guess.

    it’s only once we get past MathML that I suggest SVG be the route of commonality for standardized plugins.

  9. Thanks for the positive responses to my posts on this thread from Roger Sperberg and David Rothman. I appreciate the wonderful work that you do on this blog and elsewhere and will try to participate constructively.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.