ePubWriter for small publishers and self-publishers? A TeleRead challenge to the open source community

image Over at Twilight Times Books, publisher Lida Quillen recently came out with "advance promo" editions of The Solomon Scandals in both E and P. I can't tell you how much of a bother it's been to get Scandals properly into different formats, such as ePub, HTML and PDF. Imagine having to worry about corrections as they'll show up in half a dozen or so formats, each with its own rude surprises. Even now the job still isn't done. Tech complexities are no small reason why we call the existing files "advance promo copies." Today's e-publishing tools, including pricey ones selling for hundreds of dollars, just don't work that well or fail to include enough capabilities. Oh how Lida and colleagues must hate the hassles of dealing with Word and RTF files so that paragraph breaks show up in the right places. And then there are other joys---sarcasm alert!---such as distinguishing between neutral quotes and the directional variety. Lida and colleagues are not at fault. It's the damn technology, which is still far, far more difficult and time-consuming to use than it should be. A challenge to the open source community With the above in mind, I wonder if the time hasn't come for the OpenOffice crowd or others in the open source community to consider designing a multilingual program from scratch for ePub creation and other publishing activities. The OpenDocument format has its purposes, but book publishing shouldn't be regarded as a major one. ePubWriter, as I'll call the proposed app, would offer all the capabilities of Writer but also output smoothly into ePub and HTML and, for printers, PDF. Ideally ePubWriter could even help deal with the inherent conflict between ePub (reflowable) and PDF (nonreflowable). Let there be an easy way to see exactly what the finished p-book will look even if the ePub version will be reflowable. And let writers be able to tweak to their heart's content while seeing the final results of their changes.

Avoiding cognitive overload

Yes, yes, yes, I like the semantic approach, but it adds to the complexities of doing WYSIWYG—the very stuff so dear to most publishers and writers. The best systems might ask users to set defaults to cover many situations; perhaps there could even be standard choices to cover common structures and meanings found in novels and all that. I don’t know. I just know that the more effort writers put into quasi-programming, the worse their prose is likely to be. Talk about cognitive overload! It’s a problem for writers, too, not just readers.

I find it disconcerting that so many brilliant technical people lack empathy with writers and other civilians and create programs for themselves rather than the world at large. Ideally ePubWriter would be different and would be coded with lots and lots of participation from small publishers—ordinary publishers, not simply the technically inclined ones.

Potential funders

OpenOffice.org is strapped for resources. But maybe, along with the small publishers whom the OpenOffice people would be be helping, literary houses included, the group can make a pitch to the MacArthur Foundation, the Mellon Foundation or similar organizations.

MacArthur, meanwhile, just may want reconsider its strategy and focus on ePub-related efforts; that is the agreed-on standard of publishers. Sophie, which received a million from MacArthur, in addition to Mellon money, has been extraordinarily valuable as a way to demonstrate such capabilities as shared annotations, but now it’s time for the nonprofit world to bring these concepts to mainstream publishing. Such efforts could prove helpful not just to small presses here in the States, but also in developing countries, where the Net could revolutionize publishing and library use in places where existing resources are sparse or nonexistent. That means powerful, well-integrated programs that publishers can use for content creation for both E and P.

Don’t force writers and small publishers like Lida—wherever they are on the planet—to adjust to technology. Let the tech do the bending.

18 Comments on ePubWriter for small publishers and self-publishers? A TeleRead challenge to the open source community

  1. Small projects are starting to emerge. I’ve heard good things about Calibre (http://calibre.kovidgoyal.net/), though haven’t tried it yet. And eCub (http://www.juliansmart.com/ecub) is very straightforward to use, though I’ve yet to test it thoroughly.

    Perhaps the best place to focus developers’ energies would be on the DAISY pipeline (http://www.daisy.org/projects/pipeline/), which is already well developed. DAISY and epub are very closely related.

  2. Thanks, Arthur.Those are intriguing ideas. Together DAISY and Calibre and other projects could form something bigger, though I still love the idea of building something around the popular OpenOffice. Meanwhile, let me try to track down a link to a piece reporting problems with the organization behind OpenOffice. I’m not going to gloss over challenges.

    David

    Update: OpenOffice.org problems mentioned here.

  3. OpenOffice currently allows save-as to HTML, Palm DOC (Aportis) and PDF formats. Adding ePub doesn’t seem like it would be too big a challenge. Go for it, David.

    I’ve been using Calibre recently and it appears to do the job–but my FBR reader program now crashes every time I open it.

    Good luck on resolving these problems.

    Rob Preece
    Publisher, http://www.BooksForABuck.com

  4. your not ever getting what i think you want not with epub as base format simply because the format will break in mamny unpredictable ways if you try to use any other aproach then the structural markup it was designed to support.

    All of the reflowable formats have that flaw in someway or another, it’s simply not really posible to get reliable wysiwyg editors for anything thats better at reflow then PDF.

    Sometimes features are mutually exclusive.

    Openoffice have a feature that allows people to develop xslt based export filters but it will always be a filter and when going from a rich metadata format like ODT to a light format like epub there will be loss and, if the source is to messy unwanted sideeffects will happen. Since epub is basicly just a rebranded xhtml 1.1 in a zip container theres not that much to it

  5. Without experience in the subject, I would think that OpenOffice would be sufficient, or close to sufficient, for writing. Perhaps all that’s missing is a write-up on how to use the tool with a minimum of semantic hints to produce clean p- and e- results. Issues might include:

    * Chapter headings that translate to heading elements/ePub anchors

    * Handling blockquotes

    * Use of soft hyphens

    * Quickly and easily verifying proper paragraph and page breaks

    * Handling character encoding issues

    As a technical person, I also feel required to speak up about the lack of “empathy”. In a lot of cases, it isn’t a lack of empathy that allows such holes in the community but rather a missing awareness, unfamiliarity with the discipline (I would posit that technical writers are far more likely to utilize semantic tools rather than WYSIWYG), and, most importantly, the lack of a “critical mass” of developers. Most people are going to be like me and question, “what’s wrong with the existing tools?”

    Who knows … maybe I’ll go play around with OpenOffice sometime and attempt to create a simple book that “just works”. If it does, maybe the MobileRead guys would accept a new Wiki entry.

    You were correct in identifying that OpenOffice needs an ePub plug-in, and maybe that is as simple as it sounds (assuming plug-ins have access to table-of-contents entries and the like). The task wasn’t made any easier by Adobe, though, with its unusual (although slightly reasonable) limits on file sizes (which also apply to the Sony Reader, by the way).

    One final note. While Calibre is a fantastic tool and the only reason why I ever considered the Sony Reader, it isn’t really an author’s tool. While an author/publisher could use Calibre to convert to ePub, it is the minimum-effort solution and the quality of the end result will reflect that.

  6. I wonder if an XML filter + an OO template would do the trick. In my experience (from a year or two ago), roundtripping with Docbook and OO was painful.

  7. Most of the existing tools are oriented towards format shifting, and I would include Word and Open Office add-ons in that category. There are already OpenOffice add-ons for FB2 and eReader ebook creation by the way.

    Calibre already imports OpenOffice ODT files. So one approach would simply be to collaborate with Calibre in producing an ODT style that Calibre can reliably format shift into ePub, MOBI, etcetera.

  8. @Daniel: It sounds as though you’ve given this some thought, and I hope you’d care to elaborate on your ideas.

    What features are mutually exclusive?

    Is there a technical reason you believe WYSIWYG editors can’t offer reflowable results or a social one? What’s wrong with proper use of page breaks (as opposed to hitting enter a number of times), font-relative margins/indents (as opposed to absolute distances), and the like? Is there any reason an author couldn’t write to one page size while an editor changes to the page size to something much smaller to correct such WYSIWIG mistakes?

    What about Apple’s iWork (which I’ve only heard makes semantic styling natural) or Word 2007 (which is an enormous leap over previous versions of Word in this respect)?

    What types of metadata loss important to the layout are you referring to when going to XHTML? I ask only because none immediately spring to mind.

    Thanks for your patience, Daniel.

  9. wysiwyg isnt wysiwyg unless the output format is static, and reflow depends on the output adapting dynamicly to reader specific condition, with PDF theres a finite set of conditions and this means you can get a true representation of what your user will see.

    Structural dont have to mean text based it just mean you have to accept the fact that you cant define pagebreaks since page size becomes a relative factor. The same happens to font and text size, my times new roman might be different then your times new roman, or worse i might not even have a font called times new roman. Every thing you do on your screen can end up looking a lot different on my screen.

    The problem here is that every reader have a different design and a unique set of limitations so you dont get real consitancy.

    theres two way of doing it, the one prefered by the clasical typography industry is to select what gizmo’s you support and then tailor the format to each and everyone of those. The other is the structured way where you simply put a tag saying this is a headline and this block of text is a section, and let the reader deal with it in it’s own way and reflow completely based on it’s own rules.

  10. We have just such an application. IGP:FLIP – (Front List Interactive Publishing) It allows a manuscript to be imported (.doc or .odt), templates applied, editorial processes to be carried out and at any time a PDF can be generated, HTML and ePUB. Alternatively it can also be used as an authoring and editing environment. It also outputs preconfigured packages for MS Reader, Mobipocket and Palm eReader.

    We are just setting up a public sandbox site for people to experiment, and for feedback, and hopefully kindly criticism. We are not quite ready for prime time yet and the launch is planned for 2nd week of January. If all goes well we intend to put up a site for registered users at no charge if they are making non-commercial books. We are looking at a low cost SaaS model for self, small and boutique publishers on a low monthly price.

    It is a fully blown publisher tool and designed to handle trade, academic and textbooks. It can be used simply, but it can also be pretty complicated and even includes eIndexing. We are a little behind on the tutorials so it may be a bit opaque at present.

    It uses a What You See is What You Mean interface, with predefined, and customizable templates for a lot of presentation subtlety in Print, different to eBooks and Online. For example margin and page floated images for print, stay in place, but present inline for Online and eBooks. You can instantly see a PDF or Online file (similar to the ePub output) at any time during the process. The whole thing uses XHTML (as XML) to allow XML agents to operate on the various components – although that may be getting a bit technical for this discussion.

    It is a relatively mature product and we use it daily with 150 people using it for commercial production for a range of activities including front list production, retrodigitization, text-book reflow and custom content assembly.
    If anyone is interested in trying it out they are welcome. It is working at present and the brave can see if they can make it work. The getting started tutorials will be up by the 10th Jan.

  11. IGP:FLIP may be a good product (I’ve not used it), but it will have a rough road among small publishers if its current pricing remains (starts at $650 per month). I know that the publishers for whom I do editing and composition work would not pay that kind of fee on top of their current production costs.

    As for OpenOffice, it’s a great product but not yet used by many publishers. There are lots of reasons, but I know that I tried to get some clients to consider it instead of MS Word, but Word is too-well entrenched.

    I would also add this: Although OO and Word can be used to format (compose) books, they are not really suited to that task, which is why there is the continued market for InDesign and QuarkXpress, among other DTP programs. FWIW, InDesign, beginning with version CS3 and now improved in CS4, includes export to Adobe Digital Editions and XHTML. Although not perfect, the export is a great start.

  12. I write simple-structured things: stories, essays. I don’t include graphics or charts or tables.

    My experience is that OO.org does fine. I either work in odt format or straight in html. when the draft is ‘finished’ (ha!) I save as html. This is the simplest possible html, and I strip out all extra stuff OO.org might add, so that I end up with entities of p, h1-6, blockquote, i, b. I never format paragraphs except to modify the style sheets of the document in question, and that’s only to make rewrites easier — all this formatting will be stripped out of the html master that comes next.

    From html I then manually search-and-replace to make a LaTex version, for pdf and .ps output. (A macro to do all the search-and-replace with one click would help, but so far I’ve been to lazy to make one.)

    I personally don’t mess with epub. I like html better.

    OO.org will output .doc (ms-word 97/2000) or .rtf (I’m not sure which implementation, probably the one current when MS-Office 2000 came out?)

    OO.org 3+ now allows for extensions and plug-ins; Sun is hoping that it will now be as easy to make an extension as it is for Firefox, and that more will be available. If you really want epub, the place to ask is among the OO.org community. But so far, I haven’t seen that individuals have taken up OO.org extensions in anything like the numbers of Firefox extensions.

  13. Richard, thanks for your analysis. I need to go out and so won’t visit the IGP site directly, but if the $650 per month continues, no, that won’t be very small publisher-friendly. It makes the Adobe solution seem not quite so pricey after all. Certainly it would put FLIP in a different class from what I had in mind, and from the impression that Richard gave! I’ll welcome his reply. Part of being a real epubWriter is to be free or affordable.

    I agree with you on OpenOffice, which is exactly why I proposed a spin-off for writers and publishers—a fully integrated solution for the masses doing basic formatting.

    Glad I phrased the headline as a question mark, LOL.

    Happy New Year to you and others!
    David

  14. Pond, perhaps a little foundation money would grow OO’s interest in ePub, lol. This is another example of the gap between open source people and the world at large. Thanks for your observations.

    HNY,
    David

  15. IGP:FLIP pricing. OK to clarify. the $650 price is for a dedicated hosted system, and is only going to be of interest to medium/larger sized publishers who need 20-30 editors/compositors on the system at the same time.

    For the small publishers we have (not yet announced, so I guess this is it!) a SaaS model that is $19.95 per month for 2 users. Based on our knowledge, even if a small publisher only needs to produce 1-3 books a month in PDF and ePub, this is a fraction of the cost of typesetting and then mucking around with unsatisfactory ePub generation.

    XML publishing is also not just about getting the formats out. The application also supports online authoring, editing and review. And will the work have to be done again in the future? An additional benefit of IGP:FLIP is that the XHTML uses a controlled grammar based on the microformats concept and it is capable of very sophisticated work (should it be required).

    For a slightly bigger operation we are looking at 5 users for $69.95/month, and may add one more price step for a 10-15 user operation, but that is definitely getting away from the small/boutique operator.

    Commenting on Daniel, Logan and Ponds comments. We are an open source user and Open Office is part of our standard software. In fact we use OOo just as Pond states for importing manuscripts as it is the best format translator around. doc -odt – xhtml. The problem is the translation from their descriptive typographical styles to structural styles is difficult to the point of why bother. (This is also a problem with InDesign and similar layout producers.) So we do exactly as Pond states – strip all attributes and values and only leave the core XHTML elements. We then put frontmatter, parts, chapters, etc. into name controlled divs and use CSS multi-selectors and a few inline styles to clean things up, and a novel type trade book falls out very easily. With a good text editor and a bit of practice, this doesn’t have to take more than an hour or so.

    Then it starts to get tricky. Generated TOC’s text & components, images, counters, indexes, flow and float control, etc. This requires a bit more than just XSL transforms usually, depending on complexity requirements. So we take the approach the tagged content should match the requirement for the most sophisticated format (arguably print), and all other format generation is about dumbing-down as appropriate.

    Probably the most important ingredient is the XML. DocBook and TEI are legacy XML’s from before XHTML could do everything. PrinceXML – a CSS renderer matches the performance of XSLT:FO for most real issues. XSL and CSS with XHTML does amazing things controlled structural XHTML styles grammar to ensure consistency and quality.

    There are still arguments for LaTeX and the like, but the problem is they isolate the authoring, editing and composition environments, into workflow rather than an XML state machine. Anyway that is an different discussion, and I have absolutely lost the plot on the reason for this post – $19.95 per month for small publishers.

  16. pond there is an TeX edition that can work directly with html or SGML/xml.

    The TeX engine when used right makes 10x better print output then anything, short of a couple of hundred man hours by a decent typegrapher

    I would not call tec-c and Docbook legacy with xhtml being superior docbook and tec-c does a more pure content seperation then 99.9% of all existing xhtml solutions, and theres far more tools for automatic tramslation(for instance directly to the TeX preprocessor).

    I think ive argumented for this before but in reallity it’s too damn hard to get the right output from a one size fit’s all solution so your almost forced to seperate writing and formatting system.

  17. There is already a good OpenOffice Writer plugin that exports to the DTBook format:

    http://odt2dtbook.sourceforge.net

    The DTBook document grammar originates from the DAISY open standard for Digital Talking Books, and it is used in the IDPF’s ePub fileset along with XHTML. Please note that the NCX navigation center also comes from DAISY.

    http://daisy.org

    For your information, the DAISY Pipeline is a great open-source tool for chaining format conversions (command line, GUI, and server-side):

    http://daisymfc.sourceforge.net

    Regards, Daniel

  18. http://www.smashwords.com appears to be a great “all in one” solution, but I have just discovered it so I have not used it yet. any thoughts or feedback?

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.

wordpress analytics