E-book formatting for authors: Reader contribution by Smashwords’ Mark Coker

Thanks to Mark Coker, head of Smashwords. We welcome other reader contributions. If you have something you'd like to submit, feel free to send it to paulkbiba@gmail.com. reader contribution.jpgOne of my many joys of running Smashwords is working directly with authors every day who share my passion about the promise of e-books. Their feedback, dreams and frustrations are what guide our development. The biggest challenge these authors face getting their book into e-book form is that they're held hostage by their previous conceptions regarding how a book should be formatted. Traditional print formatting is very forgiving. If you use space marks or tabs instead of indents, for example, as long as the words are arranged where you want them on screen or in your PDF, the book prints reasonably well and all your bad formatting habits are forgiven. E-books aren't so forgiving, because for the most part, formatting is the enemy of good e-book formatting. If my statement sounds circular and nonsensical, allow me to elaborate.


In the e-book realm, authors must abandon the notion of the “page.” Pages have no meaning in e-book form, because pages become amorphous shape shifting creatures depending on the e-book reader; the reader’s choice of font size, font style or line spacing; or in the case of the iPhone, whether they’re holding it vertically or sideways.

When the notion of page disappears, it creates other problems for traditionally formatted books. The page numbers in your table of contents or index become meaningless. Your artificial page breaks, made via the common bad habit of multiple paragraph returns, create blank pages. Your forced page breaks disappear.

The secret to good e-book formatting is to keep it simple: A paragraph return at the end of a paragraph, a proper indent at the beginning of the paragraph, a couple paragraph returns between each chapter, things like that.

For long form narrative books, which is what most people read, readers buy books for the words, not the formatting. Don’t let your formatting get in the way of the words.

For helpful formatting tips, read the Smashwords Style Guide.

29 Comments on E-book formatting for authors: Reader contribution by Smashwords’ Mark Coker

  1. Anonymous Coward // December 7, 2008 at 11:23 am //

    Mark, you are getting it wrong. The notion of page is not going anywhere. If you want to be able to quote a book, you need an universally agreeable page number. For TOC/picture index/tables index, you need it too. It may take one or more screens of iPhone or the device of your choice to display a page, but the page is here to stay. Take a look at Sony’s implementation of ePub to get a clue what I’m talking about – just pressing “Next” doesn’t necessarily getting you to the next page. To the next screen, positively.

  2. Very timely. I just griped about rotten eBook formatting yesterday:
    http://mikecane2008.wordpress.com/2008/12/06/you-will-want-to-buy-ebooks/

  3. I’m all for better formatted ebooks! Too many ebooks I’ve seen are just one line after the next with no breaks and you barely notice a new chapter has begun. I’ve seen tables spread all over the place with no effort made to make the content understandable. I’ve seen links that go on forever because of poor html. I’ve seen them with footnotes and you don’t even know it til the end because there was no effort made in the text to announce their existence. Often someone overrides the reader’s options for justification, color, font by hard coding them into the book. LOTS of books don’t have tables of contents, or don’t have them linked to the software, which is extremely important to me on my Cybook.

    This is not only free ebooks. I have paid for several books with terrible formatting. I bought a book about ancient Italy and the map was about one inch square. A Cybook will display an image nearly as large as the screen. I can’t read a one inch square map. Yes, I do look at the maps provided in the front covers. It helps me understand the story, just like it’s meant to.

    To me, good formatting adds a lot to the reading experience. I’d like to see more people taking it seriously.

  4. Oh, and all these author sites where they accept the first file of a certain format and ONLY the first are ASKING for the worst formatted one. Better formatting takes time.

  5. Marcus Sundman // December 7, 2008 at 11:55 am //

    > A paragraph return at the end of a paragraph,

    Yes, paragraphs should be marked. Whether it’s by some weird escape code before/after each paragraph, or by surrounding each one with (<p>) and (</p>) is up to the format used.

    > a proper indent at the beginning of the paragraph,

    I have no idea what this even means. How things are indented are format-specific, but even so the actual indent should only be a default that can be overridden. Under no circumstances should spaces or tab-characters be used for indenting text.

    > a couple paragraph returns between each chapter

    No, no, no. Paragraphs are paragraphs and chapters are chapters. You must not “simulate” chapters by using empty paragraphs. In fact, empty paragraphs should be forbidden altogether, since there is no such thing really.

    Formatting in ebooks should be made just like in modern, style-based word processors. I.e., the author/publisher/whoever should mark which parts of the text is what. Then the reader software can display TOCs, footnotes, hyperlinks, chapter-breaks etc. as it sees fit. The author can provide wishes (or “hints”) of what defaults should be used all else being equal (or even different defaults for different screen sizes), but it should always be possible to override these.

  6. Smashwords prefers to “grind” its ebooks from a DOC or RTF file format:

    For best results, upload manuscripts in Microsoft Word .doc or .rtf format.

    That said, a simple hard return at the end of a paragraph creates a paragraph in the output.

    A “proper” paragraph indent is the use of the “first line indent” as opposed to using a tab.

    A couple of hard returns at the end of a chapter will create enough space that you get the “feel” of a chapter break.

    You can’t put in p-tags or any other xhtml tags in a word processing document and expect it to work.

    The tags that create TOCs and tables and forced page breaks are all (X)HTML codes that aren’t going to cooperate with a system that converts straight from a DOC or an RTF.

  7. Amazon’s Kindle compiler has one nice feature–after doing the initial conversion pass on your source file, it allows you to download and tweak the XHTML it cranks out. I’d like to see Smashwords (I’d pay for a software package that did this) take a basic XHTML document and produce the same output in all the formats. To simplify things, there could be an option to predefine a handful of tags, for example, title and chapter heading (preceded by a page break).

  8. It is very unfortunate that Smashwords advocate so many bad practices:
    – formatting isn’t the enemy of good formatting at all: you just need to rely on more semantic elements and less style elements
    – the concept of the page is important and a real support for paged-media is the next step for e-books if we want to create rich layouts that can adapt themselves to any screen (support for footnotes for example, which are quite different from endnotes, or multi-column layouts for newspapers)
    – people should avoid at all cost the idea of separating chapters using empty paragraphs: instead they should properly indicate that there is a chapter, which will also be useful to create a table of contents

    Services that rely on direct conversions (Amazon DTP, Smashwords) from DOC or RTF are crossing the line between simple and simplistic. The semantic markup is the most important aspect of a source format, both authors and publishers need to understand this as soon as possible.

  9. Amazon’s Kindle compiler has one nice feature–after doing the initial conversion pass on your source file, it allows you to download and tweak the XHTML it cranks out.

    Eugene, I wasn’t able to get this feature to work and, quite frankly, I found the whole Kindle process to be more of a PITA than any of the other formats I did—and I provided the XHTML document. It didn’t honor many of the tags the help file said it would and I had to reformat a good bit of it to reflect the tags it would honor.

    It is possible I wuz doin’ it rong, but I couldn’t find a faster/easier way to do it and get even close to the look I wanted.

  10. The XHTML I got back from the Amazon DTP was pretty much the same as what I was using, so that wasn’t an important step. I ran it though Mobipocket Creator until it looked the way I wanted it to, and didn’t notice any differences between what displayed in Mobipocket Reader and the Kindle preview mode (other than screen width). I haven’t looked at it on an actual Kindle, though. BTW, both Mobipocket Creator and ReaderWorks recognize the CSS attribute “page-break-before” and Mobipocket also has the proprietary tag “mbp:pagebreak.”

  11. Some excellent discussion here, thanks. I should probably clarify a few points.

    1. I’m advocating we move in the direction of simpler formatting, so that more books can be satisfactorily read on more devices and platforms than is currently the case. Project Gutenberg has done well here, IMHO. Their books may not be perfectly formatted, but they are eminently readable anywhere.

    2. My post is as much an indictment against bad Word processing habits as it is against unnecessarily complex formatting.

    3. My comments do not and cannot apply to all forms of writing. There are many types of books that are unreadable without rigid formatting. Those types of books will be slower to reach mass market adoption in ebook form.

    @ Anonymous Coward: Pages cannot persist in the ebook realm unless the world agrees that a page can only consist of a fixed horizontal and vertical dimension and a certain number of words. I think we agree location of information is important, but rather than location being defined as a page it should be defined as where that information can be found.

    @ Christine: yes, we all want good formatting that can add to the readability and enjoyment of the book. Too often, however, technologists develop elaborately complex solutions that fall flat and prevent readability. PDFs, for example, offer a horrible and inflexible reading experience unless the formatting is critical to the readability or printing of the content. Often, the content found in PDFs would be more readable if displayed as simple text or HTML.

    @ Marcus.. re: indents: I agree 100%, but keep in mind I work with self-published authors. The number one problem we see is authors using spaces or tabs for indents. re: how formatting should be done with style based (or, as Hadrien proposes, semantic) formatting: Yes, it would be wonderful if all books were created that way, and if all that intelligent formatting could translate into all the different reading formats and reading situations. I’m just not optimistic we’ll get there any time soon.

    @ Hadrian: Keep in mind, our focus is to take a single file and translate it reasonably well into multiple DRM-free ebook formats. We don’t strive for perfection, nor do we aim for mediocrity. The challenge we all face, especially as we see more and more works introduced from citizen authors, is that it’s difficult to divorce “the way people create” from “the way people *should* create.” The tips we offer our authors help them create a good looking multi-format ebook with minimal effort.

    Thanks all.
    mark

  12. Marcus Sundman // December 7, 2008 at 4:05 pm //

    > There are many types of books that are unreadable
    > without rigid formatting.

    Just out of curiosity, what might these be?

    > The number one problem we see is authors using
    > spaces or tabs for indents.

    I authors themselves don’t know what’s best for them then the publishers could help them by requiring documents to be in e.g. LaTeX. It’s very easy to (learn how to) do basic semantic markup with LaTeX, and usually faster than with wordprocessors.

    > formatting should be done with style based (or, as
    > Hadrien proposes, semantic) formatting

    Actually I meant the semantic aspects of modern style-based wordprocessors, not the styles as such. Hadrien and I are completely on the same page (pun intended).

  13. Marcus Sundman // December 8, 2008 at 1:06 am //

    > The notion of page is not going anywhere.

    I certainly hope it is, but apparently not fast enough. Unfortunately there are too many stupid people stuck on illogical notions of how something “has to be”.

    > If you want to be able to quote a book, you need an
    > universally agreeable page number.

    No, you don’t. Why on earth would you think one needs a page number for that? Have you even thought about it, or is it just some odd gut feeling you have?

    It’s obvious that page numbers are utterly illogical to use for pageless formats. Logical alternatives are letter numbers, word numbers and paragraph numbers. One could also combine any of them with (hierarchical) chapter numbers. (And just add a suitable SI prefix whenever such a number gets too big. E.g., if L = letter, then 1000 L = 1 kL and 1000 kL = 1 ML etc.)

    > For TOC/picture index/tables index, you need it too.

    No, you most certainly don’t. TOCs and other indices are of course direct links in any sensible digital format. Again, have you ever even thought about any of this? I know you have used the web since you managed to write that comment, but you seem to be utterly clueless about digital text.

  14. Bob Martinengo // December 8, 2008 at 9:04 am //

    Marcus,

    Get a clue yourself. If you think people are going to be more likely to accept your arguments if you call them stupid, you’re stupid. Just make your points and keep the snottiness for your ‘friends’.

    Thank you!

  15. Anonymous Coward // December 8, 2008 at 12:33 pm //

    @Mark:
    > Pages cannot persist in the ebook realm unless the world agrees that a page can only consist of a fixed horizontal and vertical dimension and a certain number of words.

    The world seems to agree on that part, Mark – you’re trying to change that. As I have mentioned, the whole academic research spins around being able to quote things in order to verify someone’s research. Again, I’m not inventing anything here – take a look at Sony Reader. It *can* reflow PDFs. When reflowed, it *may* take up to five *screens* of Sony device to show that particular *page*, but a page in PDF is always that page with universally quoteable page number.

  16. Anonymous Coward // December 8, 2008 at 12:35 pm //

    > Unfortunately there are too many stupid people stuck on illogical notions of how something “has to be”.

    The whole academic community that works with notion of research is used to quoting other people’s work using page numbers. If you take page numbers away, you’re all on your own convincing all of these “stupid people” that they need to adapt a different way of quoting each other’s works in order to make their research independently verifiable.

    And yes, calling people “stupid” when they express views different to yours really tells more about you that anything you say.

  17. Bob Martinengo // December 8, 2008 at 1:33 pm //

    Pages may continue on even when a document is born digital. According to a variety of sources, the accepted average word count for a printed page is 250 words. So, using 250-word ‘pages’ as markers, like mileage signs between cities, provides a handy way to judge distances between where you are and where you are going.

    For example, if a novel is published online and is 10 chapters and 100,000 words, it would be easier to think of it as 400 pages, even though it may never see print.

    Now, cant we all just get along?

  18. Bob Martinengo // December 8, 2008 at 2:41 pm //

    Here is a nice article on designing digital documents that is relevant to this thread:

    http://www.bookbusinessmag.com/article/digital-directions-does-design-matter-digital-distribution-175953_1.html

    Digital Directions: Does Design Matter in Digital Distribution?
    By Andrew Brenneman
    Oct 1, 2008

    An important characteristic of digital content is its ability to deliver to multiple platforms simultaneously—to print, Web and mobile channels. Invariably, the same content will look different when viewed on various output devices, and it should. Each device has its own display characteristics, and the design of the presentation should be optimized for that device. [… more …]

  19. For born-digital books, the standard bibliographic information in a footnote could be followed by: “search on ‘search string'” instead of the page number. Especially with resources like Google Books, that kind of footnote would be a lot more useful.

  20. Marcus Sundman // December 8, 2008 at 7:06 pm //

    > If you think people are going to be more likely to
    > accept your arguments if you call them stupid,
    > you’re stupid.

    I agree, and I don’t think that. (I suspect my “rudeness” is an expression of frustration caused by my helplessness against an overwhelming stupidity in the world. I don’t really dislike even grossly stupid people as such, but I do hate stupidity.)

    I didn’t call any particular person stupid. If you yourself think you are one of the “stupid people stuck on illogical notions of how something “has to be”” then you are calling yourself stupid.

    However, if you truly get hung up on ad hominem arguments like “You are rude, therefore you are wrong.” then you are indeed one of those people.

    > The whole academic community that works with notion
    > of research is used to quoting other people’s work
    > using page numbers.

    Oh, c’mon! Different universities, journals, proceedings, etc. use different formats for references and bibliographies. Heck, even different faculties within a university often use different formats. And the formats vary after what the targets of the references are. E.g., now when papers refer to webpages (which are inherently pageless) they seldom include any page numbers, and when they do it’s usually more out of ignorance than anything else. (A webpage included as an appendix in a paged format will obviously have pages that can be referred to, but then you’re referring to the paged appendix and not the webpage directly and thus that doesn’t count in this context.)

    Very few universities, journals, conferences, etc. have already decided how to format references to ebooks, which are inherently pageless. Thus when they do make that decision they wouldn’t be changing anything if they decide to use e.g. paragraph numbers or letter numbers or just settle for chapter numbers for now. If some of them do decide to go with page numbers for some pageless format, such as webpages or ebooks, then that would indeed be a very stupid decision.

    Stupidity is not far off when cluelessness reigns, and unfortunately a large portion of the old academia suffers from total cluelessness regarding digital media. I know professors who can’t read their own email, but have assistants to print out messages on paper and afterwards type in handwritten/spoken responses. I know professors who think the web is a 1-way channel like broadcast TV/radio. These people have unfortunately got stuck in an earlier epoch (often partially without them even realizing it).

    As these people just barely lost the vote to keep an artificial “scroll number” back when pages were a new thing we can always hope they might have learned something from that. (Clarification: This paragraph is mostly a joke.)

    > calling people “stupid” when they express views
    > different to yours

    You are either ignorant or lying. I have never, ever in my life (as far back as I can remember, which excludes my first 5 or so years) called someone stupid for expressing a view that is different to mine!

    > using 250-word ‘pages’ as markers […] provides a
    > handy way to judge distances between where you are
    > and where you are going.

    ‘Paragraphs’, ‘words’ or ‘letters’ (with SI-prefix as needed, e.g. “kilowords”) are just as good, except for the fact that some people are not used to them yet. (Some are, though. E.g., some people are often given the task of writing an “N word essay/article” or even an “N letter essay/article”.)

    After thinking about it for a good 5 seconds I’d say that I prefer paragraphs for references (since then there’s a (remote) chance it’d work even with translated versions), and letters for length (since then the actual size is not so language-specific (although it’d still be specific to the type of grapheme used, so maybe there’s something better that would work similarly even with asian, logograph-based languages)).

    Getting used to something like this is a non-issue. If you read/write much then you’d get used to it in no time flat, if you sometimes read/write you only need to know how to approximately convert to whatever metric you’re more familiar with, and if you hardly ever read/write then it doesn’t matter whether you’re used to any particular metric or not.

    If someone does decide to name and use some arbitrary number of words then I would highly recommend against using “page” or any other related word that already has a specific meaning. Having pages of different “pages” is just asking for trouble, and will cause needless confusion.

    > For born-digital books, the standard bibliographic
    > information in a footnote could be followed by:
    > “search on ’search string’” instead of the page
    > number.

    That is orthogonal to the “paged vs. pageless” issues.

  21. A mobile reading device doesn’t have to have a small screen; e-ink screens will soon (~5 yrs) become foldable/rollable so that, when expanded, they’ll provide a reading surface as big as a letter-sized paper.

    A page may not be the most logical division, but it certainly is a very psychological one. (Those who disagree perhaps don’t read much except blogs).

  22. Marcus Sundman // December 11, 2008 at 2:37 am //

    > e-ink screens will soon (~5 yrs) become […] as
    > big as a letter-sized paper.
    > A page may not be the most logical division, but it
    > certainly is a very psychological one.

    You are completely missing the point. It doesn’t matter what size some particular screen is. The relevant fact is that the same content will be shown in different text sizes, perhaps with different fonts or line heights and probably also on screens of different sizes. So, some cross-device “page size” metric is no longer the size of your actual page, but some arbitrary number of words, perhaps relative to the default font size the author/publisher specified and perhaps with different heading weights or chapter-related page-breaks, or somesuch arbitrarily chosen parameters that are more or less unrelated to the screen of some device.

    So, if you’re talking about this arbitrarily selected combination of parameters for constructing an artificial page metric then it’s in no way related to what is shown on the display of some particular device at some particular time. OTOH, if you’re talking about a page actually shown on the screen of some device at some particular time then the term is meaningless for other devices or even the same device with other settings.

    These facts are simple to understand, and as far as I can see there is no other logical conclusion than that using “page” as a metric is illogical and misleading (and thus probably counterproductive).

  23. Apologies for a slight twist to your post about a page not being a page. This has to do with what to actually put on the… page.

    I’m a novelist, and all I’ve been reading is that fiction ebooks don’t sell. True? Why?

    Thank you for any insight into this matter.

  24. Fiction e-books don’t sell? I expect that would come as some news to Fictionwise. And, for that matter, Amazon.

  25. Actually, anyone who has seen legal documents, and many government documents, which may go through many revisions, will know that “page” numbers are less important than section and paragraph numbers.

    Each section can contain an arbitrary amount of information, from a few lines, to a zillion piles of (possibly mind-numbing) data.

    Pages may mean less and less, and in my opinion were never a terribly good way to indicate where in a book we were talking about (since each size and edition of a print book can end up with the same information ending up with a different page number).

    But chapters? Sections? Bookmarks?

    Those are rad.

    When I read an ebook, I could care less what “page” I’m on. But I do jump around between chapters, and section markers, and bookmarks.

    And I use hyperlinks, etc.

    We don’t need to come up with a “standard” page size in order to communicate effectively about an electronic book. That’s just silly.

    We just need to tag (and perhaps anchor) content throughout the book, with human and/or machine readable tags. A naming convention might be nice, but even that wouldn’t be strictly necessary.

  26. Marcus Sundman // May 14, 2009 at 4:35 pm //

    > But chapters? Sections? Bookmarks?
    >
    > Those are rad.

    Indeed.

    > When I read an ebook, I could care less what “page”
    > I’m on.

    You probably mean that you couldn’t care less.

    > We don’t need to come up with a “standard” page size
    > in order to communicate effectively about an
    > electronic book.

    True, but it would be terribly efficient if people would agree on some sensible way to communicate about locations in books.

    > We just need to tag (and perhaps anchor) content
    > throughout the book, with human and/or machine
    > readable tags.

    1) That only works for stuff you can modify.
    2) Two versions of the same text can be tagged in conflicting ways.

    If we’d use paragraph, word and/or letter numbers it’d work with pretty much everything, and without the two problems above.

  27. A lot of controversy here, but I am looking for practical guidance in this field which is new to me. As an author, I would like:
    a) To create a work I am currently writing as an ebook from the start. Can I include footnotes, font variations etc. or not?
    b) To transform into an ebook a work already published which contains table of contents, footnotes, illustrations, varied paragraph formatting and an index. (I have it all in one PDF file.) Is this feasible now or in the foreseeable future?
    I have recently tried out the BeBook reader, sold in Europe (where I live and work). This would do none of the above things. It is not clear to me from the above discussion whether other readers currently available would be better in this respect.

  28. I love this page. it started as a discussion about ebook formatting and ended up as two children screaming at each other calling each other stupid. in the meantime, poor michael tracy is innocently still searching in the dark since july 09 for some practical guidance on creating ebooks. sheer comedy gold.

  29. Well, I had completely forgotten about this discussion, but I see no-one has answered my queries – so thanks to the last commentator for pointing this out.
    Fortunately there are other sources, so I am not completely in the dark. I hear that Kindle is now available in Europe, and the launch of Ipad may perhaps encourage e-book reading.

Leave a Reply

wordpress analytics