I’m very pleased to share a new post by Paul Salvette. This time Paul will be talking about ePub 3 and how the new version of this popular file format can be used by authors and publishers.

For more helpful tips on ebook formatting, check Paul’s book How to Format Your Ebook for Kindle, Nook, Smashwords and Everything Else.

 

What the Heck is EPUB?


The EPUB format is an open source standard that defines how the content and metadata of an eBook should be packaged and how eReading devices and software should render the format to the reader. EPUB is the most widely-adopted standard for eBooks and is the format utilized by the Barnes & Noble NOOK store, iBookstore, Sony eBook store, and many others. The notable exception is the Amazon Kindle store, which is discussed below. The open source philosophy of web design and development has fostered a culture of sharing best practices and lessons learned online, and this has allowed developers to write new software and iron out bugs for web browsers that have greatly enhanced the internet in the last 15 years–making web pages evolve from the dull designs of the 1990s to the new digital reality of the 21st century. It is reassuring that eBook design and development is following a similar trend of openness and cooperation, and this will greatly benefit the reader’s experience and adoption of eBooks on an international scale.

Since eReaders and rendering software comes in so many shapes and sizes, an eBook cannot be organized like a fixed-layout print book. It needs to have reflowable content to accommodate differently-sized viewing windows on the reader’s screen (from a big desktop monitor to a tiny iPhone). That is why the EPUB format looks more like the source code of a website rather than an actual manuscript. You can head on over to my eBook formatting tutorial or buy my How-To guide if you’re looking for help on turning your manuscript into an eBook. Essentially, the entire EPUB format is a collection of XHTML and XML files (plus images) that defines the content, reading order, and metadata of an eBook. Alright, so what the heck is XML and XHTML? eXtensible Markup Language (XML) is a way to transport and store data, while XHTML is a specification for the language used for web pages based on XML that is being phased out in favor of HTML5.  Using this type of code allows eReading software and devices to render data on a wide variety of screens, and it’s great for standard fiction or non-fiction eBooks. This is what is widely used today – the EPUB 2.0.1 specification.

However, many readers want more than just the standard content that would be similarly available in a print book. Comic books, text books with complicated mathematical formulas, and the ability to read foreign languages that are nothing like English (Korean, Japanese, Thai, Arabic, etc.) are just a few features that the EPUB format being utilized today has trouble with. Also, the next evolution of the internet is HTML5/CSS3, which is slowly replacing XHTML and plug-ins like Adobe Flash. This will facilitate a better system to embed audio, video, and other media content. That is why the International Digital Publishing Forum endeavored to come up with a revised standard – the EPUB 3 specification.

What’s Can the New EPUB 3 Do for Me as an Author/Publisher?

The EPUB 3 specification was finalized in October 2011, and new software and eReaders that support this format will hopefully be out in 2012. Azardi has an EPUB 3 reader that is available for free if you are interested. The EPUB 3 specification allows for a variety of new features that EPUB 2.0.1 does not support, and they will provide great enhancements to the eBook experience. Here is a brief snapshot of some of the new things to expect:

 

 

  1. Audio/Video Support – Support of embedded audio and video using HTML5 tags is a feature that will make eBooks really stand out from print books. While the file sizes of the eBook itself will obviously become larger with embedded audio and video files, this will hopefully be offset by improved storage capacity of tablets and improved communications/cellular infrastructure to allow quick file transfer. Also, another benefit of embedded audio and video means better support for text-to-speech for readers with disabilities.  [spec]
  2. Mathematical Equations – For the huge textbook market, the current version of EPUB does not allow for representation of complex mathematical equations. Generally, an eBook designer has to embed the equation as an image inside the eBook, which bloats the size of the file and can make for a somewhat clunky reading experience. The new EPUB 3 supports MathML which is a way to encode equations in a similar fashion that web pages are encoded. This will help eBooks gain wider acceptability in the textbook market. [spec]
  3. Headers and Footers – It is standard practice for print books and PDFs to have headers and footers on every page outside of the body content (e.g. the author’s name, book title, page number, etc.) With eBooks, you often only see headers and footers based on what the eReader decides to put there (usually it’s some piece of information extracted from the eBook’s metadata). With the current EPUB format in use today, you can’t really define what should be a header and footer, because you have no idea what the size of the page will be for the reader. However, for the new EPUB 3 specification, there are new tags where the designer can actually define what should be in the header and footer of every page. [spec]
  4. JavaScript – JavaScript is a simple programming language that was designed for web pages in the 1990s. While the technology has been around for a while, the only place where it can be used inside an eBook is on the iBookstore. Even then it’s sort of confusing about how the current EPUB format is supposed to process it. With the new EPUB 3 specification, it defines how the eReading device is supposed to render it and what to do if there is no support for JavaScript. This will allow for all sorts of neat features in eBooks, which we have taken for granted on the internet, such as pop-up footnotes, rotating pictures, dynamic color changes, and everything else. This will provide a whole new level of interactivity and help give the children’s book market a much-needed jump into the digital world. [spec]
  5. Embedding Fonts and Advanced Text Styling – While you can embed fonts in the current version of EPUB, the new EPUB 3 specification dictates better support for this feature. This is critical for eBooks to gain acceptance in foreign languages. The EPUB 3 specification makes use of the advanced CSS 3 properties to allow better control of how text is displayed (note: CSS or Cascading Style Sheets are a specification for styling how content should be presented, and they are generally used in web design and eBook design). For instance, you can decide whether text should be hyphenated at line breaks or not. It should be noted that there is a lot of hand-wringing over how licensing and legalities for fonts works in eBooks, so embed fonts with caution. [spec]
  6. Better Graphics and Perhaps Animation – The new EPUB 3 specification allows for better use of SVG-type images. Scalable Vector Graphics are useful, because the quality doesn’t change as the image is made bigger. This is in contrast to bitmap graphics (like JPEG, PNG, GIF, etc.), which degrade in quality as the image is expanded or zoomed in on, since they are a grid of pixels. Additionally, the new CSS3 specification has support for creating animation, and you can see an example here (note: recommend not viewing with Internet Explorer). Hopefully, developers will create EPUB 3 rendering software that can recognize the powerful animation features of CSS 3.

It will be exciting to see what kind of software developers come up with to read eBooks written in the EPUB 3 format, and we will hopefully start seeing some this new generation of eBooks soon. More interactivity will allow eBooks to gain even further market share over print books.

How Does This Tie in with the Amazon Kindle store?

Amazon owns and still utilizes the MOBI/AZW/PRC format in their Kindle store, which is somewhat similar to the EPUB format. There are some very serious downsides to MOBI as opposed to EPUB, but Amazon continues to utilize their own proprietary file type, probably for various business reasons as opposed to technical. With the release of the Kindle Fire, Amazon announced a new “Kindle Format 8″, which boasts similar functionality to some of the new EPUB 3 features. Amazon has not informed the public of how this format will work yet besides a quick press release, but it appears that, like EPUB 3, it will support a lot of the things that HTML5 and CSS3 can handle. Luckily, Amazon provides their own software, KindleGen, which easily converts the EPUB format into MOBI for upload to the Kindle store. Hopefully, the conversion process from EPUB 3 to Kindle Format 8 will be equally as simple, but we have to wait and see.

Paul Salvette is an author who lives in Bangkok, Thailand, with his wife, Lisa, and newborn daughter, Monica. He grew up in the United States and served in the Navy from 2002 to 2009, with some time in Iraq. His day job involves working at a Thai foundation that focuses on poverty eradication, philanthropy, and education. He hopes to stay in Thailand until he is deported or dies of natural causes, whichever comes first.

Learn more about Paul at http://paulsalvette.com or follow him on Twitter @PaulSalvette.

(Via Password Incorrect.)

4 COMMENTS

  1. This is a nice, concise overview of EPUB 3 that I will cite to colleagues who would be put off by referring them to the standard itself. Your links to the new EPUB standard will satisfy both the skeptical and curious as well. Well done.

    What remains to be seen is how far we can go with CSS and Javascript. That seems to depend more on eReaders than the standard itself, no? I believe that iBooks is based on WebKit, the HTML rendering engine used by Safari and Chrome. So, what about the other eReaders both software-based and dedicated hardware? What do they use to render EPUB? I would expect that Webkit and Gekko (Firefox/Mozilla) would be well represented but I really don’t know.

  2. This opens up EPUB as a method of putting many textbooks and manuals into digital format – like the very weighty Logic textbook I have been hauling around for several months! PDF files don’t really work very well with small ereaders or IPhones.
    I think the possibilities EPUB3 opens up can’t really be imagined yet : language tuition books with built in audio or an instruction manual with video clips.

  3. @Andy, “JavaScript … Another attack vector for viruses.”
    Probably not so much since sandboxing is becoming de rigueur in mobile operating systems (following iOS) and even on desktop web browsers (following Chrome). The potential benefits of Javascript in EPUB are far greater than the dangers IMHO.

    I say “potential” because Javascript can be a great temptation to devolve into the trivial. Simply copying Javascript from web sites will not be sufficient. We need fresh new thinking as to how Javascript can enhance the reading experience instead of impeding it.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.