Plucker slideshowHow big should an e-book file be?

I asked myself this question after trying to troubleshoot a problem reading a Plucker file. The Plucker desktop lets you spider URL’s to a certain depth, so that means you could potentially deal with humungous files if you’re not careful. (Plucker Desktop actually offers lots of options for controlling file size).

But what is a manageable file size? How big can an ebook be without taxing the constraints of the device or software or user experience? This question is complicated because ebooks are viewed on two different kinds of platforms: miniature devices and desktop/laptop based devices.

For the laptop, RAM and disk space are no longer hardware constraints. I’ve opened pdf files that are several MB without problem. On my main desktop, I have nearly one terabyte of storage and 3 gigs of RAM. It seems like overkill right now (I’m editing video with it), but in two or three years it will probably seem underpowered.

Portable devices still face significant hardware constraints, at least in comparison to desktops. My Nokia 770 has 1 gig MMC memory card but only 64MB of onboard memory that the user can use. My Dell Axim had 128MB.

In the days of dial up, the optimal size for a web page used to be 30-40KB, although nowadays blogs and longish articles frequently exceed 100 KB (for the record, the main teleread page is 178KB + 253KB for graphics). Actually, for me, the limitation is not size of web page but how many tabs in firefox I can leave open.

E-books are typically binary files for offline reading. Look at the list of formats and file size for Cory Doctorow’s Eastern Standard Tribe . Some of the more esoteric formats can exceed 500 KB, but the popular ebook formats are in the 200-300 KB range. (The pbook version is 224 pages and contains 49,000 words).

But Eastern Standard Tribe is a relatively small book, and my .imp version only has a single grayscale graphic. Many Gutenberg books are 1000+ pages, and each volume of the Mahabarata on blackmask is 2-3 MB. That’s without graphics. But what if you dealt with an ebook containing lots of multimedia?

First, images. How many images should an e-book contain? How big should the graphic files be? Second, audio, embedded flash and video. PDF and apparently dotReader have the capability to embed these multimedia objects. Good, yes, but if dotReader is ported to a PDA/phone, I’m guessing the bloat would render the e-book unusable. Still, the dotReader site notes for dotReader software “a notable memory management feature is the ability to display content only as it is being called up (as opposed to loading an entire document and then displaying it).” That sounds promising (and that may address hardware limitations), but it still does not help trim file size.

Ok, we can agree that sound/flash-enabled e-books are useful only in special contexts. But we’ve already grown accustomed to an online reading experience saturated with all kinds of graphics. Indeed, one significant advantage e-books would have over pbooks is its ability to display graphics without significantly adding to the sticker price of the book. If a 224 page p-book had an illustration every three paragraphs (which is typical for blogs), you’re probably talking about coffee table book prices. If the e-book version had an illustration every three paragraphs, it may not cost more; but how much would it increase the size of the binary file? (and how convenient and practical is it for the user to save it on a memory card?)

I am a content creator working on a DIY e-book, and have to admit I am clueless about how many graphics ought to be in an e-book or how compressed the graphics ought to be or how big the final binary ought to be. Content creators need better guidelines (or benchmarks) about ebook output and file size. Not only do they need to figure out things like accessibility and screen real estate, they need to know what file size to aim for and how to how to optimize for low-end devices without them looking crappy in higher-end devices. Obviously, these benchmarks are going to remain a moving target. For web designers, though, it was relatively easy to work within low-bandwith constraints. You knew to fit everything on a page to 40KB, while providing a link to high resolution versions of the graphics. The difference between designing for a browser and designing for an ebook reader is that a browser loaded individual resources. All the graphics and multimedia remained available on the web even if they didn’t automatically load. But would it make sense for a zipped ebook to contain high resolution and low resolution versions of the same graphic?

    To summarize, here’s what I’d like to know:

  1. When creating e-books, what should the optimal file size and resolution be for raster graphics?
  2. Do certain e-book platforms (such as PocketPC/Microsoft Reader or Palm/Plucker or Nokia/FBReader) have practical limits on the file size it can open?
  3. How big will a typical Sony Reader e-book be? (and how will it vary according to graphics).
  4. How many e-books is a typical reader comfortable dealing with on his portable device? 5? 10? 50? 100? How many is the minimum the reader would expect to have on his memory card?

9 COMMENTS

  1. Hi

    just compare the iPod model with the DiskMan model.
    The DiskMan model said: take the CD you want to listen to, load it in your DiskMan and there you go.
    The iPod model on the oether hand said: load all your music library and have it all the time with you.
    This model worked out fine…
    I think ebook reader manufacturers should follow this model and aim to keep all the users library on the device, always accessible.

  2. One feature of OpenReader, and the similar OEBPS, is that the textual content of a publication may be split up into a number of files. For example, each chapter in a “linear” book may be contained in its own content document. This partitioning of content allows limited resource user agents (ebook reading software) to only need to load in a chunk of text at a time, yet the overall presentation will be seamless as if all the content is in a single document.

  3. Hi,

    The size of the file is less important than the way the device/software manages it. On my Nokia 770, my biggest txt file is about 12 Mb and I read it with Fbreader from a zipped directory and it loads a little bit slower than others, but acceptably so, while navigation is as fast as the others.
    I also have lots of scanned books that I read as jpg’s cut to 800×480 (either full page or half page) and embedded in a blank html, the biggest being about 98Mb and they load and read as fast as a regular book since Fbreader reads them a page at a time, while even a 1mb pdf with evince is not readable due to slowness, scrolling, zooming…
    On my Ebookwise1150 I have the same books (the jpg’s being now 318×448 and all cut to half pages due to lower resolution, embedded in html and converted with librarian to imp), though in imp format and even the biggest at ~100 Mb is as fast as any other book, just that due to card limitations, I have to use more cards than on the Nokia where a 1Gb card is acceptable.
    So size is not the determining factor, but fast rendering and acceptable storage.

    Liviu

  4. One obvious solution – allow the reader to decide if THEY want to view graphics, hear sounds, etc, and configure the eBook file(s) accordingly. It’s done routinely on the Web. In fact, if it’s not made part of standard e-reading software then I suspect that someone will soon manage to hack it in.

    Jon.

  5. > But would it make sense for a zipped ebook to contain high resolution
    > and low resolution versions of the same graphic?

    IMHO it would be better to use a multi-resolution capable format, such as jpeg-2000.

    > When creating e-books, what should the optimal file size and resolution
    > be for raster graphics?

    That’s a really hard question. If the images are small then they might look as good as possible now but will look bad in the future when the displays are better. If the images are large then they will look good in the future but the files will be too big and/or slow to use now.

    > How many is the minimum the reader would expect to have on his memory
    > card?

    As much as possible. 🙂 I have a ton of dictionaries, reference manuals and academic literature that I’d like to have on my e-book device(s) at all times. I think that the absolute minimum number of titles that my e-book device would have to fit is 100. It sounds feasible to me to keep the average file size below 10 MiB, so 100 titles should easily fit on a 1 GiB memory card. However, if books will start containing many more pictures then the file sizes would increase drastically. Hopefully we’ll have bigger memory cards by then. 🙂

  6. > One obvious solution – allow the reader to decide if THEY want to view
    > graphics, hear sounds, etc, and configure the eBook file(s) accordingly.
    > It’s done routinely on the Web.

    Yes, but when you switch off images, sounds, animations, etc. on your browser then the browser won’t even download the image, sound, etc. files and thus the webpages get smaller. This won’t work as well with e-books, since the images, sounds, etc. would have to be included in the e-book file anyway, and thus be using memory even if they are not used. However, even if the files are equally big regardless of the device configuration it would still be possible to save a lot of CPU time (which might even be more valuable than memory) if the reader chooses not to process/display all contained media elements.

    I’m sure your guess, that e-book software that support images, sounds, etc. will be able to easily switch off these additional media types, is correct. At least if the software is not made by microsoft.

  7. Robert,

    Complex topic.

    Lots of readers already only load parts of the document into memory — there’s no need to wait for OpenReader. The PalmOS reader for Plucker does this, for instance, only loading the “records” that the user is currently reading.

    On the subject of file size, 2GB is an interesting size boundary for a single-file format. Files larger than that aren’t handled properly by lots of libraries that are used in standard desktop/laptop applications. One can get around that by using Apple-style multi-file documents which are in reality folders containing lots of sub-documents. Most systems handle this properly. It’s the approach I took with the document format design for UpLib. UpLib “books” can be very large; the UpLib version of Christoph Schiller’s Motion Mountain physics text (1253 pages) is a little over 1 GB. The book Common Lisp the Language, Second Edition (1097 pages) is 295 MB.

    Aside from that, network bandwidth is the determining factor for lots of document usage. If you can’t mail it to someone else as an attachment, because it’s too big — that’s a real determining factor for lots of people.

    But would it make sense for a zipped ebook to contain high resolution and low resolution versions of the same graphic?

    Plucker does this — a small version of the image for smaller devices, a larger version for large screens. UpLib stores multiple rasterizations of each page image — one at 300 dpi or better, one at “screen size”, one small page thumbnail with a big page number on it. It can also store more versions for special uses.

  8. I think you should make a clear distinction between fiction and education reading here.

    Fiction books should come with minimal pictures (give them a “cover” for easy recognition) and small file sizes, 100-500k depending on the lenth of the book/series.
    The determining factor for me is the reflowability (to use the PDF term) of the document. Making a format that has to be zoomed nd scrolled is a waste of time in my opinion.

    Educational reading is somthing that can eat up a lot more size. Files should be compressed, images made “scalable” so that they can be viewed on every screen size withouth either losing detail (small screen) or smoothness (large screen).
    The max. File-size here is difficult to realize…if you’re talking about black and white/grayscale eink screens, you could put the limit around 20MB, if you use full-color illustrations, you coulöd end up with files that exceed 100MB even when compressed.

    Devices with colour screens will have to come with more built-in memory (ram) and should use fast and cheap cards (like CFII) for external storage. More RAM would also come in handy.
    For the greyscale readers, 1GB should be overkill and the speed of processor and memory should be less important.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.