From Project Gutenberg e-text to PDF-based e-book for your Iliad


Modern E-ink based reading devices try and simulate the printed book. Where older devices lacked the definition to make typography something to worry about much, the resolution of devices such as the Iliad is high enough to make readers crave well laid out e-books. Luckily, with Project Gutenberg's HTML versions and Writer good looking PDFs are a matter of minutes.

The following tutorial may also be useful for those who wish to turn an e-book into a p-book.


Over the years it has been suggested that Project Gutenberg should preserve the page numbers of the books it scans. This, coupled with preserving edition statements will allow students of texts to determine exactly which artifact they are citing.

But preserving page numbers in e-books somehow feels wrong. E-books are liquid; they are not tied to archaic concepts of pages. And so Project Gutenberg stubbornly kept discarding page numbers for a long time.

Enter e-paper based reading devices. Especially Irex want their E-ink based reader, the Iliad, to emulate the printed page. The page flipping bar is designed exactly so that reading a text on the Iliad feels like flipping pages.

In a delicious twist of irony, the Distributed Proofreaders recently started to post e-books to Project Gutenberg that retain page numbers. That is unfortunate, because typically the pages of your new PDF version will not map exactly to the original pages; you will want to remove the page numbers.


In order to turn an HTML-based e-text from Project Gutenberg into a PDF file for your Iliad, you will need the following tools:

  • A web browser with which to download the book you’d like to read
  • Writer

Whew! Writer is part of the package. I do not believe you can download it separately.

You may be able to use other word processors. In this tutorial I will be using the following functionality:

  • the spell-checker, with dictionaries for the relevant languages
  • settings for page sizes, page margins and page numbers
  • “Search for Styles”
  • style editor that retains the custom styles from the HTML file
  • PDF export

Step 1: acquire the e-book

Go to Project Gutenberg (PG) at and download your e-book in HTML format. Not all of PG’s e-books are stored as HTML — about 60% aren’t. For the sake of this tutorial, I’ll assume that the book you want is a PG e-text with a HTML version available.

PG has a search engine that will let you find books stored in a certain format. However, it is slow, and with thousands of books available in glorious HTML, there’s not much chance you will find what you are looking for this way.

For this tutorial I will use H. Beam Piper‘s “Little Fuzzy” as an example. The book is fairly simple, it contains no illustrations, page numbers, footnotes and so forth. However, most of the techniques described below can also be used for more complex books. For instance, you can remove page numbers by using the Search for Styles function.

Sometimes PG’s e-books have an external stylesheet. Be sure to store as “Complete web page” instead of “Just HTML” in those cases.

Start Writer and use the File / Open menu to load the e-book. Writer can edit HTML files.

It is possible to copy an e-book from your browser window and paste it into a new Writer document, but that has the disadvantage of discarding custom styles.

Upon loading the file, immediately File / Export… it as a Writer document (.SXW). If you don’t do this, will keep treating the file as a web document. A disadvantage of this is that some of the Writer functionality will be unavailable to you.

Close the HTML document and open the Writer version.

Step 2: remove the PG header and footer

Any given PG e-text contains a lot of legalese, both at the front and the back. These are known as the PG Header and the PG Footer, and you will likely want to remove them. They outline the usage you may make of the PG trademark. So remove the PG header and footer. They are clearly marked as such. You may also wish to remove information about which volunteers worked on a book, which is typically considered part of the book by PG.

Step 3: set the language

Go to Tools / Options …. Select the Language Settings / Languages item. Select the default document language, and check the For the Current Document Only box.

[screenshot of the Language Settings dialog]

Step 4: get the basic lay-out right

Whatever you do in this step, don’t start applying font changes and so on just yet. Use this and the next step first to make sure all the styling of the HTML document is preserved, but in a form you prefer.

Go to Format / Page…. Regulars at Mobileread, a site that has a lot of Iliad owners posting to its forums, have found the following settings useful:

Setting Value
Format User
Width 13.00cm
Height 17.00cm
Margin left 0.60cm
Margin right 0.60cm
Margin top 0.50cm
Margin bottom 0.50cm


[screenshot of the title page]
Illustration: Little Fuzzy’s title page at this point.

Step 5: fixing styles

You now have an e-book suited more or less for the physical format of the Iliad screen. However, you may not like the way the default styles look. Here are some suggestions for changing styles.

Since I imported this text from an HTML original, all textual elements are associated with a certain style. You can add, edit and delete styles by opening the Styles and Formatting dialog: Format / Styles and Formatting…, or press F11. A document can contain many styles, and Writer makes it easier to navigate through them by organizing styles in categories. The two categories you will be concerned with are Custom Styles (the styles the PG volunteers added to the document) and Applied Styles (the styles that are actually used in the document).

[screenshot of the bottom part of the Styles and Formatting dialog]
The style catalog selector is hidden all the way at the bottom of the Styles and Formatting dialog.

Writer’s Edit / Find & Replace… has a Search for Styles option under More Options. Check the associated checkbox and the search field will turn into a style selector.

Search for each available custom style to see what it looks like. Make notes or screenshots, in case you will edit a style later on and want to reverse a choice.

The example document is optimized for the web. Paragraphs are in large letters and are separated by empty lines. These features are not necessary for a printed book or for an e-book to be read on an e-paper device. So let’s change these.

Leave the Find & Replace dialog and conjure up the Format / Styles and Formatting… dialog once more. Choose the Applied Styles catalog. Select the style for regular paragraph text, right-click on it, and in the resulting context menu choose Modify….

The paragraph style for PG’s Little Fuzzy is called “Text Body”. First change the font to one that is less web optimized. In this tutorial I will pick that old chestnut Times New Roman, at a size of 10.5 points. PG’s Little Fuzzy uses Georgia, which is a pretty letter but optimized for a relatively low-quality screen.

Next move to the Indents & Spacing tab. Set the Spacing Below Paragraph to 0,00cm. Also set Indent First Line to 0.97cm. With a margin-hugging document like this, you may also wish to increase the line spacing a tad, or use a larger font. Experiment until it feels right.

Changing the “Text Body” style has also changed all its dependent styles. This is unfortunate, because longer pauses — signified by more whitespace — have now disappeared. The next step is therefor to reintroduce a vertical space in the “Text body.spacedTop” style.

Open the Modify dialog for “Text body.spacedTop”, select the Indents & Spacing tab, and set the Spacing Below Paragraph to 0.50cm.

With every modification, check and make sure the results are as you want them.

Step 6: page numbers

Go to the top of your document. Select Insert / Footer / All. The cursor should jump to the bottom of the page. Now select Insert / Field / Page Number. As you’ll notice, all pages now have page numbers.

Select the first page number, and click the Align Right icon in the Formatting toolbar. All page numbers will now rest against the right margin.

Give it the once-over

Check the document for problems. The spell-checker’s red squigglies may help you locate trouble spots. You can manually adjust the style of just one paragraph, word or even character by selecting the phrase you want to edit, then choosing the relevant Format menu. Make sure all non-ASCII characters, such as accented letters, em-dashes and curly quotes are displayed correctly.

Export as PDF

Select File / Export as PDF.

Step last: you are done!

[screenshot of the title page]
Illustration: Little Fuzzy’s title page at the end of the road.

Go read the book! Here’s an example PDF.

This tutorial is generic enough to be used for other e-readers or even for if you want to produce a PDF for a printed book.

The boring tail

Disclaimer 1: I do not own an Iliad or any other reader that requires paging. (I own a Palm Zire, and am quite content with the lack of “pages”.)

Disclaimer 2: I would not know “pretty” if it hit me in the face. If you want pretty e-books, I suggest you apply your own good taste. With the above tutorial I hope to have handed you the tools to do just that. At the Mobileread forums there are people with very strong opinions about what looks “good”.

Disclaimer 3: This tutorial is probably woefully incomplete. Please add your own tips in the Comments section or on the Mobileread Wiki. Also, Lulu‘s forums might provide you with typesetting hints for p-books that apply equally to e-books for the Iliad.

Tip: acquaint yourself with your tools. Writer is a program of huge complexity that lets you do all kinds of book-like things with your document.

17 Comments on From Project Gutenberg e-text to PDF-based e-book for your Iliad

  1. Ultra-useful post, Branko—thanks! I myself, in fact, while owning a Librie, used OpenOffice’s conversion and save/as or export capabilities to create PDFs in the right size. I probably mentioned it in the TeleBlog, but not in the same detail you did. Thanks again. David

  2. What is the point of removing the Gutenberg header and footer? I understand moving the legalese to the end of the document – Gutenberg is already doing that in newer releases, with only a brief header, and I’ve done it myself so I don’t have to wade through it every time I open the document.

    However, If you actually *read* the legalese, one of the requirements/requests is that all that the Gutenberg license language remain attached to the document. As a happy Gutenberg user I believe its a fair tradeoff for free-as-in-beer, non-DRMed texts to give credit where credit is due and leave the Gutenberg information attached.

    Even if you (that’s the general, not accusatory “you”) know where the text came from, you never know who you may end up sharing it with, and leaving the Gutenberg header/footer text in the document allows that, and also constitutes proof that the text is indeed in the public domain.

  3. What is the point of removing the Gutenberg header and footer?

    That way you remove uggly crud that diminishes the enjoyment of a good looking book.

    Also, if you had actually *read* the legalese, you would know that removing the license from the document is an actual requirement in some cases.

    If you actually *read* the legalese, one of the requirements/requests is that all that the Gutenberg license language remain attached to the document.

    I must have missed that bit. Please show me where it says so.

  4. Agreed, if you feel you are creating a derivative work, you must remove all mention of PG. On the other hand:

    1.E. Unless you have removed all references to Project Gutenberg:

    1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg-tm License must appear prominently whenever any copy of a Project Gutenberg-tm work (any work on which the phrase “Project Gutenberg” appears, or with which the phrase “Project Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed:

    This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at

    And also

    1.E.4. Do not unlink or detach or remove the full Project Gutenberg-tm License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg-tm.

    1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg-tm License.

    IANAL, so I may be misunderstanding, but it does seem to ask that users leave the license agreement in place. And I reiterate, I move the full license agreement to the end of documents myself (when I’m not feeling lazy and just download and read as-is).

    I guess I don’t see the brief headers in the newer texts as all that ugly, either, any more than the usual copyright/publisher information found in printed books.

  5. Indeed, as you can see from the bit you quoted, if you remove the trademark license, you must remove all references to Project Gutenberg. If you leave in a reference to Project Gutenberg, you must leave the entire license intact.

    Which is exactly what I did: I removed the entire license, and all references to Project Gutenberg.

  6. Has the sample file been removed? When I try all I get is a 404 error from a server unrelated to

  7. Er, more like I forgot to check whether it was available in the first place. I am not at my computer right now, will check when I get home.

  8. I fixed the link, or so I hope.

  9. you don't know me // September 23, 2006 at 6:49 am //

    Step 2 is very rude.
    If you take something for free from somewhere, you don’t delete the references. Branko’s arguing from 2:39pm is nonsense.

  10. Does anyone agree with the anonymous coward?

  11. Technically, the whole point of a Gutenberg book is that it’s public domain. And the whole point of public domain is that you can do anything you want to with the text. That’s what public domain is.

  12. you don't know me // September 24, 2006 at 7:47 am //

    The books at Gutenberg are public domain. “A Gutenberg book” in that context is strange wording. I’m not affiliated with the Project Gutenberg; though, I read about conflicts with PG in this blog. I don’t care about this conflict. But it’s really bad manners to explicitly point out in a howto like this to delete the PG header/footer. At least, those in the PG header did the scanning, checking & they provide it for download. for example makes nice PDFs including a reference to PG. Those PDFs don’t look ugly.

  13. Presumably Project Gutenberg has a copyright on its trademark license. You aren’t just allowed to include it. There are rules you must follow, and if you do not want to follow those rules, but still want to copy the book, you MUST remove the license.

    It would seem that YDKM would prefer me to break the law.

  14. you don't know me // September 24, 2006 at 12:06 pm //

    I don’t want you to break the law! But am i right that Step 2 is all about your quarrel with PG?
    I thought this howto is about creating a PDF for the ILiad, and not some rant about something. That’s why I think Step 2 is just not necessary, especially this picky phrasing.

  15. My quarrel with PG? What is my quarrel with PG?

  16. As one of the people who did much of the PG work on Little Fuzzy I have to agree with Branko. You can’t move the PG header around, or do anything with it other than completely remove it. You can leave the PGDP credits line in if you want as it doesn’t infringe on the PG trademark issues. If you want to do anything commercial with the text you must remove the headers or execute an individual license with PG.

    Greg Weeks

  17. Re-reading these comments a couple of months later, I see that I may not have been clear about everything, so allow me to elucidate.

    First of all, I have no quarrel with PG. There are some bloggers at Teleread who have, in the past, maintained a critical attitute to Project Gutenberg, mainly because they saw room for improvement. There is nothing wrong with that, and it’s not the same as having a quarrel.

    Anyway, I am not one of them. I myself am a Project Gutenberg volunteer; I may not be as prolific as I sometimes would like to be, but I still have shepherded somewhere between 2 and 5 books back into the public domain.

    I do have a quarrel with some other PG volunteers, and I do sometimes wish that PG and DP would not treat their volunteers as disposable so easily, but I guess them’s just the breaks of working for a loosely bound, internet based volunteer organisation.

    There is another thing Greg Weeks touched upon: the credits line. I agree with those who say that credit for volunteer effort is important. Greg says you can leave the credits line intact. So why did I remove it from the sample document? I did so because I am not convinced that the people mentioned there would like to be mentioned in a non-PG version of the work. At DP there are many people who simply do not want to be mentioned in the credits line in the first place; because of false modesty, or because they do not want to see their name plastered over the internet in places where they do not have control over it, or because they don’t want to create the impression that they’ve been working on a book in their boss’ time, and so forth.

    Credit is a fragile thing; you should give it where it is due, but you should also make sure that those who receive it, wish to receive it.

2 Trackbacks & Pingbacks

  1. MobileRead Networks - Tutorial:Convert Gutenberg texts for the iLiad
  2. Idiotprogrammer » Blog Archive » Ebook Creation Links

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail

wordpress analytics