Moderator’s note: Here’s a chance for TeleBlog fans to put on their detective’s hats and crack this fun mystery. Given the caliber of our most attentive readers, I predict that the right answer will appear very soon. – David Rothman

Sherlock HolmesThe topic of text alteration during the creation of e-books has been discussed several times in the TeleRead blog—for example, here.

To offer a new example, a popular e-book in the Project Gutenberg archive that has been altered by the deletion of textual material from the original book. This textual modification was almost certainly not deliberately performed by the Gutenberg volunteers. Indeed, it seems likely that they are unaware of the modifications to the original text.

Bowdlerization alert

The simplest explanation for the existence of the cuts in the Gutenberg text is that an altered edition was used for scanning and proofreading. The editorial goal of the cuts was not simplification. Instead the cuts appear to be a modern form of bowdlerization. You may wish to guess the name of this famous book. Here are three hints to help:

Hint 01: The book and its sequels inspired multiple films and even a comic strip.

Hint 02: In the 1930s a film inspired by the book caused a scandal by displaying full-frontal nudity. The reaction evoked by this movie helped to catalyze Hollywood’s movement toward self-censorship in the following decades.

Hint 03: There is a district in California whose name is derived from this famous book.

Not criticizing PG volunteers

Please note that I am not criticizing the superb work of the Project Gutenberg volunteers. But I do think that it would be wonderful if e-books created by scanning included careful descriptions of general provenance with edition numbers, publisher names, and dates. For the work under discussion, I think that an e-book based on the original would be a great addition to the Project Gutenberg archive and other archives. Of course, the archiving of e-books based on expurgated editions is also valuable for understanding cultural mores, taboos and more.

I will reveal the title of the book if it is not deduced. Also, if there is a modicum of interest, I can describe the cuts that were made to the original text and suggest why they were made.

Another moderator’s note: The TeleBlog welcomes reader contributions, especially gems like the one above. We’ll not use your real name if you prefer anonymity, just so your facts are verifiable.

Photo credit: Creative Commons-licensed photo by Cindy Andrie.

Update, Feb. 13, 2007: Garson O’Toole (formerly known by a different pseudonym, Garson Poole—after a character in a Philip Dick story) okayed the use of his now-standard TeleName. Earlier the byline read simply, “A TeleRead reader.”

16 COMMENTS

  1. _Tarzan of the Apes_ by Edgar Rice Burroughs. The original 1914 book edition was revised for the Ballantine paperback published in the late 1960s or early 1970s, with the goal of reducing black stereotyping and slightly moderninzing the prose. Similar revisions were made in many (but not all) of the following books in the series, in a hit-or-miss fashion that was totally misguided. Often elements that could cause offense were retained while innocuous uses of the word _black_ were deleted or changed to _African_ or the equivalent. Michael Hart has known about this for at least 10 years and was even supplied etexts of both the original and a variorum version.

  2. Assuming the first commenter is right, and that the mystery work is Tarzan of the Apes, does this mean the current PG etext version of this book is derived from the Ballantine paperback published in the 1960’s or 70’s?

  3. This issue–using a post-1922 edition as a source for a scan–is a real problem. It seems to be an argument for scanning only the original text. However, it is often 10 times more convenient to scan a 1970 version that claims to be the same than a 1910 version.

    One underlying problem is that editions don’t mention the type of changes they are making, whether they were minor or might affect the content itself. My IANAL understanding of derivative works is that the slight editorial changes do comprise a wholly distinct work, worthy of being copyrighted. But how can scanners know the changes that have taken place without being able to scan both and run diff?

    This points to another lunacy inherent in the concept of derivative works concept and the value of Bridgeman vs. Corel in determining originality.

  4. Access to pre-1922 editions of popular works is now much easier thanks to Google Books and Microsoft Live Search. Many treasures can be found there (and some problems in the scans).

    My IANAL understanding is that editorial changes have to be substantial to qualify for a new copyright. Besides, on anything pre-1989(?) a new copyright notice would be required to establish copyright.

    The problem is that PG (until recently) chose to ignore the provenance of the books it scanned. I would bet that with a bit of effort one could uncover similar problems in later texts of almost all the classics and semiclassics. (A good place to start looking would be O’Henry and Jack London, neither of whom were very politically correct. Save us from the pious editors!)

    I tried to post a link to “Tarzan Censored,” an article detailing some of what was done to Burroughs’ books. The blogger software bounced it to the webmaster for approval due to the link. It’ll probably show up eventually. I should point out that Burroughs used dialog for characterization, and that the use of the N word always occurs in the mouthes of villanous or otherwise disreputable characters. As to the Jewish villain in _Tarzan and the Golden Lion_, a good counterpoint is the very sympathically portrayed elderly Jew in “The Moon Men,” especially in the significally longer magazine version (which I believe is used for the PG text). Even more pertinent to his viewpoint is the postumously published _Marcia of the Doorstep_, where a very stereotypically Jewish villain is given his comeupance by an upright Jewish lawyer, who tells him something like, “I’d rather put someone like you behind bars anyday rather than a Gentile. It’s people like you who make all of us look bad.”

  5. Robert, I think this also shows the need in the digitization of the Public Domain to create “digital text masters” which are, text-wise, faithful to the original source paper books. These digital masters form a stable frame of reference; others can take the digital masters and create their own digital renditions for whatever purpose they want.

    The problem with the original PG philosophy is that there’s no requirement for textually faithful digital masters — or, alternatively, that anything can be a digital master. Thus, there’s no frame of reference to rely upon and we see in the PG collection an “everything goes” philosophy, which has led to a host of unforeseen problems since Michael Hart got PG started (this is understandable, one can never foresee everything when launching a project.) Now Distributed Proofreaders (DP) has added rigor to the process, and their work product is intended to be textually faithful to known source books. But DP did not come unto the scene until well after the most popular books were already digitized and archived at PG.

    Certainly some would argue that many of the public domain printings of classics are themselves edited from the original (whatever the original is!), and thus to require faithful digital masters (faithful to source printings) is silly. On the contrary, it is vitally important that we not add to the confusion — the well-known phrase “two wrongs don’t make a right” comes to mind.

    And certainly, for the most popular works which have multiple Public Domain editions, there’s no requirement that one has to choose which edition is “canonical.” Simply consult with experts (usually literature professors) as to which edition(s) of a Work are worthy of digital mastering, and eventually digitally master them all. (Meaning we can have more than one digital master for a given Work.)

    Several of us are exploring an independent non-profit project to create very highly accurate “digital text masters” of the most popular Public Domain works in the English language (including translations — note that this project can diversify and create digital master repositories in other languages.) And this would include building a digital master repository allowing a range of interactivity with the digitized texts, as well as to archive the original, archival-quality, page scans. Of course, we’ll generate a set of ebook versions from the digital masters, which will be useful in education and libraries (where provenance, accuracy and rigorous curation is important), and for general readers who prefer to spend their time reading known accurate and faithful texts rather than something of unknown provenance and faithfulness.

    This project would work with PG, Distributed Proofreaders, the Internet Archive, academia, the library community, and various other organizations interested in digitizing the Public Domain since there’s no need to compete — we see our role as being complementary with what they are doing. The project’s focus will be quite narrow, maybe 500 to 1000 Works in the next decade, while PG and DP are focusing on creating digital texts for everything under the sun, including quite obscure works. Our view is that the most important Public Domain works should get extra special treatment.

    In addition, as a bonus, having a large repository of highly accurate digital text masters to known (and digitally scanned) original source books has benefits in research, especially for improving:

    1. OCR recognition of older books,
    2. OCR-post-processing algorithms, and
    3. Various methods for proofing (e.g., online collaborative proofing like DP’s system.)

    It is difficult to know (and thereby predict) the error rate of dedicated processes to produce digital texts without having some trusted accurate text with which to compare. Projects taking page scans and using OCR to auto-generate digital text which will not be human proofed certainly want to reduce the OCR error rate as much as possible. Thus, we see the “Digital Masters” collection as becoming a comprehensive test suite to improve the quality of digital text auto-generated from millions of books and other kinds of textual works in the not-so-distant future.

    Of course, those interested in this project, contact me privately, jon@noring.name. We are looking at innovative funding and revenue models, all the while working to make sure the work product is freely available to the public. Should we get this going, it will be built on a sound business plan with proper governance and startup-experienced management team.

  6. Munango-Keewati – We’ve had problems with spam (hundreds of thousands of spam comments). Alas, the anti-spam Dobermans sometimes eat things they shouldn’t.

    Feel free to email me (drNOSPAMteleread.og) what they deleted—and to email me in the future if this happens again.

    Sorry about that!

    Also, I hope you’ll email me about becoming a regular contributor to the main part of the blog. – David

  7. Jon,

    The “Digital Masters” project sounds quite worthwhile. The most difficult task would be to establish the most authoritative paper editions and translations. I hope you’ll consider including a wiki mechanism to allow knowledgeable lay people to contribute to this effort.

    A great deal of expertise lies outside the academic community, among literary hobbyists. As an example: With Burroughs, the most authoritative texts were the first edition hardcovers published during his lifetime, and later hardcover reprints that used those same printing plates (A.L. Burt and early G&D). However, one book (_The Mad King_?) scrambled a paragraph on the first or second page of the first edition, which was corrected in the next printing. Similarly, there’s the apparently author-approved changes to _Tarzan and the Golden Lion_, which have appeared in all U.S. editions subsequent to WWII.

    In general, Digital Masters probably should be based on the author’s last revised edition (when known) or the first edition. Bibliographical notes should be included in each etext; obvious mistakes should be corrected, with those corrections noted at the end of the volume (as PG sometimes does).

    (Please excuse misspellings in previous posts; obviously I’m becoming too dependent on my spellchecker, which isn’t available here.)

  8. Munango-Keewati wrote:

    The “Digital Masters” project sounds quite worthwhile. The most difficult task would be to establish the most authoritative paper editions and translations. I hope you’ll consider including a wiki mechanism to allow knowledgeable lay people to contribute to this effort.

    Definitely!

    We want to involve, on an equal footing, interested lay people in addition to “credentialed” academics. For every popular Work there is almost always a significant number of lay experts whose knowledge rivals that of the professionals. We should definitely leverage all expertise.

    In addition, involvement does not stop with source book selection. We envision that lay and professional involvement will continue after a digital master is released for a Work. For example, we want to create an annotation repository.

    And to answer your other comment. Note again that the project will NOT pick one particular source of a Work and call that “canonical” and ignore all others, but rather will identify all the sources (if more than one) that are considered worthy of digital mastering. So a particular Work might have more than one digital master. This is especially likely for translations, where each translation has its own unique character.

  9. Munango-Keewati Said,

    My IANAL understanding is that editorial changes have to be substantial to qualify for a new copyright.

    This is not the case, at least not under U.S. law. As explained in Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991), the touchstone for copyright protection is creativity. Any changes, however minor, qualify a work for a new copyright so long as the changes demonstrate some modicum of creativity, that is, the alterations “cannot be so mechanical or routine as to require no creativity whatsoever.”Thus, if someone were to decide to alter a text so that all upper-case letters were replaced with their lower-case equivalents (because it gives it an “artsy” feel), even though the changes might be quite extensive the new work would not qualify for copyright protection, because the changes were merely mechanical or routine. On the other hand, if someone were to alter a text by removing any ostensibly racist passages the new work would (as a whole) qualify for copyright protection even if the changes were slight, because the decision as to what to excise does involve at least a modicum of creativity.In the second case we are left with a work which has two copyrights: the original author’s copyright, and the subsequent editor’s copyright. The original copyright may lapse, or the original author may choose not to enforce it, but the second copyright remains in force, and the new copyright holder may chose to enforce it. Of course, the new copyright only applies to the altered work, and does not revive the original copyright.You may be confusing the scope of copyright with the scope of fair use. The doctrine of fair use holds that you may legally make copies of copyrighted material, even against the wishes of the copyright holder, if the copies are made for purposes traditionally allowed, such as literary criticism, political debate, or parody. In analysing whether any particular act of copying is fair use, the amount of the original copied is a relevant (but not determining) factor.In all likelihood, the versions of the Tarzan series published by Project Gutenberg are in violation of a current valid copyright, although it is extremely unlikely the current copyright holder would choose to enforce that right.

  10. Hello. I am the “TeleBlog Reader” that crafted the original question posed above about bowdlerized texts together with the posting that provides the answer of “Tarzan of the Apes”. Congratulations to the delightfully astute Munango-Keewati for discovering the answer. This extended discussion is wonderful. Thanks to Munango-Keewati, Jon Noring, Robert Nagle, Lee Passey and all for thoughtful and insightful comments. I would like to add one comment about Edgar Rice Burroughs treatment of race. Following the precedent above, the comment will be in the form of a quiz question.

    Question: In “Tarzan of the Apes” there is a tribe that Tarzan attacks and that is portrayed as very cruel, but there was another group that was crueler. Who were they?

    Answer: Here is the relevant quote from the book:

    To add to the fiendishness of their cruel savagery was the poignant memory of still crueler barbarities practiced upon them and theirs by the white officers of that arch hypocrite, Leopold II of Belgium, because of whose atrocities they had fled the Congo Free State—a pitiful remnant of what once had been a mighty tribe.

  11. Lee Passey:

    Thanks for the copyright citation. However, this case involves the copyrightability of the arrangement of facts in a telephone directory and does not seem to pertain directly to what we are discussing. Here’s my source:

    “To be copyrightable, a derivative work must be different enough from the original to be regarded as a ‘new work’ or must contain a substantial amount of new material. Making minor changes or additions of little substance to a preexisting work will not qualify the work as a new version for copyright purposes. The new material must be original and copyrightable in itself. Titles, short phrases, and format, for example, are not copyrightable.” –Copyright Registration for Derivative Works, U.S. Copyright Office (available at www dot copyright dot gov/circs/circ14 dot pdf)

    I was not trying to suggest that the revision in question wouldn’t qualify for a new copyright (some changes do seem significant), only that minor changes might not qualify.

    You wrote: “In all likelihood, the versions of the Tarzan series published by Project Gutenberg are in violation of a current valid copyright, although it is extremely unlikely the current copyright holder would choose to enforce that right.”

    No, the PG editions are probably not in violation of copyright law, because the bowdlerized edition was published when it was still necessary to include a copyright notice to gain protection. I no longer own a copy of the editions in question, and the search mechanism at loc.gov is not available right now, but I am reasonably certain they did not contain an updated copyright notice. If they did, that should have been a red flag for PG. As I recall, the only copyrights for the novel were the original 1914 and the renewal.

  12. I haven’t checked the original of PG’s _Tarzan of the Apes_, though I have checked a hardcopy of _The Story of Dr. Dolittle_ in the bowdlerized 1960s version that PG uses, and there was no new copyright notice for the changes (or indeed, any note that the text had been changed at all). I suspect that it was not all that common to re-copyright quietly bowdlerized texts, so I suspect PG is most likely safe for Tarzan as well. (The notice requirement wasn’t dropped until 1989.)

    I do have links to both the 1914 and the 1980s Ballantine texts of _Tarzan_ on my website.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.