10-years.jpgProject Gutenberg was launched by Michael Hart in July 1971 to create free electronic versions of literary works and disseminate them worldwide. The project got its first boost with the invention of the web in 1990, and its second boost with the creation of Distributed Proofreaders in 2000, to help digitizing books from public domain.

Volunteers choose a book currently processed through the site to proofread a given page. People can proofread one page or several pages, as they wish. One page per day is a great goal. It doesn’t seem much, but with hundreds of volunteers it really adds up.

A website launched in 2000

Distributed Proofreaders (DP) was founded in October 2000 by Charles Franks to support the digitization of public domain books.
From the website one can access a program that allows several proofreaders to be working on the same book at the same time, each proofreading on different pages. The goal is to all work together – from any region in the world – to significantly speed up the proofreading process.

Originally conceived to assist Project Gutenberg (PG), Distributed Proofreaders is now the main source of PG ebooks. In 2002, DP became an official PG site. In May 2006, DP became a separate legal entity and continues to maintain a strong relationship with PG.

Distributed Proofreaders counted 3,000 ebooks “preserved for the world” in February 2004, 5,000 ebooks in October 2004, 7,000 ebooks in May 2005, 10,000 ebooks in December 2006, 11,950 ebooks in January 2008, and 18,848 ebooks on 15 October 2010. Production was 196 titles in October 2010.

How all this works

Volunteers register and receive detailed instructions. For example, words in bold, italic or underlined, or footnotes are always treated the same way for any ebook. A discussion forum allows them to ask questions or seek help at any time.

Each time proofreaders go to the website, they choose the book they want. One page of the book appears in two forms side by side: the scanned image of one page and the text from that image as produced by OCR software. The proofreader can easily compare both versions, note the differences and fix them. OCR is usually 99% accurate, which makes for about 10 corrections a page. The proofreader saves each page as it is completed and can then either stop work or do another.

The books are proofread twice, and the second time only by experienced proofreaders.

All the pages of the book are then formatted, combined and assembled by post-processors to make an ebook.

A project manager oversees the progress of a particular book through its different steps on the website.

The ebook is now ready to be posted with an index entry (title, subtitle, author, ebook number and character set) for the database. Indexers go on with the cataloguing process (author’s dates of birth and death, Library of Congress classification, etc.) after the release.

Everyone is welcome

New volunteers are most welcome, and will be guided by experienced volunteers for their first steps. There is a lot to do, and everyone is welcome.

As stated on DP’s website in 2005, “Remember that there is no commitment expected on this site. Proofread as often or as seldom as you like, and as many or as few pages as you like. We encourage people to do ‘a page a day’, but it’s entirely up to you! We hope you will join us in our mission of ‘preserving the literary history of the world in a freely available form for everyone to use’.”

Distributed Proofreaders Europe

Distributed Proofreaders Europe (DP Europe) was launched in early 2004 by Project Rastko in Belgrade, Serbia, as well as Project Gutenberg Europe. DP Europe uses the software of the original Distributed Proofreaders.

Since its beginning, DP Europe has been a multilingual website, with its main pages translated into the main national European languages by volunteer translators. DP Europe was available in 12 languages as early as April 2004. The long-term goal is 60 languages representing all the (main) European languages.

In May 2005, DP Europe finished processing its 100th ebook. Ebooks were in several languages to reflect European linguistic diversity. 600 ebooks were ready in February 2009, and 732 ebooks on 15 October 2010.

DP Europe supports Unicode to be able to proofread ebooks in numerous languages. Unicode is an encoding system created in 1991 that gives a unique number for every character in any language. Unicode is meant to replace ASCII, an encoding system dating back from 1968 that can only handle English and Latin in its “original” version, and a few European languages with accents in its “extended” version. In 2008, half of the files available on the internet were Unicode files.

Distributed Proofreaders Canada

Distributed Proofreaders Canada (DPC) started production in December 2007, to digitize and proofread ebooks for Project Gutenberg Canada (PGC), founded on 1st July 2007, on Canada Day, by Michael Shepard and David Jones. There were 100 ebooks in March 2008, in English, French and Italian. 250 ebooks were available in February 2009, and 427 books on 15 October 2010.

A few numbers

# Over 33,000 high-quality proofread ebooks in Project Gutenberg, in several languages. (Quality more than quantity is what really matters to the reader, believe me.)

# 18,961 ebooks processed by Distributed Proofreaders (the original website) on 1st November 2010, since its creation in October 2000, with 2,352 active volunteers in October 2010.

# 739 ebooks processed by Distributed Proofreaders Europe on 1st November 2010, since its creation on January 2004.

# 433 ebooks processed by Distributed Proofreaders Canada on 1st November 2010, since its creation in December 2007, with 127 active volunteers in October 2010.

A few links

Distributed Proofreaders,
the original project that just celebrated its 10th anniversary, if you would like to join them to proofread ebooks.

Distributed Proofreaders Europe,
launched in early 2004, with an interface in several languages, and books to proofread in several languages.

Distributed Proofreaders Canada,
launched in December 2007.

Volunteers are most welcome on the three websites – with no geographic requirements because there are no borders for beautiful literary works to be “preserved for the world”.

Copyright © 2010 Marie Lebert


The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.