Why (and How) I Scan Old Books

September 24, 2012

909

It was Flann O’Brien, under his byline of ‘Myles na Gopaleen’ in The Irish Times, who wrote: “When I want to read anything, I usually write it meself.”

I know this because I have The Best of Myles (1968), published in paperback by Picador, which I found second-hand at a church fête some weeks ago. A quick search of e-book sites reveals that there is, as yet, no other way to read it than on paper. All yez Kindlers, Koboists and Androghedans, as O’Brien might have described you, will have to find some other book. But not me. In a few minutes I will be readin’ Myles on me own blessed tablet, as happy as Larry. Because I am about to scan the book to PDF.

The history of my book-scanning attempts would fill a small book of its own. Flatbed scanners, cameras on tripods, cameras in plywood frames, and combination printer/scanners all played a part, but last year I bit the bullet and bought a dedicated Epson GT-S50 double-sided sheet-fed scanner. My scanning speed went up by a factor of four, and the scanning error rate fell by the same proportion. From a laborious and complex chore requiring over an hour, scanning a typical book became a twenty-minute job. I didn’t understand at the time why a dedicated sheet-feed scanner should cost three times as much as a printer with a scanner attached. Now I do.

The book is 400 pages long, so my first act will be to take a sharp knife, a steel rule and a cutting board, and divide it into three sections of about 130 pages each. Aligning the page edges of the non-bound side of each section, I will then trim off the binding about four millimetres from the edge, giving me a stack of loose sheets. Before scanning I will leaf through these one at a time to ensure they are loose, and not still connected in any way; then I will stack them into piles of about sixty sheets (120 pages). Starting the Epson Scan program that came with the printer, I will enter a name for the output file, a size for the scanned pages, and an output format of PDF. Text enhancement and automatic OCR are already switched on.

I will insert the first set of pages into the sheet feeder, and top it up with the others as they begin to run out. At roughly thirty sheets a minute, it will take about six minutes to process the book. Orienting the pages and performing OCR takes a couple of minutes, and assembling them into a PDF file another minute or so. At this point I will have a perfectly legible PDF version of my paperback.

But, being a perfectionist, I will open it up in a PDF editor and tweak it a little. This involves straightening the pages, cropping some of the page margins, and re-scanning any pages that have been missed or gone badly askew. Newer and more expensively-made books rarely give scanning problems, but older, cheaper ones sometimes do. I also run off a compact RTF copy of the text alone. The book, in PDF and RTF formats, is then ready to move into my Calibre collection. (Next week’s article: Accessing Calibre on PC from an Android tablet.) Sensitive bibliophiles may want to skip the next part, where I throw the used pages in the recycling bin, and pretend instead that I preserve them for posterity.

If all goes well, the process will be over within half an hour, and The Best of Myles will be sitting on my Android tablet instead of on my desk—portable, searchable, back-up-able, and still highly readable. As O’Brien himself crowed in similar circumstances: ‘Do you mind the cuteness of me?’

11 COMMENTS

Yoda47 September 24, 2012 at 9:55 am

PDF, eww. Not re-flowable, and a much larger file size tan ePub. Not to mention it being a proprietary format.

At least with OCR it can be converted easily.

Log in to leave a comment
Bree September 24, 2012 at 10:30 am

Thanks for sharing about the Epson scanner – I wasn’t aware Epson had sheet feds. I’ve been happily using a Fujitsu ScanSnap, which I bought 2 years ago for scanning receipts and bills in order to reduce paper storage.

Thanks to 4 1/2 years working at a academic library, I’ve developed allergies to paper dust or mites or whatever it is that lurks in older books. So I have to be REALLY motivated to deal with them. I’ve only scanned a handful of older mass market paperbacks, but it’s gone reasonably well. I’ve had some jamming with the 1980s paperbacks, as they tend to have texture paper that grabs funny. I dislike disassembling the paper book, even with a guillotine trimmer outdoors, its a bit messy and tedious.

I’ve OCR’d and converted one of them to ePub, but that was tedious, too. I’d rather spend my time reading than converting and proofing.

So now I just read them in PDF, Yeah, PDF is unflowable, but the mass market PDFs display perfectly on the iPad – even more readable than they were in original form.

And yes, I recycle the books, too. I don’t know anyone who wants a chopped up paper book.

Log in to leave a comment
Steph September 24, 2012 at 12:57 pm

Sounds a bit like archaeology – the act of accurately recording the information destroys what was originally there.

Log in to leave a comment
Richard Bohn September 24, 2012 at 5:09 pm

Can you suggest a service that can scan an older book without sacrificing the binding? It may be mostly sentimental, but I think this is a valuable book in my collection!

Log in to leave a comment
Greg M. September 24, 2012 at 7:11 pm

I don’t like PDF files. I wouldn’t destroy a book to make one. It would be different if there were an easy and reliable OCR method. But a book is better than a PDF.

Log in to leave a comment
Jon Jermey September 24, 2012 at 7:51 pm

Yoda — OCR has become astonishingly accurate in the last few years. Most PDF books can now be OCR’d and proofread into an acceptable text copy in a few hours — less with practice.

Which makes the typos and other mistakes that turn up in many eBooks even more infuriating, of course.

Log in to leave a comment
psteve September 24, 2012 at 9:49 pm

Wow, one of my favorite books. Would love to have an epub of it. In a Vernor Vinge novel, whose name I forget right now, they have a huge machine that cuts the bindings from books and sends the page through a wind tunnel and as the pages turn over and over in the wind they are automatically scanned. This machine is eating libraries. Quite an image!

And I agree with others about PDFs; a sad format.

Log in to leave a comment
Paul StJohn Mackintosh September 25, 2012 at 5:36 am

Regardless of the process, that’s got to be one of the best choices for a new ebook in years. Making the best of Myles available for the ebook era?! What a crusade!

Log in to leave a comment
Jim Martin September 25, 2012 at 12:51 pm

A huge and not costly time-saver in trimming the binding of any book is to take the book(s) to a larger print shop in your community and ask them to use their knife-edge trimmer. This is a large guillotine-like machine that makes one fast, clean trim and it’s done.

Smaller print shops sometime don’t have the heavy-duty trimmer, hence my suggestion to use a “large” print shop. The cost has been from free to a dollar or two for several books.

Log in to leave a comment
KingPin April 4, 2013 at 10:54 pm

I would not destroy a book to scan it.
There is a different between book and contents.
After some searching, i found this
xcanex

ANyone tried it?

Log in to leave a comment
Adele February 10, 2014 at 11:56 am

Wow that is time consuming and cumbersome! I embarked on a similar project but outsourced the scanning, OCR conversion, with final output in .epub with a cool NY-based company – Boundbookscanning.com
Really good quality output and affordable prices!

Log in to leave a comment

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

You must be logged in to post a comment.

Share this:

Related

11 COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

AMAZON

REVIEWS: E-Book & AUDIO BOOKS

SELF PUBLISHING: TECH & BIZ TIPS

MOST RECENT

POPULAR POSTS

MAJOR CATEGORIES