How many episodes of “Law & Order” does it take to scan a 400 page book? With my clunky old scanner, the answer is 3 1/3. That’s a third of the time I spend in class each week.

It’s time to get a new scanner, but I’m faced with a predicament: I want something better than the $100-$300 models that are out there, but can’t afford a multi-thousand dollar model that costs more than a year’s worth of rent. I’m looking for a flatbed that can give me grayscale scans at 300-400 dpi, and the largest possible size to accommodate scanning two pages of larger-size books at once. The faster the better. Any suggestions?

Some of the models I’ve looked at are below:

I’m thinking of something in the $1000 price range, up to $1500 if there’s something especially good.

Image by For Inspiration Only, uploaded to Flickr with a Creative Commons License.

22 COMMENTS

  1. Quinn, do you care about the distortion caused by trying to push a bound book on a flatbed scanner? The distortion happens because of the curvature near the gutter (and if you push too hard to flatten it, the binder will be damaged, or even break the scanner glass.)

    This is one advantage of the Plustek OpticBook 3600. Even though it does only one page at a time, the scanning glass goes close to the edge, and the book can hang over the side, meaning one can usually get distortion-free scans of the page.

    Of course, the BookDrive DIY should also minimize gutter distortion, but that scanner is expen$ive for everyday home use. Now, for commercial projects, that’s another matter…

  2. The distortion isn’t a huge deal. ABBYY FineReader generally does all right with it. Even with some additional correction time for the OCR, I think it’d be worth it to do two pages at once, considering the volume of books I’m hoping to go through.

    Unless there’s a single-pager that goes super-fast (and is big enough to accommodate larger-size books), not being able to do two pages at once is a deal-killer.

  3. Scanning books face-open on a flatbed is a good way to destroy the spines, of course. You’re throwing them away afterwards, are you?

    I’d second the vote for the Plustek. I scan mainly fragile 19th-century books, admittedly. But they don’t die.

    And as a reminder, surely: binding methods and glue structure at the spine of modern books render them far more fragile than 19th-century case bindings.

  4. Thanks, Quinn, for clarifying. It all depends upon the use of the scans. If they are only intended for OCR, then in many cases using a flatbed works (although note Bill Tozier’s comment.)

    I’m looking at a scanning project where the scans themselves must be of archival quality, and not just for OCR, so for that distortion is an issue. Also, many of the books are valuable and held by libraries, so we have to be gentle on the spine. The OpticBook and the BookDrive DIY are two candidates — if the project ends up scanning a thousand books, then the BookDrive and similar scanners become very attractive. Right now the first phase of the project involves only ten volumes which need to be carefully scanned, so the Plustek is of interest.

  5. I’ll admit to qualms of conscience over what the scanning does to the books (Soviet linguistics stuff), but most of these are books that students run through the photocopier, year after year. I figure if I scan them once, then make the images available, it will save them from future torture.

  6. Thanks for the input, Jon. It does sound like I’m in an unusual position, prioritizing speed over image quality (beyond what I need for OCR).

    Any ideas of websites, message boards, etc. where I might get a lead on the fastest flatbed?

  7. I always thought the best way to scan a book is to cut the spine and put the pages in a feeder. One work hour of manual turning pages have a similar cost than one another book copy.

    Or put in another way: get a 50$ scanner with a feeder and buy 950$ of book copies. Or rely only on the scan and discard the only copy each time.

    Why did you scan in grayscale? Monochrom is great: pure white and pure black for copies and OCR, figures with nuances are printed with dots, so with a fine scan it looks the same.

  8. Hi Marc,

    Ah, if only the sheet-feeder thing were an option. But most of the stuff I’m looking to do are old Soviet books, a lot of which only had under 1000 copies printed.

    ABBYY FineReader tells you to scan things in grayscale instead of black & white, and my experience with the program says that’s not for nothing. Particularly when you’re scanning books on a flatbed (and inevitably having edge shadows, sometimes badly), the software sorts out letter vs. shadow better if it’s grayscale.

  9. Grayscale better handles the boundary between the black ink and white paper, and thus gives a higher “pseudo-resolution” than does bitonal (black and white). Putting it another way, it preserves more information. But if ABBYY can give good results with 300 dpi bitonal, then that should be considered since it may slightly speed up scanning (hard to say, the limiting factor is ultimately the time it takes to flip pages) — and the scanned images are smaller.

    My experiments with linear resolution and color depth show that if one is dealing a lot with very small type, such as 4 point (found in lots of scientific books), that 600 dpi (gray scale or full color) may be necessary for OCR.

    In the scanning projects we are contemplating, the scans will be used for more than just OCR, and need to look their best — to be archival quality. Here we will scan text at 600 dpi full color, and for many illustrations at 1200 dpi. Lossless compression of the master page scans will be used. (And of course we need to minimize page distortion.)This will result in huge image size, but we are willing to live with that. Although there is disagreement whether this is overkill, for what we have in mind (scanning some rare books) it’s better to be on the overkill side. And note that the Internet Archive/Open Content Alliance is scanning at a pretty aggressive resolution and color depth, too, from 300-600 dpi full color, so we are not alone.

    Some of the issues of scanning are debated in the YahooGroup Distributed Scanners. Again, there is no agreement on resolution/color depth when the scans are intended to to be used for more than just OCR — that is, where the scans are meant to be an archival representation of the book.

    (p.s., some wonder why full color for black and white text? First, it aids in image restoration tasks such as deskewing, and even conversion to gray-scale or black and white by picking the best color channel to use. In addition, with color calibration it will preserve the original “look” of the page, giving it more life. Again note that if the goal is archival quality, and not only OCR, this may be an important criterion.)

  10. I think the feasability of bitonal OCR largely comes down to the scanner software, and how well it can deal with the edge shadow. My current home scanner can’t even do B&W, and doesn’t have software beyond Windows Scanner and Camera Wizard. The scanners at work can do bitonal, but the software leaves much to be desired. I have a sheet-feeder that does a decent job cleaning up the edge shadows from photocopied books, but the price of copying around here is high enough that it’s not a very good option to copy first, then scan the copies.

  11. “That’s a third of the time I spend in class *each week*.”

    I feel for you. 🙂 I’m partially sighted, so I scan a *lot* of books.

    A friend gave me the Hewlett Packard Scanjet 8200. Using my OCR software (OmniPage Pro 12), it takes me about twenty seconds to scan a page of text. If your OCR software gives you a similar or faster speed, that would cut at least an hour off the time it takes you to scan that 400-page book.

    HP Scanjet 8200 offers 300-400 dpi greyscale (it also does 48-bit color at 4800 dpi), and it can scan up to 9″x14″. Shopping.com shows it selling in the $255-$580 range.

  12. Update: I ended up going with the HP Scanjet 8300. It’s around 8 seconds for a 300 dpi grayscale scan using the software (more if you count the time it takes to load– which it has to do for every scan), but only 4 seconds if you’re using the default Scanner and Camera Wizard. I love it.

  13. Hi, I’m a student looking to buy a fairly cheap (budget of $600) but very fast scanner for book scanning. I was wondering if you could advise me on whether the HP Scanjet 8300 or 8200 is faster. Also, any ideas on what I should go for in the current market?

  14. Maybe try this: http://snapter.atiz.com/download.php.
    This is a 50$ software (from Atiz) which takes digital camera picture as input and outputs PDFs. It is suppposed to manage book spine, weird orientation, … I have not yet tried it, but I certainly will!

    Right now, I’m using an antique but great Xerox XE90fx multifunction which has a parellel-port 1-bit scanner that does real-life 7ppm (including page flipping, button pressing and ABBY OCR). However the scanners stops when the toner runs out…

  15. Dear All.
    5 years ago I removed the side of a Canon Lide 25 and I have done over 6000 pages with this. On average, with fiddling with them after one can say 2 pages a minute, and a minimum outlay. However a newer model, say Lide 50 is USB2 and would probably be twice as quick.
    I am working on a newer angle to this and will report later but it will electrify the process
    regards
    peter Davenport

  16. This is a 6 year-old discussion about scanners, and the original poster specifically said in the comments that she was not interested in destructive scanning, and the companies in the link you listed only do destructive scanning.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.