‘Checking Out the Machines Behind Book Digitization’

February 25, 2006

275

Kirtas BookScan 800 A fascinating look at book-scanning technology appears in the Book Standard. Excerpt:

…there are essentially only two companies that sell the robotic equipment and hardware necessary to quickly scan and digitize large volumes of books: Kirtas Technologies and 4DigitalBooks. Not surprisingly, major players in the digitization game–Google, Amazon–often employ their own proprietary systems; a spokesman for Google, for instance, says the company uses “some really cool stuff we’ve developed.”

Detail re another kind of proprierty vs. nonproprietary: Will the same mindset at Google apply to e-book format standards? I can somewhat understand the use of proprietary scanners–Google needs something to distinguish it from the competition. Proprietary e-book standards I would not brook, however, given the damage to the flow of knowledge. As a very small shareholder in the company, I hope Google will “do no evil.” If Google is using its own proprietary approach for scanning systems, are library projects suffering? I’d welcome thoughts from others on this.

8 COMMENTS

ryanramseyer February 25, 2006 at 10:39 am

“I can somewhat understand the use of proprietary scanners–Google needs something to distinguish it from the competition. Proprietary e-book standards I would not brook, however, given the damage to the flow of knowledge. As a very small shareholder in the company, I hope Google will “do no evil.” If Google is using its own proprietary approach for scanning systems, are library projects suffering?”

I’m not sure I follow your train of thought. 1) Why would Google’s proprietary scanners distinguish them from the competition if the user never sees the scanner? (Do you mean that their scanners are just better-faster, and so they can do more scanning, and that this distinguishes them?) 2) Why would the proprietary “scanning system” have anything to do with the e-book standard they choose to employ?

Of course I agree that they should use an open standard. (I’d prefer a clean .txt file like PG that I could do what I want with.) I don’t, however, see the connection to the proprietary hardware they use to scan.

Log in to leave a comment
David Rothman February 25, 2006 at 11:01 am

Hey, we’re talking about two very different kinds of “proprietary” (I tweaked the post to be sure that was absolutely clear), but sometimes the same mindset can ishow up in both cases.

As for proprietary vs. nonproprietary scanning systems, keep in mind the inherent advantages of avoiding a Not Invented Here approach. More input from outsiders–both vendors and librarians in this case.

Even if the book scanning machines came from private companies and didn’t represent true standards, that might still be preferable to NIH.

As for .txt vs. an OpenReader-stype approach, remember that OpenReader would allow standards AND good presentation. I want to see real italics, not *this*.

Thanks,
David

Log in to leave a comment
bowerbird February 25, 2006 at 7:40 pm

> As for .txt vs. an OpenReader-stype approach,
> remember that OpenReader would allow standards
> AND good presentation. I want to see real italics,
> not *this*.

you ‘re confusing the file-format with the viewer-program.

html uses angle-brackets around an “i” for italics in the file,
and the viewer-program (your browser) does the formatting.

likewise, a viewer-program for a p.g.-style text-file could
easily change _this_ to its italicized form. (you used
asterisks, but asterisks more commonly mean *bold*
in plain-text markups like these.)

all in all, it’s a lot easier to have our _tools_ do this,
and not require humans to do “markup”. for instance,
here is a link to the project gutenberg website:

http://www.gutenberg.org

i didn’t do any of the clumsy “a href” stuff, i just
typed in the u.r.l., and your wordpress software
was smart enough to recognize it as a link, and
do the formatting for me. (at least it used to be;
i don’t know if this new version does that or not…)

but that’s what we need, software smart like that…

you’ll probably have some “yeah but” response,
because this kind of simplicity threatens your
philosophical mindset. i won’t bother to reply,
because you can’t convince someone who will
refuse to be convinced, but the movement to
plain-text writing with automatic markup is
fairly well unstoppable now. you keep calling
for “one standard format”, but do not seem to
recognize that we’ve had it all along — plain ascii.

-bowerbird

Log in to leave a comment
David Rothman February 25, 2006 at 11:16 pm

As usual, Bowerbird, you’re thinking of the entire publishing world as having easy requirements. Beyond that, the right creation tools should make it a cinch even for small publishers to do OpenReader format. Other issues arise such as the language ones, the needs of the disabled, etc. The right viewer just isn’t enough. This is an eternal debate, and I know you’ll have a reply, so go ahead and have the last word. – David

Log in to leave a comment
bowerbird February 25, 2006 at 11:51 pm

no thanks… :+)

-bowerbird

Log in to leave a comment
bowerbird February 26, 2006 at 2:37 pm

oh well, since i’ve already broken my
new year’s resolution for 2006 by
commenting here in the first place
— pretty good, almost 2 months! —
might as well finish this thread right.

you say that, “as usual”, i’m thinking
too simplistically. after i remark that
it’s unwise for you to underestimate
your rivals, i’ll accept the “challenge”
implicit in your ad hominem charge…

let’s see if i can handle difficult books!

scan 3 books that you think would be
“too hard” for my zen markup system
to handle, and furnish me the scans
and the o.c.r., and i’ll show you that
i am able to handle the books nicely…
(at a bare minimum, as well as paper.)

i predict i’ll be able to handle at least 2.
if i _can’t_ handle a book, then you can
prove the superiority of your system by
showing you _can_ handle that book…

that’s the “difficult” test. of interest too
is the “average” test — i.e, how well i can
handle the range of “average” books…

so next, give me 10 numbers between
1 and 18,000. i will then add a constant
to those numbers, and the new numbers
will be the 10 e-texts from the p.g. library
that i will process using my zen markup.
i expect i’ll be able to handle at least 9…

again here, if any of these books should
prove “too difficult” for z.m.l. to handle,
i’ll look forward to how well openreader
handles those books…

i think my z.m.l. can give better benefits
_and_ lower costs than your openreader,
and that people will then see that it is a
no-brainer as to which they will choose.

why would people voluntarily take on the
burden of doing heavy markup when they
can get better e-books without it? why?

so, how about it, rothman? are you up for
a little head-to-head performance rumble?
a _battle_ between z.m.l. and openreader?

or will you recognize that it would be
_far_ too devasting for your precious
openreader to lose, so you chicken out?

-bowerbird

Log in to leave a comment
David Rothman February 26, 2006 at 3:14 pm

Hey, Bowerbird, when you take your ZML into the standards mainstream (and I don’t just mean the use of ASCII!), Jon and I will take it a little more seriously.

Beyond the still-remaining issue of ability to handle complex typography, there are others such as interactivity. An XMLish approach will be far, far more conducive to that.

OK, now you get the last word.

Thanks,
David

Log in to leave a comment
bowerbird February 26, 2006 at 5:54 pm

> Jon and I will take it a little more seriously.

it’s not important to me that you two “take it seriously”.
you’ve got a different agenda to promote…

> Beyond the still-remaining issue
> of ability to handle complex typography,

i’ve just challenged you on that issue,
and you (wisely) declined to do battle.
so that “issue” is _not_ “still-remaining”.

> there are others such as interactivity.
> An XMLish approach will be far,
> far more conducive to that.

that assertion is also false, and you will
likewise decline to take a challenge on it.

so i’ll proceed to issue one of those too:
name your “interactivity” contest, rothman,
and then show me your openreader solution.

let’s see if you’re willing to put some pride
on the line to back up all the hyping you do.
i’m ready to do some sparring. are you?

-bowerbird

Log in to leave a comment

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

You must be logged in to post a comment.

Share this:

Related

8 COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com. Cancel reply

AMAZON

REVIEWS: E-Book & AUDIO BOOKS

SELF PUBLISHING: TECH & BIZ TIPS

MOST RECENT

POPULAR POSTS

MAJOR CATEGORIES