Kirtas BookScan 800A fascinating look at book-scanning technology appears in the Book Standard. Excerpt:

…there are essentially only two companies that sell the robotic equipment and hardware necessary to quickly scan and digitize large volumes of books: Kirtas Technologies and 4DigitalBooks. Not surprisingly, major players in the digitization game–Google, Amazon–often employ their own proprietary systems; a spokesman for Google, for instance, says the company uses “some really cool stuff we’ve developed.”

Detail re another kind of proprierty vs. nonproprietary: Will the same mindset at Google apply to e-book format standards? I can somewhat understand the use of proprietary scanners–Google needs something to distinguish it from the competition. Proprietary e-book standards I would not brook, however, given the damage to the flow of knowledge. As a very small shareholder in the company, I hope Google will “do no evil.” If Google is using its own proprietary approach for scanning systems, are library projects suffering? I’d welcome thoughts from others on this.

Related: Ruling May Undercut Google in Fight Over Its Book Scans, in the New York Times.

8 COMMENTS

  1. “I can somewhat understand the use of proprietary scanners–Google needs something to distinguish it from the competition. Proprietary e-book standards I would not brook, however, given the damage to the flow of knowledge. As a very small shareholder in the company, I hope Google will “do no evil.” If Google is using its own proprietary approach for scanning systems, are library projects suffering?”

    I’m not sure I follow your train of thought. 1) Why would Google’s proprietary scanners distinguish them from the competition if the user never sees the scanner? (Do you mean that their scanners are just better-faster, and so they can do more scanning, and that this distinguishes them?) 2) Why would the proprietary “scanning system” have anything to do with the e-book standard they choose to employ?

    Of course I agree that they should use an open standard. (I’d prefer a clean .txt file like PG that I could do what I want with.) I don’t, however, see the connection to the proprietary hardware they use to scan.

  2. Hey, we’re talking about two very different kinds of “proprietary” (I tweaked the post to be sure that was absolutely clear), but sometimes the same mindset can ishow up in both cases.

    As for proprietary vs. nonproprietary scanning systems, keep in mind the inherent advantages of avoiding a Not Invented Here approach. More input from outsiders–both vendors and librarians in this case.

    Even if the book scanning machines came from private companies and didn’t represent true standards, that might still be preferable to NIH.

    As for .txt vs. an OpenReader-stype approach, remember that OpenReader would allow standards AND good presentation. I want to see real italics, not *this*.

    Thanks,
    David

  3. > As for .txt vs. an OpenReader-stype approach,
    > remember that OpenReader would allow standards
    > AND good presentation. I want to see real italics,
    > not *this*.

    you ‘re confusing the file-format with the viewer-program.

    html uses angle-brackets around an “i” for italics in the file,
    and the viewer-program (your browser) does the formatting.

    likewise, a viewer-program for a p.g.-style text-file could
    easily change _this_ to its italicized form. (you used
    asterisks, but asterisks more commonly mean *bold*
    in plain-text markups like these.)

    all in all, it’s a lot easier to have our _tools_ do this,
    and not require humans to do “markup”. for instance,
    here is a link to the project gutenberg website:

    http://www.gutenberg.org

    i didn’t do any of the clumsy “a href” stuff, i just
    typed in the u.r.l., and your wordpress software
    was smart enough to recognize it as a link, and
    do the formatting for me. (at least it used to be;
    i don’t know if this new version does that or not…)

    but that’s what we need, software smart like that…

    you’ll probably have some “yeah but” response,
    because this kind of simplicity threatens your
    philosophical mindset. i won’t bother to reply,
    because you can’t convince someone who will
    refuse to be convinced, but the movement to
    plain-text writing with automatic markup is
    fairly well unstoppable now. you keep calling
    for “one standard format”, but do not seem to
    recognize that we’ve had it all along — plain ascii.

    -bowerbird

  4. As usual, Bowerbird, you’re thinking of the entire publishing world as having easy requirements. Beyond that, the right creation tools should make it a cinch even for small publishers to do OpenReader format. Other issues arise such as the language ones, the needs of the disabled, etc. The right viewer just isn’t enough. This is an eternal debate, and I know you’ll have a reply, so go ahead and have the last word. – David

  5. oh well, since i’ve already broken my
    new year’s resolution for 2006 by
    commenting here in the first place
    — pretty good, almost 2 months! —
    might as well finish this thread right.

    you say that, “as usual”, i’m thinking
    too simplistically. after i remark that
    it’s unwise for you to underestimate
    your rivals, i’ll accept the “challenge”
    implicit in your ad hominem charge…

    let’s see if i can handle difficult books!

    scan 3 books that you think would be
    “too hard” for my zen markup system
    to handle, and furnish me the scans
    and the o.c.r., and i’ll show you that
    i am able to handle the books nicely…
    (at a bare minimum, as well as paper.)

    i predict i’ll be able to handle at least 2.
    if i _can’t_ handle a book, then you can
    prove the superiority of your system by
    showing you _can_ handle that book…

    that’s the “difficult” test. of interest too
    is the “average” test — i.e, how well i can
    handle the range of “average” books…

    so next, give me 10 numbers between
    1 and 18,000. i will then add a constant
    to those numbers, and the new numbers
    will be the 10 e-texts from the p.g. library
    that i will process using my zen markup.
    i expect i’ll be able to handle at least 9…

    again here, if any of these books should
    prove “too difficult” for z.m.l. to handle,
    i’ll look forward to how well openreader
    handles those books…

    i think my z.m.l. can give better benefits
    _and_ lower costs than your openreader,
    and that people will then see that it is a
    no-brainer as to which they will choose.

    why would people voluntarily take on the
    burden of doing heavy markup when they
    can get better e-books without it? why?

    so, how about it, rothman? are you up for
    a little head-to-head performance rumble?
    a _battle_ between z.m.l. and openreader?

    or will you recognize that it would be
    _far_ too devasting for your precious
    openreader to lose, so you chicken out?

    -bowerbird

  6. Hey, Bowerbird, when you take your ZML into the standards mainstream (and I don’t just mean the use of ASCII!), Jon and I will take it a little more seriously.

    Beyond the still-remaining issue of ability to handle complex typography, there are others such as interactivity. An XMLish approach will be far, far more conducive to that.

    OK, now you get the last word.

    Thanks,
    David

  7. > Jon and I will take it a little more seriously.

    it’s not important to me that you two “take it seriously”.
    you’ve got a different agenda to promote…

    > Beyond the still-remaining issue
    > of ability to handle complex typography,

    i’ve just challenged you on that issue,
    and you (wisely) declined to do battle.
    so that “issue” is _not_ “still-remaining”.

    > there are others such as interactivity.
    > An XMLish approach will be far,
    > far more conducive to that.

    that assertion is also false, and you will
    likewise decline to take a challenge on it.

    so i’ll proceed to issue one of those too:
    name your “interactivity” contest, rothman,
    and then show me your openreader solution.

    let’s see if you’re willing to put some pride
    on the line to back up all the hyping you do.
    i’m ready to do some sparring. are you?

    -bowerbird

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.