imageClay Breshears, an Intel software network blogger offering his personal opinions, serves up a bunch of cons about e-books, but zeroes in on this:

"What really burns my biscuits is the ‘state-of-the-art’ in search capability that is so gosh darn literal.  If you don’t know the exact term you want to find, you’re surely outta luck.  For example (and this comes from a search engine since I don’t have an e-book reader), I was trying to find the online article "The ‘Anti-Java’ Professor and the Jobless Programmers," which I had read about a month prior.  Since I didn’t know the exact title, I tried "java unemployed programmer" as a search term.  After hundreds of hits for out-of-work Java programmers looking for jobs, I gave up.  I searched through my email archives with the same terms and came up blank.  Only after poring over my e-mail by hand did I find the URL I was looking for."

He continues:

Perhaps a more relevant example would be looking for a quote by a character in a novel I had just read.  I knew about where this was in the book and what was going on when the quote was given, so I could open up the pages and look forward and backward, skimming text looking for the exact quote.  Since I only had an idea about the content of the quote and not the exact words I needed to identify what I wanted to find, I can only imagine wasting tens of minutes doing an electronic search with variations of the key words trying to locate what I wanted.

OK, gang, go to work while remaining civil. Even with the current limits of search in e-books, I myself can find things faster than in P. What’s more, I wouldn’t be surprised to see search technology eventually include Google’s fuzzy-style features—in addition to synonym capabilities. The real Google, of course, could offer this in Web-based titles, including networked books. And with Gears, who knows about fancy searches offline as well?

Related: Google now searching for synonyms, in Search Engine Land.

Technorati Tags: ,

12 COMMENTS

  1. Hum–

    So, eBooks are worthless because a search feature, which doesn’t work at all in paper books, is not perfect in eBooks?

    I’m not sure I completely follow the reasoning. I do agree with David (and Josh) that search has plenty of room for improvement, but we’ll never reach perfection so we need to compare how search works with our eBooks to how it works with paper books.

    Rob Preece
    Publisher, http://www.BooksForABuck.com

  2. I thought the points you left out of the quote were more “neutrals” than “cons”. I’m all for new technology. My brother-in-law raves about ebooks. A friend loves them, too, and keeps technical journal articles on his. And I expect using them for pleasure reading would be great.

    It’s when I need something as a reference that I’m not ready to give up my paper. Searching for technical articles on “sorting” would be straightforward and the results would be plentiful. However, searching for articles about “order objects in monotonically increasing sequence” is going to yield fewer hits. Not to mention misspelling searches for “Clay Brashears”.

  3. The more general problem about e-books here is not search per se, but “non-linear” reading.

    That’s the one thing I dread the most when reading e-books since whatever I do, whether typing a search term, taking the stylus to move pages ahead fast on my 770, inputing page numbers in the Sony – that is the most useful hack for the 500 I saw, have no clue if native in 505 -, all these actions distract me from reading, while flipping pages never does.

    So if you read cover to cover, e-books are excellent, but if you want to jump around, the distraction is very annoying

  4. Many thanks for the reply, Clay, though you may want to change the headline, “Why I will never own an electronic book.” Maybe”..for references” or “…until search gets better”? As I saw it, you did give cons since you felt compelled to show how you were overcoming them. Glad I spelled your name right even if, yes, your hypothetical example is of interest. 😉 Any chance you’d feel better about these things if your last name were Smith? Now that you’re famous, I hope you’ll stick around the TeleBlog AND keep sharing your thoughts on the pros and cons of E. I think it would be cool if Intel encountered the development of ePub reader software that addressed your very valid concerns. While I find E better for searching than P, it would be much better. So thanks for raising these issues!

    David

  5. The solution is very simple. I mention the idea of a Cloud Library in a post last night. This would be the perfect way to have your Search and eBooks too. eBook copies are also kept in the Cloud and therefore can use *real* Search, such as Google, et al. Not everything has to be local.

  6. I would never want a ‘cloud’ library. I want all my books to be local. I can’t guarantee that if I am traveling, be it around town or on holiday, that I will always have an internet connection available to me. Also, the trend in internet usage these days seems to be moving more toward pay-as-you-use rather than flat fees (which imho is a mistake) but it means that every time you access the content you have paid for, you would be charged. I would rather download it once and then have it. Also, with storage mediums being so cheap these days and growing in capacity, I don’t why I would need my stuff in a cloud when I could have it all on my machine. I have more than 200 books on my ebookwise right now and half of them are one-off reads I will delete when I am done (e.g. freebies from the net). Even if I kept all of them, it still is more books than I have in my print collection, which I keep ruthlessly pruned. I think a cloud library is over-complicating things. Why not just download it and have it?

  7. One reason that screen based books have such rich indexing is that the encoding process produces both the rendered text and the indexing; two for the price of one. It is as if the process of printing on paper also indexed the copy. Print books actually resist indexing and searching.

    But what print does provide, and the screen doesn’t, is rich authentication. When a reference is found in print you have nailed it since it is embedded in layers of physical evidence, immutable content and bibliographic codes that persistently reveal the source and intent of its production. Screen books, like touch screen voting, remain vulnerable and un-trusted with their ease of unmonitored deletions or revisions and uncertain provenance. And with screen searching you never know what you are missing.

  8. Note that the problem with non-linear reading seams to be a gizmo and not the content being digital, On a PC you just open two or more instance of the document your reading, and one the web nobody reads linear.

    The search tools you can use on the PC on known formats is also much better then what anyone had put inside a traditional ebook app. you dont really get any flexibility with an app like the kindle.

  9. A search engine that automatically performed substitution of synonyms as part of an exploration strategy does sound desirable. Clay Breshears gives an example where synonym matching would have been useful. He says that a search for “java unemployed programmer” generated hundreds of fruitless hits. However, a small modification of his search expression to “java jobless programmer” came back with the reference that he wanted.

    Yet the obstacles in the way of synonym substitution mechanisms are significant. Consider several other search expressions that might have been automatically created based on search terms given above: “coffee, out of work, coder” or “island of Indonesia, laid off, software analyst” or “Dutch East Indies, without a job, software developer”. A simple substitution strategy leads to a combinatorial explosion since each combination must be evaluated. A single “fuzzy” synonym search might become too computationally expensive. Further, Breshears says that his first search attempt by itself was already yielding too many spurious matches. Imagine the enormous number of matches when synonyms are allowed.

    One idea for reducing the computational load of searching with synonyms utilizes the technique of “canonicalization”. Groups of words within a set of synonyms with the same denotation would be mapped to a “canonical form”. The search index would be built using canonical forms. This method would simplify matching.

    But it introduces another problem. Many words have multiple meanings. Consider “java” which might refer to “coffee” or “an island of Indonesia” or “a programming language”. If canonical forms are being used then three different forms are needed, and this makes it difficult to build an efficient search index structure.

    One might try to perform a “deeper parse” of the text being indexed and of the search terms to try and better identify the best synonym sets. For example, in the search expression “java jobless programmer” the co-occurrence of “java” and “programmer” suggests that java probably refers to the programming language. Of course, if you are looking for jobless programmers located on the island of Java then this assumption would be wrong.

    Of course it would be great to be able to search for “meaning” instead of for simple strings of text, and I do think major progress is possible. Good luck to the people trying to solve this bear of a problem!

  10. What, he’s never bought a single piece of electronic gear in the past 8 years? He’s never bought a router, a modem? He’s never picked up a commercial software program? He’s never bought a laptop or desktop computer in the past 8 years?

    I’m not sure what the precise time frame is, but it’s been years and years since any of these gizmos to be used on or connect to a computer came with a printed bound manual. They all have ebook manuals instead.

    So, Clay, yes — you do own an ebook, you own lots of them, and you have owned them for years.

  11. Google already does synonym searching. Just put a tilde before the term you want to synonymize.

    That said, this is an education issue not a technology issue. Searching on “java unemployed programmer” to find that specific article is like searching on “war” to find a specific article about battle formations in the Napoleonic era. The search phrase is far too generic to result in meaningful results.

    There are search tools that do let you do fuzzy and similar searches, but what the author doesn’t seem to realize is they add a ton of complexity. The time you could teach someone to use all the tricks in something like DTSearch would probably better be spent teaching them how to form better searches to begin with.

    OTOH, have you ever tried to find that quote you can almost remember in a 700 page physical book before? I have. I’ll take the vagaries and limitations of electronic search any day of the week.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.