Searching for a Better Way: Part 3
This year, several companies have arrived on the scene promising to improve on traditional keyword searching through “meaning-based” search technology. Their premise is a good one: keyword searching is “dumb” and can return many irrelevant results.
When I search for the word “portal,” I mean web portal; I don’t want to see material about a science fiction game called Portals When I’m seeking information about “The Big Tuna,” I don’t want to go on a fishing expedition. I just want to know whether Bill Parcells plans to coach again. (Note to self: suggest this meaning for “Big Tuna” to the Oingo staff.)
I’ve recently talked with three companies who are working on this problem. Two of them, Oingo and Simpli, are working on developing proprietary lexicons which allow users to zero in on particular meanings for a given keyword. A third, ejemoni, is developing more ambitious technology that can scan the text of a document and analyze the relationships among words to help in placing documents in specific categories that describe what they are about overall.
All of these technologies have potentially widespread applications. As one contributor to Traffick Forums argued, however, at this stage they don’t do a lot that a conscientious searcher couldn’t do for themselves with the simple use of Boolean operators such as AND or NOT.
Let’s take a peek at these three entrants into the meaning-based search field. We’re sure to hear more from them in the future.
Oingo is more typical of a Silicon Valley startup than the other two: it’s brash, young, fun, and likely to outwork you if it can’t out-think you. They’ve got a working product, and they’ve got it now. Their team of linguists has built a large lexicon of common meanings for search terms, and the company is now offering their technology “open source” as a front end for any directory or site which wishes to use it. The default directory being used to demo the service is the ever-present Open Directory Project.
If you try Oingo, you’ll see where they’re headed. For the time being, however, it’s not about to replace my favorite search engines. (Lately, I have been using Google and Ixquick, two that I find tend to provide highly relevant results without a whole lot of effort in devising search terms.)
Down the road, however, the Oingo team feels it’s only a matter of time before a major search company finds the technology useful. This could well be true. Major search and portal companies today are not shy about adding on a combination of external technologies to ensure better results. Go2Net uses Direct Hit (the popularity engine); MSN offers Looksmart directory results; various others have chosen the Open Directory for categorized results. Meaning-based search is going to find its way into the mix one way or another.
SimpliFind has the same basic idea as Oingo, but seems to have a little heavier complement of scientific muscle on board from the likes of Brown and Princeton Universities. Its lexicon, called WordNet, was developed over a long period of time by cognitive and linguistic scientists at Princeton.
A test of the product is satisfying. This technology is sure to find its way into many databases, and might become a force on the Internet.
Then again, holes in the database reinforce the fact that SimpliFind, like Oingo, is going to have to rely on considerable customer-driven customization and brute force to respond to very human twists and turns in language, history, commerce, and popular culture. I searched for “Watergate” and Simpli came back with “No Meaning Found.” Now there’s some social amnesia for you! (Oingo has them beat on that one, which underscores the fact that high-level cognitive science alone won’t be enough to make this technology practical.)
One question for the scientists. Will XML (eXtensible Markup Language) have the potential to make their current approach irrelevant? Tomorrow’s Internet is going to be more than a question of determining the different meanings for words in the English dictionary. XML may allow meanings to become hard-wired to ever more particular contexts, and thus make search technology ever more useful. Thus we’ll be able to search for documents, companies, publications, products, people, spare parts, geographic locations, stock prices, and so on, without seeing all the other junk with similar keywords. At least that’s what I read in The Economist magazine.
At this point, we can only speculate about the power of ejemoni, another sophisticated startup working on meaning-based search. Ejemoni is well-financed by an influential angel investor, and has what some observers believe may be a major scientific breakthrough on its hands. The core idea appears to be the ability to find related documents by analyzing the content of whole documents and placing them into an overall category similar to the Library of Congress classification system.
A cool feature that may be made possible with ejemoni’s technology is the ability to highlight a whole paragraph or even several paragraphs of text, and search for related documents based on all of the words you highlight. The company stresses that the algorithm used by ejemoni will not simply be looking at keyword density but will genuinely analyze the meanings of documents based on word relationships. Obviously, there is a lot of potential in a search technology that works better as you feed it more words. It might even be able to approximate what Ask Jeeves only pretends to do, which is to understand your questions! And yes, Jeeves, ejemoni is already voice-recognition ready, according to the company.
At this stage, it’s too early to see ejemoni in action. We’ll catch up with this one again later.