Searching for a Better Way: Part 4 – Natural Language Searching
Phrasing web searches in the form of a question is this week’s topic.
Since Yahoo!’s IPO, we’ve been knee-deep in company names with exclamation points. But it looks like 2000 may be the year of the question mark.
By now many are familiar with the keyword-based interfaces needed to access information from any database, be this a library catalogue or a whole web index. Sure, you can get a little bit fancy, refining your search with the use of operators like AND, OR, and NOT. You can even search for whole phrases.
To a lot of people, though, that feels cumbersome, and many wish they could interact with databases more naturally, somewhat like Star Trek characters who turn to the computer and ask it a question: “Computer, where is Playa del Carmen?”
On TV, the computer hardly ever messes up. It never answers “Commander del Carmen is not on board the ship.” It usually says something like “Playa del Carmen was a town on the planet Earth, country Mexico.” On further prompting, more relevant information is supplied, albeit in a rather nasal voice: “It reached the height of its popularity in the period AD 2020-2050 as a resort city. A worldwide depression caused by a concentration of wealth in the hands of margarita-peddling bar owners in Playa del Carmen resulted in the Tequila Wars of 2183. Three weeks of armed struggle between bar owners and the World Economic Council resulted in the departure of thousands of wealthy, bored-looking German teens, and complete economic collapse for the bar owners. By the end of that century, little was left of Playa del Carmen.”
Is Jeeves the Answer?
University scientists have been investigating natural language query interfaces for some time.
But the first really significant deployment of natural language technology on the Internet – essentially, the ability to ask a search engine a question in normal English as if you were asking an human-like but enormously overeducated android – has been Ask Jeeves.
Jeeves is a popular Internet search tool which points you towards several secondary options which might help you narrow down your search to the tool, database, category, or search engine which can help you most, based on the structure and content of a naturally-worded question.
Its stock market valuation (the company is valued at about $2 billion), however, may be largely due to the usefulness of such an interface in responding to a pre-existing database of commonly asked questions. Blue-chip corporate partners plan on using the Ask Jeeves interface and technology to answer product questions online. Since this can save companies money they might otherwise need to spend on live customer support, it’s seen as a promising niche.
This tips us off to the underlying reality of Ask Jeeves’ “natural language” search technology. According to some industry watchers, the responses provided by Ask Jeeves are canned sets of resources which are largely assembled “brute force” by a team of developers in response to the most commonly-asked questions (in the case of the mass-market version of Jeeves, meant to connect users with Internet resources, it’s a database of millions of common questions). To be sure, there is a lot more to it than this, including a “matching algorithm,” but it’s hard to avoid the feeling that natural language isn’t the real point of Ask Jeeves.
Jeeves vs. Google, Round 2
Jeeves responds perfectly well, in fact, if you just type in keywords. Conversely, many major search tools will do OK if you type in a question. Google, for example, just ignores the common words such as “where” and “is”. I expected Jeeves to do better than Google on a “where” question (“Where is Playa del Carmen?”), but he didn’t. I was given “Where can I buy” the movie “Carmen” on video or DVD, and “What is the story of the opera Carmen” as leading search options. The best options offered were mostly in the drop-down box of About.com results, which begs the question: why not just go to About.com in the first place?
Google, though supposedly not trained to answer questions, gave me playadelcarmen.com as the first result, and a site promising a “map of Playa del Carmen” wasn’t far down the first page.
The moral of the story: the kind of natural language technology to which we’re being exposed on the Internet at present has little to do with natural language, and a lot to do with cute icons and cuddly branding. Practically any major search engine could be modified to accommodate the “ask me a question” trick. If this feature proves popular enough, perhaps they will be.
That said, there is obviously a vast, pent-up demand for more of this sort of thing. The success of Jeeves should lure more university researchers into the action.
He’s more human than you think
In the meantime, let’s ask tougher questions about what Jeeves actually does. The fact that a team of developers work on linking common questions with answer sets should prompt us to face Jeeves off more squarely against other human-guided net tools like Yahoo’s directory, Looksmart’s editors (who also do custom question answering), About.com’s guides, Suite 101’s editors, or 4anything.com’s vertical sites. So why don’t we? Jeeves’ need to maintain the fiction of the smart butler-robot means that we don’t get to see who is beside the scenes pulling Jeeves’ strings. We get answers, but not accountability.
This column, like Jeeves, was more question than exclamation mark. Later on, we’ll look at powerful “off the radar” natural language technology, and try to figure out where it’s headed. For the time being, we’re collecting links, asking questions, and listening to experts in this field.