The New York Times has a great article on a new system developed by IBM named “Watson”. It’s a computer system that’s scraped 10′s of millions of documents from the Internet and compiled a massive database of knowledge. It used natural language parsing to interpret questions and generate answers. The cool thing is that it beat former Jeopardy contestants 4 out of 6 times in mock Jeopardy session. Here are some quotes from the article I found interesting:
It displayed remarkable facility with cultural trivia (“This action flick starring Roy Scheider in a high-tech police helicopter was also briefly a TV series” — “What is ‘Blue Thunder’?”), science (“The greyhound originated more than 5,000 years ago in this African country, where it was used to hunt gazelles” — “What is Egypt?”) and sophisticated wordplay (“Classic candy bar that’s a female Supreme Court justice” — “What is Baby Ruth Ginsburg?”).
Software firms and university scientists have produced question-answering systems for years, but these have mostly been limited to simply phrased questions. Nobody ever tackled “Jeopardy!” because experts assumed that even for the latest artificial intelligence, the game was simply too hard: the clues are too puzzling and allusive, and the breadth of trivia is too wide.
With Watson, I.B.M. claims it has cracked the problem — and aims to prove as much on national TV. The producers of “Jeopardy!” have agreed to pit Watson against some of the game’s best former players as early as this fall (emphasis mine). To test Watson’s capabilities against actual humans, I.B.M.’s scientists began holding live matches last winter.
I’d definitely watch that episode… especially if Watson was pitted against Ken Jennings.
Under the hood:
[IBM's] main breakthrough was not the design of any single, brilliant new technique for analyzing language. Indeed, many of the statistical techniques Watson employs were already well known by computer scientists. One important thing that makes Watson so different is its enormous speed and memory. Taking advantage of I.B.M.’s supercomputing heft, Ferrucci’s team input millions of documents into Watson to build up its knowledge base — including, he says, “books, reference material, any sort of dictionary, thesauri, folksonomies, taxonomies, encyclopedias, any kind of reference material you can imagine getting your hands on or licensing. Novels, bibles, plays.”
The full article is worth the read.

