The New York Times has a great article on a new system developed by IBM named “Watson”. It’s a computer system that’s scraped 10’s of millions of documents from the Internet and compiled a massive database of knowledge. It used natural language parsing to interpret questions and generate answers. The cool thing is that it beat former Jeopardy contestants 4 out of 6 times in mock Jeopardy session. Here are some quotes from the article I found interesting:
It displayed remarkable facility with cultural trivia (“This action flick starring Roy Scheider in a high-tech police helicopter was also briefly a TV series” — “What is ‘Blue Thunder’?”), science (“The greyhound originated more than 5,000 years ago in this African country, where it was used to hunt gazelles” — “What is Egypt?”) and sophisticated wordplay (“Classic candy bar that’s a female Supreme Court justice” — “What is Baby Ruth Ginsburg?”).
Software firms and university scientists have produced question-answering systems for years, but these have mostly been limited to simply phrased questions. Nobody ever tackled “Jeopardy!” because experts assumed that even for the latest artificial intelligence, the game was simply too hard: the clues are too puzzling and allusive, and the breadth of trivia is too wide.
With Watson, I.B.M. claims it has cracked the problem — and aims to prove as much on national TV. The producers of “Jeopardy!” have agreed to pit Watson against some of the game’s best former players as early as this fall (emphasis mine). To test Watson’s capabilities against actual humans, I.B.M.’s scientists began holding live matches last winter.
I’d definitely watch that episode… especially if Watson was pitted against Ken Jennings.
Under the hood:
[IBM's] main breakthrough was not the design of any single, brilliant new technique for analyzing language. Indeed, many of the statistical techniques Watson employs were already well known by computer scientists. One important thing that makes Watson so different is its enormous speed and memory. Taking advantage of I.B.M.’s supercomputing heft, Ferrucci’s team input millions of documents into Watson to build up its knowledge base — including, he says, “books, reference material, any sort of dictionary, thesauri, folksonomies, taxonomies, encyclopedias, any kind of reference material you can imagine getting your hands on or licensing. Novels, bibles, plays.”
The full article is worth the read.
Having printf-style functions is very useful. I find myself periodically having to remember how to write variable argument functions, so I decided to just blog about it.
#include <stdarg.h> // or <cstdarg>
// Hide annoying naming differences between Windows and other platforms
#ifdef WIN32
#define my_vsnprintf _vsnprintf
#else
#define my_vsnprintf vsnprintf
#endif
// The function
void Foo( const char* format, ... )
{
// Parse the argument list
va_list args;
va_start( args, format );
// Calculate the final length of the formatted string
int len = my_vsnprintf( 0, 0, format, args );
// Allocate a buffer (including room for null termination)
char* target_string = new char[++len];
// Generate the formatted string
my_vsnprintf( target_string, len, format, args );
// <Do something with the formatted string>
// Clean up
delete [] target_buffer;
va_end( args );
}
Gotchas
We ran into a problem with vsnprintf() using the Denx Linux distro on a PowerPC processor: vsnprintf( 0, 0, format, args ) would modify the va_list, which would cause a crash on the second call to vsnprintf()… the one that does the actual formatting. The work-around is to make a temporary copy of the va_list when determining the formatted string length:
va_list args_copy;
va_copy( args_copy, args );
int len = my_vsnprintf( 0, 0, format, args );
va_end( args_copy );
I have to hand it to Microsoft… if this is a supposed to be a serious tool (the promo seems to sell it as such) then they should just throw in the towel now before they tarnish their image anymore. The idea behind SongSmith is that you sing the melody and it will auto-generate the backing music… I’m sure you can already see where this is going. Many people have run the lyric tracks from popular songs through it with funny results. I’ll let you find the actual SongSmith demo video on YouTube yourself… I’m posting my favorite SongSmith results below.
(requires Adobe Flash plugin… click HERE to watch it on YouTube)
(requires Adobe Flash plugin… click HERE to watch it on YouTube)
(requires Adobe Flash plugin… click HERE to watch it on YouTube)
My friend Nick brought this video to my attention. It shows how small $100 million dollars is compared to the entire US budget. What really stood out to me was how much of the budget is dominated by welfare handouts… looks like over 80%. I guess we are “all socialists now.”
(requires Adobe Flash plugin… click HERE to watch it on YouTube)
I found my Charlie Hunter CD recently and I’ve been enjoying listening to him again. He’s an amazing musician apart from the fact that he plays the bass and guitar simultaneously. YouTube didn’t exist when I first got into him, and it’s nice now to be able to watch him play. Enjoy!
(requires Adobe Flash plugin… click HERE to watch it on YouTube)
If this video ever gets deleted from YouTube, you can download it HERE.
Well, I wasn’t planning of having 2 of these posts in a row be about Obama, but I guess the President is always an easy target.
President Obama claims that his 2010 budget will save American taxpayers $2 trillion over the next 10 years… $2 trillion compared to what? His numbers all depend on what the baseline is. According to Politifact, it appears Obama’s stretching the numbers, and rates his claim as “barely true”.
Keep in mind that the deficit is a number that reflects income minus expenses over the course of a single year…
When we talk about the deficit getting “smaller,” though, you have to ask “smaller compared to what?”
The answer: smaller than it would have been without Obama’s proposed changes…
Obama’s critics contend he’s inflating the baseline. In particular, they say, Obama claims we would have spent a pile of money on “overseas contingency operations,” which means the wars in Iraq and Afghanistan. Obama’s budget then posits that he wouldn’t spend that much money…
“It’s the equivalent to assuming an expensive vacation, then not taking it, and saying you’ve cut your family’s budget,” he (Brian Riedl) said. “To claim savings off that baseline is ridiculous.”
The left-leaning Center on Budget and Policy Priorities… said a more realistic number for deficit savings was $900 billion, a little less than half of Obama’s estimate.
… Obama’s budget increases revenues by letting the Bush tax cuts expire… The Obama budget document shows a deficit reduction of $636 billion over 10 years from those tax increases.
You can read the entire article HERE.
After some digging, I finally found out how to create a regex for Sed (stream editor) that will find a line that does NOT contain a particular string. First, I used ‘find’ to list all the *.cpp files in my source tree:
find . -name “*.cpp” -print
Then I piped the files to ’sed’ via ‘xargs’ (Note: replace the ‘-e’ with ‘-i’ to actually modify the files inline):
find . -name “*.cpp” -print | xargs sed -e ‘/STRING_TO_INGORE/! { d }’
The trick is adding the ‘!’ (exclamation point) after the search expression. Without it, ’sed’ would think you only want lines with the string, not without it.
This is different than another syntax I’ve seen used: /(?!STRING_TO_IGNORE)/.
Here’s another example. Say you want to replace STRING1 with STRING2 only if the first characters of the line (ignoring white space) are NOT “//”… i.e. skip the string replacement in code comments:
sed -i ‘/^[ \t]*\/\/.*/! { s/STRING1/STRING2/ }’
NOTE: ‘[ \t]*’ means ignore 0 or more spaces or tabs.



