ThreadNinja is a Linux library my team created that tracks pthread_create() and pthread_join() calls in an application. It prints a stacktrace where each thread is created and where it is joined. Any rogue (unjoined) threads are reported when the application exits. ThreadNinja is unobtrusive: it does NOT have to be compiled into the code. This means you can use it on applications you didn’t compile.
We found it useful and thought we’d share it. It’s be no means production code… just a tool. Hack on it, expand it, change it… whatever. It’s pretty small, so it should be easy to dive right in. We’ve released it under the BSD license.
Cut To The Chase
You can checkout the source code from Google Code, or download the version 1.0 tarball directly (threadninja.tar.gz).
To build ThreadNinja, simply untar it and call make:
> tar -zxf threadninja.tar.gz
> make
Now, simply use LD_PRELOAD to run the application:
> LD_PRELOAD=/path/to/threadninja/build/libthreadninja.so.1 TheApplication
If you don’t see function names in the stacktraces that are generated, then the application needs to be compiled with debug symbols. For my test app, I had to compile with the -rdynamic option:
> g++ -Wall -rdynamic main.cpp -lpthread
This causes the global symbol table to be included in the executable, which contains all the application’s function names. For more info, look at the --export-dynamic option on the GNU linker (ld) man page.
The Story Behind ThreadNinja
My team was assigned to stabilize a large video application that runs as a Linux-based appliance. The application consisted of 100,000+ of lines to code that was a tangle of build warnings, circular references, and many creative hacks. Our particular task was to fix a persistent set of seg-faults and memory leaks.
Continue reading »
In C/C++, macros that take a variable number of arguments (called variadic macros) can be very useful. Having printf-style macros just makes certain things easier to read and understand. Below I’ll describe how to do this in a way that works on Windows and Linux, as well as supports empty argument lists.
Example
Consider the following log method:
void WriteLog( const LOG_LEVEL level, const char* file, const int line, const char* format, ... );
Pretty straight-forward: it allows you to specify the log level (NOTICE, WARNING, ERROR, etc), a file name and line number, and a user-defined description in “printf” format. To log a warning message, you would type:
WriteLog( LL_WARNING, __FILE__, __LINE__, "[WARNING] Failed to import descriptor '%s:%i'!", _descriptor, _id );
The entry in the log file would look like:
Core.cpp:715 [WARNING] Failed to import descriptor 'timing-engine'!
Creating a Variadic Macro
Calling WriteLog directly works fine, but it’s pretty verbose and annoying to use. One way to make it simpler is to wrap WriteLog with a variadic macro:
#define LOG_WARNING( format, ... ) \\
WriteLog( LL_WARNING, __FILE__, __LINE__, "[WARNING] " format, ##__VA_ARGS__ );
Then the example above would become:
LOG_WARNING( "Failed to import descriptor '%s:%i'!", _descriptor, _id );
The macro automatically fills in the file and line number where the log message was generated, and the log level is specified by the macro name (LOG_WARNING). This is much more straight-forward and I believe it makes the code easier to read.
Notice that __VA_ARGS__ is used to get the variable argument list and pass it to WriteLog. During compilation, the preprocessor replaces __VA_ARGS__ with the comma-separated list of arguments. The ‘##‘ that prefixes __VA_ARGS__ is vital if you want your macro to work like you expect. Without it, you would be required to have at least one argument after the format string. ’##‘ has a special meaning in this case: it causes the preceding comma to be deleted if there are no variable arguments. Without this, the following line would generate a syntax error:
LOG_WARNING( "This would cause a syntax error" );
Prefixing __VA_ARGS__ with ‘##‘ allows the above code to work just fine.
We ran into a weird problem the other day where our Linux video display appliance would lose audio support when the process was restarted. The audio was supposed to play through a custom joystick-keyboard that was attached via USB (the keyboard is used by security guards to PTZ cameras, control monitors, etc). The audio could be heard just fine when the box first booted, but if the application restarted audio would be lost.
Looking at the logs, we found that our audio pipeline was failing to open /dev/dsp on the restart. We then used lsof to list the open file descriptors to see which process currently held /dev/dsp:
# lsof | grep /dev/dsp
ntpd 18857 root 16u CHR 14,3 180099 /dev/dsp
What!?!?… why the heck is NTP opening the sound device and how did it steal it from us??? After some discussion we started remembering a problem in the past with ntpd stealing our SNMP diagnostics port. This just didn’t make any sense.
Digging into our appliance code, we found this line:
system( "service ntpd restart" );
This would be called each time we were notified by the security system that the NTP server address had changed (which fired once each time the process was started so we could get the initial address). But this still didn’t explain why NTP took over ownership of our file descriptors on restart.
Long story short: system() is implemented as fork() followed by execv(). By default, fork() gives a copy of the parent’s file descriptors to the child process (i.e. the ntpd child process got a copy of the /dev/dsp file descriptor). To prevent this, you have to set the FD_CLOEXEC flag on the file desciptors you don’t want copied.
For example:
fd = open( "/dev/dsp", O_RDWR );
fcntl( fd, F_SETFD, FD_CLOEXEC );
Conclusion: setting the FD_CLOEXEC flag on the /dev/dsp file descriptor fixed the problem for audio. However, most of the other file desciptors still got owned by ntpd. Did we go back and set the FD_CLOEXEC flag on all file descriptors, you ask? Nope. It turns out we had a script monitoring the NTP config file and restarting ntpd for us when the file got updated… we just had to update the config file and remove the system( "service ntpd restart" ) call.
Oh, and the reason audio worked on first boot but not subsequent restarts was due to a weird race condition around when /dev/dsp got opened.
Having printf-style functions is very useful. I find myself periodically having to remember how to write variable argument functions, so I decided to just blog about it.
#include <stdarg.h> // or <cstdarg>
// Hide annoying naming differences between Windows and other platforms
#ifdef WIN32
#define my_vsnprintf _vsnprintf
#else
#define my_vsnprintf vsnprintf
#endif
// The function
void Foo( const char* format, ... )
{
// Parse the argument list
va_list args;
va_start( args, format );
// Calculate the final length of the formatted string
int len = my_vsnprintf( 0, 0, format, args );
// Allocate a buffer (including room for null termination)
char* target_string = new char[++len];
// Generate the formatted string
my_vsnprintf( target_string, len, format, args );
// <Do something with the formatted string>
// Clean up
delete [] target_buffer;
va_end( args );
}
Gotchas
We ran into a problem with vsnprintf() using the Denx Linux distro on a PowerPC processor: vsnprintf( 0, 0, format, args ) would modify the va_list, which would cause a crash on the second call to vsnprintf()… the one that does the actual formatting. The work-around is to make a temporary copy of the va_list when determining the formatted string length:
va_list args_copy;
va_copy( args_copy, args );
int len = my_vsnprintf( 0, 0, format, args );
va_end( args_copy );
A lot of solutions I’ve found for recursively replacing text in files is implemented using shell scripts, perl, php, or some other inconvenient way. Rushi got it right by using the Linux command line. Here it is (slightly modified) from his blog:
find . -name “*.cpp” -print | xargs sed -i ’s/[find]/[replace]/g’
where “[find]” and “[replace]” are the things you are searching for and substituting.
To search files with multiple file extensions, use:
find . -name “*.cpp” -o -name “*.h” -o -name “*.c” | xargs sed -i ’s/[find]/[replace]/g’
ADDED 4-13-2009: See comments for other variations.
The title’s kind of a misnomer. This post is really to help me remember how to get a human-readable string from a Windows error code… I’m finally tired of always having to look it up
. However, my current situation revolves around determining why a DLL (or *.so on Linux) failed to load, so that’s why this post it titled the way it is.
I like to disable the annoying default dialog that pops up in Windows when a library fails to load.
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX | SEM_NOALIGNMENTFAULTEXCEPT | SEM_NOOPENFILEERRORBOX);
Now, here’s the code to get a user friendly text string:
#ifdef WIN32
LPVOID pStr = 0;
DWORD_PTR args[1] = { (DWORD_PTR)pFilename };
FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
FORMAT_MESSAGE_FROM_SYSTEM |
FORMAT_MESSAGE_ARGUMENT_ARRAY,
NULL,
GetLastError(),
MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
(LPTSTR)&pStr,
0,
(va_list*)args);
// TODO: Do something with the string pStr here.
LocalFree(pStr);
#else
// TODO: Call dlerror() and do something with the string.
#endif
So here’s a cool feature of GNU’s implementation of libc: you can get a stack backtrace (as an array of strings) dynamically in your code. This can be really useful when trying to determine the code path taken when an error occurs. Most times, it’s faster to just run the code in a debugger and use it to display a backtrace, but there are instances when doing it programmatically is your best option. For example, you could get a backtrace in your application’s exception handler and use it to augment error log messages.
First, you need to include execinfo.h to your code:
#include <execinfo.h>
Next, call the backtrace() function to get an array of void pointers that represents the current stack (the pointers are the return addresses for each stack frame).
void* tracePtrs[100];
int count = backtrace( tracePtrs, 100 );
The backtrace() function returns the number of entries in the array (read the man pages for more info about the array size).
Finally, you need to resolve the function names associated with the pointers. You have 2 options: backtrace_symbols() and backtrace_symbols_fd(). Both of these methods resolve the pointers to strings, but the difference is that backtrace_symbols() allocates the strings on the heap while backtrace_symbols_fd() writes the strings to a file descriptor that you can read. Just keep in mind that backtrace_symbols() won’t work if the heap has been trashed.
Here’s an example using backtrace_symbols():
char** funcNames = backtrace_symbols( tracePtrs, count );
// Print the stack trace
for( int ii = 0; ii < count; ii++ )
printf( “%s\n”, funcNames[ii] );
// Free the string pointers
free( funcNames );
NOTE: Make sure you call free() on the array of strings returned from backtrace_symbols().
For more information, here’s a good article from the Linux Journal.
Debugging C++ templates is difficult. Debugging C++ templates with GDB can be an act of torture for even seasoned GDB users. I like GDB, but there are some tricks you should know when using it to debug templates. In this post, I deal with setting breakpoints.
Breakpoint Basics:
Setting a breakpoint in GDB is supposed to be simple. Here we set a breakpoint at line 50 in file main.cpp:
(gdb) b main.cpp:50
Breakpoint 1 at 0×804937a: file main.cpp, line 50.
We can also use the function name and GDB will attempt to find the correct location for us:
(gdb) b DoSomething
Breakpoint 2 at 0×8049334: file main.cpp, line 150
Simple, right? Just wait…
Breakpoint Gotchas:
GDB’s breakpoint logic is pretty handy for simple projects, but it can break down fast when things get more complicated.
For example, let’s say your application is plugin-driven, with each plugin being a separate library. Now assume each plugin has a Plugin.cpp file under it’s own Source directory. Try to set a breakpoint in the Initialize() method of the Plugin class:
(gdb) b Initialize
Breakpoint 3 at 0×8049717: file main.cpp, line 230
Oops! There is an Initialize() method in main.cpp and GDB thought that’s where we wanted to put it: wrong!
Introduction
If you’ve ever done embedded development in C/C++, you are probably familiar with bitfields. They are a handy way to reference individual bits in things like hardware registers. The problem is that bitfields can lead to performance problems and race conditions if not used properly. I hope to highlight some of the issues you should consider when using them.
Usage
First, let’s assume you need to check various fields in a hardware register with the following layout:

You could define the following bitfield to represent this register:
1: struct HwReg
2: {
3: unsigned int Base : 16;
4: unsigned int Offset : 8;
5: unsigned int Rsvd : 5;
6: unsigned int Flag : 1;
7: unsigned int Type : 2;
8: };
The total size of this data type is sizeof(unsigned int), with each line defining a different region (field) within that type (this looks confusing when you first look at it). The following code uses the HwReg bitfield to access a memory-mapped register:
1: struct HwReg* pReg = (struct HwReg*)0×80001005;
2:
3: if (pReg->Flag && pReg->Type == TYPE_1)
4: {
5: void* address = pReg->Base + pReg->Offset;
6: }
Line 1 defines a pointer to the physical hardware register as type HwReg. We can now use this pointer to easily access the register fields. If this isn’t clear, you can read more about bitfields HERE.
Performance Problems
The compiler doesn’t know how to optimize bitfield accesses (especially because the pointers to memory-mapped hardware registers are almost always declared ‘volatile’). This means that every access to a member of the bitfield will require a read of the physical hardware register. This can be orders of magnitude slower than accessing main memory. In the code example above, the hardware register will be read 4 times; once for each field access.
The way to remedy this is to cache a copy of the register value and then operate on that. Consider the following code:
1: unsigned int* pFullReg = (unsigned int*)0×80001005;
2: unsigned int temp = *pFullReg;
3: struct HwReg* pReg = (struct HwReg*)&temp;
4:
5: if (pReg->Flag && pReg->Type == TYPE_1)
6: {
7: void* address = pReg->Base + pReg->Offset;
8: }
Line 1 defines a pointer to the physical hardware register. Line 2 performs the actual read into a local variable (the slowest part). This local copy is now in main memory and the CPU cache. Line 3 casts the cached value to the bitfield for easy access. Finally, all accesses to the register fields is on the cached value, which can be read very fast from L1 cache.
Another advantage to this approach is when the hardware requires locking before the register can be accessed. By caching the value, you can keep all the locking code localized to a single area of the function. Without caching, you would hold the lock for a longer period of time (possibly forcing other operations to block) and have to make sure to release the lock on every return path (more difficult with exceptions).
NOTE: Remember you are only working with a copy of the register value. If you update a value in the bitfield, you must still copy the updated value back to the register.
Race Conditions
As stated above, each access to a field value generates its own read/write operation. Even if the CPU architecture guarantees that an individual operation is atomic, updating multiple fields are not. Thus, in a multi-threaded application you must lock the entire block of code that operates on the bitfield. I again suggest caching the value, as you only need to lock the actual read/write of the entire register.
Conclusion
Bitfields are a nice language construct that can help make it easier to write clean code (as opposed to using macros and bitmasks). Unfortunately, it’s all too easy to shoot-yourself-in-the-foot with bitfields if you don’t understand the pitfalls. As always, use caution when writing performance-critical code and make sure you understand how to use the available code constructs.
Happy coding!


