High-performance timing is hard… no doubt about it. I can’t tell you how many times I’ve seen high-performance timing code done wrong. Timing is one of those things where a little knowledge can be problematic; the code may work, but it either won’t perform or will exhibit “unexplained” behavior. The purpose of this post is to explain a foundational component to getting timing right: the clock. I won’t focus on theory… this post is meant to be pragmatic.
Note: I’m talking here about interval timing (i.e. accurately measuring the duration between 2 events). This is different than synchronizing different clocks or maintaining accurate wall time.
Anatomy of the Clock
The first mistake most people make when doing timing is to use functions like gettimeofday(), GetSystemTime(), etc. These functions return what is called “wall time”… time that corresponds to a calendar date/time. These clocks suffer from the follow limitations:
- They have a low resolution: “High-performance” timing, by my definition, requires clock resolutions into the microseconds or better.
- They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive).
For interval timing, all that’s needed for a clock is a simple counter that increments at a stable rate. For high-performance timing, the rate this counter increments should be high. A related constraint is that the counter must be monotonic (can never “tick” backwards… ever). The counter may overflow and wrap back to 0, but using unsigned math in your timing calculations can compensate for that (see example below).
Something to note: since we are usually measuring short durations, the drift of the clock is so small that we aren’t concerned by it (what matters is the drift between successive reads, not total drift over time).
Continue reading »
We got an interesting application crash yesterday with a confusing message similar to this:
Fault bucket 42424242, type 1
Event Name: APPCRASH
Response: None
Cab Id: 0
Problem signature:
P1: MyApp.exe
P2: 1.42.42.42
P3: 598773cf
P4: StackHash_ac62
P5: 0.0.0.0
P6: 00000000
P7: c0000007
P8: 00000000
P9:
P10:
We spent some time wondering if our crypto libraries were the problem (we just made some changes recently), but concluded that was unlikely. So what the heck is the “StackHash” module? Did our trashed stack cause the kernel to think we were a different module? Nope.
The answer is that the Windows executive couldn’t identify the module we were in when the application crashed (it uses the instruction pointer to determine what code was executing). In this case, the kernel simply takes a hash of the stack so at least we might be able to identify if we’ve seen this exact crash before. Here’s the answer summarized by an engineer from Microsoft:
In the OS when I try to get a faulting module name it is possible that there is no module laoded (sic) at that address. For example in this case the EIP was zero. So in those cases where a module is not loaded and it is not also in the unloaded module list, I take a stack hash of the stack so that we can identify this crash from other crashes where also the module is not known.
I’ve been jealous of Rob’s screensaver for awhile now. I thought it was Mac only until I asked him about it… nope. I installed the Windows version today. What a beautiful piece of art! The creator, Jesson Yip, describes it like this:
Analogy is a typographic clock which fuses the immediacy of digital with the visual-spatial quality of analogue into a hybrid format. It presents an everyday object with a fresh twist.
Click on the image below to visit his site and download it. Enjoy!
The Windows development environment provided by VisualStudio has some neat tools for detecting memory leaks in code. You simply #define _CRTDBG_MAP_ALLOC before including your headers, and #include <crtdbg.h> as the last header:
#define _CRTDBG_MAP_ALLOC
// Include header files here
#include <crtdbg.h>
Then, you call _CrtDumpMemoryLeaks() before your application exits. If your program exits at many points, you can alternatively call _CrtSetDbgFlag( _CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF ) at the beginning of you application, which will cause the leaks to also be printed when it exits. The results are printed to the Debug Window and look like the following:
Detected memory leaks!
Dumping objects ->
C:\PROGRAM FILES\VISUAL STUDIO\MyProjects\leaktest\leaktest.cpp(20) : {18}
normal block at 0×00780E80, 64 bytes long.
Data: < > CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD
Object dump complete.
Cool, Huh?! However, some libraries don’t play nice with this, as I explain below.
Microsoft has started to release developer information for Windows 7 (the follow-on to Windows Vista). Of particular interest to me is the Windows 7 Developer Guide. It discusses many of the new features that will be available when this new version of Windows is released.
Of particular interest to me are the changes to DirectX 10, Media Foundation, and the new DirectX 11. Here are some highlights.
DirectX 11:
- “…resource creation and management has been optimized for multithreaded use, enabling more efficient dynamic texture management for streaming.”
- Several improvements have been made to the high-level shading language (HLSL), such as a limited form of dynamic linkage in shaders to improve specialization complexity, and object-oriented programming constructs like classes and interfaces.”
DirectX 10 improvements:
- “The pipeline also introduces the geometry shader stage, which offloads work entirely from the CPU to the GPU. This new stage enables you to create geometry, stream the data to memory, and render the geometry with no CPU interaction.”
- Predicated rendering performs occlusion culling to reduce the amount of geometry that is rendered. Instancing APIs can dramatically reduce the amount of geometry that needs to be transferred to the GPU by drawing multiple-instances of similar objects. Texture arrays enable the GPU to do texture swapping without CPU intervention.”
Media Foundation improvements:
- “…Media Foundation has been enhanced to provide better format support, including MPEG-4, as well as support for video capture devices and hardware codecs.”
- “In Windows 7, Media Foundation provides extensive format support that includes codecs for H.264 video, MJPEG, and MP3; new sources for MP4, 3GP, MPEG2-TS, and AVI; and new file sinks for MP4, 3GP, and MP3.”
- “In Windows Vista, Media Foundation exposed a relatively low-level set of APIs. These APIs are flexible, but may not be appropriate for performing tasks. Windows 7 adds new high-level APIs that make it simpler to write media applications in C++.”
The Sydney Morning Herald has a great piece on a computer malfunction that showed up during the 2008 Olympic opening ceremony in Beijing. The dreaded “Blue Screen of Death” (BSOD), familiar to Windows XP users, was projected on the stadium ceiling when one of the display computers crashed. Here’s one of the images:

It seems that Lenovo (the PC supplier for the games) chose Windows XP instead fo Vista. From the article:
Lenovo chairman, Yang Yuanqing, was quoted as saying that because of the complexity of the IT functions at the Games, it was decided to not use the the more recent operating system. “If it’s not stable, it could have some problems,” he said.
Ironically, former Microsoft CEO Bill Gates was in the crowd (he can run but he can’t hide).
Gizmodo has some more images and links to the incident.
Overview
When writing a shared library, it is sometimes useful to have a set of functions that get called when the library is loaded and unloaded. In Windows, this is done by implementing the DllMain function. This function is called by the loader whenever a DLL is loaded or unloaded into the address space of a process (and also when the process creates a new thread, but it is less common to handle this case). A value is passed in as an argument to the DllMain function that indicates which event is occurring: DLL load or unload.
On Linux, one must use the GCC __attribute__((constructor)) and __attribute__((destructor)) keywords (double underscores before and after) to explicitly declare functions to be called on load and unload. These keywords cause the compiler/linker to add the specified functions to the __CTOR_LIST__ and __DTOR_LIST__ (“ConstrucTOR LIST” and “DestrucTOR LIST” respectively) in the object file. Functions on the __CTOR_LIST__ are called by the loader when the library is loaded (either implicitly or by dlopen()). The main purpose for this list is to call the constructors on global objects in the library. Conversely, functions on the __DTOR_LIST__ are called when the library is unloaded (either implicitly or by dlclose()). By adding initialization and clean-up functions to this list, one can effectively replicate the DllMain functionality on Linux.
NOTE: There are many ways to “shoot yourself in the foot” with these methods (on both Windows and Linux) because certain things aren’t available to your library until loading is complete. Don’t use these methods unless you have a real need… just export an Initialize() and Destroy() function instead, and force the consuming application to call them. Please read the “Gotcha’s” section below.



