Melting ClockHigh-performance timing is hard… no doubt about it. I can’t tell you how many times I’ve seen high-performance timing code done wrong. Timing is one of those things where a little knowledge can be problematic; the code may work, but it either won’t perform or will exhibit “unexplained” behavior. The purpose of this post is to explain a foundational component to getting timing right: the clock. I won’t focus on theory… this post is meant to be pragmatic.

Note: I’m talking here about interval timing (i.e. accurately measuring the duration between 2 events). This is different than synchronizing different clocks or maintaining accurate wall time.

Anatomy of the Clock

The first mistake most people make when doing timing is to use functions like gettimeofday(), GetSystemTime(), etc. These functions return what is called “wall time”… time that corresponds to a calendar date/time. These clocks suffer from the follow limitations:

  1. They have a low resolution: “High-performance” timing, by my definition, requires clock resolutions into the microseconds or better.
  2. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive).

For interval timing, all that’s needed for a clock is a simple counter that increments at a stable rate. For high-performance timing, the rate this counter increments should be high. A related constraint is that the counter must be monotonic (can never “tick” backwards… ever). The counter may overflow and wrap back to 0, but using unsigned math in your timing calculations can compensate for that (see example below).

Something to note: since we are usually measuring short durations, the drift of the clock is so small that we aren’t concerned by it (what matters is the drift between successive reads, not total drift over time).

Using the Clock

To use the clock, you need access to 2 pieces of information: the clock value, and the rate (frequency) at which it increments. Once you have that information, it’s trivial to calculate the duration between 2 successive reads of the clock.

For example, consider the following:

-        start equals the clock value at the beginning of the interval to measure.

-        end equals the clock value at the end of the interval.

-        frequency equals the frequency that the clock increments per second.

Calculating the duration of the interval (in seconds) is as simple as:

duration = (end – start) / frequency

duration will equal the floating-point interval time in seconds (e.g. 0.000237 = 237us).

A common use case I’ve seen is wanting to measure the duration between successive calls to a function. Here’s one way to implement that:

float freq = (float) get_frequency();

Foo() 
{
    unsigned now = read_clock();
    float duration = (float)(now – last) / (float)freq;
    last = now;

    /* Your code here */
}

It is important to use unsigned variables for now and last or your calculations will go haywire if the clock ever wraps back to 0. Consider what happens if now is less than last. Using unsigned variables allows the math operation to underflow, which produces the correct result. If you don’t believe me, just write some test code and see for yourself… it’s an important concept to understand.

Don’t Use RDTSC As Your Clock

I’m dismayed by how many forum posts suggest that newbie’s use RDTSC for timing. Don’t get me wrong; the TSC isn’t bad in-and-of-itself. It’s just way too hard to get timing right with it unless you’re an expert… and if you’re reading this, chances are you aren’t :) . Stick to the higher level API’s I present below.

Even if you are an expert, I still suggest avoiding RDTSC for the following reasons:

  1. It’s processor core specific: Using RDTSC will give different values on different processor cores. This causes non-monotonic clock behavior as a thread is migrated across cores during execution (i.e. the clock can tick backwards on two successive reads). Conversely, the clock may also appear to jump forwards if the thread moves to a different core.
  2. It doesn’t always tick at the same rate: On some systems, the clock frequency will vary as the CPU load changes. This is due to power saving features in the processor that throttle down the clock speed when the load is low (e.g. Intel SpeedStep).
  3. It’s unnecessary: there are better alternatives to RDTSC that are readily available (HPET and APIC).

Game Timing and Multicore Processors” has more info, though it’s centered around Windows.

Guidelines For Providing Multimedia Support” has a good summary of the hardware clocks on the PC platform, and justification for the creation of the HPET.

A more technical look into the past problems with RDTSC can be found in this email from Rick Brunner (AMD Fellow).

If you’re having trouble sleeping, you can look at Intel’s HPET design document.

Linux Clock

POSIX.1b defines realtime clock methods that you’ll find on most *NIX systems (the full spec can be viewed HERE). Specifically, you want to use clock_getres() and clock_gettime(). clock_getres() returns the resolution (frequency) of the clock, and clock_gettime() returns the current value of the clock. Most systems implement the CLOCK_MONOTONIC type, which provides a frequency-stable, monotonically-increasing counter. The resolution of CLOCK_MONOTONIC is high on the 2.6 kernel, in my experience. I recommend using this clock when building a high-performance timing solutions on Linux.

The methods are defined in time.h, and you need to link against librt (pass ‘-lrt’ to gcc). The prototypes for the functions are:

int clock_getres(clockid_t clock_id, struct timespec *res);
int clock_gettime(clockid_t clock_id, struct timespec *tp);

A detailed description of these methods can be found HERE.

I also suggest looking at clock_nanosleep(), but that’s a separate topic.

Windows Clock

On Windows, QueryPerformanceFrequency() and QueryPerformanceCounter() are the obvious choice. QueryPerformanceFrequency() returns (surprise!) the frequency of the counter. QueryPerformanceCounter() returns the current value of the counter. Just like CLOCK_MONOTONIC on Linux, the Windows performance counter is a high-frequency, stable, monotonically-increasing counter.

The methods are defined in windows.h. The prototypes are:

BOOL QueryPerformanceFrequency(LARGE_INTEGER *lpFrequency);
BOOL QueryPerformanceCounter(LARGE_INTEGER *lpPerformanceCount)
;

Windows Timing Errata

Note: The follow items do NOT affect the resolution of the Window’s performance counter, but I think it’s still important to know when doing timing on Windows.

One problem you may run into on Windows is that the default system clock interval defaults to 10 or 15ms (depending of the OS version). This clock is what drives all the timers and sleep functions for that platform. What this means is that, left to the default, your timers will trigger 5 to 7.5ms late, on average.

You can fix this by increasing the system clock resolution to 2ms (I remember reading a tech article by Microsoft saying that 2ms gave better system performance than 1ms, but I can’t find it for the life of me). You do this by using the timeGetDevCaps(), timeBeginPeriod() and timeEndPeriod(). Sample code can be found HERE. You have to include mmsystem.h and link against Winmm.lib.

I feel it necessary to note that increasing the system clock interval negatively affects power consumption. The Windows 7 blog as an interesting breakdown of “Windows 7 Energy Efficiency”. Specifically, they noticed a 10% drop in battery life when the clock resolution was set to 1ms using timeBeginPeriod().

Maybe this last section belongs in a separate post, but whatever… :)

Tagged with:  

8 Responses to High-performance Timing on Linux / Windows

  1. Roberto Martinez says:

    According to the link of the documentation, it says:
    The gettimeofday() function shall obtain the current time, expressed as seconds and microseconds

    But you said:
    They have a low resolution: “High-performance” timing, by my definition, requires clock resolutions into the microseconds or better.

    Do you think the documentation is wrong?

    Reply

  2. Tom says:

    @Roberto There’s a difference between “resolution” and “units”. gettimeofday() may be expressed as microseconds (units), but that doesn’t mean it “ticks” fast enough to measure sub-millisecond times (resolution).

    For example, assume I have a clock that “ticks” every second. Since it only ticks each second, it’s impossible for me to use it to measure sub-second intervals… it doesn’t have the resolution. If I sample the clock faster than a second, I’ll just get the previous value. However, I’m free to express the clock time in any units I want: hours, microseconds, Plank time, whatever. Say I sample the clock and it reads 1000 microseconds… then I sample it again 10 microseconds later… it still reads 1000… just because the units are microseconds doesn’t mean it has the resolution to update at that rate.

    Hope this helps!

    Reply

  3. Frank Vaughn says:

    This is great. I was looking for MONOTONIC and got your article. Very clear and simple. Thanks…

    Reply

  4. Roy says:

    He’s a software engineer relieved the blogger writes in the 3rd person, as he used to write in the 4th person, but stopped after being arrested for mental loitering in 50 of 150 countries. The commenter reports this ratio is dynamic, and climbing. For details he suggests filing for transcripts from the NSA under the Freedom of Information Act, AKA the Lolololol Act.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>