Introduction
If you’ve ever done embedded development in C/C++, you are probably familiar with bitfields. They are a handy way to reference individual bits in things like hardware registers. The problem is that bitfields can lead to performance problems and race conditions if not used properly. I hope to highlight some of the issues you should consider when using them.
Usage
First, let’s assume you need to check various fields in a hardware register with the following layout:

You could define the following bitfield to represent this register:
1: struct HwReg
2: {
3: unsigned int Base : 16;
4: unsigned int Offset : 8;
5: unsigned int Rsvd : 5;
6: unsigned int Flag : 1;
7: unsigned int Type : 2;
8: };
The total size of this data type is sizeof(unsigned int), with each line defining a different region (field) within that type (this looks confusing when you first look at it). The following code uses the HwReg bitfield to access a memory-mapped register:
1: struct HwReg* pReg = (struct HwReg*)0×80001005;
2:
3: if (pReg->Flag && pReg->Type == TYPE_1)
4: {
5: void* address = pReg->Base + pReg->Offset;
6: }
Line 1 defines a pointer to the physical hardware register as type HwReg. We can now use this pointer to easily access the register fields. If this isn’t clear, you can read more about bitfields HERE.
Performance Problems
The compiler doesn’t know how to optimize bitfield accesses (especially because the pointers to memory-mapped hardware registers are almost always declared ‘volatile’). This means that every access to a member of the bitfield will require a read of the physical hardware register. This can be orders of magnitude slower than accessing main memory. In the code example above, the hardware register will be read 4 times; once for each field access.
The way to remedy this is to cache a copy of the register value and then operate on that. Consider the following code:
1: unsigned int* pFullReg = (unsigned int*)0×80001005;
2: unsigned int temp = *pFullReg;
3: struct HwReg* pReg = (struct HwReg*)&temp;
4:
5: if (pReg->Flag && pReg->Type == TYPE_1)
6: {
7: void* address = pReg->Base + pReg->Offset;
8: }
Line 1 defines a pointer to the physical hardware register. Line 2 performs the actual read into a local variable (the slowest part). This local copy is now in main memory and the CPU cache. Line 3 casts the cached value to the bitfield for easy access. Finally, all accesses to the register fields is on the cached value, which can be read very fast from L1 cache.
Another advantage to this approach is when the hardware requires locking before the register can be accessed. By caching the value, you can keep all the locking code localized to a single area of the function. Without caching, you would hold the lock for a longer period of time (possibly forcing other operations to block) and have to make sure to release the lock on every return path (more difficult with exceptions).
NOTE: Remember you are only working with a copy of the register value. If you update a value in the bitfield, you must still copy the updated value back to the register.
Race Conditions
As stated above, each access to a field value generates its own read/write operation. Even if the CPU architecture guarantees that an individual operation is atomic, updating multiple fields are not. Thus, in a multi-threaded application you must lock the entire block of code that operates on the bitfield. I again suggest caching the value, as you only need to lock the actual read/write of the entire register.
Conclusion
Bitfields are a nice language construct that can help make it easier to write clean code (as opposed to using macros and bitmasks). Unfortunately, it’s all too easy to shoot-yourself-in-the-foot with bitfields if you don’t understand the pitfalls. As always, use caution when writing performance-critical code and make sure you understand how to use the available code constructs.
Happy coding!


There appears to be an error in this blog post – ‘virtual’ should likely be ‘volatile’
Reply
Tom reply on July 29th, 2008 8:48 am:
Fixed. Thanks!
Reply
short & sweet.
Reply
The Example to mask “volatile” (in the “Performance Problems” ) should not be practised as volatile is having its own special purpose and masking this property will give unexpected behaviuor from the application.
Reply
In reply to nix:
hum, yes and no.
You should know when you shouldn’t mask/cache (usually you shouldn’t) and when
you can do freely (for example to read the status of the struct just one time and then read the various flags of that register snapshot).
But I have to agree that usually caching isn’t a good idea for volatile variables.
Reply
Hey, that was interesting,
Great information here on bitfields.
Thanks for writing about it
Reply
I would say that you should still lock the entire operation, from the caching to the writing back from the cache. If you don’t, what happens when you read, release the lock, and then get interrupted by a process that manages to acquire the lock, cache, release the lock, perform the same block processing you were going to do, but with an updated register, reaquire the lock again, and then write. When the first thread runs again it will the clobber the register.
Reply
Nice article, especially the race condition section. There’s one other big problem you missed though. The order of allocation of bit-fields is implementation defined, so it’s entirely possible, though unlikely, that the implementation puts the fields in some order other than what you specify. That means, for example, that Base may actually occupy bits 16-31, and your direct read from a hardware register wont work. To use this reliably, you must have a thorough understanding of your implementation (compiler, etc) and even then, it’s not likely to be portable. See section 6.7.2.1 of the C99 standard for a list of other implementation-defined and unspecified behaviors to look out for.
Reply
I know it’s only an example but shouldn’t the register address be a modulo 4 address?
Reply