In this post, I outline a basic GDI video window class. GDI is inefficient for rendering video, but it’s also very convinient when writing simple tools on Windows. I find myself doing it often enough that it makes sense to do a brain-dump here. This is only an outline, but it should get you over the biggest hurdles.
Outline
- Video Window Class Definition
- Creating and Destroying the Window
- The Window Thread
- The Message Pump
- Rendering a Video Frame
Things to note:
- You will need a thread in your window class to “pump” window events.
- GDI does NOT do any color conversion, so you will need to convert each frame into the same format as the display (always RGB, usually 24 or 32 bits).
Video Window Class Definition
class VideoWindow
{
public:
...
private:
static DWORD _ThreadMain(void*);
static LRESULT WINAPI _MsgProc(HWND, UINT, WPARAM, LPARAM);
CRITICAL_SECTION _lok;
HANDLE _thread;
HWND _hWnd;
HDC _hDC;
HDC _hFrameDC;
HFONT _hFont;
int _bpp; // bit depth
volatile bool _exit;
int _x, _y, _w, _h; // Window dimensions
int _imgWidth, _imgHeight, _imgLength; // Frame dimensions
unsigned char *_imgBits;
};
Creating and Destroying the Window
void VideoWindow::Create(int x, int y, int w, int h)
{
EnterCriticalSection(&_lok);
_x = x;
_y = y;
_w = w;
_h = h;
_hWnd = 0;
_exit = false;
_thread = CreateThread( NULL, NULL, (LPTHREAD_START_ROUTINE)_ThreadMain,
this, NULL, 0 );
LeaveCriticalSection(&_lok);
// Wait for window to be created. Cheap, but it works.
while (!_hWnd) {
::Sleep(100);
}
}
Continue reading »
The C99 spec requires that math.h should define the constant float value NAN. However, NAN isn’t defined in the Visual Studio version of math.h, so you have to define it yourself (VS only implements C89). It’s pretty straight-forward for 32-bit machines… here’s my implementation with cross-platform protection:
#ifdef WIN32
#ifndef NAN
static const unsigned long __nan[2] = {0xffffffff, 0x7fffffff};
#define NAN (*(const float *) __nan)
#endif
#endif
This code is adapted from an MSDN example.
Writing code that works on Windows, Linux, and Mac is frequently challenging. Socket programming is no exception. Modern versions of Linux and Mac have full implementations of the latest IPv6 socket API extensions defined in RFC 3493. Windows, however, has only a partial implementation of the original (deprecated) version, RFC 2553. This sounds worse than it is, but it’s something you have to consider.
Note: This post assumes you are already familiar with the socket extensions for IPv6 (RFC 3493).
Linux and Mac
Good news… they both fully support RFC 3493.
Windows
Windows IPv6 support varies based on which version you’re targeting. Microsoft started adding IPv6 in Windows 2000, and they’ve continued adding more of the socket extensions as time went on. Most of the core functionality is present in XP, and what’s missing is easily replaced by using Winsock calls directly (more on this later).
Windows gained IPv6 support while RFC 2553 was still the supported standard. Since then, it has been deprecated by RFC 3493. However, Microsoft doesn’t want to break existing code written against it’s API, so the older API lives on. The main impact of this is that sockaddr_in6 and sockaddr_storage are slightly different on Windows than Mac and Linux. The size of the structures across platforms is the same (the sa_family_t member was shortened), it’s just that the Windows structures don’t begin with the length member. For example:
// Linux and Mac
struct sockaddr_in6 {
uint8_t sin6_len; /* Added in RFC 3493 */
sa_family_t sin6_family;
...
};
struct sockaddr_storage {
uint8_t ss_len; /* Added in RFC 3493 */
sa_family_t ss_family;
...
};
// Windows
struct sockaddr_in6 {
sa_family_t sin6_family;
...
};
struct sockaddr_storage {
sa_family_t ss_family;
...
};
I’ve never had a problem with this, because the size of sockaddr_in6 is easily determined (sizeof(sockaddr_in6)) and I always end up casting sockaddr_storage to the specific type (sockaddr_in or sockaddr_in6) based on ss_family.
Besides the data structure differences, it’s important to remember that Microsoft added IPv6 support over multiple versions. Support first appeared in Windows 2000, but more of the extensions have been added over time. Most of the core functionality was present in XP (including multicast), but not everything is implemented as of Windows 7. It’s annoying, but I will say that what’s missing is easily replaced by using Winsock calls directly.
Here’s the breakdown of IPv6 socket extensions by Windows version:
| Socket Extension | 2K | XP | Vista | 7 | Comments |
|---|---|---|---|---|---|
| if_indextoname() | x | x | GetAdaptersAddresses() for XP | ||
| if_nametoindex() | x | x | GetAdaptersAddresses() for XP | ||
| if_nameindex() | GetAdaptersAddresses() (XP, later) | ||||
| if_freenameindex() | |||||
| getaddrinfo() | x | x | x | x | |
| getnameinfo() | x | x | x | x | |
| freeaddrinfo() | x | x | x | x | |
| gai_strerror() | x | x | x | x | |
| inet_pton() | x | x | WSAStringToAddress() (2000, XP) | ||
| inet_ntop() | x | x | WSAAddressToString() (2000, XP) | ||
| All IN6_IS_ADDR_* macros | x | x | x | Ex: IN6_IS_ADDR_LOOPBACK() | |
| struct sockaddr_storage | x | x | x | ||
| Multicast support | x | x | x |
As you can see, you may still have to call Winsock directly depending on what version of Windows you are targeting. In my opinion, programming IPv6 on Windows is a lot easier if you only support XP and later, but I know that’s not always possible.
Summary
Modern operating systems all support IPv6. However, for business reasons, Windows has a slightly older version of the socket API which requires special consideration. My goal was to enumerate those differences to help make the transition to IPv6 smoother. Writing cross-platform code can be a fun challenge at times, but it’s also a little tedious. Hopefully, this post helps ease the pain.
Here’s a simple function using getaddrinfo() that will take an IP address and return the address family (AF_INET for IPv4, AF_INET6 for IPv6, etc). I works on both Linux and Windows. This function will also accept hostnames and return the address family of the first address returned. You can disable this feature (and the corresponding DNS lookup) by passing the AI_NUMERICHOST flag.
// Returns the address family of an address or hostname.
// AF_INET, AF_INET6, or -1 on error.
int getaddrfamily(const char *addr)
{
struct addrinfo hint, *info =0;
memset(&hint, 0, sizeof(hint));
hint.ai_family = AF_UNSPEC;
// Uncomment this to disable DNS lookup
//hint.ai_flags = AI_NUMERICHOST;
int ret = getaddrinfo(addr, 0, &hint, &info);
if (ret)
return -1;
int result = info->ai_family;
freeaddrinfo(info);
return result;
}
See RFC 3493 for more information on the latest socket API for dealing with IPv6.
Recently, I was tasked with picking an AAC audio codec library for one of our products. There were several libraries I had to evaluate, and I needed some quantitative metrics for doing the comparison. I’m not what professionals call an “expert listener”, so I had to do the best with what I had. While creating my test plan, I noticed that more people seemed interested in how I was doing testing rather than the actual results. So I decided to share my approach to audio codec testing.
Note: This is intended to be a pragmatic guide for engineers evaluating codecs. It is not a comprehensive treatment of the subject. The goal is to give readers a solid overview and some practical ideas.
Get Familiar with Psychoacoustics
Psychoacoustics is the study of how humans perceive sound. As you might expect, we humans don’t process sound in a perfect, linear fashion. The physical shape of the ear, the transfer function of the Basilar membrane, and the psychological interpretation of the data all affect how we perceive sound (and by extension, how “good” an audio codec sounds to us).
I highly recommend you start by reading this excerpt from Surround Sound: Psychoacoustics Part 1, by Tomlinson Holman (he created THX for Lucasfilm).
Understand the Codec
Make sure you understand the codec you are testing; not necessarily the implementation, but what tools (i.e. methods) the codec uses for compression. Many codecs have different “profiles”, which describe what subset of available tools are used (e.g. AAC). You should also have some idea how each compression tool works and any short-comings it has. This will help guide you in selecting reference audio samples and knowing what artifacts to listen for.
For an introduction to modern audio compression, read Audio Coding: An Introduction to Data Compression Part 1, and Part 2 (discusses MP3 and AAC). I actually suggest buying the book “Introduction to Data Compression”, by Khalid Sayood.
Understand the API
Make sure you actually read the codec documentation and look at any available code samples. This step has more to do with due-diligence than anything, as I haven’t seen a codec API we couldn’t work with, but you need to do this. This will also help you scope the work required to get a working encoder/decoder for future steps (if your lucky, the sample code can be used).
Choose the Reference Audio Samples
An effective test requires multiple audio samples with different characteristics. There are many types of artifacts a codec can introduce, and your choice of audio samples will dictate how easy they are to detect. It’s also important to pick samples that reflect the actual types of sound the codec to have to deal with. For example, if the final system will primarily be encoding speech, then you should choose more speech-oriented references as opposed to music samples.
Some characteristics you might consider:
- Transients (snare drum): Sensitive to pre-echo and noise “smearing”.
- Tonal structure (clarinet, saxophone): Sensitive to noise and “roughness”.
- Natural speech (male and female voices of various languages): Sensitive to distortion and smearing of “attacks”.
- Complex sound (bag pipes): Stresses the codec.
- High bandwidth (bag pipes): Loss of high frequencies and program-modulated high frequency noise.
It is also possible to use synthetic sounds and sweeps, but this is only recommended for the automated objective tests below.
As a basic guideline, you need 10-25 second “raw” samples recorded at the highest sample rate your system needs to work with. It is vital that the samples you choose have never been compressed with a lossy codec (mp3, AAC, etc)… that would severally limit the quality of your test. For sample rate and size, I suggest 48kHz 16-bit PCM, but a lower rate/size makes sense if the final system is limited in this area. It also makes sense to use a sample rate of 44.1kHz, since many quality audio samples can be ripped losslessly from CD. Just keep in mind that the objective PEAQ test mentioned below requires 48kHz 16-bit PCM, so up-sampling may be required.
The audio samples can be stored in whatever container format you want (raw, WAV, etc) as long as your codec test application can unpack it. This is important to keep in mind… you don’t want to accidentally run the WAV header through the codec (yes, I’ve done this). The container format is more of a practical issue, but it was worth mentioning.
Generate Various Test Samples and Observe CPU Load
This step is pretty straight-forward: wrap the codec in an application and encode the reference audio samples at different bit rates. You should choose bit rates that represent the full spectrum of bit rates that will be used in the final system. While you’re encoding, track the CPU usage on the codec and how many cores it’s using. You may even want to do a separate test running many encodes in parallel (this works nicely if the CPU usage is too low to measure accurately). Make sure to consider application overhead and disk I/O when making measurements.
After encoding, you need to decode back to raw PCM. Clearly label your files so you know what bit rate each one was encoded with. These decoded test samples are what we will be comparing to the original reference samples.
Do a Subjective Test
How you conduct your subjective testing will depend on several factors, such as time constraints, cost, and the required test precision. At the low end, you could simply listen to the test samples in a pair of headphones and judge the quality yourself. For a high precision test, you could do a full ITU BS.1116 test using “expert listeners” in a controlled environment. While these examples represent the extremes, there are many permutations that can give you the desired quality of results.
The most common subjective test is called a “double-blind triple-stimulus with hidden reference” test. The listener hears three samples (commonly labeled A, B, and C) for a period of 10 to 25 seconds. A is always the original reference sample. The next two samples, B and C, are randomly assigned either the test sample from the codec or the original reference sample played again (called the “hidden reference”). The listener must then rate the difference between B and A, and C and A, not knowing which one is the test sample. The grading scale is:
- 5.0 Imperceptible
- 4.0 Perceptible, but not annoying
- 3.0 Slightly annoying
- 2.0 Annoying
- 1.0 Very annoying
Ideally, you would conduct several tests and average the results together. If you do the listening test yourself, your results will be limited to your listening skills and understanding of audio codec artifacts. Here’s a summary of factors that affect the quality of your results:
- The quality of the listener.
- The choice of audio samples.
- The number and duration of the testing.
- The testing environment, including speaker/headphone quality, room design, and listener placement.
- The quality of randomization of sample order to remove any correlation between samples.
- Proper statistical analysis of the combined test results.
A proper subjective test is both expensive and time consuming. It’s important to find the right balance for your particular needs.
Do an Objective Test
Evaluating a codec objectively requires testing methods that correlate well to actual human perception. You can’t simply measure the distortion introduced by the codec using traditional measurements like Signal-to-Noise ratio (S/N) and Total-Harmonic-Distortion (THD), because they don’t correlate well to perceived audio quality. Some distortion is imperceptible to the human ear, and codecs take advantage of this to increase the compression ratio.
Fortunately, the ITU has standardized an objective audio test called PEAQ (BS.1387). The acronym stands for Perceptual Evaluation of Audio Quality. PEAQ uses software to model the entire human auditory system (including blood flow noise in the inner ear) to generate a set of metrics that are used to give a final “quality” score. The original reference signal is compared to a signal run through the codec, and the result is a real number between 0.0 and -4.0. The result is interpreted on the following scale:
- 0.0 = Imperceptible
- -1.0 = Perceptible but not annoying
- -2.0 = Slightly annoying
- -3 .0= Annoying
- -4.0 = Very annoying
Obviously, values closer to zero are better.
The test was developed by a similar group of audio experts that developed BS.1116 (mentioned above) and the results have been validated against a long list of subjective tests done using expert listeners.
There are several free and commercial software packages available for doing PEAQ tests. The best free package I’ve found is AFsp from the McGill Telecommunications and Signal Processing Lab. There’s also peaqb, but there’s a comment that it gives incorrect results. AFsp worked great in my tests and included some helpful tools like CompAudio and InfoAudio.
Summary
Hopefully this post has given you a good starting point and some practical ideas for testing audio codecs. My goal was to provide a pragmatic approach with different options depending on what your actual evaluation needs are. This is in no way a comprehensive treatment of the subject; only an overview. I highly suggest reading some of the books I referenced if you’d like a deeper treatment of the subject. Either way, I hope you found this post helpful.
It always pays to read the documentation carefully. That’s the moral of this story, but I think the details are still worth reading.
We’ve been using the LGPL build of Ffmpeg for a while now. However, every few months during testing the decoder will segfault. It’s always on some random box, in a release build, in the middle of the night, and nobody can reproduce it (sound typical?). Well, I think I fixed the problem today and I wanted to share what I found.
I ran our decoder through Valgrind while playing an H.264 stream and noticed several errors like the following:
==6757== Invalid read of size 4
==6757== at 0x4D92C50: ff_h264_decode_nal (in/usr/local/pelco/lib/libavcodec.so.52.59.0)
Hum… strange. MPEG-4 doesn’t have this problem.
After triple-checking all our buffer size calculations, memcpy lengths, buffer-overrun checks, etc I was at a loss to explain this. I even added an arbitrary 4 bytes to every allocation but still ran into problems. I finally decided to go line-by-line through our decoder source, cross-referencing with the avcodec documentation.
Found it!
Looking at the documentation for avcodec_decode_video2(...), I found the following note:
Warning:
The input buffer must be FF_INPUT_BUFFER_PADDING_SIZE larger than the actual read bytes because some optimized bitstream readers read 32 or 64 bits at once and could read over the end. The end of the input buffer buf should be set to 0 to ensure that no overreading happens for damaged MPEG streams.Note:
You might have to align the input buffer avpkt->data. The alignment requirements depend on the CPU: on some CPUs it isn’t necessary at all, on others it won’t work at all if not aligned and on others it will work but it will have an impact on performance.
In practice, avpkt->data should have 4 byte alignment at minimum.
This must have been overlooked when the decoder was written… oops!
Solution
I ended up adding alignment and padding parameters to our buffer objects, which (internally) uses the following methods for allocation:
Windows: _aligned_malloc(), _aligned_free() (requires malloc.h)
Linux: posix_memalign(), free() (requires stdlib.h)
I mention these methods because I think they’re pretty handy and not everyone knows about them. However, don’t forget to read the documentation before using them!
A colleague of mine found a pretty cool profiling tool for Windows called Very Sleepy. It’s simple, straight-forward, free, and it works… everything I like in a tool (this is why I like the Sysinternals stuff so much). It’s also got call-graph and 64-bit support. Definitely worth checking out.
Some artists from Industrial Light & Magic (ILM) gave the closing keynote at the GPU Technology Conference (GTC) in 2009… it’ s well worth watching by itself (watch here). At GTC 2010, they presented a video talking about how the GPU and CUDA are helping to render effects faster. It’s a short video with lots of cool effects.
ct>
Here’s a copy of the video if it is ever removed (13MB, 3gp).
Here’s an interesting paper that came out of Stanford in 2005 about plenoptic lenses and light field photography. The techniques in the paper describe how using micro-lenses coupled with a standard lens, the full 4D light field can be captured on a digital image sensor (as opposed to the standard 2D light field of using a standard lens alone). With the full 4D field, ray-tracing techniques can be applied to map the incoming light rays and manipulate the image is ways that are usually only possible at the time the photo is taken.
For example, here’s what the raw image looks like with the micro-lens:
If you zoom in, you can see the tiny micro-images:
Using the micro-images, one can calculate the ray vectors for the incoming light rays. This allows software to change the depth-of-field or the focal point of the image. The 2 pictures below show the processed image with different planes in focus:
This is pretty powerful stuff, especially when coupled with the GPU and CUDA. Adobe did a cool demo at GTC 2010 during the keynote where they do the image manipulation in real-time using the GPU… worth watching if you’re interested.







