Why Securing A Video Stream Is So Difficult

8:28 am Code Monkey, Tech and Security

ColorbarsThere was an interesting article this last Friday at the New Scientist about how the contents of encrypted VOIP conversations could still be deduced via traffic analysis. The short version is that many spoken words have a signature to them even when they are encrypted. This signature is related to the size of the data packets used to represent the sound data. Many phonemes in a word have a distinct encoded data size… by analyzing the packet sizes you can deduce the phonemes and thus the spoken word.

This got me thinking I should write about the complex problem of securing a video stream. There are many aspects to securing a video stream: integrity, authenticity, and privacy being the most important. I’m not going to spend time talking about integrity and authenticity, because those are somewhat simpler problems to solve (integritiy = digital signatures, authenticity = digital certificates). The main focus of this post is about privacy; keeping an eavesdropper from deducing the contents of a video stream.


Terms

Privacy: The goal of privacy is to limit the amount of information that an attacker can deduce from the data stream (ideally, none). This is a much more difficult problem than it first appears to be. If you think all you have to do is encrypt the video, you REALLY need to keep reading.

Entopy: A measure of the minimum number of bits required to communicate a unit of information. Most forms of data (especially video) have a lot of redundant information in them that is not necessary to transmit to the receiver.

Compression: The process of removing redundant data so that the number of bits transmitted is as close as possible to the entropy value.

Encryption: The process of encoding data into a form that is impractical to decode without knowing the key. A good encryption algorithm will cause patterns in the input to have no correlating pattern on the output. The input is called plaintext and the output is called ciphertext.

MPEG-4 and h.264 Video Compression

I’m going to talk specifically about video compressed with MPEG-4/h.264, but these concepts apply to any variable bitrate compression algorithm. Also, to simplify writing, I’m going to refer to MPEG-4 and h.264 collectively as MPEG-4 (since both are part of the same standard and have similar characteristics).

As I mentioned above, raw video has a tremendous amount of redundant data. Except for a few rare cases, the difference between 2 successive video frames is small (many times the background is unchanging or panning in a predictable manner). MPEG-4 takes advantage of this by outputting the difference between frames; complete frames are only produced periodically. Since the difference between frames is usually small, MPEG-4 does a good job of minimizing the video data close to its entropy value.

Below is a graphical representation of an MPEG-4 video stream. I-frames are the complete frames, and P-frames are the ‘difference’ frames. The period between I-frames defines the Group Of Pictures (GOP) length.

Encoded MPEG-4 Video Stream

As you can see, MPEG-4 greatly reduces the amount of data that needs to be transmitted to the receiver. This is an important feature to have, because it allows more streams (or higher-quality streams.. your choice) to be transmitted over the same network infrastructure.

Securing the Video

Okay, so how do we prevent an attacker from gaining any information about the contents of the video? The naive approach would be to simply encrypt the video using something like AES and call it good. True, this would prevent the attacker from decoding the raw video (assuming the key exchange infrastructure is designed correctly… but that’s another post). However, this may not be good enough to fulfill our requirement of privacy. For example, what if the attacker couldn’t watch the video, but was able to determine if there was movement in the video? Would this be valuable information? It could be, if the attacker is trying to determine the schedule of a security guard making his rounds.

First, let’s look at what an encrypted stream would look like to the attacker:

MPEG-4 Encrypted Video Stream

Assuming the video is transmitted using a common transport protocol like RTP, it is easy for an attacker to determine the frame boundaries (there are bits in the RTP header that specify this). The attacker still can’t decode the video to watch it, but he now has a vital piece of information: the encrypted frame size.

The size of the encrypted frame matters, because there is a strong correlation between the size of the ciphertext and the size of the original plaintext. Most symmetric encryption algorithms (like AES) do nothing to mask the size of the data they encode. Thus, the size of the encrypted output is almost exactly the size of the original input (’almost’, because symmetric block ciphers will pad the data so it is an exact multiple of the block size). Thus, the attacker can use the frame size to determine the GOP length simply by analyzing the data stream and looking for periodic spikes in the size of the frame data. Since the GOP length rarely (if ever) changes, it’s trivial to find the I-frame… everything else is therefore a P-frame.

So, why is the original frame size important? Well, let’s look at what P-frame size tells us. When there is no movement in front of a camera, the differences between successive frames will be very small, and thus the P-frames will be small. When something changes in the frame (like a security guard entering the room), the difference between frames will be greater, and thus the P-frame size will increase. An encrypted stream with movement in it may look like:

MPEG-4 Encrypted Video With Motion

If an attacker can determine the size of the P-frames, he can determine the amount of change between frames and possibly deduce information about what’s going on. Thus, we have failed to meet the strict definition of privacy.

Conclusion

Security is harder than it looks: much harder. Because of this, most engineers will do it wrong. As Bruce Schneier is fond of saying, no security is better than broken security, because at least you will be more careful with your data if you know your security is broken.

Security is more than simply understanding algorithms; it’s a way of thinking. It’s understanding the flow of information. It’s understanding to think like an attacker. Cryptography is just a tool, not a panacea. Relying only on cryptography for security is like trying to protect your house with a huge, electrified, steel front door covered in barbed wire… the attacker will simply break in through the window. You have to consider ALL the attack vectors.

So how do you really secure compressed video? Well, I can’t reveal that here. Solving these problems is what I’m paid to do… good old trade secrets stuff. I’ll tell you it is possible to do, but it’s not trivial. If you think you’ve solved it, be careful… you may still be revealing more than you think.

Suggested Reading

Leave a Comment

Your comment

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.