Tuesday, October 21, 2014

GoLang Debugging - Turning Pennies Into G's

GDB Ain't Great

Our favorite application debugger is awesome. Don't get me wrong, I use it often. Almost daily. But, the fact remains that GDB is dependent on a predefined legacy application architecture. GDB wasn't designed to predict new application architectures. As a result, it isn't elegant at supporting alternatively designed stacks, concurrency models, or execution flows. 

That's essentially why GDB has been enhanced with extension capabilities like Sequences, Guile, and yes, even Python. Unfortunately, even the extension system is a bit lacking. Case in point? The existing GoLang GDB script uses a completely outdated extension API that causes exceptions. It wasn't super useful to begin with, either. 

donb@evian-les-bains:~/home/library/golang/go/src/pkg/regexp$ GOMAXPROCS=9 /usr/bin/gdb -q ./regexp.test
Reading symbols from ./regexp.test...done.
Loading Go Runtime support.
(gdb) run
Starting program: /home/donb/home/library/golang/go/src/pkg/regexp/regexp.test 
^C
Program received signal SIGINT, Interrupt.
runtime.usleep () at /home/donb/home/library/golang/go/src/pkg/runtime/sys_linux_amd64.s:77
77              RET
(gdb) info goroutines 
Python Exception <class 'gdb.error'> Attempt to extract a component of a value that is not a (null).: 
Error occurred in Python command: Attempt to extract a component of a value that is not a (null).
(gdb) 
In the above example, we see a common debugging scenario that's been going on for over a year or two. The debugging process is even documented on the GoLang website under Debugging Go Code With GDB. Yet, when one tries to reproduce the steps outlined in the documentation, the above error occurs. 

Why does this happen? The runtime-gdb.py script packaged with GoLang uses an outdated model for accessing gdb.Value objects, essentially treating them as dictionaries. Because the internals of the gdb.Value no longer support this model, or allow linking via Python generators in the fashion used in the script, simple commands will fail.

Go Routine Yourself

To solve these horrors, we really just need to generate a class that handles retrieval of pertinent values from the GoLang runtime environment. For those not in the know, each GoLang application acts like a kernel, scheduling the execution of each Go routine, monitoring memory allocations, and preparing for garbage collection.

Before you start to become concerned about potentially severe user-land bloat, take a moment to realize that this is actually the correct architecture. Any robust production quality application must handle resources elegantly by silently monitoring for operating system signals, scheduling and managing thread synchronization, handling per-thread intercommunication, and efficiently allocating and deallocating subsystem resources transparently. GoLang accomplishes all these things, but with surprising eloquence and a new level of light-weight that even Magdalena Frackowiak would be jealous of.

GoLang executes in tiny self-contained executable images called Go Routines, which are co-routines handled by GoLang's internal scheduler. GoRoutines are not threads, they are simple co-routines that execute under an Operating System thread. The benefit of this is that GoRoutines can move transparently across OS threads (pthread, libthread (Solaris), etc) with almost no cost to the application.

GoRoutines are managed by the C language structure 'struct G'. Operating system threads are managed with the C language structure 'struct M'. Therefore, for every OS thread M, one or more G may run within it. There are other abstractions within the GoLang scheduler, but since those aren't relevant to this discussion we'll leave those abstractions alone for now.

All The G

Internally, GoRoutines are managed under the runtime package. There is a variable called runtime.allg which points to a list of all GoRoutines in the system. A corresponding length variable, runtime.allglen, defines how large this array is. As GoRoutines die, they are marked by their status as Dead. But, unless it is overwritten at some point, the pointer in allg lives on. So, you can inspect what's left of a GoRoutine even after it has moved on to its next iteration.

To solve our problem with GDB, we have to inspect the allg variable. As can be seen in the existing runtime-gdb.py code, this used to be as easy as calling gdb.parse_and_eval. Now that we can no longer act this easily, we have to use what resources are available to us to retrieve values from memory, even if it's a notorious pain in the ass.

Let's build a simple Python class around this idea. Because I solved my problem with this code last night in two hours while watching classic episodes of The Rockford Files, it isn't a super great solution. Regardless, it works and frankly because no one else has solved this problem for 2+ years, I don't care if you don't like it.

class Allg:
    __allglen = -1
    __position = 0
    __allg = 0

    __offsets = {
            'status': 152,
            'waitreason': 176,
            'goid': 160,
            'm': 200,
            'sched': 40,
            'sched.pc': 48,
            'sched.sp': 40,
            'stackguard': 120,
            'stackbase': 8,
        }

    def __init__(self):
        # first, fetch the number of active goroutines
        self.__allglen = int(gdb.parse_and_eval("&{uint64}'runtime.allglen'"))
        print("found allglen = {0}".format(self.__allglen))

        # get the next address in the array
        s = "&*{uint64}(&'runtime.allg')"
        self.__allg = int(gdb.parse_and_eval(s))
        print("found allg = {0}".format(hex(self.__allg)))

    def fetch(self):
        if self.__position >= self.__allglen:
            return None

        s = "&*{uint64}(" + "{0}+{1})".format(self.__allg, self.__position*8)
        p = int(gdb.parse_and_eval(s))
        self.__position += 1
        return p

    def Status(self, a):
        s = "&*{int16}(" + "{0}+{1})".format(a, self.__offsets['status'])
        return int(gdb.parse_and_eval(s))

    def WaitReason(self, a):
        s = "&*{int64}(" + "{0}+{1})".format(a, self.__offsets['waitreason'])
        x = int(gdb.parse_and_eval(s))
        s = "&{int8}" + "{0}".format(x)
        return str(gdb.parse_and_eval(s))

    def Goid(self, a):
        s = "&*{int64}(" + "{0}+{1})".format(a, self.__offsets['goid'])
        return int(gdb.parse_and_eval(s))

    def M(self, a):
        s = "&*{uint64}(" + "{0}+{1})".format(a, self.__offsets['m'])
        return int(gdb.parse_and_eval(s))

    def Pc(self, a):
        s = "&*{uint64}(" + "{0}+{1})".format(a, self.__offsets['sched.pc'])
        return int(gdb.parse_and_eval(s))

    def Sp(self, a):
        s = "&*{uint64}(" + "{0}+{1})".format(a, self.__offsets['sched.sp'])
        return int(gdb.parse_and_eval(s))

    def Stackguard(self, a):
        s = "&*{uint64}(" + "{0}+{1})".format(a, self.__offsets['stackguard'])
        return int(gdb.parse_and_eval(s))

    def Stackbase(self, a):
        s = "&*{uint64}(" + "{0}+{1})".format(a, self.__offsets['stackbase'])
        return int(gdb.parse_and_eval(s))

Using the class Allg, I simply identify the address of the runtime.allg symbol in memory, and its corresponding size parameter, runtime.allglen. Once I store these parameters internally, I can just fetch every subsequent GoRoutine's address from the array. Since these routines are allocated sequentially in the array, I can fetch them using a simple iterator. Then, I just pass back the pointer of the actual G* structure. Any time the caller wants to learn more about a specific G*, they just pass back the address to any other function in the class, which will return the value for the corresponding G* field. 

This simple class makes data retrieval very easy. Let's look back at the class that gets invoked when we execute info goroutines on the GDB command line. 

class GoroutinesCmd(gdb.Command):
    "List all goroutines."
    __allg = None

    def __init__(self):
        gdb.Command.__init__(self, "info goroutines", gdb.COMMAND_STACK, gdb.COMPLETE_NONE)

    def invoke(self, _arg, _from_tty):
        self.__allg = Allg()

        # donb: we can retrieve the correctly size pointer with a cast
        # (gdb) python \
        # print("{0}".format(gdb.parse_and_eval("&*{uint64}&'runtime.allg'")))
        while True:
            ptr = self.__allg.fetch()
            # print("fetched ptr = {0}".format(hex(ptr)))
            if not ptr:
                break

            st = self.__allg.Status(ptr)
            # print("status is {0}".format(st))
            w = self.__allg.WaitReason(ptr)
            # print("waitreason is {0}".format(w))
            #if st == 6:  # 'gdead'
                #print("skipping over dead goroutine")
                #continue

            s = ' '
            m = self.__allg.M(ptr)
            if m:
                s = '*'

            # if the status isn't "waiting" then the waitreason doesn' tmatter
            if st != 4:
                w = ''
            w2 = w.split('"')
            if len(w2) > 1:
                w = """waitreason="{0}\"""".format(w2[len(w2) - 2])

            pc = self.__allg.Pc(ptr)
            blk = gdb.block_for_pc(pc)
            goid = self.__allg.Goid(ptr)
            a = "fname={0} faddr={1}".format(blk.function, hex(pc))
            
            print(s, goid, "{0:8s}".format(sts[st]), a, "&g={0}".format(hex(ptr)), w)

How simple is that? Now, the routine can fetch each G* from within the invoke function's while loop, and print information regarding the runtime. 

donb@evian-les-bains:~/home/library/golang/go/src/pkg/regexp$ GOMAXPROCS=9 /usr/bin/gdb -q ./regexp.test
Reading symbols from ./regexp.test...done.
Loading Go Runtime support.
(gdb) run
Starting program: /home/donb/home/library/golang/go/src/pkg/regexp/regexp.test 
^C
Program received signal SIGINT, Interrupt.
runtime.usleep () at /home/donb/home/library/golang/go/src/pkg/runtime/sys_linux_amd64.s:77
77              RET
(gdb) info goroutines 
found allglen = 5
found allg = 0xc208018000
  16 waiting  fname=runtime.park faddr=0x4134d9 &g=0xc208002120 waitreason="chan receive"
* 17 syscall  fname=runtime.notetsleepg faddr=0x404a56 &g=0xc208002480 
  18 waiting  fname=runtime.park faddr=0x4134d9 &g=0xc208032240 waitreason="GC sweep wait"
  19 waiting  fname=runtime.park faddr=0x4134d9 &g=0xc2080325a0 waitreason="finalizer wait"
* 31 waiting  fname=runtime.gc faddr=0x40a0c6 &g=0xc2080326c0 waitreason="garbage collection"
(gdb) goroutine 31 bt
found allglen = 5
found allg = 0xc208018000
#0  0x000000000040a0c6 in runtime.gc () at /home/donb/home/library/golang/go/src/pkg/runtime/mgc0.c:2329
#1  0x000000000040a150 in runtime.gc () at /home/donb/home/library/golang/go/src/pkg/runtime/mgc0.c:2306
#2  0x00007fff00000000 in ?? ()
#3  0x000000c21531e000 in ?? ()
#4  0x000000000055fc00 in type.* ()
#5  0x0000000000000001 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) 

We can even use the goroutine command using the same Python class to retrieve information about a specific GoRoutine, and then execute a gdb command based on that routine. Excellent!

Summary

This isn't a great way to debug GoLang. There is a lot that is left desired here. For example, stack backtraces are still difficult because of the execution architecture. GoLang's toolchain uses an internal Base Pointer (BP) and doesn't emit one when the binary is generated. This is a legacy of the Plan 9 Operating System assembler, which is intelligent on CISC ASM architectures such as x86/64 becuase it enables %R/BP to be used as a general register. 

But, as a result, GDB doesn't know how the heck to rewind the stack. In fact, you have to inspect the current function's SP adjustment code to identify how far to rewind the stack before popping off the return value. I've accomplished this (the basics) in another change I've made to the GDB script. But, I'll share that another time once I finish dealing with some of the gotchas of this method. 

Regardless, for now, you have a simple way to print each GoRoutine during a GDB session. You also have an easy way to identify where in memory each G* exists, and can inspect them with ease, and that's a lot better than you've had it for the past couple of years! 

GoLang Security Auditing 

Are you worried about the real internal security surface of the GoLang application architecture? Are you worried about how the subtleties of the custom scheduler can affect data consistency across co-routines? Are you wondering if the split-stack architecture puts you at risk for memory segement collision under significant client-request pressure? Are you concerned that poorly-written third party libraries might subvert the otherwise mostly-sound GoLang security model?

Come check out Lab Mouse Security! I've been working with GoLang since the project was made public. I understand the internal runtime architecture, the compiler toolchain, and how the security model affects real-world applications. If you're interested in GoLang security, consider having Lab Mouse evaluate the security of your GoLang application today! 

Best,
Don A. Bailey
Founder
Lab Mouse Security

No comments:

Post a Comment