Tuesday, April 18, 2017

The RISC-V Files: Supervisor -> Machine Privilege Escalation Exploit

The Demo

The following video demonstrates my original proof-of-concept exploit for the RISC-V privilege escalation logic flaw in the 1.9.1 version of the standard. The exploit lives in a patched Linux kernel, controlled through a simple userland application. The Linux kernel triggers the exploit and breaks out of Supervisor privilege in order to abuse the Machine level privilege. You may need to play the video in full-screen mode to view the console text. 

In the video, the userland application fakesyscall is used to control the exploit living in the Linux kernel. The first option passed to the app (and subsequently to the kernel) is 6. Option 6 simply tells the kernel to dump bytes of memory at a specific address in RAM. Option 8 then overwrites this same memory region with illegal opcodes. Option 6 is used again to verify that the opcodes have been overwritten. 

Finally, option 9 is used to tell the malicious kernel to trigger a call from its privilege layer (Supervisor) to Machine mode, which executes the overwritten instructions. This causes an unhandled exception in QEMU, which is displayed at the bottom of the screen at the end of the video ("unhandlable trap 2"). Trap 2 represents the illegal instruction trap, which is not supported in the Machine layer of this implementation (riscv64-system-qemu and riscv-pk). 

A Brief Introduction to RISC-V Privilege

The RISC-V privilege model was initially designed as an ecosystem that consists of four separate layers of privilege: User, Supervisor, Hypervisor, and Machine. The User privilege layer is, of course, the least privileged layer, where common applications are executed. Supervisor is the privilege layer where the operating system kernel (such as Linux, mach, or Amoeba) lives. The Hypervisor layer was intended to be the layer at which control subsystems for virtualization would live, but has been deprecated in more recent versions of the privilege specification. The Machine layer is the highest privileged layer in RISC-V, and has access to all resources in the system at all times. 

Full compromise of a system with a RISC-V core can't simply mean compromise of both the User and System privilege layers, which is the goal of most modern attacks. Rather, breaking out of the System layer into the Machine layer is required. This is because of the capability that the Machine layer will have in the future. 

The Hypervisor layer (H-Mode) is currently removed from the 1.10 privilege specification. The intent is that it may be re-added in a future revision of the privilege specification. Alternatively, it could be conglomerated with the Machine layer. Regardless, both layers are designed to control processor functionality that the Supervisor layer cannot access. This includes physical memory regions assigned to other hypervisor guests, restricted peripherals, Hypervisor and Machine registers, and other high-privileged objects. 

In the future, Machine mode may also be used as a subsystem similar to TrustZone or Intel SMM. Trusted keys may be used here to validate executable code running in the Hypervisor or Supervisor layer. It may also support Supervisor's verification of User layer applications. Other critical security goals can be achieved by leveraging the isolation and omnipotence of the Machine layer. Such functionality may be able to detect and disable a Supervisor layer exploit. Thus, escalating privileges from Supervisor layer to Machine layer as quickly as possible is imperative for future-proofing RISC-V exploits.

Resolving the Risk

Before we get into the technical details, it is important to note that the RISC-V team is aware of this privilege escalation problem. I presumed this when I discovered this vulnerability, as anyone with a background in operating system theory or CPU memory models will quickly observe the gap in security caused by the 1.9.1 privilege specification's memory definition. More on that later. 

Regardless, I was unable to find material supporting that the team knew of this security gap and, in my excitement, did not realize that a resolution to this issue was proposed 15 days prior to my HITB talk. Stefan O'Rear emailed me privately and pointed out the git commit for the proposal, which explained why I was unable to find it (I was using poor search terms in my haste). 

The proposal (for PMP: Physical Memory Protection) can be found here on github. In his email to me, Stefan points out that the image QEMU (and Bellard's riscvemu) executes, which contains the bootloader and the embedded Linux kernel/rootfs images, isn't designed for full Machine layer protection, and that it may not be updated with the PMP model in the near future. 

This is a reasonable perspective, but, academically, the exploit is still an important demonstration of flaws in CPU security logic. The target, itself, doesn't have to be an attempt at a perfectly secure system. It is more important that the exploit be proven practical and useful as an exercise. 

Besides, this was the first CPU level security implementation flaw I've ever discovered on my own accord. So, I had extra incentive to actually exploit it. ;-)

But PMP Existed!

Correct! For those familiar, there was a PMP definition in the v1.9.1 privilege specification of RISC-V. However, this implementation was considered incomplete and not capable of deployment. This is probably why the qemu-system-riscv* emulators don't support it currently. As the git commit declares, the PMP full proposal scheme was only introduced a couple weeks prior to this post. 

The Vulnerability

The technical vulnerability is actually quite simple, especially if the reader is familiar with common CPU models for memory protection. Each privilege layer is presumed to be isolated from all lower privileged layers during code execution, as one would expect. The CPU itself ensures that registers attributed to a specific privilege layer cannot be accessed from a less privileged layer. Thus, as a policy, Supervisor layer code can never access Machine layer registers. This segmentation helps guarantee that the state of each layer cannot be altered by lower privileged layers. 

However, the original privilege specification defined memory protection in two separate places. First, the mstatus register's VM field defines what memory protection model shall be used during code execution. This can be found in section 3.1.8 of privilege specification v1.9.1. Table 3.3 in that same section outlines the various memory protection/translation schemes currently defined by the RISC-V team. 

The second place where memory protection is defined isn't in the Machine layer at all, it's in the Supervisor layer. This is where things get tricky. Because the Supervisor layer is where a traditional Operating System kernel would execute, it must be able to alter page tables to support dynamic execution of kernel code and userland applications. Thus, the sptbr (Supervisor Page-Table Base Register), found in section 4.1.10, allows the Supervisor layer to control read and write access to the page tables. 

For those that are unfamiliar, page tables control translation of virtual memory addresses (va) to physical memory addresses (pa). Page tables also enforce access privileges for each page, e.g. whether the page is Read-Only, Write-Only, Executable, etc. 

Because the Machine layer of privilege's executable code resides in physical memory, and the Supervisor layer can create page tables that can access that physical memory, the Machine layer cannot protect itself from the Supervisor layer. 

The attack works this way:
  • A malicious Supervisor kernel determines the physical address of Machine layer code
  • The kernel creates a page table entry that grants itself read/write access to the Machine layer
  • The kernel overwrites Machine layer code with a beneficial implant
  • The kernel triggers a trap to Machine mode, causing the implant to be executed with Machine privileges
It's quite simple! 

The Exploit

The fun part about this vulnerability was not so much discovering it, but writing a useful exploit rather than simply a proof-of-concept that demonstrated code execution. At HITB2017AMS this past week, I used a simple PoC to show that implanted code was indeed executing in Machine mode. However, this is quite boring and has no real value beyond proving the vulnerability. 

A real exploit needs to allow code injection in a way that any arbitrary payload can be implanted and executed within the Machine context, from Supervisor context. To accomplish this, it was necessary to do the following:
  • Identify Machine layer code that the Supervisor can trigger at will
  • Identify an unused or little-used function in that code that can be altered without negative consequence
  • Ensure arbitrary payloads can be stored within this region  

Triggering Machine Layer Code

This is the simplest part of the process. Currently, booting a RISC-V system means using the Proxy Kernel (riscv-pk) as a bootloader. This code lives in the Machine layer and loads an embedded kernel (such as Linux or FreeBSD) into virtual memory. 

The riscv-pk must support the embedded kernel by providing resources, such as access to the console device, information about the RISC-V CPU core the kernel is running on, and other duties usually handled by mask ROM or flash. riscv-pk does this through the ecall instruction, the common instruction used to call the next most privileged layer in the processor. For example, an ecall executed at the User layer will likely be handled at the Supervisor layer. An ecall executed at the Supervisor layer will be handled by the Machine layer. (This is a simplistic explanation that can get more complex with trap redirection, but we won't dive into those waters at this moment). 

So, when the Supervisor (Linux kernel) executes ecall, the Machine layer's trap handler is executed in Machine mode. The code can be found in the riscv-pk at trap 9, the mcall_trap function, in machine/mtrap.c

Unused Functionality

Most of the functionality in mcall_trap must be preserved, to ensure the stability of the system. Overwriting arbitrary instructions here is frowned upon from an exploit developer perspective. Instead, we must target specific functionality to disturb as little of the ecosystem as possible. Fortunately, we can do so with the MCALL_SHUTDOWN feature. 

This feature does precisely what it sounds like, it performs an immediate system shut down as if someone hit an ACPI power-off button on a PC. Presumably, we would never do this in a system we've compromised. We want the system live so we can control it! Thus, this is the feature to overwrite. However, only a few instructions can be overwritten here as the functionality is small. Take a look at the assembly generated by this feature:

    80000dfc:   00008417                auipc   s0,0x8
    80000e00:   20440413                addi    s0,s0,516 # 80009000 <tohost>
    80000e04:   00100793                li      a5,1
    80000e08:   00f43023                sd      a5,0(s0)
    80000e0c:   00f43023                sd      a5,0(s0)
    80000e10:   ff9ff06f                j       80000e08 <mcall_trap+0x18c>

This only gives us 6 instructions to overwrite. Not much capability can be performed here! So, instead, we simply call another region of memory that can't be directly accessed by forcing a trap to mcall_trap

We can be a bit clever and overwrite the code that bootstraps the Proxy Kernel, do_reset. This function has zero value for an already running environment! So, why not reclaim the executable space? When reading the objdump of the current riscv-pk, we can see that 60 32bit instructions (or 120 16bit compressed instructions) can be stored here. If we simply jump to the do_reset address and perform our real work here, we can get away with quite a bit, especially if we can constantly update this region of memory with any payload we choose. 

Arbitrary Payloads 

Storing arbitrary payloads in this region simply means designing a sufficiently engineered implant stager in our patched malicious Linux (or other) kernel. This feature simply loads the physical memory addresses at which an implant should live, and installs the implant. Easy! There's not much to it. The only catch is ensuring our jump instructions know the address of the target physical memory address (and can reach the address using a single instruction). 

Linux Kernel Patch

The change to the Linux kernel is simple. We simply alter a system call to perform the implant installation and mtrap trigger. This can be done by augmenting any system call with two chunks of code:

                /* install implant at physical address a2 */
                else if(regs->a1 == 8)
                        uint8_t * c;
                        int i;
                        /* Overwrite an address a2 of maximum size 4096 with
                         * binary code pointed to by a4 of size a3.
                                "DONB: overwriting %p:%lx\n",
                                (const void * )regs->a2,
                        x = ioremap(regs->a2, 4096);
                        printk("DONB: remapped to %p\n", x);
                        r = -1;
                        if(!access_ok(VERIFY_READ, regs->a4, regs->a3))
                                printk("DONB: bad access_ok\n");
                                goto __bad_copy;
                        printk("DONB: access ok\n");
                        if(regs->a3 <= 0 || regs->a3 > 4096)
                                printk("DONB: bad a3\n");
                                goto __bad_copy;
                        printk("DONB: a3 ok\n");
                                (const void * )regs->a4,      
                                printk("DONB: bad copy from user\n");
                                goto __bad_copy;

                        printk("DONB: copy ok\n");


                        /* update the tlb */
                        __asm__("fence; fence.i");

The above code installs an implant at the given physical address in system call argument 2. Argument 4 contains a pointer to a userland buffer containing the binary to be written at the mapped virtual address. Argument 3 contains the size of the binary blob to be written. The last function ensures that the TLB is updated since we are altering instruction code, which guarantees that the CPU has the updated copy of our executable code and wont execute an out of date cache, once triggered.

                /* trigger implant overwritten at MCALL_SHUTDOWN */
                else if(regs->a1 == 9)
                        printk("DONB(8): ok, now try the m-hook\n");
                        /* MCALL_SHUTDOWN=6 */
                        __asm__("li a7, 6; ecall; mv %0, a0" : "=r" (r));
                        printk("DONB(8): returned = %d\n", r);

This code issues an ecall, causing mcall_trap to be executed from Machine mode context. This, in other words, executes our implant at a higher privilege level.

.global callreset
        auipc t0, 0
        addi t0, t0, -1578
        addi t0, t0, -1578
        jalr t0

Finally, the above code, written to the MCALL_SHUTDOWN feature in the mcall_trap function, calls our implant at do_reset. The code in my version of riscv-pk expects do_reset at address 0x800001a8 and the overwritten MCALL_SHUTDOWN code at 0x80000dfc. The differential between these two addresses requires two addi instructions to generate the proper negative offset. This can probably be done in a cleaner manner. 

The only requirement left is for the implant at do_reset to restore the stack and return, to avoid crashing by not properly adjusting the Machine mode memory layout. This can be accomplished by returning to the mcall_trap function at an address where it is performing this functionality. In my implementation, there is only one address where this occurs, 0x80000ccc. 

Gimme Code

For working demonstration code, please visit my github archive where I will track all of my RISC-V related security research. 

More to come!


Don A. Bailey
Lab Mouse Security
Mastodon: @donb@mastodon.social

Friday, November 18, 2016

Check Your (Root) Privilege - On CVE-2016-4484

A Cryptsetup Initrd Script Flaw

Recently, a programming flaw was found in the init scripts for certain Linux distributions. These scripts handle decryption of the system volume when full disk encryption is used to guard the system's data. There has been a lot of confusion as to whether this is a high priority vulnerability or not. I would qualify this bug as a security risk, but as a very low priority risk. In fact, if I were auditing a system with this flaw in it, I would likely mark it as "Risk Accepted" after a conversation with the customer.

As the reader may or may not know, the vulnerability in this script allows an adversary with access to the boot interface to gain a root shell. This is accomplished by abusing a flaw with the initrd scripts that accept passwords for decrypting the disk. After a somewhat short period of waiting for a valid password, the scripts literally just give up and decide to grant access to a shell. This is done in case the console user needs to administer the disk in some way.

Oh, No! Not a Shell!

It's important to note the technical attributes of a computing environment that determine whether access to that environment is privileged or not. Most importantly, access to an administrative shell is not equivalent to access to the underlying system objects. 

This is most evident in sandboxed environments or jails where an untrusted application (or user) is granted administrative privileges within that walled garden. Sure, they can screw up the walled garden all they want, but that does not affect the host environment without a secondary vulnerability in the operating system environment or kernel software. This can also be easily observed by terminal services environments where a "clean" operating system is presented to each user that logs in, and is automatically cleaned up and refreshed on logout. 

The Model

The points to acknowledge when evaluating a computing environment are:
  • Is the boot process trusted
  • Is the full-disk encryption integrity checked
  • Is initrd read only
These points are really all the reader needs to keep in mind when determining whether this is a security flaw. The answer is very simple once you put your computing device into the above context. 

For example, in a trusted boot model, the following steps should occur:
  • The first-stage bootloader (Boot1) in either ROM or locked Flash executes
  • Boot1 loads the next-stage bootloader (the featurefull bootloader, Boot2) into memory
  • Boot1 cryptographically validates the integrity of Boot2
  • If integrity check fails, halt; Otherwise, continue
  • Boot2 loads the next-stage executable (Kernel1) into memory
  • Boot2 cryptographically validates its configuration as well as the next-stage executable
  • If integrity check fails, halt; Otherwise, continue
  • Boot2 adjusts the launch of Kernel1 based on the secure configuration
  • Boot2 executes Kernel1
  • Kernel1 loads an operating system bootstrap image Initrd1 into memory
    • Typically Initrd1 is already cryptographically validated per Boot2's process
  • Kernel1 passes control to the init or init-alike application in Initrd1
At the end of this chain of events, the loaded mini-operating system image should not only be trusted, it should originate from an immutable environment. In other words, any applications executing within Initrd1 should not be able to alter the configuration or subvert the trust of any of the executable objects that have executed prior to it. In fact, it is possible to lock down an initrd such that manipulating key peripherals and kernel memory is not possible. 

The reader may at this point acknowledge that all objects loaded after this point are vulnerable to tampering. This will always be true unless the encrypted disk image is read-only. Even so, if the system relies on the console user to provide a password and the user does not have access to this password, a read-only image cannot be read and a read/write image can only be destroyed (presuming the image is properly integrity checked).

It's Not a Toomah

So, from the perspective of this model, gaining access to a root shell means absolutely nothing if the system was properly secured. If it was not secured, then abuse of the computing environment via a root shell is only a symptom of the underlying gaps in security and not a cause. Access to the computing device in an untrusted boot system will always yield privileged access regardless of whether or not a shell is immediately accessible. 

For example, a few years ago a team at iSEC Partners was able to manipulate a Verizon/Sprint femtocell simply by gaining access to the console. I reverse engineered the next model of the same femtocell which had two separate processing units (one PowerPC and one MIPS). The PowerPC side controlled the baseband while the MIPS side controlled the user configurable interface. While they went to great lengths to separate application layers for stability and security, access to the "secure" processor on the femtocell was as easy as attaching a JTAG adapter and interrupting the boot process to enable write on the read-only console. 

Why do I bring this up? Because this was not a hack. It was an abuse of a fundamental part of a poorly secured and over-engineered system. It was a symptom of flawed engineering and not the cause

But it Is Cancerous

The fundamental takeaway from this isn't that this bug is a security flaw (because it really isn't). The takeaway is that we have engineered systems that are untrustworthy by design. This was initially because we didn't have the technology, the cost-effectiveness, or the interest to engineer secure systems for consumers (or even mass distributed technologies like embedded systems for ATMs, in-flight entertainment, telematics, etc). But, now we do. However, the skill for implementing this seems to be isolated within engineering teams at Apple and Chromebook. 

The only way to make perceived vulnerabilities like CVE-2016-4484 go away is to provide the consumer (or engineering firm) with technology that ensures programming flaws such as the bugs in these init scripts will not have privileged side effects, if and when they are abused.

As always, if you need assistance ensuring your embedded systems are designed securely from the ground up, or want your trust model evaluated by skilled engineers and reverse engineers, Lab Mouse Security is available for consulting engagements.

Best wishes,
Don A. Bailey
CEO / Founder
Lab Mouse Security (The IoT Experts)

Monday, July 25, 2016

This Old Vulnerability #2: NetBSD and OpenBSD kernfs Kernel Memory Disclosure of 2005

Time is an Illusion

[Editor's Note: This is part one of a two part post, the second of which is Vineetha Paruchuri's guest co-post, which can be found: here]

It makes sense to me that physicists have been arguing against time as a physical construct for years now, because as humans we have a clear penchant for ignoring time altogether. More precisely, we seem to ignore history as if it never happened. And, when we do recall historical events, we somehow do so erroneously. This isn't just true in the world of politics or law, it's true in every facet of society. Tech, and sometimes especially tech, is no outlier. 

In 2005, I was bored, making silly bets with friends on IRC about how fast we could find exploitable bugs in "secure" operating systems. This was pretty common for us, as young hackers spend the majority of their time reading source code. A good friend pointed out that the increased scrutiny on the BSD variants was decreasing the number of exploitable integer overflow attacks on kernels. I argued that this was probably false, and that there were lots of bugs yet to be found. 

What's interesting is that this bug class is still prevalent today. In fact, it may be the most underreported bug class in the history of computing. In 2014, when I released the LZO and LZ4 memory corruption bugs, they are of the exact same class of exploitable integer issues. Because of pointer arithmetic, and how CPUs manage the indexing of memory, they are extremely difficult to find and remediate. The difficulty of this bug class caused the LZO vulnerability to persist in the wild for over 20 years, and allowed variants of LZO, such as LZ4, to be created with the exact same vulnerability

Finding the Bug

Back to my friends and I on IRC, we made a bet: Find an exploitable kernel vulnerability affecting any BSD variant within an hour. The winner gets bragging rights. I almost lost, having found the bug in literally 57 minutes and some seconds. 

The bug? An integer truncation flaw in the NetBSD and OpenBSD kernfs pseudo-filesystem. This file system provides access to kernel abstractions that the user can read to identify the state of the running kernel. In Linux terms, these abstractions would all be handled by procfs. On BSD, procfs was (is?) a pseudo-filesystem providing insight into only active processes, themselves. On Linux, procfs provides access to kernel objects ranging from the CPU, to VMM, processes, and even network abstractions. 

The flaw was discovered by trolling through NetBSD patches. In fact, I discovered the bug by identifying a patch for a similar integer problem committed days earlier, simply by chance. Because I constantly monitored the patches for all BSDs, it was easy to troll through the patches identifying ones may be valuable. An interesting commit tag caught my eye:

Revision 1.112 / (download) - annotate - [select for diffs]Thu Sep 1 06:25:26 2005 UTC (10 years, 10 months ago) by christos
Branch: MAIN 
CVS Tags: yamt-vop-base3yamt-vop-base2yamt-vop-basethorpej-vnode-attr-basethorpej-vnode-attr 
Branch point for: yamt-vop 
Changes since 1.111: +6 -6 lines
Diff to previous 1.111 (colored)

Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().

As depicted above, the patch applied at revision 1.112 purports to resolve multiple integer related bugs from being triggered in the kernfs_xread function. It does so by moving the check for all valid read offsets to kernfs_read. One might think, at this point, that this is a solved problem. Presumably all bugs in the former function can be resolved by placing the check in the latter, parent function. 

However, there is an easy to spot problem in the patch. Consider the following code:

 void *v;
 struct vop_read_args /* {
  struct vnode *a_vp;
  struct uio *a_uio;
  int  a_ioflag;
  struct ucred *a_cred;
 } */ *ap = v;
 struct uio *uio = ap->a_uio;
 struct kernfs_node *kfs = VTOKERN(ap->a_vp);
 char strbuf[KSTRING], *bf;
 off_t off;
 size_t len;
 int error;

 if (ap->a_vp->v_type == VDIR)
  return (EOPNOTSUPP);

 /* Don't allow negative offsets */
 if (uio->uio_offset < 0)
  return EINVAL;

 off = uio->uio_offset;
 bf = strbuf;
 if ((error = kernfs_xread(kfs, off, &bf, sizeof(strbuf), &len)) == 0)
  error = uiomove(bf, len, uio);
 return (error);

Initially, this looks appropriate. The function now checks to see if the file descriptor associated with a kernfs file has a negative read offset. If a negative offset is identified, the function returns with an error. Otherwise, the offset is passed to kernfs_xread and presumed safe for all operations within that function. 

This should be fine, except for the function kernfs_xread, itself. Here is the definition of the function:

static int
kernfs_xread(kfs, off, bufp, len, wrlen)
 struct kernfs_node *kfs;
 int off;
 char **bufp;
 size_t len;
 size_t *wrlen;

In BSD variants, the off_t type is always a signed 64bit integer to accommodate for large files on modern file systems, regardless of whether the underlying architecture is 32bit or 64bit. The problem arises when the 64bit signed integer is checked for its sign bit, then passed to the kernfs_xread function. Passing the off_t to the function truncates the value to a 32bit signed integer. This means that the check for a negative 64bit integer is invalid. An adversary only need to set bit 31 of the 64bit offset to ensure that the value passed to kernfs_xread is negative. 

The result of this integer truncation bug can be observed at the end of kernfs_xread. At the end of this function, we have the following code, regardless of which type of kernfs pseudo-file is being read:

 len = strlen(*bufp);
 if (len <= off)
  *wrlen = 0;
 else {
  *bufp += off;
  *wrlen = len - off;
 return (0);

This code ensures that the size of the data copied back to userland is very large, and that the pointer to the data being copied will point outside the valid memory buffer for the given file. What's really great about this bug is that both kernel stack and kernel heap can be referenced, depending on which kernfs file is being read while triggering the bug. 

This allows an attacker to page through heap memory, which may contain the contents of privileged files, binaries, or even security tokens such as SSH private keys. Paging through stack memory is less immediately valuable, but allows an attacker to disclose other tokens (such as kernel stack addresses) that may be relevant to subsequent attacks. 

Patching the Bug

Though this vulnerability affected both NetBSD and OpenBSD, OpenBSD claimed that "it isn't a vulnerability" because they previously removed the kernfs filesystem from the default OpenBSD kernel. However, it was still build-able in the OpenBSD tree at the time, meaning that it was indeed a vulnerability in their source tree. It just wasn't a vulnerability by default. This was yet another misstep in a long standing career of misdirection by the core OpenBSD team. The NetBSD team reacted quickly, as kernfs was not only still integrated into the default kernel, it was mounted by default, allowing any unprivileged user access to abuse this bug. 

I sold this vulnerability to Ejovi Nuwere's security consulting firm, who ethically acquired software flaws in order to help promote their consulting practice. Tim Newsham reviewed the flaw and agreed that it was an interesting finding. Ejovi's team managed the relationship during patching and helped develop the resolution with the NetBSD team, who was quick to patch the bug. I was impressed with Ejovi's professionalism, and also appreciated the NetBSD team's fast work, and the fact that they didn't whine about the bug in the way OpenBSD did. 

The patch fixed the bug by performing the check on the truncated integer rather than the signed 64bit offset. 

@@ -922,18 +922,18 @@ kernfs_read(v)
  struct uio *uio = ap->a_uio;
  struct kernfs_node *kfs = VTOKERN(ap->a_vp);
  char strbuf[KSTRING], *bf;
- off_t off;
+ int off;
  size_t len;
  int error;
  if (ap->a_vp->v_type == VDIR)
   return (EOPNOTSUPP);
+ off = (int)uio->uio_offset;
  /* Don't allow negative offsets */
- if (uio->uio_offset < 0)
+ if (off < 0)
   return EINVAL;
- off = uio->uio_offset;
  bf = strbuf;
  if ((error = kernfs_xread(kfs, off, &bf, sizeof(strbuf), &len)) == 0)
   error = uiomove(bf, len, uio);

Breaking the Historical Cycle

While we considered the patch adequate at the time, we were wrong. The reason for this is based on the logic from the first This Old Vulnerability blog post: an integer doesn't need to be negative to create a negative offset or an over/underflow when applied to an arbitrary pointer in kernel memory. This is because the value of any given pointer does not start at address zero. This is a presumption often made in systems engineering. 

Tests presume a base address of zero, rather than the pointer's actual address, plus the offset into the pointer. If a 32bit pointer address points to 0xb0000000UL, an integer overflow will occur with an offset far less than would be required to set a sign bit. If this pointer address and a sufficient offset value are used in an inadequate expression, it may seem that the test would pass. Consider the following pseudo-example:

uint32_t * p = 0xb0000000UL;
uint32_t off = 0x60000000UL;
uint32_t * max_p = 0xb0008000UL;
if(off < 0 || p + off >= max_p)
        return EINVAL;

Some compilers will actually compile out the above code as it would be impossible to properly evaluate. But, if engineers don't notice this, or if there is no warning message printed by the compiler, or if an IDE is being used that doesn't adequately highlight the warning messages, this can result in critical flaws in software.

Testing this properly requires policy that evaluates both the base of the pointer and a ceiling for the pointer given the context of its usage. If a pointer points to a structure of a particular size, any expression that results in an address must be verified to land within that structure. This can be done by performing the operation, storing the result in the appropriate type, then evaluating the address as being within the structure in memory. 

As noted in the previous blog post, this requires organizational coding standards that enforce policies on how pointers expressions are evaluated and how they are tested. It also requires an evaluation of the context of each pointer. 

As always, these improvements are challenging to implement because they aren't simply a coding construct. This is an organizational problem that must be addressed at the management level along with each individual engineer's coding practices. Peer reviews must be accentuated with policies that guide auditing practices, and guarantee a higher level of success in catching and fixing these issues. For help, consider hiring Lab Mouse Security to assist with your internal code audits, and break the seemingly eternal cycle of exploitable integer vulnerabilities!

An Introduction

For those that don't know her, Vineetha Paruchuri is a brilliant up-and-coming information security researcher. She and I have been discussing the effects of security flaws that have persisted over decades, why langsec addresses some of the remediation/mitigation potential, but what gaps are still missing. 

This resulted in a guest post where Vineetha evaluates modern active models for the reduction of security flaws, rather than retrospective models which include code reviews, bug reports, etc. I highly suggest reading her guest blog as a co-piece to this one, and a primer for anyone interested in the modern movement to active, rather than passive, vulnerability reduction models. 

Don A. Bailey
Founder and CEO