Friday, November 18, 2016

Check Your (Root) Privilege - On CVE-2016-4484

A Cryptsetup Initrd Script Flaw

Recently, a programming flaw was found in the init scripts for certain Linux distributions. These scripts handle decryption of the system volume when full disk encryption is used to guard the system's data. There has been a lot of confusion as to whether this is a high priority vulnerability or not. I would qualify this bug as a security risk, but as a very low priority risk. In fact, if I were auditing a system with this flaw in it, I would likely mark it as "Risk Accepted" after a conversation with the customer.

As the reader may or may not know, the vulnerability in this script allows an adversary with access to the boot interface to gain a root shell. This is accomplished by abusing a flaw with the initrd scripts that accept passwords for decrypting the disk. After a somewhat short period of waiting for a valid password, the scripts literally just give up and decide to grant access to a shell. This is done in case the console user needs to administer the disk in some way.

Oh, No! Not a Shell!

It's important to note the technical attributes of a computing environment that determine whether access to that environment is privileged or not. Most importantly, access to an administrative shell is not equivalent to access to the underlying system objects. 

This is most evident in sandboxed environments or jails where an untrusted application (or user) is granted administrative privileges within that walled garden. Sure, they can screw up the walled garden all they want, but that does not affect the host environment without a secondary vulnerability in the operating system environment or kernel software. This can also be easily observed by terminal services environments where a "clean" operating system is presented to each user that logs in, and is automatically cleaned up and refreshed on logout. 

The Model

The points to acknowledge when evaluating a computing environment are:
  • Is the boot process trusted
  • Is the full-disk encryption integrity checked
  • Is initrd read only
These points are really all the reader needs to keep in mind when determining whether this is a security flaw. The answer is very simple once you put your computing device into the above context. 

For example, in a trusted boot model, the following steps should occur:
  • The first-stage bootloader (Boot1) in either ROM or locked Flash executes
  • Boot1 loads the next-stage bootloader (the featurefull bootloader, Boot2) into memory
  • Boot1 cryptographically validates the integrity of Boot2
  • If integrity check fails, halt; Otherwise, continue
  • Boot2 loads the next-stage executable (Kernel1) into memory
  • Boot2 cryptographically validates its configuration as well as the next-stage executable
  • If integrity check fails, halt; Otherwise, continue
  • Boot2 adjusts the launch of Kernel1 based on the secure configuration
  • Boot2 executes Kernel1
  • Kernel1 loads an operating system bootstrap image Initrd1 into memory
    • Typically Initrd1 is already cryptographically validated per Boot2's process
  • Kernel1 passes control to the init or init-alike application in Initrd1
At the end of this chain of events, the loaded mini-operating system image should not only be trusted, it should originate from an immutable environment. In other words, any applications executing within Initrd1 should not be able to alter the configuration or subvert the trust of any of the executable objects that have executed prior to it. In fact, it is possible to lock down an initrd such that manipulating key peripherals and kernel memory is not possible. 

The reader may at this point acknowledge that all objects loaded after this point are vulnerable to tampering. This will always be true unless the encrypted disk image is read-only. Even so, if the system relies on the console user to provide a password and the user does not have access to this password, a read-only image cannot be read and a read/write image can only be destroyed (presuming the image is properly integrity checked).

It's Not a Toomah

So, from the perspective of this model, gaining access to a root shell means absolutely nothing if the system was properly secured. If it was not secured, then abuse of the computing environment via a root shell is only a symptom of the underlying gaps in security and not a cause. Access to the computing device in an untrusted boot system will always yield privileged access regardless of whether or not a shell is immediately accessible. 

For example, a few years ago a team at iSEC Partners was able to manipulate a Verizon/Sprint femtocell simply by gaining access to the console. I reverse engineered the next model of the same femtocell which had two separate processing units (one PowerPC and one MIPS). The PowerPC side controlled the baseband while the MIPS side controlled the user configurable interface. While they went to great lengths to separate application layers for stability and security, access to the "secure" processor on the femtocell was as easy as attaching a JTAG adapter and interrupting the boot process to enable write on the read-only console. 

Why do I bring this up? Because this was not a hack. It was an abuse of a fundamental part of a poorly secured and over-engineered system. It was a symptom of flawed engineering and not the cause

But it Is Cancerous

The fundamental takeaway from this isn't that this bug is a security flaw (because it really isn't). The takeaway is that we have engineered systems that are untrustworthy by design. This was initially because we didn't have the technology, the cost-effectiveness, or the interest to engineer secure systems for consumers (or even mass distributed technologies like embedded systems for ATMs, in-flight entertainment, telematics, etc). But, now we do. However, the skill for implementing this seems to be isolated within engineering teams at Apple and Chromebook. 

The only way to make perceived vulnerabilities like CVE-2016-4484 go away is to provide the consumer (or engineering firm) with technology that ensures programming flaws such as the bugs in these init scripts will not have privileged side effects, if and when they are abused.

As always, if you need assistance ensuring your embedded systems are designed securely from the ground up, or want your trust model evaluated by skilled engineers and reverse engineers, Lab Mouse Security is available for consulting engagements.

Best wishes,
Don A. Bailey
CEO / Founder
Lab Mouse Security (The IoT Experts)

Monday, July 25, 2016

This Old Vulnerability #2: NetBSD and OpenBSD kernfs Kernel Memory Disclosure of 2005

Time is an Illusion

[Editor's Note: This is part one of a two part post, the second of which is Vineetha Paruchuri's guest co-post, which can be found: here]

It makes sense to me that physicists have been arguing against time as a physical construct for years now, because as humans we have a clear penchant for ignoring time altogether. More precisely, we seem to ignore history as if it never happened. And, when we do recall historical events, we somehow do so erroneously. This isn't just true in the world of politics or law, it's true in every facet of society. Tech, and sometimes especially tech, is no outlier. 

In 2005, I was bored, making silly bets with friends on IRC about how fast we could find exploitable bugs in "secure" operating systems. This was pretty common for us, as young hackers spend the majority of their time reading source code. A good friend pointed out that the increased scrutiny on the BSD variants was decreasing the number of exploitable integer overflow attacks on kernels. I argued that this was probably false, and that there were lots of bugs yet to be found. 

What's interesting is that this bug class is still prevalent today. In fact, it may be the most underreported bug class in the history of computing. In 2014, when I released the LZO and LZ4 memory corruption bugs, they are of the exact same class of exploitable integer issues. Because of pointer arithmetic, and how CPUs manage the indexing of memory, they are extremely difficult to find and remediate. The difficulty of this bug class caused the LZO vulnerability to persist in the wild for over 20 years, and allowed variants of LZO, such as LZ4, to be created with the exact same vulnerability

Finding the Bug

Back to my friends and I on IRC, we made a bet: Find an exploitable kernel vulnerability affecting any BSD variant within an hour. The winner gets bragging rights. I almost lost, having found the bug in literally 57 minutes and some seconds. 

The bug? An integer truncation flaw in the NetBSD and OpenBSD kernfs pseudo-filesystem. This file system provides access to kernel abstractions that the user can read to identify the state of the running kernel. In Linux terms, these abstractions would all be handled by procfs. On BSD, procfs was (is?) a pseudo-filesystem providing insight into only active processes, themselves. On Linux, procfs provides access to kernel objects ranging from the CPU, to VMM, processes, and even network abstractions. 

The flaw was discovered by trolling through NetBSD patches. In fact, I discovered the bug by identifying a patch for a similar integer problem committed days earlier, simply by chance. Because I constantly monitored the patches for all BSDs, it was easy to troll through the patches identifying ones may be valuable. An interesting commit tag caught my eye:

Revision 1.112 / (download) - annotate - [select for diffs]Thu Sep 1 06:25:26 2005 UTC (10 years, 10 months ago) by christos
Branch: MAIN 
CVS Tags: yamt-vop-base3yamt-vop-base2yamt-vop-basethorpej-vnode-attr-basethorpej-vnode-attr 
Branch point for: yamt-vop 
Changes since 1.111: +6 -6 lines
Diff to previous 1.111 (colored)

Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().

As depicted above, the patch applied at revision 1.112 purports to resolve multiple integer related bugs from being triggered in the kernfs_xread function. It does so by moving the check for all valid read offsets to kernfs_read. One might think, at this point, that this is a solved problem. Presumably all bugs in the former function can be resolved by placing the check in the latter, parent function. 

However, there is an easy to spot problem in the patch. Consider the following code:

 void *v;
 struct vop_read_args /* {
  struct vnode *a_vp;
  struct uio *a_uio;
  int  a_ioflag;
  struct ucred *a_cred;
 } */ *ap = v;
 struct uio *uio = ap->a_uio;
 struct kernfs_node *kfs = VTOKERN(ap->a_vp);
 char strbuf[KSTRING], *bf;
 off_t off;
 size_t len;
 int error;

 if (ap->a_vp->v_type == VDIR)
  return (EOPNOTSUPP);

 /* Don't allow negative offsets */
 if (uio->uio_offset < 0)
  return EINVAL;

 off = uio->uio_offset;
 bf = strbuf;
 if ((error = kernfs_xread(kfs, off, &bf, sizeof(strbuf), &len)) == 0)
  error = uiomove(bf, len, uio);
 return (error);

Initially, this looks appropriate. The function now checks to see if the file descriptor associated with a kernfs file has a negative read offset. If a negative offset is identified, the function returns with an error. Otherwise, the offset is passed to kernfs_xread and presumed safe for all operations within that function. 

This should be fine, except for the function kernfs_xread, itself. Here is the definition of the function:

static int
kernfs_xread(kfs, off, bufp, len, wrlen)
 struct kernfs_node *kfs;
 int off;
 char **bufp;
 size_t len;
 size_t *wrlen;

In BSD variants, the off_t type is always a signed 64bit integer to accommodate for large files on modern file systems, regardless of whether the underlying architecture is 32bit or 64bit. The problem arises when the 64bit signed integer is checked for its sign bit, then passed to the kernfs_xread function. Passing the off_t to the function truncates the value to a 32bit signed integer. This means that the check for a negative 64bit integer is invalid. An adversary only need to set bit 31 of the 64bit offset to ensure that the value passed to kernfs_xread is negative. 

The result of this integer truncation bug can be observed at the end of kernfs_xread. At the end of this function, we have the following code, regardless of which type of kernfs pseudo-file is being read:

 len = strlen(*bufp);
 if (len <= off)
  *wrlen = 0;
 else {
  *bufp += off;
  *wrlen = len - off;
 return (0);

This code ensures that the size of the data copied back to userland is very large, and that the pointer to the data being copied will point outside the valid memory buffer for the given file. What's really great about this bug is that both kernel stack and kernel heap can be referenced, depending on which kernfs file is being read while triggering the bug. 

This allows an attacker to page through heap memory, which may contain the contents of privileged files, binaries, or even security tokens such as SSH private keys. Paging through stack memory is less immediately valuable, but allows an attacker to disclose other tokens (such as kernel stack addresses) that may be relevant to subsequent attacks. 

Patching the Bug

Though this vulnerability affected both NetBSD and OpenBSD, OpenBSD claimed that "it isn't a vulnerability" because they previously removed the kernfs filesystem from the default OpenBSD kernel. However, it was still build-able in the OpenBSD tree at the time, meaning that it was indeed a vulnerability in their source tree. It just wasn't a vulnerability by default. This was yet another misstep in a long standing career of misdirection by the core OpenBSD team. The NetBSD team reacted quickly, as kernfs was not only still integrated into the default kernel, it was mounted by default, allowing any unprivileged user access to abuse this bug. 

I sold this vulnerability to Ejovi Nuwere's security consulting firm, who ethically acquired software flaws in order to help promote their consulting practice. Tim Newsham reviewed the flaw and agreed that it was an interesting finding. Ejovi's team managed the relationship during patching and helped develop the resolution with the NetBSD team, who was quick to patch the bug. I was impressed with Ejovi's professionalism, and also appreciated the NetBSD team's fast work, and the fact that they didn't whine about the bug in the way OpenBSD did. 

The patch fixed the bug by performing the check on the truncated integer rather than the signed 64bit offset. 

@@ -922,18 +922,18 @@ kernfs_read(v)
  struct uio *uio = ap->a_uio;
  struct kernfs_node *kfs = VTOKERN(ap->a_vp);
  char strbuf[KSTRING], *bf;
- off_t off;
+ int off;
  size_t len;
  int error;
  if (ap->a_vp->v_type == VDIR)
   return (EOPNOTSUPP);
+ off = (int)uio->uio_offset;
  /* Don't allow negative offsets */
- if (uio->uio_offset < 0)
+ if (off < 0)
   return EINVAL;
- off = uio->uio_offset;
  bf = strbuf;
  if ((error = kernfs_xread(kfs, off, &bf, sizeof(strbuf), &len)) == 0)
   error = uiomove(bf, len, uio);

Breaking the Historical Cycle

While we considered the patch adequate at the time, we were wrong. The reason for this is based on the logic from the first This Old Vulnerability blog post: an integer doesn't need to be negative to create a negative offset or an over/underflow when applied to an arbitrary pointer in kernel memory. This is because the value of any given pointer does not start at address zero. This is a presumption often made in systems engineering. 

Tests presume a base address of zero, rather than the pointer's actual address, plus the offset into the pointer. If a 32bit pointer address points to 0xb0000000UL, an integer overflow will occur with an offset far less than would be required to set a sign bit. If this pointer address and a sufficient offset value are used in an inadequate expression, it may seem that the test would pass. Consider the following pseudo-example:

uint32_t * p = 0xb0000000UL;
uint32_t off = 0x60000000UL;
uint32_t * max_p = 0xb0008000UL;
if(off < 0 || p + off >= max_p)
        return EINVAL;

Some compilers will actually compile out the above code as it would be impossible to properly evaluate. But, if engineers don't notice this, or if there is no warning message printed by the compiler, or if an IDE is being used that doesn't adequately highlight the warning messages, this can result in critical flaws in software.

Testing this properly requires policy that evaluates both the base of the pointer and a ceiling for the pointer given the context of its usage. If a pointer points to a structure of a particular size, any expression that results in an address must be verified to land within that structure. This can be done by performing the operation, storing the result in the appropriate type, then evaluating the address as being within the structure in memory. 

As noted in the previous blog post, this requires organizational coding standards that enforce policies on how pointers expressions are evaluated and how they are tested. It also requires an evaluation of the context of each pointer. 

As always, these improvements are challenging to implement because they aren't simply a coding construct. This is an organizational problem that must be addressed at the management level along with each individual engineer's coding practices. Peer reviews must be accentuated with policies that guide auditing practices, and guarantee a higher level of success in catching and fixing these issues. For help, consider hiring Lab Mouse Security to assist with your internal code audits, and break the seemingly eternal cycle of exploitable integer vulnerabilities!

An Introduction

For those that don't know her, Vineetha Paruchuri is a brilliant up-and-coming information security researcher. She and I have been discussing the effects of security flaws that have persisted over decades, why langsec addresses some of the remediation/mitigation potential, but what gaps are still missing. 

This resulted in a guest post where Vineetha evaluates modern active models for the reduction of security flaws, rather than retrospective models which include code reviews, bug reports, etc. I highly suggest reading her guest blog as a co-piece to this one, and a primer for anyone interested in the modern movement to active, rather than passive, vulnerability reduction models. 

Don A. Bailey
Founder and CEO

This Old Vulnerability: Guest Post: Vineetha Paruchuri on Modeling How Vulnerability is Created, Rather than Remediated

[Editor's Note: Vineetha's guest blog is a companion piece to the Lab Mouse post found here]

It all started on Twitter when I called Bailey out on his crappy taste in music (naturally, he vehemently disagrees with the “crappy” part). [Editor’s Note: My musical tastes are sublime and don't include Evanescence…] [Author’s Retort: N-O-P-E]

We got to ranting about InfoSec things in private; initially felt that nuances in textual conversations usually get lost in translation, and one might often need to explain further. It quickly became evident that this was not the case in our discussions.

Of course, like your typical hyper-rational engineers, we instinctively started modeling our behavior - analyzing why we seem to process information very similarly, how people intellectually process things in general, how that affects the code they write, or the way they visualize technical problems, or the way they interpret security concepts. This line of thought extended to our discussion on vulnerabilities.

For the better part of the past year, I have passively been mulling over specific combinations/variations of arguments from a couple of papers, because I saw immense potential for these ideas in practical scenarios. Visualizing these arguments from the perspective of vulnerability identification and disclosure (residual thoughts from my discussion with Bailey) gave me the much-needed context that tied some things together.

In most cases, at the core, all vulnerabilities boil down to something that the developer/architect/whoever overlooked, that someone else noticed. To simplify terminology, let’s call this “someone else” an attacker, and the “developer/architect/whoever” a systems designer. The system is ultimately designed for the end-user.

The attacker might see things that the systems designer missed, because attackers visualize the system quite differently. Further, the end-user might (un)intentionally perform some action(s) that might send the system into a state not initially modeled by the systems designer. In such cases when the system does not behave as expected (and also in other cases e.g. when the end-user doesn’t get the desired functionality), the end-user often figures out workarounds to get the job done. Such workarounds routinely circumvent established security mechanisms in place too; once the system is not in a documented state, there is no saying what security measures were bypassed because of the workaround.

In essence, when analyzing from the context of actor-behavior, vulnerabilities can be the result of any (or all) of the above factors, or some combination thereof. At a glance, it looks like delineating and formalizing these factors would have some value from the perspective of vulnerability analysis.

Based on the above reasoning, we can delineate the major factors contributing to software/system vulnerabilities from the actor-behavior standpoint as follows:

First, the issue of what the systems designer doesn’t see that others might see: the blindspots. In “It’s the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer’s Blind Spots”, Oliveira et. al. discuss the idea that “software vulnerabilities are blind spots in the developers’ heuristic decision-making process”.

Second, the issue of how the attacker-mindset differs from other actors’ in the system, and what that means. Quite a lot has been written on this topic (hacker behavior/motivations) from the perspective of sociology/psychology, law/policy, technology etc., but some interesting thoughts on how to cultivate an attacker-mindset, and what the “hacker methodology” is, are given in “What Hackers Learn That The Rest Of Us Don’t” by Sergey Bratus.

Third, the obvious existence of differential perceptions amongst various actors in the system, the resultant security circumvention and suboptimally-defended systems exposed to vulnerabilities. In “Mismorphism: A Semiotic Model Of Computer Security Circumvention (Extended Version)”, Smith et. al. examine security circumvention using a model based on semiotic triads. How differential perceptions affect systems has been explored from the perspective of security circumvention in the paper, but it got me thinking about how the same idea can also be explored in settings not necessarily involving security circumvention.

Although not all of these arguments apply directly (they all certainly apply in other ways, more on that in another post, another time perhaps) to the vulnerability we are currently discussing, I briefly touched upon them because all these issues are interrelated, and the larger issue of vulnerability identification/mitigation is better served when such component-issues are discussed together. In essence, understanding the core logic behind each of these arguments and tailoring it to apply to specific contexts might help in better vulnerability detection and mitigation. Plus, anyone looking at the same issues now has a decent starting point on where to find relevant information in case they want to explore these issues further.

That said, in the context of the vulnerability that’s currently being discussed, apart from thinking about langsec (but of course! Again, more on that some other time), further analysis of the first issue listed above concerning developer blind spots could prove quite useful. The primary argument comes from the paper “It’s the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer’s Blind Spots” by Oliveira et. al.

The learnings from Oliveira’s paper directly play into the the remedial measures Bailey touched upon in his post - enforcing organizational coding standards, evaluating the context of each pointer, and improving coding practices etc. Rather than looking at the issue retrospectively, such as in the context of code reviews, Oliveira et. al’s paper outlines how we can prime the developers to minimize such blind spots while coding (of course, code reviews can/should still be done, but increasing the quality of the code is always the primary goal).

Oliveira’s paper explores a new hypothesis that software vulnerabilities arise due to blind spots in developers’ heuristic decision-making processes. Another hypothesis (that neatly dovetails with the former) is also investigated in tandem, as to whether priming software developers on the spot (as opposed to drawing from previous security knowledge), and alerting developers to the possibility of vulnerabilities in real time would be effective in changing developer-perspective on security, eventually making security-thinking a part of developers’ repertoire of heuristics.

This paper points out, quite rightly, that “The frequent condemnation of security education and criticism on software developers, however, do not help to reason about the root causes of security vulnerabilities”.

Psychological research shows that, due to limitations in humans’ working memory capacity,  humans often engage in heuristic-based decision-making processes. Heuristics are simple computational models that help solve problems without needing to consider all the information available. Because of their relative simplicity, heuristics require less cognitive effort, and hence they are an adaptive response to humans’ short term working memory when dealing with complex problems with a large amount of information. In such situations, due to limitations in working memory capacity, humans make “simplified, suboptimal decisions regardless of the rich information available”. We need to consider such cognitive limitations if we want developers to come up with more secure code; security education and/or code reviews alone wouldn’t be effective in making code safer.

Oliveira’s paper proves this primary hypothesis, and suggests priming, as in explicitly cueing developers on-the-spot, as an effective mechanism to eventually incorporate security-thinking as a part of developers’ cognitive processing. One of the ways the paper proposes to do this is to have developer-interfaces (such as IDEs, text editors, compilers etc.) display security information pertinent to the context of the current working scenario.

Naturally, further research needs to be done regarding what specific security information is useful, and what interfaces work best, if there are other/better ways to prime developers etc., but the point here is that more security education and more code reviews alone are not the answer to preventing such vulnerabilities.

One needs to get to the root of the problem - be it addressing systemic insecurity in the coding language, mitigating developer blind spots, or bridging differentials in actor-perspectives.

So why should we care about mechanisms factoring in actor-behavior when code reviews, semantic checkers etc. work just fine?

Firstly, they clearly don’t, at least not well enough (also, maybe things working just fine doesn’t quite cut it for some folks).

Second, this is also what someone dealing with enough vulnerability identification and mitigation might instinctually reason out (but since we technologists tend to trust empirical evidence better, the papers I cited should do the job?). For example, in the context of the current vulnerability, Bailey says the following:

“But, if engineers don't notice this, or if there is no warning message printed by the compiler, or if an IDE is being used that doesn't adequately highlight the warning messages, this can result in critical flaws in software.”

I know for a fact that he hasn’t read Oliveira’s paper before he wrote that (not even sure he read it beyond the abstract even now). In fact, looking at what happened in the code and how the whole thing played out prompted me to think about how priming could apply here, and then I saw that Bailey reasoned it out the same way too!

So yes, even in the worst case, considering that such mechanisms factoring in actor-behavior would not be useful in any other context (while *I* think that they most certainly would be) - at least a few such subclasses of fairly intractable bugs (like the current one) can be caught/mitigated more effectively.

Third, solving for mitigating a vulnerability at the source would in turn facilitate more effective mechanisms for identifying vulnerabilities. For example, if we identify the primary factors causing such vulnerabilities, we could potentially leverage that knowledge toward building more effective systematic/automatable vulnerability identification mechanisms (yes, a few formal mechanisms currently exist, but their efficacy leaves a lot to be desired, because they’re acting more as band-aids than stemming from addressing the root cause; i.e. they’re often not solving the right problem1).

What I mean to say is...  

Hence why maybe... it’s about d*** time we started looking at these issues as more than just failures in coding constructs…

< quietly sashays away and lets Bailey deal with the aftermath of any fires she lit >

Vineetha Paruchuri
M.S., Computer Science
Dartmouth College

Author’s Note: Before all ye grammar pedants come out of the woodwork to get me, the “hence why maybe” thing was intentional. (BTW Bailey, I censored out my own “damn”, thank you. Now don’t censor this “damn”, or the one I just typed; ugh, this is turning so meta). So anyway, any other (grammar) mistakes that were overlooked are totally Bailey’s fault (he seems to take “The Editor” thing a tad too seriously; so go burn him for those if you must; bye now).

“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.