Saturday, April 29, 2017

The RISC-V Files: On Princeton MCM and Linus' Law

Princeton and RISC-V MCM

In the past week, a research team from Princeton's school of engineering released details on flaws they uncovered in the RISC-V memory consistency model (MCM). This is exciting work from Professor Margaret Martonosi and her team, because it demonstrates the value of using automation to uncover design flaws in complex systems. 

This strategy may be similar to the techniques used by commercial tools, such as Tortuga Logic's silicon security analysis utilities. Regardless, advances in capability from both commercial and academic teams has exponentially improved the state of the art over recent years. As a result, these bugs are being uncovered faster, and earlier in the engineering (or ratification) process, than ever before. 


Codasip and Linus' Law

To comment on the Martonosi's findings, the Codasip team released a blog post describing their thoughts in the context of long-term RISC-V reliability and security. While I typically agree with the Codasip team, and have a large amount of respect for their engineering staff, I thought it imperative to comment on one aspect of their article: complex security landscapes are not made shallow when more eyes are focused on them. 

This concept, colloquially known as Linus' Law, posits that all flaws in complex (and open) systems are increasingly easy to observe, detect, and resolve as the number of users and engineers of that system increases. While this model does work for quality assurance (stability) purposes, it does not work well for subtleties that impact the security of complex systems. 

While there are many reasons why this mantra fails with respect to security models, I'll focus on one example for the purposes of this blog post: Linus' Law largely implies that bugs will reveal themselves.


Security is Not Stability

Linus' Law presumes one of two things will occur to diminish the total number of bugs in a complex system:
  1. Many engineers hunt for flaws in source code
  2. A subset of N users out of T total users will observe and report any given bug
While there are hundreds of engineers working on the Linux code base, they are often constrained within the technology they are focused on improving or implementing. Though these engineers can identify problems within their own ecosystem, they are largely focused on the source code of their implementation, not the resultant object code or machine code generated to run their source code, or the effects their code will have (and vice-versa) on multiple aspects of a running system. This level of visbility into a complex architecture is extremely challenging to acquire, and even more challenging to maintain. This is why, while many engineers submit patches to the Linux kernel, only a handful of engineers are authorized to actually approve code for inclusion into each branch of the kernel. Put simply, only a few individuals are capable of observing complex security flaws, and these individuals are largely bogged down by engineering tasks that do not include the overarching analysis of subtle behaviors in the context of security. 

Yet, this point describes bugs that can be found easily prior to inclusion into the release of a kernel version. But, what happens when a bug does get through these checks and balances and ends up in the wild? This is where the many users part of Linus' Law comes into play. Someone, somewhere, out in production, will observe anomalous behavior. Hopefully, this user (or users) will also report this issue to the kernel team, or their distribution maintainers. It's fine to presume this will occur, but this will likely only occur if the bug is actually triggered by the user. 

In the case of complex security flaws, they are almost never triggered in the wild on accident. Exploiting a complex security flaw usually only occurs with intent, not arbitrarily. If one piece of a complex set of bugs leading to a critical gap in system security is triggered accidentally, it may never be observed as a flaw impacting security unless a specific chain of flaws are triggered all at once, and in a particular order. This is highly improbable in the real world, and results in a lot of simple bugs either being ignored as irrelevant, or resolved in the context of stability and not flagged as security related, which affects who applies the patch and how quickly

This is why applications like Ubuntu's whoopsie are imperative, to ensure that even the simplest bugs are not ignored. But, it also requires the team reviewing whoopsie bug/crash reports to be capable of evaluating the risk of each flaw, then properly escalating the issue to someone with authority. So, there are still gaps even with this practice in place. 

Thus, as we can see, Linus' Law works well to ensure the stability of complex systems, but it is very inefficient at identifying and guarding users against security flaws. 


That Lonesome Road

The real resolution to complex security related issues is creating a team to perform a unified analysis of each technology used in a system, and the overarching interactions between the technologies that make up the whole system. Using this model, less long-term flaws can make their way into system releases, and the ones that do are more likely to be simple bugs that can be detected using the presumptions in Linus' Law. 

In addition, tools like Professor Martonosi's team's technology, and commercial tools like Tortuga Logic's silicon security utilities, can greatly assist an internal security team, streamlining their workload and reducing errors by optimizing their time. 

This path, however, requires a long-term commitment to security, and an understanding that security is not a separate discipline from engineering, but is an effect of engineering stable systems. This is because a stable system is one that enforces rigid constraints around how data is accessed, stored, and processed. Insecure systems create arbitrary paths around these constraints, reducing the integrity of a system. Thus, any system with reduced integrity cannot be considered a stable system

Though it comes at a cost, the positive effects of implementing a security program are long lasting for both manufacturers and consumers, ensuring greater stability and system integrity for not only end-users, but for the global Internet. 

For more information on architectural security analysis, please reach out to Lab Mouse Security. We specialize in architectural security for embedded systems, from wearable IoT, to Industrial IoT, and more! 

Don A. Bailey
Founder and CEO
Lab Mouse Security

No comments:

Post a Comment