Today’s computer systems are distressingly insecure. As such, the foundation upon which our information-based society is built is under constant threat, making everyone vulnerable to attacks. The causes for this dissatisfying state of affairs are complex, but at this point mostly historical: our programming languages, compilers, runtime systems, and architectures were designed in an era of scarce resources and they far too often trade security for efficiency or compatibility. In this post we discuss a promising line of research, which we call secure compilation, that aims to address this historical imbalance.
Programming Languages and Security
Mainstream low-level languages like C and C++ give up even on the most basic safety guarantees for the sake of efficiency. The lack of memory safety in these languages leaves programmers bearing the burden for security: the smallest mistake in widely-deployed C and C++ code can cause security vulnerabilities with disastrous consequences. Over the past 12 years, around 70% of all patched security vulnerabilities at Microsoft were memory safety issues. The C and C++ languages do not guarantee memory safety and their compilation chains do not enforce it because deployed hardware provides no good support for it and software checks could incur high overhead, e.g., 70-80% in SoftBoundCETS. Instead, much weaker, low-overhead mitigation techniques are deployed today, but they are routinely circumvented by practical attacks. Worse, just ensuring memory safety would not be enough to make C and C++ safe, as the standards for these languages call out a much larger number of undefined behaviors, for many of which compilers produce code that behaves arbitrarily. This includes for instance invalid unchecked type casts and integer overflows, which often also lead to security vulnerabilities.
Safe(r) Programming Languages
Languages such as Java, C#, ML, Haskell, and Rust are safer, providing type and memory safety by default as well as many useful abstractions for writing secure software (e.g., safe memory management, modules, and interfaces). These languages are an important step forward, but are still not immune to low-level attacks. For a start, these languages often provide unsafe escape hatches (e.g., unsafe blocks in Rust, Obj.magic in OCaml, and performUnsafeIO in Haskell). Moreover, the run-time systems for these languages are large, complex, and often written in C and C++, which makes them a target of attacks. Finally, all the safety guarantees of the source language are lost when interacting with low-level code, for instance via linked native libraries. So even if a critical program is secure with respect to the semantics of a high-level language, any mistake in the unsafe parts of the code, run-time system, or linked low-level code can break its security.
Beyond Safety
Other problems arise because today’s mainstream languages provide no way to specify the intended security properties, such as data confidentiality. For instance, for cryptographic implementations it is crucial that the running time of the computation does not depend on the value of secret keys, otherwise side-channel attacks can be used to infer the keys. However, even if one manages to achieve such a secret independence property in the source language, an unaware compilation chain is likely to break the property when compiling the code. In fact, producing machine code for modern architectures that can withstand powerful side-channel is a challenging open problem. All software implicitly trusts the abstractions of the hardware, and hardware bugs and loosely specified software-hardware interfaces can enable powerful side-channels (e.g., Spectre, Meltdown, ZombieLoad, Fallout, RIDL, or SMoTherSpectre) and can even result in violations of memory safety.
Towards Secure Compilation
Secure compilation is an emerging field that puts together advances in programming languages, security, compilers, systems, formal verification, and hardware architectures. Secure compilation aims to devise compilation chains that eliminate many of the security attacks above, and that allow sound reasoning about security properties in the source language. In particular, the secure compilation field targets:
- Identifying precise security goals and attacker models. Since there are many interesting security goals and many different kinds of attacks to defend against, secure compilation is very diverse. Secure compilation chains may focus on providing (some degree of) type and memory safety for unsafe low-level languages like C and C++ (e.g., HexType, SoftBoundCETS), or on providing mitigations that make exploiting security vulnerabilities more difficult (e.g., control-flow integrity). Other secure compilation chains use compartmentalization to limit the damage of an attack to only those components that encounter undefined behavior (e.g., via software fault isolation), or to enforce secure interoperability between code written in a safer language and the malicious or compromised low-level code it links against. Yet another kind of secure compilation tries to ensure that compilation and execution on a real machine does not introduce side-channel attacks.
- Designing secure languages. Better designed programming languages and language features can enable secure compilation in various ways. New languages such as Rust provide safer semantics, and compiler updates for old unsafe languages can turn undefined behavior into guaranteed errors. Components or modules in the source language can be used as units of compartmentalization in the compilation chain. The source language can also make it easier to specify the intended security properties. For instance, explicitly annotating secret data that external observers or other components should not be able to obtain (maybe indirectly through side channels) may give a secure compilation chain the freedom to more efficiently handle any data that is not influenced by secrets.
- Devising efficient enforcement and mitigation mechanisms. An important reason for the insecurity of today’s compilation chains is that enforcing security can incur prohibitive overhead or significant compatibility issues. To overcome these problems, the secure compilation community is investigating various efficient security enforcement mechanisms such as statically checking low-level code, compiler passes, software rewriting (e.g., software fault isolation), dynamic monitoring, and randomization/diversity. Another key enabler is the emergence of new hardware features that enable efficient security enforcement: checks on pointer dereferencing (e.g., Intel MPX, Hardbound, WatchdogLite, Oracle SSM, HWASAN, or ARM MTE), protected enclaves (e.g., Intel SGX, ARM TrustZone, Sanctum, or Sancus), capability machines (e.g., CHERI), or micro-policy machines (e.g., PUMP, Dover CoreGuard). An important question is how these or new features can be used to offer various security features in source languages efficiently and effectively.
- Developing effective formal verification techniques for secure compilation chains. Criteria for secure compilation are generally harder to prove than compiler correctness. As an example, showing a common criterion for secure compilation, called full abstraction, usually involves translating any low-level context attacking the compiled code to an equivalent high-level context that can attack the original source code. Another example is preservation of secret independent timing even in the presence of side-channels, as required for “constant-time” cryptographic implementations, which requires more complex simulation proofs than for compiler correctness. Finally, scaling such proofs up to even a simple compilation chain for a realistic language is a serious challenge that requires serious proof engineering in a proof assistant.
Community
Programming languages research can play a big role in secure compilation. A strong community is quickly crystallizing around the workshop on Principles of Secure Compilation (PriSC), organized every year together with POPL since January 2017. PriSC is an informal 1-day workshop without proceedings offering a forum for presenting and discussing recent and ongoing work. The previous PriSC edition organized with POPL 2019 in Cascais, Portugal brought together 54 participants, mostly PL researchers and enthusiasts, but also members of the security and systems communities with an interest in secure compilation. The next PriSC edition will happen with POPL 2020 in New Orleans.
In addition, a new Dagstuhl seminar series (2018, 2020) aims to identify interesting research directions as well as open challenges and to bring together the various communities interested in secure compilation. The secure compilation community includes, for instance, researchers building secure compilation chains, designing security enforcement and attack-mitigation mechanisms in both software and hardware, and developing formal verification techniques for secure compilation.
Catalin Hritcu is a researcher at Inria Paris, where he works on security foundations and formal verification. He was awarded an ERC Starting Grant on formally secure compilation and is also involved in the design of the F* verification system.
David Chisnall is a researcher at Microsoft Research in Cambridge, UK. He is a developer of the CHERI capability machine, a former member of the FreeBSD Core Team, the maintainer of the GNUstep Objective-C runtime, and a long-time contributor to the LLVM project.
Deepak Garg is a tenured faculty member at the Max Planck Institute for Software Systems (MPI-SWS) in Kaiserslautern and Saarbrücken, Germany. He is broadly interested in security, programming languages, formal logic, and systems.
Mathias Payer is a security researcher and assistant professor at EPFL in Switzerland, leading the HexHive laboratory. His research focuses on protecting applications in the presence of vulnerabilities, with a focus on memory corruption and type violations.
Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.