Select Page

PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

Memory safety bugs are the single largest source of critical vulnerabilities in modern software. Recent studies by Microsoft and the Chrome team found that roughly 70% of all critical vulnerabilities were caused by memory safety bugs. And, unfortunately, attackers are exploiting these bugs before we can patch them.

One way to improve software security is to eliminate this whole class of bugs by construction. Safe languages (high level or languages preserving memory safety) and verification are a great way to do this. Another way, and the topic of this blog post, is to ensure that bugs do not impact security.

The most promising way to ensure that bugs in, say, a third-party library do not become security vulnerabilities is to run the library in a  software sandbox. Software sandboxing, or software-based fault isolation (SFI), uses runtime checks to ensure that the library is confined to its own region of memory—and cannot access the rest of the program memory.

Consider an image decoding library like libjpeg or libpng. With sandboxing, we can restrict this library so it has access to the image it decodes and the bitmap it produces, and that’s it. Or, consider a spell-checking library like Hunspell. With sandboxing, we can restrict this library to just its dictionary and the text it checks. The application benefits from the library’s features, but doesn’t inherit its security flaws. Bugs like the recent out-of-bounds read in libpng would be bugs, not security vulnerabilities.

The idea that software sandboxing can be used to build secure systems from unsafe and untrusted components was present in Wahbe et al.’ 1993 original paper on SFI. Over the last three years, we’ve been working with Mozilla on deploying SFI in production—to sandbox third-party libraries in the Firefox browser.

Challenge when sandboxing libraries. The hard part of retrofitting a large system like Firefox with sandboxing is the last mile, ensuring that you properly verify and sanitize all the control and data-flow between Firefox and the now-untrusted library. This is hard because Firefox was written assuming libraries are trusted and thus the browser-library control and data flow are both unchecked and tightly coupled. Manually hardening the browser-library interface is slow and error-prone. We tried; others have too. It did not go well.

PL techniques to the rescue! The key to making library sandboxing both feasible and less error prone was to combine several PL techniques into a general sandboxing framework we call RLBox. RLBox uses a static information flow (or tainted) type system to make the library-application trust boundary explicit. This makes it clear—via type errors—when and where developers need to validate untrusted (or tainted) data. It also makes it possible for the framework to automatically insert certain kinds of runtime checks (e.g., to validate that pointers returned by the sandbox always point to the sandbox region). The type system borrows ideas from gradual typing to help developers migrate the code at the library boundary piecemeal and test it along the way.

In the rest of this post we’ll work through an example showing why migrating existing applications to securely use sandboxing is difficult. Then we will show how our RLBox sandboxing framework uses types to make the migration process fast, easy, and secure. We’ll conclude by describing the different domains that can benefit from tainted type systems and frameworks like RLBox, and outlining open problems related to sandboxing that could benefit from a PL perspective.

Retrofitting sandboxing is hard

To understand why migrating a program to eliminate trust in libraries is difficult and error-prone, let’s consider retrofitting the Firefox browser to use a sandboxed libjpeg library to render JPEG images.  Specifically, let’s consider updating the fill_input_buffer JPEG decoder callback function. This function is Firefox code that is invoked by the sandboxed libjpeg library.

 1: void fill_input_buffer (j_decompress_ptr jd) {
 2:   struct jpeg_source_mgr* src = jd->src;
 3:   nsJPEGDecoder* decoder = jd->client_data;
 4:   ...
 5:   src->next_input_byte = new_buffer;
 6:   ...
 7:   if (/* buffer is too small */) {
 8:     JOCTET* buf = (JOCTET*) realloc(...);
 9:     if (!buf) {
10:       decoder->mInfo.err->msg_code = JERR_OUT_OF_MEMORY;
11:       ...
12:     }
13:     ...
14:   }
15:   ...
16:   memmove(decoder->mBackBuffer + decoder->mBackBufferLen,
17:       src->next_input_byte, src->bytes_in_buffer);
18:   ...
19: }

libjpeg calls fill_input_buffer whenever it needs more bytes from Firefox.  Firefox saves the unused input bytes held by libjpeg to an internal back buffer (line 16), and sends these bytes to libjpeg along with the new input bytes.

When sandboxing libjpeg, we need to change this code to ensure that any data returned by libjpeg should be treated as unsanitized—data that may have malicious values designed to compromise Firefox. We must modify Firefox code such that unsanitized data is correctly handled by the host Firefox process:

  • Sanitize jd, otherwise an attacker exploiting a flaw in libjpeg could set jd to an arbitrary value.  This could trick the host application into reading arbitrary memory contents with the read of jd->src on line 2. This gives the sandbox a “read gadget”.
  • Sanitize src, otherwise the attacker could similarly trick the host application into overwriting arbitrary memory contents with the write to src->next_input_byte on line 5 (a “write gadget”), and also use the memmove() on line 16 as a read gadget.
  • Sanitize jd->client_data to ensure it points to a valid Firefox nsJPEGDecoder object, else an attacker could cause type confusion. In C++, this type confusion could allow the attacker to hijack control flow when a virtual method is invoked on this object.
  • Adjust pointer representations for mInfo.err and decoder->mBackBuffer—many SFI systems, including Native Client and WebAssembly, have different pointer representations, so we must translate (or swizzle) pointers accordingly.
  • Ensure that multiple threads can’t invoke the callback on the same image, otherwise we have a data race that results in a use-after-free vulnerability.

Adding these checks manually to the hundreds of Firefox functions which use libjpeg is tedious. Worse, these checks are easy to miss and if (when) we do miss any necessary checks—and, as we describe in our USENIX Security 2020 distinguished paper, these aren’t even all the necessary checks—an attacker could potentially bypass the sandbox.

The upfront engineering effort of modifying the browser this way is huge. Beyond adding security checks, we also have to retrofit all library calls, marshal data to and from the sandbox, and adjust data structures to account for differences such as pointer representation and even the sizes and layouts of data structures—the application binary interface (ABI)—between the application and sandbox.  Only then can we run tests to ensure our retrofitting didn’t break the application.

Luckily, we don’t have to do this manually. Type systems are meant to help with such tasks and our sandboxing framework, RLBox, precisely takes advantage of this.

Using information flow types for robust library sandboxing

RLBox is a sandboxing framework, implemented as a C++ library, designed to make it easier for developers to securely retrofit library sandboxing in existing applications.  It does this by making data and control flow at the application-sandbox boundary explicit—using types—and by providing APIs to both mediate these flows and enforce security checks across the trust boundary.

RLBox mediates data flow using tainted types—it uses type wrappers to demarcate data originating from the sandbox, and ensure that application code cannot use this data unsafely. Tainted types impose a simple static information flow control (IFC) discipline on the application code. For example, while application code can add two tainted<int>s (to produce another tainted<int>), it cannot branch on such values or use them as indexes into an array. Instead, the application must validate tainted values before it can use them.

RLBox mediates control flow with explicit APIs for control transfers.  Calls into the sandbox must use sandbox_invoke(sbx_fn, args...).  Callbacks into the application can only use functions registered with the sandbox_callback(app_fn) API.  These APIs also impose a strict data flow discipline by forcing all sandbox function return values, and callback arguments, to be tainted.

As we show next, this tainted-type-driven approach addresses both the security and engineering challenges we mentioned earlier.

Using tainted types to prevent sandbox breakouts

RLBox uses static IFC to (1) identify unsafe control- and data-flows, and (2) disallow unsafe flows by turning them into type errors or by automatically inserting security checks.  Concretely, RLBox automatically sanitizes sandbox-supplied (tainted) pointers to ensure they point to sandbox memory, swizzles pointers that cross the trust boundary, and statically identifies locations where the developer must manually validate tainted data before it is used.

Consider, for example, the JPEG decoder callback from before. RLBox type errors would guide us to (1) mark values from the sandbox as tainted (e.g., the jd argument and src variable on line 2 below) and (2) copy and verify (otherwise tainted) values we need to use (e.g., jd->client_data on line 3 below).

 1: void fill_input_buffer (rlbox_sandbox& sandbox, tainted<j_decompress_ptr> jd) {
 2:   tainted<jpeg_source_mgr*> src = jd->src;
 3:   nsJPEGDecoder* decoder = jd->client_data.copy_and_verify(...);
 4:   ...
 5:   src->next_input_byte = new_buffer;
 6:   ...
 7:   if (/* buffer is too small */) {
 8:     JOCTET* buf = (JOCTET*) realloc(...);
 9:     if (!buf) {
10:       decoder->mInfo.err->msg_code = JERR_OUT_OF_MEMORY;
11:       ...
12:     }
13:     ...
14:   }
15:   ...
16:   size_t nr = src->bytes_in_buffer.copy_and_verify(...));
17:   memmove(decoder->mBackBuffer + decoder->mBackBufferLen,
18:       src->next_input_byte.copy_and_verify(...), nr);
19:   ...
20: }

Whenever we want to convert from a tainted to untainted type we need to copy the data from the sandbox and validate it: On line 3, 16, and 18 we do this by writing several validators (as C++ lambdas to the copy_and_verify method).  As we describe in our paper, validators typically fall into one of two categories: preserving application invariants (e.g., memory safety) or enforcing library invariants.  On line 3, for example, we must ensure that decoder points to a valid nsJPEGDecoder object not used by a concurrent thread, while on line 16 we ensure that copying nr bytes won’t read past the mBackBuffer bounds.

We must get validators right—a bug in a validator is often a security bug. In practice, though, validators are rare and short: The six libraries we sandboxed in our USENIX Security 2020 paper required 2-14 validators each, and these validators averaged only 2-4 lines of code. Most importantly, by making these validators explicit, RLBox makes code reviews easier: security engineers only need to review these validators.

What’s missing in this code snippet is almost as important: RLBox automatically performs some checks automatically. We don’t write any security checks on lines 2, 5, and 10, for example. Instead, RLBox uses runtime checks to automatically swizzle and sanitize the src, src->next_input_byte, and decoder->mInfo.err pointers to point to sandbox memory. We can do this whenever the validator is not application specific; beyond pointer swizzling, for example, RLBox automatically null-terminates C strings, bounds array accesses (to the sandbox boundary), and handles machine model conversions.

Using tainted types to minimize engineering effort

Manually migrating an application to use library sandboxing is labor intensive, and demands a great deal of specific knowledge about the isolation mechanism. RLBox abstracts away many of these specifics (e.g., tainted types allow us to automatically handle ABI differences between the application and sandboxed library), making migration relatively simple and mechanical.

We still need to change the application code to use RLBox though.  In particular, we need to add a trust boundary at the library interface by turning all control transfers (i.e., library function calls and callbacks) into RLBox calls, and we need to write validators to sanitize data from the library, as we saw above.  Making these changes all at once is frustrating, error-prone—overlooking a single change might suddenly result in crashes or more subtle malfunctions—and hard to debug.

To allow developers to modify application code to use the RLBox API one line at a time, we again borrow ideas from the PL community—this time from gradual typing. Specifically, RLBox lets developers use escape hatches—similar to Haskell’s unsafePerformIO—while migrating their application code, and in turn allows them, at each step, to compile and test the application. Compile-time type errors guide the developer by pointing to the next required code change—e.g., data that needs to be validated before use, or control transfer code that needs to change to use the RLBox APIs. By the end of the process, the application is still fully functional, all the escape hatches have been removed, and the application-library interface has fully migrated to using tainted types.

Using tainted types outside of library sandboxing

The security challenges we face when sandboxing libraries are not unique to library sandboxing. Indeed, RLBox’s type-driven approach builds on lots of work from language-based security, especially coarse-grained IFC systems like (H)LIO, Nexus, Aeolus, COWL, and Co-Inflow which enforce information flow at API boundaries (e.g., to protect sensitive user data in applications like BIBIFI and Civitas). In general, developers have to handle untrusted data and control flow in many domains—and such a tainted-type approach can help. We give four examples.

TEE runtimes: Applications running in trusted execution environments (TEEs), like Intel’s SGX and ARM’s TrustZone, interface with untrusted code by design—TEEs even consider the OS untrusted. Getting this code right is hard. And, indeed, TEE runtimes contain similar bugs: Van Bulck et al. , for example, found that most frameworks, across several TEEs, were vulnerable to bugs that RLBox addresses by construction.

OS kernels: Operating system kernels handle untrusted data from user space.  Bug-finding tools have found many vulnerabilities in kernels due to unchecked (or improperly checked) user space data (notably, pointers)—going back to MECA at the start of the century.  Tainted types, and frameworks like RLBox could automatically identify where user space data needs to be checked and even perform certain checks automatically (e.g., much like we ensure that sandbox pointers point to sandbox memory, we can ensure that user space pointers point to user space memory). Indeed, Johnson and Wagner’s bug-finding tool even used type inference to find such kernel bugs.

Browser IPC layers: Modern browser architectures separate different parts of the browser into sandboxed processes. Almost all separate the renderer parts—the portion of the browser that handles untrusted user content from HTML parsing, to JavaScript execution, to image decoding and rendering—from the chrome parts—the trusted portion of the browser that can access the file system, network, etc.—and restrict communication to a well-typed inter-process communication (IPC) layer. Like OS kernels, the browser chrome must validate all values coming from untrusted renderer processes; like kernels, browsers have been exploited because of unchecked (and improperly checked) untrusted data. Here, again, tainted types can help—and as a step in this direction, Mozilla started integrating tainted types into the Firefox IPC layer, as part of the IPDL interface description language used to generate boilerplate code for sending and receiving well-typed IPC messages.

Web applications: While much of the IFC community has focused on preventing information leaks in web applications, by design, web applications handle untrusted data and web developers—both server and client-side—need to sanitize user data before calling potentially dangerous APIs. For example, they need to sanitize data before performing database queries (to prevent SQL injection), rendering HTML (to prevent cross-site scripting (XSS) attacks), or invoking external commands (to prevent command injection attacks). Tainted types can help. And (sometimes) they are already used to this end: Google, for example, uses a type-driven discipline to catch SQL injection and XSS bugs server-side. Client-side, browsers are implementing Trusted Types (which essentially the dual of RLBox’s tainted types) to help developers ensure that strings are always sanitized before rendered as HTML (and thus JavaScript).

This list is by no means exhaustive; others have similarly observed that tainting can be used to catch and prevent bugs when handling untrusted data (e.g., see Xu et al.). As these ideas are increasingly being used in practice, this is only likely to grow.

Conclusions

Making library sandboxing work in practice required PL techniques. But this is not the first use of PL to address challenges in software sandboxing. Indeed, PL has historically played an important role in the evolution of SFI—from developing compilers (e.g., PSFI on top of CompCert) and verifiers (e.g., Rocksalt for NaCl, VeriWasm for WebAssembly, and SFI in general) that provide high-assurance sandboxing, to language designs for sandboxed intermediate representations like PNaCl and WebAssembly, and semantics that can help us build faster and more secure SFI systems that can be used to sandbox code written in arbitrary languages. Undoubtedly it will play an even greater role in future.

Bio: Shravan Narayan is a PhD candidate in Computer Science at UC San Diego. Deian Stefan is an assistant professor at UC San Diego. They both work at intersection of security, programming languages, and systems.

Acknowledgements: We thank Michael Hicks and Todd Millstein for their guidance and helpful suggestions on earlier versions of this post. A slightly modified verison of this post appeared in a USENIX ;login article. This post is based on joint work with Craig Disselkoen, Tal Garfinkel, Nathan Froyd, Eric Rahm, Sorin Lerner, Hovav Shacham, and many amazing folks at Mozilla (especially Bobby Holley, Mike Hommey, Tom Ritter, and Peter Van der Beken).

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.