PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

This blog has had several recent posts about program synthesis, the task of automatically generating a program that does what we want. Synthesis is an exciting technique on the cutting edge of programming language design and implementation. In this post, I want to get concrete and show what’s involved in building a synthesis tool. By the end of this post, we’ll have a tool that can synthesize simple arithmetic expressions.

Getting started with Rosette

There are several great off-the-shelf frameworks for program synthesis, including the original Sketch language. For this post, we’re going to use Rosette, a language developed by Emina Torlak at UW, that adds synthesis and verification support to the Racket programming language. The nice thing about Rosette is that it’s an extension of Racket, so we’ll be able to use many of Racket’s nice features (like pattern matching) while building our synthesizer.

Following along: the code for this post is available on GitHub. If you’d like to follow along, you’ll need to install Racket and Rosette. Then you’ll be able to run Rosette programs either with the DrRacket IDE or the racket command-line interpreter.

Rosette’s key feature is programming with, and solving, constraints. In a Rosette program, some variables have unknown values we call symbolic constants. Rosette automatically determines the values of these constants according to the constraints we write.

For example, we can try to find an integer whose absolute value is 5:

#lang rosette/safe

; Compute the absolute value of `x`.
(define (absv x)
  (if (< x 0) (- x) x))

; Define a variable y bound to a symbolic constant
(define-symbolic y integer?)

; Solve a constraint saying |y| = 5.
(solve
  (assert (= (absv y) 5)))

This program outputs:

(model
  [y -5])

which is a model (an assignment of values to all the symbolic constants) in which y takes the value -5. The model satisfies the constraints we generated using assert, which in this case was just the single assertion (= (absv y) 5). Rosette found this value by using an SMT solver to solve the constraint. Of course, 5 would also have been a valid value for y — if there are multiple assignments that would work, Rosette chooses one arbitrarily.

Domain-specific languages

We’ve seen how to use Rosette to solve for constants, but program synthesis requires solving for programs. To bridge this gap, our synthesizer will target programs in a domain-specific language (DSL). A DSL is just a small programming language with exactly the features we are interested in.

To keep things simple, we’ll define a trivial DSL for arithmetic operations. The programs we synthesize in this DSL will be arithmetic expressions like (plus x y). To define our DSL, we first specify its syntax, using Racket structures (records):

(struct plus (left right) #:transparent)
(struct mul (left right) #:transparent)
(struct square (arg) #:transparent)

Our language has plus, mul, and square operators. The definitions give names to the fields of each operator, and the #:transparent annotation tells Racket to automatically generate niceties like string representations.

To define our DSL’s semantics, we’ll write an interpreter for our structures. The interpreter takes as input a program, performs the computations that program describes, and returns the output value. We can do this by recursing and pattern matching on the syntax structures we just created:

(define (interpret p)
  (match p
    [(plus a b) (+ (interpret a) (interpret b))]
    [(mul a b)  (* (interpret a) (interpret b))]
    [(square a) (expt (interpret a) 2)]
    [_ p]))

The recursion has a base case [_ p]—in Racket patterns, the underscore matches any value—that simply returns the input program p. This base case handles constants in our programs.

We can already use our interpreter to execute concrete programs in our DSL. For example, (interpret (plus (square 7) 3)) evaluates to 52, as we’d expect.

Synthesis with DSLs

Rosette automatically lifts our interpreter to work even with symbolic constants. For example, if z is bound to a symbolic constant, then evaluating (interpret (square (plus z 7))) returns an expression (* (+ 7 z) (+ 7 z)).

This lifting works no matter which part of the expression is symbolic. For example, we could create an expression that is either an addition or a multiplication, by creating a boolean symbolic constant and branching on it:

(define-symbolic z integer?)
(define-symbolic b boolean?)
(define expr (if b (plus z z) (mul z z)))

Running this expression through interpret returns the expression (ite b (+ z z) (* z z)), where ite evaluates to either its second or third argument depending on whether its first is true.

We now have everything we need to synthesize a simple program! Let’s try to find a program in our DSL that is equivalent to (square z) for every value of z. We use our expr from above to define the candidate programs we want to explore, by passing it as part of the assertion to Rosette’s synthesize form:

(synthesize
  #:forall (list z)
  #:guarantee (assert (= (interpret expr)
                         (interpret (square z)))))

and get back a model as output:

(model
  [b #f])

Our model is telling us that if we set b to false, then expr is equivalent to (square z). If we look back at our definition of expr, we see that setting b to false makes expr evaluate to (mul z z), which is indeed equivalent to (square z). The #:forall argument to synthesize means that it searched for a model that makes the assertion true for every possible value of z.

Sketching

In the example above, we would call the expr variable a sketch. It’s a template for the program we want to synthesize — it defines a set of possible programs, and the synthesizer searches for a program in that set that satisfies our assertions. In this case the set contained only two programs; in general, the more programs we include in the sketch, the more interesting a synthesis problem we can solve (but having more choices also makes synthesis more expensive).

One way to make our sketch more general is to allow it to choose both the operator (expr chose between plus or mil) and the arguments to it. We can do this using Rosette’s choose* form, which returns a single symbolic expression that can evaluate to any of choose*‘s arguments. For example, we could have written the definition of expr above as (choose* (plus z 2) (mul z 2)), which is an expression that can evaluate to either (plus z 2) or (mul z 2) (or equivalently, defines a set of two programs).

Let’s use this idea to synthesize a more interesting example. Suppose we’re given an expression (mul 10 z), and our job is to decompose it into the sum of two other expressions — in other words, find an equivalent expression of the shape (plus ?? ??). We’ll do this by first defining a new function ??expr that evaluates to a set of expressions from our DSL (generalizing the idea of expr above):

(define (??expr a b c)
  (define left (choose* a b c))
  (define right (choose* a b c))
  (choose* (plus left right) (mul left right) left))

In this code, each of left and right can evaluate to any of a, b, or c, and then the return value can evaluate to plus or mul expressions using those values, or to one of those values directly. For example, (??expr z 1 2) evaluates to a set of programs that includes (plus z 1), (plus z 2), (plus z z), (mul z 1), (mul z 2), (mul z z), and so on — any depth-1 expression involving z, 1, or 2, but not to nested expressions.

We can use ??expr and some fresh symbolic constants to define our sketch:

(define-symbolic p q integer?) ; new constants
(define sketch (plus (??expr z p q) (??expr z p q)))

This sketch contains all programs that are of the shape (plus ?? ??), where each ?? (which we call a hole) can be filled with depth-1 expressions involving z, p, and q. The new variables p and q allow the synthesizer to generate some constant values, and so this sketch actually contains an infinite number of programs(!), since p and q can be any integer.

Finally, we can call synthesize to find a program that satisfies our new constraints:

(define M
  (synthesize
    #:forall (list z)
    #:guarantee (assert (= (interpret sketch)
                           (interpret (mul 10 z))))))
(evaluate sketch M)

Here, we’re saving the result of the synthesis to a variable M, and then using Rosette’s evaluate form to apply it to the sketch, so we can see the actual program that was generated rather than a model. We get this output:

(plus (mul 9 z) (plus 0 z))

which is equivalent to (mul 10 z)! This time, the synthesizer chose both the operations to use, and their arguments, including synthesizing the constants 9 and 0 (as assignments to the symbolic constants p and q). Of course, there are many different solutions to this synthesis problem, so Rosette chose one arbitrarily.

Scaling up

We’ve barely scratched the surface of program synthesis in this post. But already, we’ve done something cool: we built our DSL without thinking about synthesis or symbolic reasoning at all, and Rosette gave us those features for free.

The idea of using DSLs and sketching underpins a huge variety of exciting program synthesis tools. A DSL for synthesizing implementations of cryptographic primitives ships in Chrome and Android today. Synthesized network configurations rule out bugs in critical networking infrastructure. A data movement DSL helps synthesize efficient GPU code for machine learning and high-performance computing. DSLs for education can help teach K–12 algebra and give better feedback on CS1 assignments. And there are many applications of synthesis beyond computer science in the world at large.

Each of these applications requires scaling synthesis up beyond the simple examples in this post. If you want to dig in more, there are detailed Rosette tutorials that go into building bigger tools, as well as tutorials for other synthesis frameworks like Sketch.

Bio: James Bornholt is an Assistant Professor in the Department of Computer Science at the University of Texas at Austin. His research is in programming languages and formal methods, with a focus on automated program verification and synthesis.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.