Research projects unfold in many different ways. For our POPL 2010 paper From Program Verification to Program Synthesis, named at POPL 2020 as the Most Influential Paper for POPL that year, the research journey was launched by a question at a job interview, enabled by a fortuitous summer internship, scoped by deliberately focusing on the small, and driven, most of all, by curiosity, with a dash of very indirect revenge. This article describes that journey.
When we began working on our paper, two hot topics in PL research were program analysis and correctness. The hugely influential Proof Carrying Code, CompCert, and POPLMark Challenge projects were leading a resurgence in work on program verification. Another major trend was the increasing application of SAT and SMT solvers to PL problems. Despite SAT being NP-complete, many SAT instances can be solved efficiently in practice. By 2009, the Microsoft’s Z3 SMT solver was in use both inside and outside Microsoft for a wide set of problems involving program correctness, such as SDV, verified C, SAGE, and test generation. (Satisfiability Modulo Theories (SMT) extends SAT with special-case reasoning for theories such as bit-vector arithmetic, arrays, etc.)
Sumit had been interested in program analysis and correctness throughout his PhD studies. During Sumit’s job interview talk at Microsoft Research (MSR) in 2005, Nebojsa Jojic proposed an interesting challenge. He likened data flow analysis to equation solving, in which forward data flow analysis seeks to infer the postcondition given the precondition and the program, while backward data flow analysis seeks to infer the precondition given the postcondition and the program. Nebojsa asked Sumit what turned out to be a critical question: What happens if the precondition and the postcondition are known, but the program is unknown? “That would be program synthesis,” responded Sumit while wondering about the intriguing connections.
The Summer Internship
In early 2006, Saurabh had been working with Jeff and with Mike Hicks on a project on C module systems. Seeking a new research direction, Saurabh proposed doing a summer internship with Sumit, who had overlapped with Jeff as a student at UC Berkeley. With Sumit’s approval, Saurabh headed to MSR in the summer of 2006 to work with Sumit. The goal of Saurabh’s internship was to explore program verification using constraint solving, aiming to encode and prove verification conditions as more general SMT constraints. Nikolaj Bjørner and Leonardo de Moura had just started building Z3 that summer, and it quickly became apparent that Z3 should form a key part of the verification approach.
A Choice of Focus
After some initial success, Sumit, thinking back to his job interview, proposed Nebojsa’s problem to Saurabh: Could the verification techniques we were exploring also work if the program statements were unknown?
Saurabh was intrigued, but he immediately faced a conundrum. Several major projects at the time—such as FindBugs at UMD, SDV at Microsoft, and abstract interpretation at Airbus—applied PL techniques to production codebases, an appealing proposition. But, amazing as Z3 was (and is), synthesis plus verification of large programs was out of reach. Instead, we chose to focus on small yet complicated and interesting problems Z3 could scale to already. This choice enabled us to encode verification and synthesis of an entire program as a single SMT query! As it turns out, many useful and practical applications of program synthesis even today involve synthesizing various kinds of small, but entire, programs.
A Curious Pursuit and a Dash of Revenge
We began our synthesis work by building on our earlier verification work. We realized that if we looked at the kinds of verification problems we were passing to Z3 and squinted, encoded program statements seemed very close to, if not indistinguishable from, program constraints.
So, we got the idea that we could create Z3 queries in which programs were replaced by unknowns. But the space of possible program statements is large and broad, giving us no obvious way to encode an arbitrary statement as a Z3 variable. However, inspired by the Sketch synthesis tool, in which synthesis problems were regular programs with holes—unknowns to be solved for—and generators—which describe sets of statements or expressions—we developed the idea of using a template language to represent the space of possible expressions. Templates paved the way for DSLs (domain-specific languages), which have been the cornerstone for scalable and robust synthesis in various recent success stories.
Templates in our approach consisted of a triple <F, D, R>, where F is pre- and postcondition of the function to be synthesized; D is the set of possible program expressions the synthesizer can select from; and R describes the shape of the program, in terms of loops, sequencing, how many local variables are available, and, optionally, an upper bound on the number of times a certain operation can be performed. For example, for Bresenham’s line algorithm, F specifies that the output sequence of points is the discrete best fit to the desired line, D specifies that the valid expressions include variables and basic arithmetic, and R specifies that the program consists of initialization followed by a single loop, can use one temporary variable, and has no operation bounds. From such a template, our synthesis tool creates a synthesis condition, which is an SMT formula that asserts the unknown program, drawn from D in the shape of R, satisfies F and terminates. (See the paper for more details.)
In addition to Bresenham’s algorithm, Saurabh also applied the technique to synthesize sorting algorithms, dynamic programming problems, and one problem that Sumit had some history with: Strassen’s matrix multiplication algorithm. Back in 1995, Sumit took an introductory programming course, coincidentally at the same college Saurabh later attended. In one class, Sumit’s instructor said that two 2×2 matrices can be multiplied with 7 scalar multiplications instead of 8. Sumit asked how that was possible, but he was rebuffed! The instructor told him such premium material was only taught to CS students, but Sumit was in the EE program. Upset at this, Sumit wrote the college entrance exam again and was admitted to CS. Thus, many years later, Sumit had a dash of revenge: Not only does Sumit now hold the premium knowledge of how Strassen’s algorithm works, but he developed something his instructor could never dream of: An algorithm that can synthesize Strassen’s algorithm automatically and, if memory serves, five other variants of it.
For Saurabh, this POPL 2010 paper was the launching point for two companies he started. The first, 20n, employed program synthesis to design societally useful microorganisms. For example, 20n successfully showed how to make Tylenol from sugar by reprogramming yeast to brew medicines instead of alcohol. 20n was listed as an honorable mention in the top-10 from YC demo day 2015 by TechCrunch. His second company, Synthetic Minds, is still in stealth-mode, but Saurabh can reveal that it uses program synthesis in an even deeper way.
For Sumit, the title of this paper summarizes the journey of his career. This paper was written in the middle of his 20-year long research career, the first decade of which focused on verification and the second of which has focused on synthesis (a representative keynote). This POPL 2010 paper inspired several of his future synthesis papers, including the immediately next synthesis paper that he co-authored at ICSE 2010, which won the Most influential paper award at ICSE 2020 (a 7-minute retrospective). Moreover, in 2010, program synthesis wasn’t even mentioned in the call for papers for PL or SE conferences. Sumit lobbied successfully for it to be added, which has helped the synthesis community flourish. POPL 2010 and PLDI 2010 were the first major PL conferences with a session on synthesis.
For Jeff, this paper exemplifies the free flow of ideas, friendships, and strong collaboration across institutions and between academia and the corporate world. Saurabh interned with Sumit twice during the course of this work, and Saurabh was the catalyst for all of us to collaborate together, while laying the groundwork for his dissertation and later success after graduation. The three of us continued collaborating, leading to a subsequent paper in PLDI 2011, and Jeff continues research in program synthesis to this day, all launched by this POPL 2010 paper.
We would like to thank the conference chairs, reviewers, and award committee members for encouraging work that starts small but has big ideas that take time to mature. We also thank the research community for driving the tremendous, inspirational cross-disciplinary advances in program synthesis over the last ten years across various dimensions of program synthesis including techniques, user experiences, and diverse applications (to end-user programming, data science, developer productivity, personalized education, and interpretable AI), leading to synthesis competitions and several courses (Colorado, MIT, Purdue, UCSD, UW, etc).
What Comes Next?
Even a decade later, program synthesis is just getting started. Synthesis still has the unrealized potential to provide huge productivity increases for software developers and to democratize programming by enabling computer users, almost all of whom do not know how to program, to synthesize code to solve their problems. With impressive advances in AGI (such as OpenAI’s GPT-3 deep learning model), we have intriguing possibilities to leverage new classes of techniques that combine logical reasoning and machine learning. We enthusiastically welcome all researchers new to program synthesis to create the next big programming revolution.
Bios: Sumit Gulwani is a computer scientist seeking connections between ideas, between research & practice, and with people with varied roles. He is the inventor of the popular Flash Fill feature in Excel used by hundreds of millions of people.
Saurabh Srivastava is the founder of 20n and Synthetic Minds, both companies aiming to use technology, specifically program synthesis, to augment human capabilities. 20n aimed to do that for biologists, and Synthetic Minds is looking into the broader workforce. To learn more, please email him.
Jeffrey S. Foster is Professor and Associate Chair in the Computer Science Department at Tufts University. His research interests include program analysis and synthesis, type systems, security, and making fine chocolates.
Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.