PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

In a previous post, I talked about the potential of program synthesis as an approach to discovering truths about the natural world. As it happens, a team that Armando Solar-Lezama leads and that I am a part of has just won a US NSF Expeditions in Computing grant to work on this topic. The Expeditions program aims to fund “ambitious, fundamental research agendas that promise to define the future of computing and information.” The program has played an important role in PL research. The ExCAPE expedition enabled major progress in program synthesis: the problem of automatically discovering programs from specifications. The ongoing DeepSpec expedition is making the dream of formally verified software ever closer to reality. I am optimistic that our expedition will continue this tradition and produce results that are deeply grounded in PL while also being of fundamental interest to science and the broader society. 

My last post primarily focused on the application dimension of the project. In this post, I will talk about the central technical idea that we aim to pursue: the data- and specification-driven synthesis of neurosymbolic programs, or programs in which traditional code is melded together with neural modules

Bridging Program Synthesis and Deep Learning 

Program synthesis is a hot topic of research in the PL and formal methods (FM) communities. However, it is worth noting that machine learning is also a form of program synthesis, because a computable function learned from data is nothing but a program. Specifically, deep learning techniques can be seen to synthesize parameters for a particular class of low-level programs (neural networks) using a specific synthesis algorithm (stochastic gradient descent).

PL/FM-style synthesis and deep learning have different advantages. Regarding program representations, neural networks can represent arbitrarily complex patterns in data, but this representation is also inscrutable to humans. In contrast, the programs that PL/FM approaches generate are high-level and interpretable. As for synthesis algorithms, gradient-based parameter synthesis scales to extremely high-dimensional input and parameter spaces. On the other hand, synthesis techniques in PL/FM can use automated reasoning and the structure of the programming language to decompose problems and prune large parts of the search space. 

The vision of our project is to bring together these two kinds of approaches. We imagine neurosymbolic program representations that blend together neural networks and compositional, high-level code. We also imagine neurosymbolic program synthesis techniques that simultaneously synthesize the neural and symbolic parts of such programs, using a combination of deep learning and symbolic methods from PL/FM. 

Neurosymbolic program representations

A neurosymbolic program is like a normal program in a compositional, high-level language, except it is also allowed to invoke a set of neural networks as library routines. The inputs and outputs of these neural networks may themselves be constructed using neurosymbolic programs. 

Because they contain high-level, compositional elements, such programs are more human-interpretable, and more easily analyzed using symbolic tools from PL/FM. Also, while learning these programs, we can go by more than data; we can also provide a strong inductive bias for the learner using constraints that encode rich domain knowledge. Doing so can lead to more reliable and generalizable learning. 

Let’s now look at a few examples of neurosymbolic programs. 

Example: Symbolic Composition of Neural Modules

Consider a program

\lambda x. f(N_1(x), \dots, N_k(x))

where f is an expression in a symbolic domain-specific language (DSL), and N_1,…, N_k are neural networks. One could imagine using such a program to represent computations that operate on visual or auditory signals. The neural components of such a composition abstract low-level patterns in these signals into higher-level features. The symbolic component f performs a higher-level computation over these features. 

A few recent papers study the automatic synthesis of such programs. In the paper Houdini: Lifelong Learning from Program Synthesis, Valkov et al. synthesize programs that use higher-order functional combinators such as map and fold to orchestrate a set of neural networks. The neural modules process visual inputs; higher-level symbolic primitives use the outputs of the neural nets to perform algorithmic operations over images. 

In the paper Learning to Infer Graphics Programs from Hand-Drawn Images, Ellis et al. synthesize programs that translate drawings to Latex code. Given an input drawing, such a program uses a neural network to extract likely drawing primitives used to produce the drawing, then uses a symbolic program to generate concrete snippets of Latex. 

Example: Neural Composition of Symbolic Modules

Now think of a program

\lambda x. N(f_1(x),\dots, f_k(x))

where N is a neural network and f_1, …, f_k are expressions from a symbolic DSL. One could imagine using such a composition to perform data-driven prediction tasks over complex, semantics-rich structured data. Here, it is the symbolic functions f_i that are used to featurize the input data, for example, by extracting important substructures from it. The higher-level neural function N represents a way to aggregate the results of these queries and functions so as to maximize predictive power. Once again, the functions N and f_i are generated from data and constraints. 

In their paper Using Program Synthesis for Social Recommendations, Cheung, Solar-Lezama, and Madden give a method to synthesize programs of this form. A synthesized program takes as input a complex stream of events in a social network and predicts which events a particular user would prefer. Given a new event, the method first evaluates a set of (automatically synthesized) symbolic feature functions. A statistical model (an SVM because the paper was published in 2012; today it would be a neural net) is then used to assign weights to these features and generate the final label for the event.

Example: Algebraic Composition of Symbolic and Neural Modules

Finally, consider a program of the form

\lambda x. f(x) \oplus N(x)

which applies an algebraic operator \oplus to combine the outputs of a neural network N and a symbolic component f on the same inputs. Such a composition could capture settings in which a traditional symbolic program, constructed using a priori knowledge, approximately describes the best way to perform a task. The neural component adds a bit of additional, data-driven “spice” to the outputs to this program to boost performance in a particular execution environment. 

Verma et al’s paper Imitation-Projected Programmatic Reinforcement Learning offers an example of a method that synthesizes programs of this form. The programs here are additive compositions of a traditional programmatic controller (for example, a PID controller) and a neural network to control an agent that is exploring an unknown environment. The programmatic controller is reliable but is somewhat conservative. The addition of the neural component gives the agent some extra flexibility in decision-making and leads to higher performance, without completely compromising the guarantees coming from the programmatic controller.

Neurosymbolic Program Synthesis

Having discussed neurosymbolic program representations, we now turn to the problem of devising algorithms that produce them. The neurosymbolic synthesis problem generalizes classical machine learning as well as classical program synthesis. In the former context, one normally discovers models (programs) from noisy input-output examples that approximately describe a program’s behavior. In the latter, the specification that directs a synthesizer is a formal constraint on the syntax and semantics of a program. In the synthesis of neurosymbolic programs, the specification could include all of these things: noisy input-output data, constraints on the architectures of the neural modules, skeletal syntax (e.g., with “holes”) for the symbolic elements, and even formal semantic requirements such as the robustness of the overall program to perturbations to the inputs. 

It seems natural that algorithms for solving this synthesis problem would use a combination of gradient-based learning and symbolic search. Initial explorations of such combinations appear promising. For example, among the papers I mentioned above, Valkov et al. use a type-directed, top-down search through a space of programs to generate neurosymbolic programs in which certain parameters are missing, then uses gradient descent to learn values for those parameters. The Ellis et al. paper used the Sketch program synthesis system to generate programs operating over neurally generated features. The Cheung et al. paper used Sketch to generate symbolic features as well. The Verma et al. paper combined oracle-guided program synthesis with a constrained optimization technique called mirror descent to find optimal neurosymbolic controllers. These methods barely scrape the surface of what’s possible in this space; much is left for future research to discover.

Conclusion

Neurosymbolic program synthesis brings together two very different approaches to automating programming: the PL/FM approach of generating composable, human-interpretable code that can be plugged into a human-driven software engineering process, and the machine learning approach of discovering blackbox code that represents patterns not easily described in language. The problem has many pieces, including: 

  • how to design the DSLs that limit the symbolic parts of programs and the architectures for the neural parts, 
  • how to model uncertainty, 
  • how to effectively combine the combinatorial search over symbolic program components with the gradient-based optimization of neural network parameters, 
  • how to decide when to deploy a specific synthesis algorithm, 
  • how to ensure that any synthesized program satisfies worst-case or statistical guarantees. 

I expect that many exciting results on these topics will come out in the next few years, and I look forward to sharing some of these results through this blog!

Bio: Swarat Chaudhuri is an Associate Professor of Computer Science at UT Austin. His research background is in Formal Methods (program verification and synthesis). These days, he mostly works on problems in the intersection of Formal Methods and Machine Learning. Follow him on Twitter at @swarat.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.