Given a program and an assertion as input, the problem of program verification is to find whether all the program’s states that reach the assertion satisfy its condition. Suppose the program was open, i.e., it had calls to functions whose bodies are missing. Open programs can be verified by finding constraints on the input and output of the functions and using them in place of the function calls. Such constraints are referred to as specifications. This problem instance can be found when programs are developed incrementally or verified module by module for scalability. It can also be the case that the functions are third-party libraries with no source code available. In all these applications, we must reason about function calls to functions whose bodies are unavailable.

Automatic open program verification poses two significant challenges. First, the least restrictive specifications are desirable for the applications. When specifications are least restrictive, it will allow for as many implementations of the functions as possible. The notion of least restriction can be logically captured by maximality. Informally, when specifications are maximal, the specification of one function cannot be weakened logically without strengthening the specification of another function. The second challenge is that open programs can have complex control flow such as loops, recursion, or both. Along with specifications, one also has to infer adequate inductive invariants for these control-flow structures, which is known to be challenging by itself.

This post provides a summary of our PLDI’21 distinguished paper “Specification Synthesis with Constrained Horn Clauses”. The technique presented in this paper can infer both maximal specifications and invariants. In the rest of the post, we illustrate the problem with a simple example and then give an overview of the technique

*An Example*

We demonstrate the problem on a simple C-like program shown on the left side of the listing below. An integer x is initialized in the program, then decremented in a non-deterministic loop (i.e., the number of iterations that the loop takes to terminate is unknown). Finally, it is checked if x’s value after the loop terminates does not exceed the return value of function f. However, the body of the function f is missing. The constraint on the output of f such that the assertion is not violated is the specification of f. Notice that the program has a loop. To verify it in Floyd-Hoare logic style, we need to additionally find an adequate inductive invariant for the loop.

int x = 19; while (*) { x = x – 1; } int y = f() assert(x ≤ y); |
∀ x. x=19 ⇒ |

We now describe a useful intermediate representation for program verification. The listing above (right) shows the program’s verification conditions, albeit the loop invariant and the specification are represented by uninterpreted relations **inv**(x) and **f**(y), respectively. Such a representation of programs is referred to as a system of Constrained Horn Clauses (CHCs). A solution to CHCs maps all the uninterpreted relations to some interpretations from a fixed set of predicates (linear integer arithmetic here). These interpretations are such that when the relations are substituted by corresponding interpretations in CHCs, they make each one valid. For example, **inv**(x) and **f**(y) mapped to x <= 19 and y >= 19, respectively, is a solution. So, a solution to CHCs provides invariants and specifications for the program that the CHCs are representing.

*Some solutions are more equal than others*

You can observe that a system of CHCs can have more than one solution. We now characterize the solutions based on their usefulness. For the example, consider **inv**(x) := x ≤ 19 and **f**(y) := false. It can be verified that this is indeed a solution, except that it makes the last implication vacuously valid. For the underlying program, mapping **f**(y) := false implies that the constraint on function f is such that it makes the assertion unreachable. Such a constraint is not very useful, and hence undesirable. Let us call such solutions *vacuous*. Consider another solution **inv**(x) := x ≤ 19 and **f**(y) := y = 19. Unlike the previous one, this is a non-vacuous solution as it does not make any CHC vacuously valid. The specification y = 19 for function f allows it to return only one value (viz. 19). However, f can return additional values (like 42) without violating the assertion with the same invariant x <= 19. Thus, f admits a weaker specification. Unlike it, the specification y ≥ 19 (with the same loop invariant x ≤ 19) cannot be weakened further, and hence is a maximal (and non-vacuous) solution. Such a maximal and non-vacuous solution for the CHCs is preferable for the applications mentioned at the beginning of this post.

CHCs are gaining popularity as an intermediate representation in program verification. Naturally, there are many techniques that can find solutions to CHCs (e.g., 1,2,3,4). However, these techniques can generate vacuous solutions as they are not aimed at solving CHCs representing the specification synthesis problem. The problem of generating non-vacuous specifications can be expressed in a suitable format such that constraint solvers (e.g., for Satisfiability Modulo Theories or Syntax-Guided Synthesis) can solve them. However, they do not guarantee maximality. The recent approach by Albarghouthi et al on maximal specification synthesis aims at finding maximal specifications in a counterexample-guided loop and relies on an external model checker for C. Our approach is orthogonal: it is based on invariant synthesis and works directly at the CHC level. Thus, many existing verification frontends for various languages which generate CHCs may readily use it!

*Our Approach*

The outline of our technique is depicted below. The technique has interactions between two modules: the non-vacuous solver and the maximality checker. As the names indicate, the non-vacuous solver finds a non-vacuous solution to a system of CHCs, and the maximality checker checks if a non-vacuous solution is maximal. When a non-vacuous solution is not maximal, the maximality checker constructs a new system of CHCs. This new system is such that a non-vacuous solution to it is weaker than the solution checked. The new system of CHCs is passed again to the non-vacuous solver. This cycle continues till a maximal non-vacuous solution is found.

The non-vacuous CHC solver module has a novel algorithm to find non-vacuous solutions. The algorithm works in a *guess-and-check* loop. It guesses a candidate solution by exploiting the fact that CHCs are implications. Then, it checks whether the candidate is a non-vacuous solution, i.e., whether all CHCs are non-vacuously valid. If not, it tries to guess new interpretations such that a CHC that is not valid becomes valid. The guessing part of the algorithm alternates between backward and forward reasoning. In backward reasoning, the algorithm attempts to find new interpretations of the relations on the left-hand side of a CHC using the interpretations from the right-hand side. In contrast, forward reasoning finds interpretations of the relations on the right-hand side. Effectively, the backward reasoning is trying to propagate the safety property towards the initial states of the program, like the weakest pre-condition computation. On the other hand, forward reasoning resembles the strongest post-condition inference. The alternation of these techniques lets the algorithm use some part of already learned inductive invariants to discover specifications or vice versa.

The problem of generating non-vacuous specifications can also be expressed in a suitable format such that SMT- and SyGuS-based solvers can solve them. The non-vacuous solver module has support for automatically converting the input CHCs to the format accepted by these solvers to generate non-vacuous solutions. Since these solvers do not guarantee maximality, the generated solutions need to be weakened incrementally using the maximality checker module. However, in our experiments, we found that our non-vacuous CHC solving algorithm performs significantly better than these other tools.

The technique has been implemented in a tool **HornSpec**, which is available for public use.

*Conclusion*

Our PLDI’21 paper proposes a technique to discover maximal and non-vacuous specifications and invariants by treating the problem as an instance of a CHC problem. The technique has a novel algorithm based on forward and backward reasoning to get non-vacuous solutions. Such solutions are iteratively generalized till they become maximal. So far, our technique works on input programs over integer variables and operations in linear arithmetic. Extending the technique to commonly used data structures like arrays and linked lists, as well as supporting bit-precise reasoning is an essential but challenging future work. We are currently working on new versions, and are looking forward to your feedback.

**Bio: ***Sumanth Prabhu is a PhD candidate at Indian Institute of Science, Grigory Fedyukovich, Kumar Madhukar, and Deepak D’Souza are professors at FSU, IIT Delhi, and IISc, respectively.*

**Disclaimer:** *These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.*