PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

Soundness is something that everyone seems to talk about in program analysis. Indeed, soundness merely means “claim implies truth”—the analysis only tells the truth. Why would anyone accept anything less?

However, this straightforward notion becomes complicated and often counter-intuitive for static analyses. A static analysis establishes properties over all program executions. This immediately introduces confusion, especially when an analysis makes detailed, multi-value claims, instead of merely claiming “the program is (not) ok”. As an interesting case, consider a value-flow computation, i.e., an analysis that computes that variable x holds values v1, v2, …, vN. What is the exact claim of the analysis (and hence the soundness obligation) in this case? Some intuitively straightforward interpretations are:

(A) There is some execution of the program in which x holds v1, some (possibly different) execution in which it holds v2, etc., all the way to vN. [“All these values are realizable”]

(B) In every execution of the program, x can only hold one of v1, v2, …, vN. [“Only these values are realizable”]

Analyses that intend to claim variants of (A) are called must-analyses, while analyses that aspire to claim (B) are called may-analyses. May-analyses are statistically dominant: the vast majority of published static analysis papers intend to compute a superset of all possible values, rather than a guaranteed-realizable subset. The (B) claim is simply the highest-value application of static analysis, i.e., the kind of thing that we usually want to know about a program.

The Trouble with Soundness

The rather surprising fact about soundness is that no realistic, useful whole-program static analysis can soundly make claim (B). Our analyses cannot be telling the absolute truth over the question we most care about, unless they say things so general that they end up useless! The reason is that static analysis cannot model program behaviors that involve unknown code (e.g., dynamically generated/loaded code, binary/native code) or dynamic language features (e.g., reflection, eval, invokedynamic). For instance, the following Java code fragment invokes an unknown method, identified by string methodName, over an object, obj.

Method m = obj.getClass().getMethod(methodName);
m.invoke(obj);

String methodName could be a true run-time value—e.g., it could be read from a file or external resource. Object obj could itself be of a type not available during analysis—e.g., obj could be deserialized from the network and statically typed using an interface or root-of-hierarchy type.

Therefore, for an analysis to be sound, it needs to modify its claims. The typical soundness result qualifies both the kinds of acceptable programs (e.g., precluding those that use dynamic features) and the kinds of acceptable executions (e.g., precluding those that are hard to handle, such as interference by other threads, or throwing exceptions). The modified claim then becomes:

(B’) for programs in language subset Lang’ (of a full language Lang), for all executions in a subset Exec’, x can only hold one of v1, v2, …, vN.

This claim is carefully qualified and indeed can be made soundly, i.e., truthfully, and with a rigorous proof! However, several aspects are less than satisfactory:

  • The qualifications Lang’ and Exec’ eliminate the overwhelming majority of realistic programs “in the wild”. Almost all real-world programs employ dynamic features. For instance, in a recent study, 80% of a 461-program sample of representative Java programs employ reflection. If other dynamic features and native code are included, the percentage of Java programs that are covered soundly by a typical static analysis is effectively zero!

  • It is easy to misinterpret the claim of the analysis, unless one reads carefully the specification of Lang’ and Exec’. Consumers of analysis results are likely to be surprised or misled, e.g., expecting that the analysis gives a guarantee for all program executions or for all programs in a language.

  • Claim (B’) as a model of soundness creates perverse incentives for analysis designers. Covering more language features in a static analysis makes soundness proofs harder or impossible. It becomes easier to have soundness proofs for weaker analyses (which omit some language features, adjusting Lang’ accordingly) than for stronger analyses. Although it would be possible to specify an analysis for more features, yet prove soundness for only a subset of them, this is perceived as a negative (“your analysis is not even sound”) instead of as a positive—e.g., by reviewers of papers.

In fact, the tendency to freely omit language features from an analysis’s handling, while offering (B’-style) proofs of soundness, has led to the introduction of the term “soundy”. Despite the derogatory tone, “soundy” analyses are the current good case of static analyses! They are realistic analyses that handle all “normal” language features soundly.

Defensive Analysis

There is, however, a realistic way to make the strong original claim (B) of soundness. This is the way proposed in our ECOOP’18 “best paper”, on defensive analysis: a new static analysis architecture that aims to address the above shortcomings.

The cornerstone of defensive analysis is a reformulation of the analysis claim. Again, consider an analysis that computes that variable x holds values v1, v2, …, vN. The analysis claim becomes:

(C) the analysis computes results together with soundness markers: if variable x is marked “sound” by the analysis, then the strong soundness property, claim (B), holds. (I.e., in all executions of the program, x can only hold one of v1, v2, …, vN.)

Having soundness claims of the form (C) has several advantages.

  • The soundness theorem applies without (or with very few) language or execution qualifications, thus covering all (or most) real-world programs and behaviors.

  • The qualifications are no longer on the language features used or on the nature of dynamic executions (which are hard to discern in a large program), but on locations readily identifiable in the program text.

  • The extent of soundness is itself an analysis output: the percentage of variables that the analysis can mark “sound”. This is a measurable quantity that allows comparing analyses.

We call this quantity the “coverage” of the analysis. In a more general analysis form, coverage is a measure of program elements (e.g., variables, instructions, functions) that an analysis can mark “sound”.

Notably, claim (C) is just a specialization of the full claim (B). For every analysis that satisfies (C), we can directly produce one that satisfies (B), by merely having any variable without a soundness marker get assigned a special value Top, interpreted as “any possible value”. However, viewing the claim in terms of soundness markers emphasizes the obligation of the analysis to compute its own coverage.

Not All Is Rosy

Of course, this new model of analysis claims is not without its drawbacks. These come in two forms: First, it is challenging to produce an analysis that can (soundly) make claims of the form (C). Second, when the analysis does so, the soundness markers are likely to cover only a minority of the program’s variables, i.e., coverage may be small.

Technical Difficulty

Let’s look at the first issue. The challenge of making claim (C) is to distinguish parts of the program that are certain to stay “unpolluted” from behavior that the analysis cannot model statically. This is a hard goal because dynamic code (e.g., reflection, dynamically loaded classes) can do anything: it can add dynamically-generated subclasses with never-seen methods that get called (via dynamic dispatch or first-class functions) at unsuspecting program points; it can call any existing method or alter fields; worst of all, it can wreak havoc on all parts of the heap reachable from any reference it can access.

To address this challenge, a defensive analysis needs to keep extremely close tabs on where object values have been throughout their tracked lifetime. Consider the case that the analysis wants to place a soundness marker on a program expression x.fld at program instruction i. (Placing markers on expressions that involve fields at specific program points is equivalent to placing them on variables, as in the original formulation of claim (C): it is like adding a new variable at instruction i that just reads the value of the expression.)

For instance, for the analysis to be able to claim that program expression x.fld can only hold value v at instruction i, it needs to have tracked x.fld from an initial assignment (that gives it value v) and to establish several hard properties, throughout all subsequent program paths that reach i:

  • that the possible objects that x holds cannot escape into non-modeled code
  • that these objects cannot have their fld receive values other than v
  • that these objects cannot have their fld receive values from variables that do not themselves have a soundness marker.

These are heavy obligations, and will typically make a static analysis unscalable to large programs. It is certainly possible that future research will find great ways to establish such properties efficiently. For now, defensive analysis makes a specific form of claim (C), in which the soundness markers are implicit: any variable that would not have a soundness marker merely gets assigned an empty set of values. This allows the analysis to be lazy, performing as little work as possible, which results in great scalability, but may reduce coverage.

Effective Coverage

Obtaining realistic coverage is the second issue with the new model of analysis claims. Our current defensive analysis places soundness markers on under half the variables of realistic programs—in fact, just around 45% on average, for a liberal counting of variables. Is it true that some-55% of the variables of these programs can be affected by (adversarial, worst-case) dynamic code or reflection? Possibly yes, but intuitively the estimate seems rather conservative.

The current coverage is certainly good enough for several realistic applications (e.g., optimization) but in the future one would hope for certified-sound handling of the majority of a program’s text. It is an interesting challenge to see which techniques can help coverage rise dramatically. Could there be human-supplied assertions about the behavior of dynamic code? Is there overlap with techniques for secure compilation?

Conclusion

Telling the truth about all program behaviors collectively is hard: to say something useful, instead of highly general platitudes, one has to make assumptions. Conventional assumptions typically qualify the language features a program uses or the allowed dynamic behaviors—but these qualifications exclude the vast majority of realistic programs. A new kind of qualification applies to all programs but requires the analysis to compute the subset of the program for which it promises to tell the truth vs. that for which its estimate may be violated.


Bio: Yannis Smaragdakis is a Professor at the University of Athens. He loves hacking static analyses, especially in Datalog, and is a recipient of an NSF Career award and an ERC Consolidator grant.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.