PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

Have you ever found yourself debugging some code, where the input that triggers the bug is huge? Maybe you’re analysing data from a log file, and the program you’re using to do the analysis is crashing. The log file is massive, and you’ve no idea why the crash occurs. Often it’s easier to first cut the input down to size before you start debugging so that the specific problem is more visible. This is tedious to do by hand, but fortunately there are automated tools to help with this, called test-case reducers, or just reducers for short.

As well as being useful for inputs found in the wild, reducers are even more useful, indeed almost essential, when proactively looking for bugs using randomized testing. Randomized testing is excellent at finding bugs, but randomly-generated test cases tend to be large and hard to understand, usually containing a lot of irrelevant noise. A reducer can remove this noise, making the test cases easier to read, and thus the bug easier to understand.

We — the authors of this post — are really keen on test case reduction, and we think you should be too! In this post we’ll give a brief overview of how test case reduction works, and in follow-on posts we’ll talk about some interesting practical applications of test case reducers, as well as areas that present open research challenges.

A brief overview of test case reduction

Let’s assume we have some program and a test case that triggers a bug in it. A test case reducer’s job is to find a smaller and simpler test case that still triggers the bug. Ideally the resulting test case should be simple enough that it’s useful for understanding the root cause of the bug.

From a user’s perspective, the inputs to a test case reducer are:

  1. A procedure, triggersBug(testCase), for checking whether a particular bug in a program is triggered when the program is executed on a test case. This is typically a script that runs the program on the test case and then checks for symptoms of the bug, such as whether the program’s return code is non-zero, whether a segfault has occurred, or whether some specific error message is written to a log file.
  2. An initial test case that triggers the bug, i.e. a test case initial for which triggersBug(initial) holds.

The output from the test case reducer is a reduced test case that triggers the bug, i.e. a test case final that is simpler than initial, such that triggersBug(final) also holds.

Most reducers implement an algorithm along the lines of this pseudo-code:

bestTestCase = initialTestCase;
while (keepTrying()) {
  for (evenSimplerTestCase : simpler(bestTestCase)) {
      if (triggersBug(evenSimplerTestCase)) {
         bestTestCase = evenSimplerTestCase;
         break;
      }
  }
}
return bestTestCase;

This algorithm is parameterised by two procedures:

  • simpler(testCase) returns a set of test cases that are simplified versions of testCase. These are candidates for becoming the new best test case, if they turn out to still trigger the bug. An example simplification might be removing some lines from a file, or replacing a node in a tree data structure with one of its children.
  • keepTrying() determines when the reduction process should terminate. This could be because e.g., the reducer has run out of simplifications to make, reduction has hit a built-in time limit, or the user has gotten impatient and terminated the process.

The above pseudocode omits the fact that the reducer will need to maintain state to keep track of which simplifications have been attempted so far, in order to guide which ones to try next and determine when to stop. There are also many opportunities for optimizing the process. For example, the order in which test cases are proposed by simpler() can have a substantial performance impact, and the results of triggersBug() can be cached to avoid needlessly running the program under test if the same test case is considered multiple times during a reduction run.

Test case reducers can be broadly categorised as domain-agnostic or domain-specific.

Domain-agnostic reducers

A domain-agnostic reducer is one that can be applied to essentially any test case format, by treating a test case as a series of bytes and assuming nothing more about its expected structure.

The most famous test case reduction algorithm, Delta Debugging, is domain-agnostic. It was introduced by Zeller in his 1999 paper “Yesterday, my Program Worked. Today, it Does Not. Why?” as a method for understanding breaking changes in programs, and then adapted to work as a test case reduction algorithm in Zeller and Hildebrant’s “Simplifying and Isolating Failure-Inducing Input” paper.

Delta debugging works by systematically attempting to delete contiguous sequences of lines (or, alternatively, bytes) from the test case. The algorithm initially tries to delete large chunks, in the hope of quickly eliminating most irrelevant parts of the test case, gradually considering successively smaller chunks until deleting any single line (or byte) stops the bug from triggering.

An example state-of-the-art implementation of Delta Debugging is Picire.

A domain-agnostic reducer has the advantage of being applicable across many domains, but its performance (in terms of the quality of reduced test cases) may be limited when applied to highly structured data.  For example, suppose we have found a program that crashes a C compiler, and that the program contains a three-argument function foo:

int foo(int a, int b, int c) {
  ...
}

Let’s say that at least one call to foo must be present in the program for the crash to occur, but that the reducer manages to delete enough statements from the body of foo such that the b parameter is no longer referenced. It would be nice if the reducer could go further and get rid of the irrelevant parameter, but this would require changing the function’s signature and patching up all calls to the function so that they do not supply an argument for b. These changes cannot be achieved via line-based deletions; they require domain-specific knowledge about functions in the C programming language.

Domain-specific reducers

Some reduction algorithms and tools can take advantage of domain-specific information, and others are crafted to work exclusively in certain domains. For instance:

Hierarchical Delta Debugging is a refinement of Delta Debugging that takes both a test case to reduce and a description of the test cases’s input format (as a context-free grammar). By exploiting the grammar it can remove and simplify large, structured chunks of data, without wasting time making deletions that are guaranteed to make the test input invalid. Exploiting grammars also allows simplifications that would be impossible to find with standard Delta Debugging as they cannot be expressed as single deletions of contiguous lines. The idea was originally proposed by Misherghi and Su in “HDD: hierarchical delta debugging” in 2006.

Property-based testing libraries are used to generate instances of domain objects for testing arbitrary code. As a result they often need reducers that are capable of handling specific types that cannot easily be represented as sequences of bytes (e.g. a balanced tree, or a record of user data). These types will need a reducer capable of handling them, which is typically written by a user of the library. Some property-based testing libraries have generic methods available. For example QuickCheck for Haskell offers generic reducers for algebraic data types that use their uniform representation as tagged trees of data. This allows simplifying transformations such as replacing a subtree of a data structure with a leaf node. It works well as long as invariants on the data type do not need to be maintained (e.g. replacing a node with a leaf would likely destroy a balancing-related invariant). Hypothesis, a property-based testing tool for Python originally created by David, uses an approach called internal reduction. This involves a domain-specific reducer that operates on the choices made during test-case generation. By perturbing and simplifying these choices, it runs the generator again and again to search for shorter interesting test cases. Because reduced test cases are produced by the test case generator, they automatically satisfy any invariants that the generator ensures (see “Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer” for details, or Hypothesis: A new approach to property-based testing for an overview of Hypothesis).

C-Reduce is a reducer that specifically targets C and C++ programs. It can be applied to arbitrary C and C++ code, and is particularly useful for reducing large programs generated by compiler fuzzing tools such as Csmith (paper, code) and YARPGen (paper, code, recent SIGPLAN blog post).  C-Reduce uses a combination of (a) domain-agnostic transformations, such as deleting chunks of text (in the style of Delta Debugging), (b) somewhat domain-specific transformations, such as replacing a region of the form {} with its interior, and (c) highly domain-specific transformations that use the Clang compiler framework to make sophisticated simplifications to C and C++ programs. Interestingly, C-Reduce can be invoked in a mode where transformations of type (c) are disabled, in which case it can be applied to programs in arbitrary languages. It might not sound like this would work well, but actually transformations of type (b) depend only on the language having C-like features (such as enclosing blocks in the { and } delimiters) – C-Reduce has been reported to work remarkably well for languages such as Java, JavaScript and OpenCL. Check out the C-Reduce paper and tool.

GraphicsFuzz, of which Alastair is the main author, is a suite of tools for finding bugs in compilers for graphics shading languages, such as the OpenGL shading language. It does this by randomly applying semantics-preserving transformations to an original graphics program, producing a larger, more complex, but equivalent program that should cause the same image to be rendered. GraphicsFuzz has an associated reducer that knows exactly how to revert the transformations that were randomly applied. This means that when a bug is found by transforming an original program, the reducer can search for a minimally-transformed version of the original program that still triggers the bug. This is extremely domain-specific: the reducer doesn’t just know about the intricacies of the OpenGL shading language; it also knows all about the particular transformations that GraphicsFuzz can apply, and how to reverse them. Check out this series of blog posts about GraphicsFuzz, a research paper on GraphicsFuzz and its reducer, and a recent experience report about how the tool has been used in industry.

Summary

Test case reducers are fantastic tools that can make your life easier by simplifying complex inputs that trigger bugs, speeding up debugging the problem. We have given a brief overview of how test case reducers work, the difference between domain-agnostic and domain-specific reducers, and examples of various practical tools for test case reduction. In a follow-on post we plan to look at some interesting uses of test case reducers that go beyond the domain of bugs.

Bio: Ally Donaldson is a Professor at Imperial College London, and is the main author of GraphicsFuzz and spirv-fuzz, two metamorphic fuzzing tools for graphics shader compilers, each with an associated test case reducer. David MacIver is the main author of Hypothesis, a popular property-based testing tool for Python that makes heavy use of test case reduction.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.