PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

Outside my life as a researcher, I run advanced trainings for professional software engineers. One thing I stress in my teaching is the importance of learning deep concepts over superficial instances. For instance, I teach programmers to tune out the noise of refactoring catalogs and learn how most familiar refactorings instead follow from a few primordial algebraic laws, such as how the identity X2*A = XA * XA, which flows easily from the axioms of high-school algebra, corresponds to replacing a function that accepts a boolean parameter with two functions. When I first started teaching working professionals, all refactorings I knew how to derive stemmed from the single-step reductions of a programming language, and from the algebraic laws of sums, products, and exponentials (functions).

Last year, when I was building my system that automatically generates programming tools from language semantics, I learned about defunctionalization, a transformation that replaces higher-order functions with equivalent first-order ones. Prof. Olivier Danvy had used it to show the correspondence between many paradigms for expressing programming language semantics. I quickly realized this transformation’s ubiquity as a refactoring technique, being implicitly used in tasks ranging from making a recursive function iterative to making a single-machine program distributed. And yet it was completely unheard of in the refactoring literature — and didn’t fit into my previous mental framework.

In this post, I’d like to give a summary demonstrating the technique and its uses, and how learning to recognize instances of defunctionalization and refunctionalization can help programmers more easily understand the tradeoffs in design choices. As a technique born in compilers research and nourished by formal semantics, defunctionalization is a prime example of how insights from PL research may be translated into practical programming advice.

When There are Too Many Functions

Consider the basic filter function, which inputs a list and outputs the sublist satisfying some condition. Every functional-programming student learns how to write something like the following code.

Version 1 (higher order):

filter :: (Int -> Bool) -> [Int] -> [Int]
filter f []     = []
filter f (x:xs) = if f x then x : filter f xs else filter f xs


filter isOdd          [1, 2, 3, 4, 5]
filter isPrime        [1, 2, 3, 4, 5]
filter (\x -> x < y)  [1, 2, 3, 4, 5]

This is a higher-order function, accepting the condition to check as a function passed by the caller. The code is totally agnostic to the function passed in, of which there may be infinitely many. Yet the filters you encounter in daily life do not work like this. When you go shopping online, you may click checkboxes to limit your search to shirts, pants, and boots, but you may not provide an arbitrary filter condition. In general, higher-order programming is challenging in today’s world of distributed applications; remote procedure calls with higher-order functional arguments would require serializing those functions, which is not easy to do safely or efficiently.

As an alternative, the programmer can design a data type to represent all filter functions of interest. Designing this type is a lot like designing a programming language: the programmer needs to decide what the filtering primitives are and how to combine them. Here’s one possible answer:

Version 2 (defunctionalized):

data Filter = IsOdd
            | IsPrime
            | LessThan Int
            | And Filter Filter

apply :: Filter -> Int -> Bool
apply IsOdd x   = isOdd x
apply IsPrime x = isPrime x
apply (LessThan y) x = x < y
apply (And f g) x = apply f x && apply g x

filter :: Filter -> [Int] -> [Int]
filter f [] = []
filter f (x:xs) =
  if apply f x then
    x : filter f xs
    filter f xs


filter IsOdd        [1, 2, 3, 4, 5]
filter IsPrime      [1, 2, 3, 4, 5]
filter (LessThan y) [1, 2, 3, 4, 5]

Version 2 is longer, yet it has the advantage that a Filter may be selected on one machine and used on another. It may seem that this is yet another problem which makes programming for many machines intrinsically more difficult than programming for one. But, in fact, the thought put into the design of this filter language is unnecessary. The above code can be automatically derived from Version 1.

Enter Defunctionalization

The above change is an example of defunctionalization, turning a higher-order function into a first-order one. In this section, we’ll replace the ad-hoc work of the previous change into a general and automated transformation.

Most of the work in this example was coming up with the cases in the Filter datatype and giving their implementations. These can be determined automatically from the code, but not the code we gave above defining the filter function. Rather, the insight of defunctionalization is to find all actual uses of the higher-order function. Each distinct use of the filter function yields a new case in the Filter datatype. The first two, IsOdd and IsPrime, are easy; they’d come from uses like these:

filter isOdd   [1, 2, 3, 4, 5]
filter isPrime [1, 2, 3, 4, 5]

The LessThan case is a little more interesting, as it takes an Int parameter.  Though it has infinitely many instantiations, that doesn’t mean the original code had infinitely many distinct uses. Rather, it comes from a single use which refers to outside information, e.g.:

filter (\x -> x < y) someList

In the above code, y is a free variable, defined by the outside context. This becomes the Int parameter in the LessThan case.

What happens when the free variables are functions, as in this code:

filter (\x -> f x && g x) someList

Then f and g must also be defunctionalized. But, since they have the same type as the overall filter, they themselves become filters, giving the recursive case (And Filter Filter).

All told, here’s the general procedure for defunctionalization:

  1. Collect all functions passed as an argument to the filter function.
  2. Create a data type, with one variant for each possible function, each with fields to store the free variables referenced by the corresponding function.
  3. Replace the invocation of the filter condition with an apply function, which determines what filter condition the data structure represents, and executes it.

Running this procedure on Version 1, with a suitable set of uses of the filter function, produces the code in Version 2. The “programming language” of filters has appeared in complete form, with no design effort required.

In choosing the defunctionalized form, we are making a tradeoff. The original version, called the direct or refunctionalized form, was open, meaning it’s easy to add a new kind of filter: just call it with a different lambda. In contrast, for the defunctionalized form, adding a new filter condition requires modifying the data structure that filter depends on. However, unlike functions, the data structures of the defunctionalized form may be serialized, with all the benefits that entails: printing, comparing, saving, and network transmission.

In the next section, we’ll observe many unwitting uses of defunctionalization in the wild. Superficially, these applications bear no resemblance to each other, although many of them yield some kind of state machine. Yet each application gives a design choice between two alternatives which offer a tradeoff between openness and serializability.

What Does Defunctionalization Enable?

Defunctionalization and refunctionalization have many applications in programming. Here, we show how defunctionalization explains several superficially-unrelated changes programmers often do, and sheds light on the tradeoffs made. Most of these combine defunctionalization with a more popular transformation, CPS (“continuation passing style”) conversion. In CPS, each function is given an extra argument, a function called a “continuation,” and will invoke this function with the return value instead of returning that value normally. The fusion of defunctionalization with CPS-conversion allows splitting actions into multiple pieces and do them at different times — or on different machines.


Changing a recursive function such as printing a tree into an iterative one can be tricky, requiring nested loops and an explicit stack. These stacks are precisely defunctionalized continuations, with the stack nature emerging because continuations may reference other continuations.

Recursive functions that have undergone CPS+defunctionalization tend to resemble state-machines. Prof. Danvy has also used the inverse transformation to turn some classic state-machine algorithms into previously-unknown stateless versions using clever recursion schemes.

Splitting a Computation across Multiple Machines

Imagine a banking website which contains code for closing an account and transferring its assets: amount = account1.withdrawAll(amount); account2.add(amount). Imagine if accounts 1 and 2 may have their data managed on different servers. Then the code would be altered to withdraw from the first account, and then send a message to the second server, signaling it to withdraw from the second account. This message is precisely a defunctionalized continuation.

A related use is in continuation-based web frameworks. For example, on the popular website Hacker News, a logged out user can post a comment, be taken to a login page, log in, and then be redirected to the original page with their comment posted. To implement this, the programmer needed only to write sequential code like “if not logged in, then go to login screen, then post comment;” the server would save everything to happen after logging in as a continuation to be invoked after completion of the login. However, because this continuation could not be serialized and sent to another machine, this decision limited the website to running on a single server, and would not work if the server rebooted before the user logged in.

To remedy this, the Hacker News team rewrote the website to store data about what action to take after login on the login page itself. This data, along with similar occurrences elsewhere on the website, can be seen as a defunctionalized continuation.

Understanding the correspondence between both versions of the website helps us understand how to gain the benefits of both. While Paul Graham, the celebrity programmer behind the original Hacker News code, boasted of his web framework’s simplicity and elegance, it’s likely that the code’s simplicity was marred by the need to manually define data structures for each distinct captured continuation. This suggests that language-level support for defunctionalization could be used to keep the beauty of explicit continuations while gaining the scalability of multiple machines.

A few systems, such as the MFlow and jmacro-rpc frameworks, have tried to circumvent this problem by sending continuations over the wire in a different way: by sending over information about the execution’s past, so that the other machine may replay a function call in order to get to where the first machine left off. This can be seen as an application of my ICFP 2018 paper on thermometer continuations.

Non-Blocking I/O

When a program reads from a file or network, it typically suspends to await data. To avoid lost throughput, programmers turn to non-blocking I/O, where the program signals its desire to read data, but receives the data later. One way to offer non-blocking I/O is to send the program a notification once the data is available, as in Linux’s SIGIO. The corresponding notification-handlers can be derived from the original blocking-I/O program by defunctionalizing a “delimited” portion of the continuation of the internals of the read function.

Fancy polymorphism in not-so-fancy languages

“Ordinary” polymorphism, as in Java’s generics, allows writing functions over containers with arbitrary type of data,  e.g.: writing a function over List<X>  for any type X. But to write functions over any container, such as a single function that works on both lists and trees, one must be able to abstract both List<X> and Tree<X> into F<X>, where F represents an arbitrary type function. A few languages, including Haskell, allow type variables that abstract over type functions (second-order polymorphism), but most languages with polymorphism do not. But there’s a trick for turning this kind of higher-order polymorphism into something that Java or OCaml can handle: turn types like List<Int> and Tree<Int> into App<List, Int> or App<Tree, Int>, demonstrated by the HighJ library.  It turns out that this is just defunctionalization at the type level.

Applications to Other Parts of PL Theory

Finally, no discussion would be complete without the original applications of defunctionalization:

  • Defunctionalization was introduced by John Reynolds in his 1972 paper “Definitional Intepreters for Higher-Order Programming Languages” as a way to compile higher-order languages such as ML into first-order languages such as C. Though typically eschewed today in favor of other techniques such as closure conversion,. It still finds uses today in whole-program optimizing compilers such as MLTon, and in PAKCS, which compiles the functional-logic programming language Curry into the “first-order” logic programming language Prolog.
  • Olivier Danvy and his collaborators popularized defunctionalization in the programming languages community through their work on the inter-derivability of different ways of specifying programming language semantics. A big insight was their explanation of evaluation contexts, “programs with holes” used in reduction semantics, explaining them as the defunctionalized continuations of small-step and big-step semantics.


Defunctionalization is a versatile technique, simple yet obscure, with applications in compilers, theoretical PL, software engineering at all levels, and perhaps even in language design. By both showing us the exact manner of how many pairs of programs are equivalent — and how many techniques for altering programs are equivalent, it demonstrates that understanding a subject more deeply means discovering more things are the same.

For more details about all the examples here, see the enriched transcript of the original talk.

Bio: Jimmy Koppel is a fifth-year Ph.D. student  in the Computer-Aided Programming group at MIT, with research focusing on meta-metaprogramming. He also runs a business training software engineers at the advanced level, and previously won the “20 Under 20” Thiel Fellowship for entrepreneurship.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.