The SIGPLAN Research Highlights committee has chosen three outstanding articles published in SIGPLAN venues in 2019 as SIGPLAN Research Highlights.
As discussed in this earlier post, the committee selects nominated articles that meet the Communications of the ACM ‘s criteria for its own monthly Research Highlights, which showcase top research to a broad computer science audience. The SIGPLAN RH committee nominates its selections to CACM; in recent years, more than two-thirds of nominated papers have been selected.
The three chosen SIGPLAN Research Highlights for 2019 showcase work on
- optimizing parallel program thread scheduling;
- deriving machine learning models for code artifacts; and
- identifying threats to validity in online experiments written in the PlanOut domain-specific language.
Each of these papers is freely available.
Provably and Practically Efficient Granularity Control
VITALY AKSENOV, INRIA, France, and ITMO University, Russia
ARTHUR CHARGUÉRAUD, INRIA & Université de Strasbourg, CNRS, ICube, France
MIKE RAINEY, Indiana University, USA, and INRIA, France
Determining the granularity of parallel tasks is a long-standing challenge for parallelizing programs. Overly fine-grained parallelization incurs excessive overhead due to synchronization and control. Overly coarse-grained parallelism limits parallelism by impeding load balancing. Finding the grain size that hits the sweet spot usually requires non-trivial tuning effort. Almost all parallel programming models (Cilk, OpenMP, TBB, etc.) require the programmer to manually identify this grain size, which is usually program and machine dependent.
This paper presents a solution to this problem: a practical and provably efficient general automatic mechanism for finding the right granularity size for parallel programs. Their approach works by gradually coarsening the granularity until parity is achieved between the parallel and sequential versions. Its practical usability is evaluated against hand tuned granularity selection on the PBBS benchmark. The results often come close to hand-picked grain sizes and even outperform them in some cases. The authors provide a theorem that shows that their control algorithm achieves an execution time of within a small constant factor of the optimal solution (under some assumptions that are met by a wide variety of programs).
code2vec: Learning Distributed Representations of Code
MEITAL ZILBERSTEIN, Technion, Israel
OMER LEVY, Facebook AI Research, USA
ERAN YAHAV, Technion, Israel
Given a code snippet, what is the likely name of the function it implements? Can we predict properties of this snippet, such as how similar it is to other code snippets? Can we answer these questions without executing any code, given that we have access to large code databases? This work presents a novel methodology for addressing these questions by learning from data a so-called distributed representation of code snippets, similar to the distributed representations of words or documents from the natural language processing domain. The key technical innovation is a model that applies soft attention to the representations of all paths in a code snippet; and can thus learn to identify the most relevant paths for the task in hand. This highly-cited paper reports significant advances over previous work in predicting method names compared to previous work, includes a qualitative evaluation of the interpretability of the learnt attention over the paths of a program, comes with an open-source implementation, as well as online interactive demos.
See also Eran’s PL Perspectives blog posts on this topic.
PlanAlyzer: assessing threats to the validity of online experiments
EMERY D. BERGER, University of Massachusetts Amherst, USA
DAVID D. JENSEN, University of Massachusetts Amherst, USA
The paper presents PlanAlyzer, a first-of-its-kind static analyzer for the domain specific language (DSL) PlanOut. PlanOut is a leading DSL for expressing online experiments, which are a critical tool for designing and engineering online processes by internet companies. PlanAlyzer detects errors that can impact the validity of statistical conclusions. Previously, such experiments were only validated and analyzed by a human ad hoc effort. The paper lists various frequent potential pitfalls that would lead to an erroneous analysis of experiments, including failures of randomization, treatment assignment, or causal sufficiency. PlanAlyzer statically checks whether an input program involves one of these bad practices. Experimental results are provided for a few years’ worth of real PlanOut programs from Facebook. A subset of these programs are mutated and PlanAlyzer yields precision and recall of 92% on the mutated corpus.
Bio: Michael Hicks is a Professor of Computer Science at the University of Maryland, the past SIGPLAN Chair (2015-2018), and the editor of this blog.
Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.