Note: This is a cross-post from the ACM SIGARCH blog; see the original post for additional comments and discussion from the SIGARCH community.
When we started planning the ASPLOS’21 program committee in Spring 2020, we asked ourselves what we could do to make the review process better for everyone. In our opinion, the most impactful improvement would be to increase the signal available for each submission as we make decisions.
In our typical reviewing process, papers reach the PC meeting with five reviews, three from PC members and two from external reviewers. If the three PC members agree in either direction, the paper’s fate is quickly sealed. If there is disagreement between the three PC reviewers, it can be challenging for the committee to make a decision. Do the three PC reviewers have sufficient expertise on the paper topic, collectively or individually? Is one of them overly positive or negative? How should external reviews be factored in, especially since their authors are not as calibrated as PC members on overall submission quality? Without reading the paper, many PC members abstain from voting or vote in support of the most eloquent or senior reviewer in the room. A comment that “this may be an embarrassing paper” is often enough to trigger negative votes. The final decision often feels random.
This limited signal is problematic in earlier stages as well. Most papers receive two PC and one external review during the first reviewing phase (R1). If they are all positive or negative, it is easy to make a decision. However, it is difficult to judge a paper that has, for instance, two weak rejects and one weak accept R1 scores. If we reject a paper that would have otherwise gotten two positive R2 reviews, we have dropped a decent paper early. If we advance a paper that will get two negative R2 reviews, we have unnecessarily increased the reviewing load.
These observations led us to design a review process that ensures that a large number of well-calibrated PC members weigh in on all decisions for every paper. The challenge was to do so without significantly increasing the PC load. Extended abstracts were key in achieving this goal.
Extended abstracts in a nutshell
We asked authors to submit together with their full paper a two-page abstract (excluding references) that summarizes the key attributes of their work. We provided authors with a template for extended abstracts inspired by the questions from the Heilmeier Catechism and the submission format to the special issue of IEEE Micro for Top Picks from architecture conferences.
The template guided authors to summarize: 1) the problem they are addressing; 2) the shortcomings of the state of art; 3) their key technical insights; 4) the artifacts they implemented; 5) their results and contributions; and 6) why ASPLOS is a good match for this work. We (the two co-chairs) read the extended abstracts and used them to quickly understand submissions and guide reviewer assignments. We found that most authors followed our guidelines and provided well-written digests of their work.
Extended Abstracts in the Review process
In R1, we assigned the extended abstracts. At least 5 PC members and 1 external reviewer reviewed each extended abstract to determine whether this submission belongs to the top 50th percentile of this year’s submissions. Since each PC member reviewed 27 abstracts – nearly 7% of all submissions – they were sufficiently calibrated to make this decision. With 5 PC reviews per submission, we could solicit expert reviews in all aspects covered by the paper, as well as reviews that would assess the broad community interest in this paper’s contributions. Reviewers scored abstracts on a scale of 4 (strong reject, weak reject, weak accept, strong accept) and provided feedback for authors.
We used the resulting 6 or more scores to advance 55% of submissions to R2. Most submissions advanced to R2 had positive scores from the majority of reviewers. However, any submission that was championed (strong accept) by even a single PC member or an external reviewer with strong expertise was also advanced to R2. This approach also created a “skin in the game” incentive: reviewers who championed an extended abstract were in effect voting to review the full paper. We increased the R1 advance rate from the initial target of 50% to 55% to accommodate championed papers and minimize the impact of noise towards the bottom of the pile.
For continuity, we assigned full papers for R2 reviews primarily to those PC members who had reviewed the extended abstracts. We assigned at least 4 PC members and 1 external reviewer to review each paper and determine whether it should be accepted to the conference. While we solicited one or more external reviews per submission in each round to add expertise as needed, more than 80% of the reviews came from well-calibrated, PC members.
We analyzed the stability of reviewer scores across R1 and R2. The 222 submissions that advanced to R2 received 1083 reviews from the same reviewer in both phases. The reviewers assigned the same score in R1 (abstracts) and R2 (papers) to 45.5% of these submissions. For 37.7%, the R1 and R2 reviews differed by one grade. Only 1.5% of reviews exhibited a huge swing between the two phases, from strong accept to strong reject or vice versa. This stability of grades suggests that extended abstracts are a reasonable way to identify the top half of submissions. As expected, the grades trended negative as we moved into R2, since we asked reviewers to judge papers by different standards in each round (in R1, the top half; in R2, the top quintile).
We also used extended abstracts during the online discussion prior to the PC meeting. We solicited a few additional reviews within a very short amount of time. The abstracts helped reviewers to swiftly check for expertise match or provide quick feedback on the importance of the problem or the originality of the insight. We also made the abstracts available to all non-conflicted PC members during the online PC meeting so they could read a concise version of the paper before voting. Finally, we made the abstracts of accepted papers publicly available in order to help our community quickly read and digest the key contents of the ASPLOS’21 proceedings.
Looking back and looking forward
So, did we achieve our initial goal? We think so. We involved 5 or more PC members with every paper and the stronger signal for each submission made both the online and PC discussions lively and substantial. Decisions were well informed both when the PC reviewers converged and in the cases where the whole PC needed to vote. The PC load was reasonable compared to recent conferences in our field. Each PC member reviewed in total 27 abstracts and 11 full papers. Assuming that the load of an abstract review is proportional to its length (⅕ of a full paper), the overall load for each PC member was equivalent to 16.4 papers. The informal feedback we collected was that, once they got used to this process, PC members found extended abstracts useful and were mostly happy with R1 decisions and their R2 review assignments.
We believe extended abstracts played an invaluable role in all phases of the reviewing process (and beyond). Naturally, there is plenty of room for experimentation. We had long debates about important parameters such as the right length for abstracts and papers, and the right number of PC and external reviewers. We would also like to see an online discussion at the end of the R1 review phase. We were forced to skip it as we cut two weeks from our reviewing process in order to accommodate earlier conferences most impacted by the COVID pandemic. Extended abstracts could also be combined with other ideas for improving our reviewing process, such as multiple submission deadlines per year.
We hope that our community continues to use extended abstracts and collects the multi-year data necessary to better assess their benefits. We also hope that the discussion around this article will generate further ideas on how to analyze or improve extended abstracts.
Frequently Asked Questions about Extended Abstracts
Q: Can a well-written abstract allow a flawed paper to reach the 2nd round of reviewing?
A: This is possible and it is absolutely fine. We advanced to R2 the abstracts that 5 well-calibrated PC members identified as most interesting in terms of their problem statement, technical insights, artifacts, and results. These are exactly the kind of papers we should thoroughly review in R2. For papers with promising technical insights and a flawed implementation, the detailed feedback in R2 reviews hopefully helps authors address the shortcomings and achieve the full potential of their idea. The stability of grades across R1 and R2 suggests that highly rated abstracts with flawed papers were not common.
It is also worth noting that our conventional process can advance a flawed paper to the R2. If one of the three R1 reviewers misses the flaw or is lenient in her score anyway because of the idea’s potential, the paper may advance to the next round.
Q: Can a paper that would otherwise be accepted get rejected in the abstract reviewing phase?
A: This is possible but very unlikely. It can happen if the authors of a strong paper submit a poorly written abstract. We checked and poorly written abstracts typically came with poorly written papers. In our process, a single PC member could champion an abstract into R2. If 5 well-calibrated PC members did not find the problem statement and technical insight in an abstract sufficient to argue it belongs in the top 50th percentile, it is very unlikely that the full paper would be exciting enough to make it to the top 20% percentile. The large number of PC reviewers for each abstract also made lack of reviewer expertise and misunderstandings of the paper’s contributions significantly less likely.
Q: Do extended abstracts disadvantage non-native English speakers?
A: Our reviewing process has to some extent always favored native English speakers as papers are written in English and writing quality is one of the aspects we judge. Hence, the question is whether extended abstracts introduce an additional disadvantage. While a thorough, multi-year study would be necessary to authoritatively answer this question, our initial data suggest that this was not the case. We analyzed R1 and R2 acceptance rates across geographical regions. The differences in average acceptance rates across regions were mostly introduced during the R2 phase when full papers were reviewed.
Q: Do extended abstracts make collusion more or less effective?
A: By involving more PC members with each submission, extended abstracts reduce the likelihood of successful collusion. A single colluding PC member could advance a submission to R2, but would have a hard time getting a strong majority in a group of 5 PC reviewers for final acceptance. Since we did not use bidding for review assignments, the chances of a submission getting assigned to 3 or more colluding reviewers were quite low. You can read about our overall strategy to reduce the effectiveness of collusion in this document.
Q: What were all the ways extended abstracts were used in ASPLOS’21?
A: We made use of extended abstracts throughout and beyond the reviewing process:
- The PC chairs read all extended abstracts to guide reviewer assignment.
- The PC reviewed extended abstracts in R1 to identify the top 55% of submissions.
- The PC chairs used extended abstracts to solicit additional expert reviews during the online discussion.
- We made extended abstracts available to all non-conflicted PC members during the PC meeting for reference.
- We made all extended abstracts for accepted papers available to the public as a concise summary of the ASPLOS’21 proceedings.
Christos Kozyrakis is Professor of Electrical Engineering and Computer Science at Stanford University. He previously co-chaired the SIGARCH/TCCA R2 committee that evaluated our conference reviewing system and proposed a new process with multiple submission deadlines per architecture conference.
Emery Berger is Professor of Computer Science at the University of Massachusetts Amherst. He led PLDI to adopt full double-blind reviewing (blind to accept); the survey of reviewers from that conference formed the basis of a CACM article demonstrating the effectiveness of anonymity in double-blind review. He also curates https://double-blind.org/, which tracks and encourages the adoption of double-blind reviewing in Computer Science.