In our RLIBM project, we have been making a case for mandating correctly rounded elementary functions, which will enhance the portability and reproducibility of applications using them. In the previous SIGPLAN blog post, we described our previous results in the RLIBM project where we have developed a collection of correctly rounded and efficient elementary functions for the 32-bit float representation with a single rounding mode: *round-to-nearest-ties-to-even mode*.

While the *round-to-nearest-ties-to-even* rounding mode is the default IEEE-754 rounding mode, there are four other standard rounding modes specified in IEEE-754. Some of them are attractive for efficient hardware implementations. Further, there is active research to explore new variants of the floating point representation that trade-off the range of values that can be represented and the precision of each such value. Such representations include bfloat16 and tensorfloat32, which are becoming widely used and also have hardware support. The key research question in consideration in this post is the following: how can we extend the RLIBM approach to create a single implementation that produces correctly rounded results for multiple configurations (e.g., float, tensorfloat32, and bfloat16)?

One possible method to develop correctly rounded implementations is to create one for each configuration of the floating point representation and for each rounding mode. This is a perfectly valid strategy if there are only few such configurations. However, it is not suitable/appealing if we have to support hundreds of configurations. Further, using a correctly rounded polynomial for a particular configuration and a rounding mode for other configurations will result in incorrect results due to *double rounding errors*, which we explain later in this article.

In contrast, it will be ideal to create a **single implementation** that can produce correct results for all the supported configurations. This is exactly what we propose in our POPL 2022 distinguished paper. As before with the RLIBM project, we also make a case for generating polynomials that approximate the correctly rounded result of f(x) rather than the real value of f(x). Approximating the correctly rounded result considers both the approximation error and the rounding error. The RLIBM approach identifies the rounding interval for each correctly rounded result. Using this interval, it frames the problem of polynomial generation into a linear programming (LP) problem.

To create a single polynomial approximation that produces correct results for multiple representations and rounding modes, we propose to generate a polynomial that generates the correctly rounded result of f(x) using the non-standard ** round-to-odd** rounding mode with

**2 additional precision bits**compared to the largest floating point representation that we wish to support. We provide a proof that this method produces correctly rounded results for multiple representations and for all the standard rounding modes. More detailed explanation of our approach can be found in our POPL 2022 distinguished paper.

## What is the Correctly Rounded Result of an Elementary Function?

Floating point values can only represent a finite number of rational values. In contrast, elementary functions (e.g., f(x) =e ^{x}) are typically irrational values. If the real value of f(x) cannot be exactly represented, we round it to a floating point value depending on the chosen rounding mode. The figure above shows an example of log_{2}(x_{1}) rounding to the floating point value v_{2} under the rounding mode *round-to-nearest-tie-goes-even*. In this case, v_{2} would be considered the correctly rounded result of log_{2}(x_{1}) for the chosen floating point representation and rounding mode. An approximation function that produces the correctly rounded result of an elementary function f(x) for all inputs is called a correctly rounded elementary function.

## A Primer on Our Prior Work in the RLIBM Project

A common way to approximate an elementary function f(x) is by polynomial approximation. Many existing approaches use the minimax approach to generate a polynomial that approximates the real value of f(x). Such approaches combine the problem of generating the oracle and the problem of generating efficient implementations. Hence, generating correctly rounded elementary functions have been considered challenging (aka, Table Maker’s Dilemma).

One of our key insights in the RLIBM project is to split the problem into two parts: (a) the task of generating an oracle (which may be slow) and (b) the task of generating efficient implementations that produce the oracle result. In the RLIBM project, we assume the presence of an oracle that provides the correctly rounded result (for example, MPFR libraries that are slow but produce correct results). Once we have such an oracle, we make a case for generating polynomials that approximate the correctly rounded result of f(x) instead of the real value. The key insight is that there is a range of values in the vicinity of the correctly rounded result v_{1}, known as the rounding interval, where all values in the interval rounds to v_{1}. If the polynomial approximation produces any value in the rounding interval, then the result will round to the correctly rounded result. The rounding interval provides the maximum amount of freedom available for polynomial generation. The figure below shows the rounding interval for v_{2} in a gray box.

Using this insight, the RLIBM approach generates polynomials by computing the correctly rounded result of f(x) for each input x using a chosen rounding mode. Next, it identifies the rounding interval with respect to the correctly rounded result and the rounding mode. This rounding interval [l, h] for each input x defines a constraint on the output of the polynomial P(x); if P(x) produces a value between [l, h], the value will round to the correctly rounded result of f(x). A polynomial of degree d that we want to generate can be written as:

The RLIBM approach encodes the constraints for all inputs into a system of linear inequalities and uses an LP solver to solve for the coefficients of the polynomial. We have shown this RLIBM approach is successful in generating correctly rounded functions for the 32-bit float type.

## Why is it Challenging to Support Multiple Representations with a Single Polynomial Approximation?

A naive approach to generate a single approximation for multiple representations is to use a correctly rounded elementary function designed for a higher precision representation. Suppose that we have an implementation that produces the correctly rounded result of e^{x} for the 34-bit floating point representation (FP34) with the *round-to-nearest-tie-goes-to-even* mode.

We might approximate e^{x} for the float representation by repurposing this function by converting the float input to an FP34 input, use the FP34 implementation to produce the correctly rounded FP34 result, and then round the result back down to float. This sounds especially appealing if we have a correctly rounded math library for a significantly higher precision compared to our float representation (e.g., CR-LIBM). Unfortunately, this does not guarantee to produce the correctly rounded result due to the **double rounding error.**

## What is a Double Rounding Error?

Double rounding is an act of performing rounding twice by first rounding the original value to a finite precision representation and then subsequently rounding the result down to a smaller precision representation. In some cases, double rounding results in a value different from what we would produce when directly rounding the original value to the smaller representation. This discrepancy is known as the double rounding error, illustrated by the figure above. The first part of the figure above shows the result of rounding the original value (black star) to FP34 (green circle) and subsequently rounding the result to float (red box). It can be observed that this result is different from directly rounding the original value down to float (blue box).

## How Do We Avoid Double Rounding Errors?

The main reason behind double rounding errors is the loss of information regarding the original value when rounding with one of the IEEE-754 standard rounding modes. However, if we were able to retain this information each time we round a value, we would be able to produce the correctly rounded result even after double rounding.

This is precisely what we do in our prototype, which we call RLIBM-ALL. Suppose our goal is to generate a single polynomial approximation that can produce correctly rounded results for all floating point representations up to n-bits. *Our key idea is to generate a polynomial for the **(n+2)*** -bit floating point representation (which has 2 additional precision bits compared to the n-bit representation) using the round-to-odd rounding mode.** The two additional precision bits as well as the

*round-to-odd*mode retain all necessary information to produce the correctly rounded results for any representation with n-bits or fewer.

## What is this Round-to-Odd Rounding Mode?

The *round-to-odd* rounding mode is a non-standard rounding mode that has not received the attention it deserves. It has been used in a few niche cases, such as correctly rounding decimal numbers to binary numbers and computing primitive operations in extended precision and still producing correct results for smaller representation.

This rounding mode works as follows: If a real value is exactly representable by the target representation, we represent it with that value. Otherwise, the value is rounded to an adjacent floating point value where the bit-pattern is odd when interpreted as an integer.

Performing double rounding using the *round-to-odd* mode is guaranteed to produce the correctly rounded result in the smaller representation. For example, when we round a real value (black star in the above figure) to FP34 using the round-to-odd mode (green circle) and then subsequently rounding the result to float using the target rounding mode (red box), the final result is the correctly rounded result in the float type.

To understand why double rounding with the *round-to-odd* mode produces correct results, we have to first understand how rounding works. Typically, we need 3 pieces of information when rounding a real value to an n-bit floating point representation: (1) the first n-bits of the real value in the binary representation, (2) the (n+1)^{th}-bit known as the rounding bit, and (3) the result of OR operation on all the remaining bits, known as the sticky bit.

When we round a value to an (n+2)-bit representation using the round-to-odd mode, the result precisely captures all three components as shown in the above figure. Consequently, rounding the round-to-odd result to n-bit or smaller representation produces the correctly rounded result. Hence, if our goal is to produce correctly rounded results a representation with n-bits or smaller, then it suffices to generate the correct round-to-odd result in (n+2)-bit representation. More details on the formal proof of this claim can be found in our paper.

## RLIBM-ALL, Our New Approach

Extending the RLIBM approach and integrating the *round-to-odd* mode is straightforward. The key insight of the RLIBM approach is to approximate the correctly rounded result, regardless of the chosen representation or rounding mode. To generate a polynomial that produces the correctly rounded *round-to-odd* result in the (n+2)-bit representation, we simply approximate the *round-to-odd* result in the (n+2)-bit representation! For each input in the n-bit representation, we compute the correctly rounded result of f(x) in the (n+2)-bit representation with the *round-to-odd* mode. Next, we compute the rounding interval for each result. The left graph on the below figure shows the rounding intervals in the gray area.

When the real value of f(x) is exactly representable in the (n+2)-bit representation and the bit-pattern is even, the rounding interval will be a singleton interval containing only the exact value of f(x) (red box in the above figure). Generating a polynomial that satisfies even a few of these intervals while satisfying the rounding interval for millions of other inputs is challenging. However, notice that we get single interval only when f(x) is exactly representable. All floating point values are rational values and the result is an exact floating point value. Hence, the problem of identifying singletons boils down to identifying rational inputs for elementary functions that produce rational outputs, which is widely studied. We can use mathematical theories about such inputs to quickly identify and filter them out.

Once all rounding intervals are non-singleton, then we proceed with the RLIBM approach as normal, encode the intervals into an LP problem, and use an LP solver to generate a polynomial that produces the correct *round-to-odd *results of f(x) in the (n+2)-bit representation.

## Does RLIBM-ALL Produce Correct Results?

We created a prototype of RLIBM-ALL, a correctly rounded polynomial generator and a collection of elementary functions that produce correctly rounded results in FP34 with the *round-to-odd* rounding mode. The below table shows the list of functions that RLIBM-ALL supports. Our experiments show that a single elementary function in RLIBM-ALL can produce correctly rounded results for more than 160 different floating point representations including bfloat16, tensorfloat32, and the float type. Additionally, it produces correct results with all IEEE-754 standard rounding modes. The table below shows the ability of different math libraries to produce correct float results for all standard rounding modes. The checkmark indicates that a particular implementation produces correct results for all inputs and rounding modes. An χ mark indicates that it produces wrong results for at least one input.

Mainstream math libraries (i.e., glibc’s and Intel’s libm) do not produce correct results for all rounding modes due to the double rounding error. In contrast, CR-LIBM provides correctly rounded elementary functions for the double representation with different rounding modes. Hence, it produces correct results for all rounding modes for many elementary functions, but still does not produce correct results for some functions when repurposed for lower bit-widths.

## Are the Resulting Functions Efficient?

The resulting functions from RLIBM-ALL are also fast. The above graphs show the speedup of RLIBM-ALL producing float results compared to glibc’s and Intel’s libm. On average, RLIBM-ALL has 1.05x and 1.1x speedup over glibc’s float and double functions. RLIBM-ALL has 1.3x and 1.5x speedup over Intel’s float and double functions. Overall, RLIBM-ALL not only produces correctly rounded results for hundreds of representations and rounding modes, but it is also faster than mainstream math libraries.

## Transition to Practice

We have been collaborating with the developers of mainstream libraries to push correctly rounded elementary functions into practice. As of now, LLVM’s libc has incorporated five correctly rounded elementary functions from RLIBM-ALL: Log, Log2, Log10, Exp, and Exp2.

## Summary

It only requires one generic polynomial approximation to produce correctly rounded results of an elementary function for hundreds of representations and rounding modes. To produce the correctly rounded results for all representations up to n-bits and all standard rounding modes, we generate a polynomial for the (n+2)-bit representation using the *round-to-odd* mode. When we use the RLIBM approach to generate this polynomial, we can filter the singleton intervals by identifying the inputs where the result of the elementary function is a rational value. Our resulting math library, RLIBM-ALL, is fast and produces correctly rounded results for hundreds of representations and all standard rounding modes. Now that we have elementary functions that produce correctly rounded results for multiple representations, perhaps it is time to revisit the standard and make correctly rounded results mandatory, rather than a recommendation. The RLIBM project page provides links to papers, talks, and implementations for various representations.

**Bio: **Santosh Nagarakatte is an associate professor and the undergraduate program director of computer science at Rutgers University. His research spans the hardware-software interface ranging from lightweight formal methods for compiler verification to correctly rounded math libraries. Jay Lim is currently a lecturer at Yale University. The research described in this article was carried out during his PhD working with Santosh Nagarakatte at Rutgers University.

**Acknowledgments: **We thank Todd Millstein for his feedback on the drafts of this article. We also thank collaborators Mridul Aanjaneya and John Gustafson for their inputs to the the RLIBM project.

**Disclaimer:** *These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.*