Equality of Floating-Point Numbers

When using floating-point numbers then exact, bit-for-bit, equality is almost never what you want. The result of most floating-point operations like addition, multiplication and trigonometric functions cannot be represented exactly due to the limited precision of floating-point numbers. Furthermore, in most practical situations we are just interested in a result that is "close enough" and not correct with every digit available.

Say you want to compare two floating-point numbers $u$ and $v$ and consider the error $|u-v|$ . It is natural to compare this error to some bound which is relative to the size of the numbers,

|u-v| \leq \epsilon_{\text{rel}} \cdot \max(|u|, |v|)

Using $\max(|u|, |v|)$ ensures that this relation is symmetric in $u$ and $v$ . This is a nice property to have as it would be unfortunate if we could have that $u$ was close to $v$ , but that $v$ was not close to $u$ . We could also use $\min(|u|, |v|)$ , which would result in a stronger requirement, or $\tfrac{1}{2}(|u|+|v|)$ , which would lead to a behaviour between the $\min$ and $\max$ expressions.

In the inequality above, the quantity $\epsilon_{\text{rel}}$ controls how close the numbers must be to be considered approximately equal. Using $\epsilon_{\text{rel}}=10^{-k}$ means that roughly the $k$ most significant decimal digits are correct. For example, $|3 \cdot 10^8 - 299792458| \leq \epsilon_{\text{rel}} \cdot 3 \cdot 10^8$ is true for $\epsilon_{\text{rel}}=10^{-3}$ but not for $\epsilon_{\text{rel}}=10^{-4}$ . Also, $|\pi-3.14159| \leq \epsilon_{\text{rel}} \cdot \pi$ is true for $\epsilon_{\text{rel}}=10^{-6}$ but not for $\epsilon_{\text{rel}}=10^{-7}$ .

This way of checking closeness brakes down, however, when comparing numbers to zero or close to zero. For instance, is $10^{-8}$ close to zero? Since $(10^{-8} - 0)/10^{-8} = 1$ we see that it would require a relative tolerance of at least $1$ to be viewed as approximately equal according to the test above. In such cases it makes sense to look at the absolute error instead,

|u-v| \leq \epsilon_{\text{abs}}.

Combining these two inequalities we get that $u$ and $v$ are approximately equal, $u \sim v$ , when

|u-v| \leq \max\Big( \epsilon_{\text{rel}} \cdot \max(|u|, |v|), \epsilon_{\text{abs}} \Big).

This is the function suggested for approximate equality in a Python Enhancement Proposals from 2015. It is implemented as isclose in the math module (CPython implementation).

Some rules of thumb for choosing $\epsilon_{\text{rel}}$ and $\epsilon_{\text{abs}}$ :

Use $\epsilon_{\text{rel}}=10^{-k}$ when you want (roughly) $k$ correct decimal digits.
Let $\epsilon_{\text{abs}}$ determine when a number is considered (close to) zero, $|u| \leq \epsilon_{\text{abs}} \Leftrightarrow u \sim 0$ . Use $\epsilon_{\text{abs}}=0$ if you don't need to consider numbers close to zero.

Some extra resources to check out:

Comparing Floating Point Numbers by Bruce Dawson.
The Art of Computer Programming, Volume 2, Section 4.2.2, by Donald E. Knuth.
Theory behind floating point comparisons from the Boost C++ library.
What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.

janmr blog

Equality of Floating-Point Numbers
December 3, 2023

About

Links

janmr blog

Equality of Floating-Point Numbers December 3, 2023

About

Links

Equality of Floating-Point Numbers
December 3, 2023