janmr blog

Equality of Floating-Point Numbers December 3, 2023

When using floating-point numbers then exact, bit-for-bit, equality is almost never what you want. The result of most floating-point operations like addition, multiplication and trigonometric functions cannot be represented exactly due to the limited precision of floating-point numbers. Furthermore, in most practical situations we are just interested in a result that is "close enough" and not correct with every digit available.

Say you want to compare two floating-point numbers $u$ and $v$ and consider the error $|u-v|$. It is natural to compare this error to some bound which is relative to the size of the numbers,

$|u-v| \leq \epsilon_{\text{rel}} \cdot \max(|u|, |v|)$

Using $\max(|u|, |v|)$ ensures that this relation is symmetric in $u$ and $v$. This is a nice property to have as it would be unfortunate if we could have that $u$ was close to $v$, but that $v$ was not close to $u$. We could also use $\min(|u|, |v|)$, which would result in a stronger requirement, or $\tfrac{1}{2}(|u|+|v|)$, which would lead to a behaviour between the $\min$ and $\max$ expressions.

In the inequality above, the quantity $\epsilon_{\text{rel}}$ controls how close the numbers must be to be considered approximately equal. Using $\epsilon_{\text{rel}}=10^{-k}$ means that roughly the $k$ most significant decimal digits are correct. For example, $|3 \cdot 10^8 - 299792458| \leq \epsilon_{\text{rel}} \cdot 3 \cdot 10^8$ is true for $\epsilon_{\text{rel}}=10^{-3}$ but not for $\epsilon_{\text{rel}}=10^{-4}$. Also, $|\pi-3.14159| \leq \epsilon_{\text{rel}} \cdot \pi$ is true for $\epsilon_{\text{rel}}=10^{-6}$ but not for $\epsilon_{\text{rel}}=10^{-7}$.

This way of checking closeness brakes down, however, when comparing numbers to zero or close to zero. For instance, is $10^{-8}$ close to zero? Since $(10^{-8} - 0)/10^{-8} = 1$ we see that it would require a relative tolerance of at least $1$ to be viewed as approximately equal according to the test above. In such cases it makes sense to look at the absolute error instead,

$|u-v| \leq \epsilon_{\text{abs}}.$

Combining these two inequalities we get that $u$ and $v$ are approximately equal, $u \sim v$, when

$|u-v| \leq \max\Big( \epsilon_{\text{rel}} \cdot \max(|u|, |v|), \epsilon_{\text{abs}} \Big).$

This is the function suggested for approximate equality in a Python Enhancement Proposals from 2015. It is implemented as isclose in the math module (CPython implementation).

Some rules of thumb for choosing $\epsilon_{\text{rel}}$ and $\epsilon_{\text{abs}}$:

• Use $\epsilon_{\text{rel}}=10^{-k}$ when you want (roughly) $k$ correct decimal digits.
• Let $\epsilon_{\text{abs}}$ determine when a number is considered (close to) zero, $|u| \leq \epsilon_{\text{abs}} \Leftrightarrow u \sim 0$. Use $\epsilon_{\text{abs}}=0$ if you don't need to consider numbers close to zero.

Some extra resources to check out:

Feel free to leave any question, correction or comment in this Mastodon thread.