janmr blog

Implementing Multiple-Precision Arithmetic, Part 1


This article is the first in a series dealing with algorithms for multiple-precision arithmetic. The goal is to present both a theoretical foundation with high-level algorithm descriptions (based on Section 4.3.1, The Classical Algorithms, of The Art of Computer Programming, Volume 2, by Donald E. Knuth) and a portable C++ implementation of the algorithms. The theory and high-level algorithms will be quite universal and generic, whereas the presented code will be just one way to implement the algorithms in a specific programming language.

We start out by considering only non-negative integers. A number u0u \geq 0 will be represented in radix b2b \geq 2 using the notation

u=(un1u1u0)b=i=0n1uibi,0ui<b.u = (u_{n-1} \ldots u_1 u_0)_b = \sum_{i=0}^{n-1} u_i b^i, \quad 0 \leq u_i < b.

We will call uu an nn-digit number and u0u_0, u1u_1, etc., its digits. Unless stated otherwise we will always have that the most-significant digit is non-zero, here un10u_{n-1} \neq 0, and we will represent zero with no digits, 0=()b0 = ()_b. We have bn1ubn1b^{n-1} \leq u \leq b^n-1, implying n=1+logbun=1+\lfloor \log_b u \rfloor for u1u \geq 1.

Let the word size of the data type T used for each digit be bTb_T. For instance, if T is a 32 bit unsigned integer, we have bT=232b_T = 2^{32}. We will implement the algorithms using b=bTb = b_T and exploit the fact that C++ does arithmetic on unsigned integers modulo bTb_T (see paragraph 3.9.1 (4) of the C++ standard). This makes it possible to implement portable algorithms. They will not be optimal with respect to speed, however, and it will be noted when specialized operations, such as add-with-carry instructions, would lead to more efficient implementations.

Data Structures

A non-negative number will be represented in C++ as an instance of the class NonNegativeInteger:

template <typename T, typename V=detail::SimpleDigitVector<T> >
class NonNegativeInteger {
  boost::shared_ptr<V> digitvec;
  // ...
  NonNegativeInteger(T value);
  // ...

The type argument T is used to represent each digit. It must be integer and unsigned, so unsigned char, unsigned short, unsigned int, unsigned long, and unsigned long long can all be used (the type long long int is not standard C++, but is, e.g., supported by GCC). If a digit type with 8, 16, 32, or 64 bits is needed, the boost integer types uint8_t, uint16_t, uint32_t, uint64_t (from the namespace boost) can be used with portability ensured (uint64_t is not always available, but the macro BOOST_NO_INT64_T will tell you if it is not).

The type argument V is used as the container type for the digits. The default container is SimpleDigitVector (which at this time is also the only container supported). This default container simply wraps an array of size (at least) the number of digits.

Note that the digit container digitvec of NonNegativeInteger is wrapped in a boost shared pointer. Consider the small code excerpt:

NonNegativeInteger<unsigned> a=value1(), b=value2(), c;
c = a;
b += a;
c += b;

Because of the shared pointer, the first assignment, c = a, is very cheap and will not result in copying the digit container. Instead, the digitvec of both a and c will refer to the same instance of the digit container (which gets a reference count of 2). This also makes it cheap to pass arguments by value and returning numbers by value. The next statement, b += a, can add a to b in place (assuming there is enough space in b's digitvec to contain the result) because it is possible to check if b is the only instance referring to its digit container. Contrarywise, the last statement, c += a, cannot add a to c in place because a refers to the same container as c does.


Adding radix bb-numbers is quite straightforward and is easily done using the familiar pencil-and-paper method.

To formalize, we first consider adding two nn-digit numbers, n1n \geq 1, u=(un1u1u0)bu=(u_{n-1} \ldots u_1 u_0)_b and v=(vn1v1v0)bv = (v_{n-1} \ldots v_1 v_0)_b, obtaining an (n+1)(n+1)-digit sum w=(wnw1w0)bw=(w_n \ldots w_1 w_0)_b. Note that we here may end up with wn=0w_n = 0 (in which case wn10w_{n-1} \neq 0). We have

w0(u0+v0)  mod  b,wi(ui+vi+ki)  mod  b,wnkn.k1(u0+v0)/b,ki+1(ui+vi+ki)/b,i=1,,n1, \begin{aligned} w_0 &\leftarrow (u_0 + v_0) \;\text{mod}\; b, \\ w_i &\leftarrow (u_i + v_i + k_i) \;\text{mod}\; b, \\ w_n &\leftarrow k_n. \end{aligned} \quad \begin{aligned} k_1 &\leftarrow \lfloor (u_0 + v_0)/b \rfloor, \\ k_{i+1} &\leftarrow \lfloor (u_i + v_i + k_i)/b \rfloor, \quad i = 1, \ldots, n-1, \\ \text{ } \end{aligned}

Note that x/b=[xb]\lfloor x/b \rfloor = [x \geq b] for x<2bx < 2 b, where [P][P] is equal to 11 if PP is true and equal to 00 if PP is false. This means ki{0,1}k_i \in \{0,1\} and furthermore ui+vi+kib1+b1+1=2b1u_i + v_i + k_i \leq b-1 + b-1 + 1 = 2 b - 1. Using that 0ui,vib10 \leq u_i, v_i \leq b-1 and (x  mod  b)+x/b=x(x \;\text{mod}\; b) + \lfloor x/b \rfloor = x it is quite easy to show that, in fact, w=u+vw = u + v.

The cases where uu or vv is zero makes addition trivial. Similarly, if the number of digits in uu and vv are different, is it quite easy to adjust the algorithm above.

Let us now look at some implementation details. How do we compute z=(x+y)  mod  bz = (x + y) \;\text{mod}\; b and k=(x+y)/bk = \lfloor (x + y)/b \rfloor for 0x,y<b0 \leq x, y < b? We have several options, some of which are

  1. Use 2bbT2 b \leq b_T. Then zz and kk can be computed directly.
  2. Use b=bTb = b_T and the CPU's add and add-with-carry instructions.
  3. Use b=bTb = b_T, but the computations must be done in some portable C++ way.

Option 1 is actually not an option because we insist on using b=bTb = b_T. Option 2 leads to the most efficient code, regarding both space and speed. The problem is that these special instructions are not directly accessible via the C++ standard. Some compilers, though, make it possible to use inline assembly. For instance, GCC has such capabilities.

Option 3 is the way to go. As mentioned earlier, C++ does calculations modulo bTb_T, so z(x+y)  mod  bz \leftarrow (x + y) \;\text{mod}\; b comes ‘for free’ as simply z = x + y in C++. Left is how to detect whether a carry occurs during an addition. One way to do that is the following. Consider z=(x+y)  mod  bz = (x + y) \;\text{mod}\; b for which there are two possibilities. Either z=x+yz = x + y (k=0k = 0) which implies zxz \geq x and zyz \geq y, or we have z+b=x+yz + b = x + y (k=1k = 1) which implies z=x(by)=y(bx)z = x - (b - y) = y - (b - x), leading to z<xz < x and z<yz < y. So k=[z<x]=[z<y]k = [z < x] = [z < y]. Another way to detect whether a carry occurs is to split xx and yy into a low and high part, and then adding the low and high parts seperately—keeping track of a possible intermediate carry, of course.


An algorithm for multiple-precision subtraction is similar to addition, as just considered. Again we set u=(un1u1u0)bu=(u_{n-1} \ldots u_1 u_0)_b and v=(vn1v1v0)bv = (v_{n-1} \ldots v_1 v_0)_b with n1n \geq 1. To ensure that the result w=uvw = u - v is a non-negative integer we furthermore require that uvu \geq v. We now have the algorithm:

w0(u0v0)  mod  b,wi(uiviki)  mod  b,k1[u0<v0],ki+1[ui<vi+ki],i=1,,n1.\begin{aligned} w_0 &\leftarrow (u_0 - v_0) \;\text{mod}\; b, \\ w_i &\leftarrow (u_i - v_i - k_i) \;\text{mod}\; b, \end{aligned} \quad \begin{aligned} k_1 &\leftarrow [u_0 < v_0], \\ k_{i+1} &\leftarrow [u_i < v_i + k_i], \quad i = 1, \ldots, n-1. \end{aligned}

Note that any digit of the result may end up being zero—we only know that 0wbnbn110 \geq w \geq b^n - b^{n-1} - 1. Note furthermore that kn=[u<v]=0k_n = [u < v] = 0. Verification of the algorithm is easily done using the fact that xy=((xy)  mod  b)b[x<y]x - y = ((x-y) \;\text{mod}\; b) - b [x < y] for 0x,y<b0 \leq x, y < b.

The options when implementing the algorithm is, again, much like for addition. Using the CPU's subtract-with-borrow instruction would be ideal here, but it cannot be done portably in C++. The borrow when calculating xyx - y can be computed as simply k[x<y]k \leftarrow [x < y], or, to avoid branching instructions, we can split xx and yy into two parts and do the subtraction for each part seperately.


We seek an algorithm to compute w=uvw = u v where

u=(um1u1u0)b,v=(vn1v1v0)b,w=(wm+n1w1w0)b.u = (u_{m-1} \ldots u_1 u_0)_b, \quad v = (v_{n-1} \ldots v_1 v_0)_b, \quad w = (w_{m+n-1} \ldots w_1 w_0)_b.

We first, however, consider the simpler operation zy+αuz \leftarrow y + \alpha u where

0α<b,y=(ym1y1y0)b,z=(zmz1z0)b.0 \leq \alpha < b, \quad y = (y_{m-1} \ldots y_1 y_0)_b, \quad z = (z_m \ldots z_1 z_0)_b.

The following algorithm suggests itself:

z0(y0+αu0)  mod  b,zi(yi+αui+ki)  mod  b,zmkm.k1(y0+αu0)/b,ki+1(yi+αui+ki)/b,i=1,,m1, \begin{aligned} z_0 &\leftarrow (y_0 + \alpha u_0) \;\text{mod}\; b, \\ z_i &\leftarrow (y_i + \alpha u_i + k_i) \;\text{mod}\; b, \\ z_m &\leftarrow k_m. \end{aligned} \quad \begin{aligned} k_1 &\leftarrow \lfloor (y_0 + \alpha u_0)/b \rfloor, \\ k_{i+1} &\leftarrow \lfloor (y_i + \alpha u_i + k_i)/b \rfloor, \quad i = 1, \ldots, m-1, \\ \text{ } \end{aligned}

If α=0\alpha = 0 then obviously zi=0z_i = 0. If 1αb11 \leq \alpha \leq b-1 then zmz_m may be zero, in which case zm10z_ {m-1} \neq 0, since

2bm1  z  bm1+(bm1)(b1)=bm+1b<bm+12 b^{m-1} \leq \; z \; \leq b^m-1 + (b^m-1)(b-1) = b^{m+1} - b < b^{m+1}

Note that ki<bk_i < b since yi+αui+kib1+(b1)(b1)+b1=b21y_i + \alpha u_i + k_i \leq b-1 + (b-1)(b-1) + b-1 = b^2 - 1. The algorithm above is easily verified, using that zi+bki+1=yi+αui+kiz_i + b k_{i+1} = y_i + \alpha u_i + k_i.

We now turn to w=uvw = u v and get

w=uv=j=0n1bjvju.w = u v = \sum_{j=0}^{n-1} b^j v_j u.

From this we see that we can compute ww by doing a number of (zy+αu)(z \leftarrow y + \alpha u)-type operations, if we start with w=0w=0 and then do in-place updates for each jj. The bjb^j-factor simply determines the ‘digit offset’ on which the updates should be done:

(wm1w1w0)b(000)b,(wm+jwj+1wj)b(wm+j1wj+1wj)b+vj(um1u1u0)b,\begin{aligned} (w_{m-1} \ldots w_1 w_0)_b &\leftarrow (0 \, \ldots \, 0 \, 0)_b, \\ (w_{m+j} \ldots w_{j+1} w_j)_b &\leftarrow (w_{m+j-1} \ldots w_{j+1} w_j)_b + v_j (u_{m-1} \ldots u_1 u_0)_b, \end{aligned}

for j=0,1,,n1j = 0, 1, \ldots, n-1. Note that wm+n1w_{m+n-1} may be zero, in which case wm+n20w_{m+n-2} \neq 0, since bm+n2w<bm+nb^{m+n-2} \leq w < b^{m+n}.

Now for some implementation details. We note that the only non-trivial computation is zy+αx+kz \leftarrow y + \alpha x + k, where 0α,k,x,y<b0 \leq \alpha, k, x, y < b, followed by computing z  mod  bz \;\text{mod}\; b and z/b\lfloor z/b \rfloor. Most CPUs have instructions that can multiply two word-size numbers and produce a double-word answer. As was the case for addition and subtraction, we don't have access to these instructions from standard C++. We have 0z<b20 \leq z < b^2 but every multiplication result in our portable C++ implementation must be smaller than bb. We can do this by using a new base number hh where h2=bh^2 = b (we assume that bb is chosen appropriately so hh is integer) and setting

z=(z3z2z1z0)h,y=(y1y0)h,α=(α1α0)h,x=(x1x0)h,k=(k1k0)h.z = (z_3 z_2 z_1 z_0)_h, \quad y = (y_1 y_0)_h, \quad \alpha = (\alpha_1 \alpha_0)_h, \quad x = (x_1 x_0)_h, \quad k = (k_1 k_0)_h.

We can now use the multiplication algorithm above on a ‘smaller scale’ to compute the product αx\alpha x. We can furthermore expand the methods slightly and incorporate the addition of yy and kk in an elegant way:

(z1z0)h(y1y0)h,ck0,tz0+α0x0+c,z0t  mod  b,ct/b,tz1+α0x1+c,z1t  mod  b,ct/b,z2c,ck1,tz1+α1x0+c,z1t  mod  b,ct/b,tz2+α1x1+c,z2t  mod  b,ct/b,z3c.\begin{aligned} (z_1 z_0)_h &\leftarrow (y_1 y_0)_h, \\ c &\leftarrow k_0, \\ t &\leftarrow z_0 + \alpha_0 x_0 + c, \quad z_0 \leftarrow t \;\text{mod}\; b, \quad c \leftarrow \lfloor t/b \rfloor, \\ t &\leftarrow z_1 + \alpha_0 x_1 + c, \quad z_1 \leftarrow t \;\text{mod}\; b, \quad c \leftarrow \lfloor t/b \rfloor, \\ z_2 &\leftarrow c, \\ c &\leftarrow k_1, \\ t &\leftarrow z_1 + \alpha_1 x_0 + c, \quad z_1 \leftarrow t \;\text{mod}\; b, \quad c \leftarrow \lfloor t/b \rfloor, \\ t &\leftarrow z_2 + \alpha_1 x_1 + c, \quad z_2 \leftarrow t \;\text{mod}\; b, \quad c \leftarrow \lfloor t/b \rfloor, \\ z_3 &\leftarrow c. \end{aligned}

We now have z  mod  b=(z1z0)hz \;\text{mod}\; b = (z_1 z_0)_h and z/b=(z3z2)h\lfloor z/b \rfloor = (z_3 z_2)_h.

Concluding Remarks

We have now covered addition, subtraction, and multiplication of non-negative integers of arbitrary magnitude. Left is how to do division, which will be the subject of the next article.

Update 2010-07-03: The code has undergone some changes since this article was written. See the project page for more information.