janmr blog

Evaluation of Powers

How do you efficiently compute xnx^n for a positive integer nn? Take x15x^{15} as an example. You could take xx and repeatedly multiply by xx 14 times. A better way to do it, however, is this:

  • t0=xt_0=x
  • t1=t0t0=x2t_1=t_0 \cdot t_0 = x^2
  • t2=t0t1=x3t_2=t_0 \cdot t_1 = x^3
  • t3=t1t2=x5t_3=t_1 \cdot t_2 = x^5
  • t4=t3t3=x10t_4=t_3 \cdot t_3 = x^{10}
  • t5=t3t4=x15t_5=t_3 \cdot t_4 = x^{15}

A shorter way to write this is x1,x2,x3,x5,x10,x15x^1,x^2,x^3,x^5,x^{10},x^{15}, where each quantity is obtained by multiplying two of the previous quantities together. We can write it even shorter as 1,2,3,5,10,15, where only the exponents are written. Here each number is obtained by adding together two of the previous numbers. This is called an addition chain and is at the heart of studying the optimal way of evaluating powers. There is no simple expression that computes the minimal number of multiplications a(n)a(n) needed to evaluate xxnn. A list, however, is available from The On-Line Encyclopedia of Integer Sequences, where the first 40 entries are

0, 1, 2, 2, 3, 3, 4, 3, 4, 4, 5, 4, 5, 5, 5, 4, 5, 5, 6, 5, 6, 6, 6, 5, 6, 6, 6, 6, 7, 6, 7, 5, 6, 6, 7, 6, 7, 7, 7, 6, …

We see that a(15)=5a(15)=5, which shows that the addition chain 1,2,3,5,10,15 is indeed the shortest possible, and it follows that the procedure shown above to compute x15x^{15} required the minimal number of multiplications.

Since the numbers in an addition chain grows the fastest by doubling the previous item, it is fairly easy to see that

(1)

a(n)log2(n).a(n) \geq \lceil \log_2(n) \rceil \quad .

(This, and other results related to the evaluation of powers and addition chains can be found in The Art of Computer Programming, Volume 2, Section 4.6.3.)

An algorithm that comes close to this optimal bound is the binary method, which relies on these simple relations:

  • x0=1x^0 = 1
  • x2k=(xk)2x^{2k} = (x^k)^2
  • x2k+1=xx2kx^{2k+1} = x \cdot x^{2k}

A recursive algorithm could readily be made from these, but we wish to have an iterative algorithm. The key here is to consider the slightly more general problem of evaluating yxny \cdot x^n. Here we have the relations:

  • yx0=yy \cdot x^0 = y
  • yx2k=y(x2)ky \cdot x^{2k} = y \cdot (x^2)^k
  • yx2k+1=(yx)x2ky \cdot x^{2k+1} = (y \cdot x) \cdot x^{2k}

This leads immediately to the following C++ code:

number_t power(number_t y, number_t x, unsigned n) {
  while (n) {
    if (n % 2 == 0) {
      x *= x;
      n /= 2;
    } else {
      y *= x;
      n--;
    }
  }
  return y;
}

A slightly improved version (and maybe a bit less elegant) is the following:

number_t power(number_t y, number_t x, unsigned n) {
  if (!n) return y;
  while (n > 1) {
    if (n & 1) y *= x;
    x *= x;
    n >>= 1;
  }
  return y*x;
}

This algorithm performs log2n\lfloor \log_2 n \rfloor multiplications of the type xx2x \leftarrow x^2 and ν(n)\nu(n) multiplications of the type yyxy \leftarrow y \cdot x, where ν(n)\nu(n) is the number of 1s in the binary representation of nn, so all in all it requires

log2n+ν(n)\lfloor \log_2 n \rfloor + \nu(n)

multiplications (sequence A056792 at OEIS).

So now we can evaluate yxny \cdot x^n fairly efficiently. To evaluate xnx^n we can simply use this routine by setting y=1y=1. But that wastes one multiplication because the first time we perform yyxy \leftarrow y \cdot x it will be redundant. Instead we could use power(x, x, n-1), but that could increase the number of multiplications for even nn. A good way to evaluate xnx^n is this:

number_t power(number_t x, unsigned n) {
  if (!n) return (T) 1;
  while (!(n & 1)) {
    x *= x;
    n >>= 1;
  }
  return power(x, x, n-1);
}

This way, when executing power(x, x, n-1), n will always be uneven. This saves one multiplication compared to using just power(1, x, n), so it requires

log2n+ν(n)1\lfloor \log_2 n \rfloor + \nu(n) - 1

multiplications (sequence A014701 at OEIS).

As mentioned above, this algorithm is not optimal, but it is not bad either. In fact, 15 is the smallest value of nn for which the binary algorithm does not use the minimal number of multiplications. Figure 1 below compares the number of multiplications needed by the binary algorithm to the minimal number possible.

Evaluation of powers
Figure 1. Number of multiplications used to evaluate the nth power.

Note also that we have only talked about minimizing the number of multiplications. What if a different cost is associated with each multiplication? For instance, the basic multiple-precision multiplication algorithm described in an earlier post has a cost propertional to m×nm \times n if the factors have mm and nn digits, respectively. Using this cost model, the binary algorithm is actually optimal. This was shown by R. L. Graham, A. C.-C. Yao, and F.-F. Yao in Addition chains with multiplicative cost, Discrete Math., 23 (1978), 115-119 (article available online).