Pippenger's Algorithm

Reduction of multi-scalar multiplication (MSM) into double and add operations

AKA "Bucket" method

Motivation

Improve upon Double-and-add by reducing the total number of double and add operations needed for MSM.

Algorithm Process

Introduction

Multi-Scalar Multiplication (MSM) refers to the efficient computation of expressions in the following form:

S=i=1nkiPiS = \sum_{i=1}^{n} k_i P_i
  • kik_i: the ii-th λ\lambda-bit scalar

  • PiP_i: the ii-th point on an elliptic curve

Pippenger's algorithm speeds up MSM by dividing each scalar kik_i into multiple ss-bit chunks, and then grouping and processing scalars by their corresponding bit positions (windows).

1. Scalar Decomposition

First, we decompose the scalars kik_i:

ki=ki,1+2ski,2+...+2(λs1)ski,λski=j=1λs2(j1)ski,j\begin{align*} k_i &= k_{i,1} + 2^{s}k_{i,2} + ... + 2^{(\lceil\frac{\lambda}{s}\rceil -1)s}k_{i,\lceil\frac{\lambda}{s}\rceil } \\ k_i &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil }2^{(j - 1)s} k_{i,j} \end{align*}
  • ss : the number of bits per window

  • λs\lceil\frac{\lambda}{s}\rceil : the total number of windows

Through this definition, we can rewrite the original MSM summation as follows:

S=i=1nkiPi=i=1nj=1λs2(j1)ski,jPi=j=1λs2(j1)s(i=1nki,jPi)=j=1λs2(j1)sWj\begin{align*} S &= \sum_{i=1}^{n} k_i P_i \\ &= \sum_{i=1}^{n} \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s}k_{i,j} P_i \\ &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} \left( \sum_{i=1}^{n} k_{i,j} P_i \right) \\ &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j \end{align*}

Note that WjW_j is the sum for the jj-th window.

2. Bucket Accumulation

Given our WjW_j, we denote buckets BB as the following:

B,j=i:kij=PiB_{\ell, j} = \sum_{i: k_{ij} = \ell} P_i

Or in other words, B,jB_{\ell, j} refers to the \ell bucket in window WjW_j consisting of the summation of points PiP_i with the same ki,jk_{i,j}.

Note that ki,jk_{i,j} must be within the range [0,2w1][0, 2^w - 1]; however, if ki,jk_{i,j} is 0, as ki,jk_{i,j} is the scalar multiplier, the result of the bucket will be 0. Therefore, the practical range of ki,jk_{i,j} is [1,2w1][1, 2^w - 1].

3. Bucket Reduction

Thus, in total, a window WjW_j will be re-structured as follows:

Wj=i=1nkijPi==12s1B,jW_j = \sum_{i=1}^{n} k_{ij}P_i = \sum_{\ell=1}^{2^s - 1} \ell B_{\ell, j}

Note that the computation of a singular B,j\ell B_{\ell,j} is computed through a running sum of recursive summations and not through further scalar multiplications.

Complexity Cost:

  • Original: (2s1)×ScalarMul+(2s2)×PointAdd(2^s - 1) \times \mathsf{ScalarMul} + (2^s - 2) \times \mathsf{PointAdd}

  • Bucketed: (2s+12)×PointAdd(2^{s+1} - 2) \times \mathsf{PointAdd}

C++ Implementation

Bucket ReduceBucketPoints(const std::vector<Bucket>& buckets) {
  Bucket res = Bucket::Zero();
  Bucket sum = Bucket::Zero();
  for (size_t i = buckets.size() - 1; i != SIZE_MAX; --i) {
    sum += buckets[i];
    res += sum;
  }
  return res;
}

Example

W1=B1,1+2B2,1+3B3,1+4B4,1W_1 = B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1}
// Round 1
sum = B_(4,1);
res = B_(4,1);
// Round 2
sum = B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)];
// Round 3
sum = B_(2,1) + B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)] + [B_(2,1) + B_(3,1) + B_(4,1)];
// Round 3
sum = B_(1,1) + B_(2,1) + B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)] + [B_(2,1) + B_(3,1) + B_(4,1)] + [B_(1,1) + B_(2,1) + B_(3,1) + B_(4,1)];

// Total:
res = B_(1,1) + 2*B_(2,1) + 3*B_(3,1) + 4*B_(4,1);

4. Window Reduction - Final MSM Result

With the finished individual window computations, we return to the total MSM summation and add all windows together:

S=j=1λs2(j1)sWjS= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j

In practice, the computation is calculated recursively from j=λsj = \left\lceil \frac{\lambda}{s} \right\rceil to j=1j = 1 like so:

Tj={0j=λsWj+2sTj+1j1T_j = \begin{cases} 0 & j = \left\lceil \frac{\lambda}{s} \right\rceil \\ W_j + 2 ^sT_{j+1} & j \ge 1 \\ \end{cases}

Complexity Cost

λs(s×PointDouble+PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + \mathsf{PointAdd})

C++ Implementation

Bucket ReduceWindows(uint32_t lambda, uint32_t s, const std::vector<Bucket>& buckets) {
  Bucket res = Bucket::Zero();
  uint32_t window_num = static_cast<uint32_t>(std::ceil(lambda / s));
  for (uint32_t j = window_num - 1; j != UINT32_MAX; --i) {
    for (uint32_t i = 0; i < s; ++i) {
      res = res * 2;
    }
    res += buckets[j];
  }
  return res;
}

Complexity Analysis

  • To compute WjW_j, n×PointAddn \times \mathsf{PointAdd} is required: λs(n×PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil(n \times \mathsf{PointAdd})

  • Each WjW_j needs (2s+12)×PointAdd(2^{s+1} - 2) \times \mathsf{PointAdd}: λs((2s+12)×PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil((2^{s+1} - 2) \times \mathsf{PointAdd})

  • Final QQ needs: λs(s×PointDouble+PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + \mathsf{PointAdd})

Total complexity

λs(s×PointDouble+(n+2s+11)×PointAdd)λ×PointDouble+λs(n+2s+11)×PointAdd\left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + (n + 2^{s+1} - 1) \times \mathsf{PointAdd}) \approx \\\lambda \times \mathsf{PointDouble} + \left\lceil \frac{\lambda}{s} \right\rceil(n + 2^{s+1} - 1)\times \mathsf{PointAdd}

Optimization Table (When , n=220n = 2^{20})

s
PointAdd Count

2

134,218,624

4

67,110,848

8

33,570,784

16

18,874,352

32

68,727,865,336

64

147,573,952,589,680,607,228

128

1,361,129,467,683,753,853,853,498,429,727,074,942,974

🔍 This clearly shows that selecting an optimal ss value is critical for performance.

Example

Let's try to run Pippenger's on the following simple example:

S=k1P1+k2P2+k3P3=12P1+9P2+13P3\begin{align*} S &= k_1P_1 + k_2P_2 + k_3P_3\\ &=12P_1 + 9P_2 + 13P_3 \end{align*}

1. Scalar Decomposition

Let's define our bits per window ss to be 2. In this case, since the maximum amount of bits needed to constrain the values 12, 9, and 13 is 4, we will have λs=max_bits/s=4/2=2\lceil\frac{\lambda}{s}\rceil =\mathsf{max\_bits} / s = 4/2 = 2 windows.

Note that in binary we have:

k1=12=020021122123k2=9=120021022123k3=13=120021122123k_1 =12 = \underbrace{0}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}\underbrace{1}_{2^3}\\ k_2 = 9 = \underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{0}_{2^2}\underbrace{1}_{2^3} \\ k_3 =13 = \underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}\underbrace{1}_{2^3}

Decomposing our k0k_0, k1k_1, and k2k_2 results in the following:

ki=ki,1+2ski,2+...+2(λs1)ski,λs\begin{align*} k_i &= k_{i,1} + 2^{s}k_{i,2} + ... + 2^{(\lceil\frac{\lambda}{s}\rceil -1)s}k_{i,\lceil\frac{\lambda}{s}\rceil } \\ \end{align*}
k1=12=020021window 1122123window 2k2=9=120021window 1022123window 2k3=13=120021window 1122123window 2=k1,1+2sk1,2=k2,1+2sk2,2=k3,1+2sk3,2=0+3(22)=1+2(22)=1+3(22)\begin{align*} \begin{array}{rlrlrl} k_1 = 12 &= \overbrace{\underbrace{0}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{1}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} & \quad k_2 = 9 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{0}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} & \quad k_3 = 13 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{1}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} \\[1ex] &= k_{1,1} + 2^sk_{1,2} & \quad &= k_{2,1} + 2^sk_{2,2} & \quad &= k_{3,1} + 2^sk_{3,2} \\[1ex] &= 0 + 3(2^2) & \quad &= 1 + 2(2^2) & \quad &= 1 + 3(2^2) \\[1ex] \end{array} \end{align*}

2+3. Bucket Accumulation + Reduction

Wj=i=1nkijPi==12s1B,jW_j = \sum_{i=1}^{n} k_{ij}P_i = \sum_{\ell=1}^{2^s - 1} \ell B_{\ell, j}

Given our ss of 2, we now have buckets of [B1,j,B2,j,B3,j][B_{1,j}, B_{2,j}, B_{3,j}] for a given window WjW_j. Our windows are now re-structured to buckets like so:

W1=1k2,1P2+1k3,1P3=1(P2+P3)=B1,1W2=3k1,2P1+2k2,2P2+3k3,2P3=2(P2)+3(P1+P3)=2B2,2+3B3,2\begin{align*} W_1 &= \overbrace{1}^{k_{2,1}}P_2 + \overbrace{1}^{k_{3,1}}P_3 = 1(P_2+P_3) = B_{1,1}\\ W_2 &= \overbrace{3}^{k_{1,2}}P_1 + \overbrace{2}^{k_{2,2}}P_2 + \overbrace{3}^{k_{3,2}}P_3 = 2(P_2) + 3(P_1 + P_3) = 2B_{2,2} + 3B_{3,2} \end{align*}

From bottom up, in our bucket accumulation stage we compute the buckets:

Bucket
Computation

B1,1B_{1,1}

P2+P3P_2 + P_3

B2,2B_{2,2}

P2P_2

B3,2B_{3,2}

P1+P3P_1+P_3

...and in our bucket reduction stage we compute the windows:

W1=B1,1W2=(B2,2+B2,2)+(B3,2+B3,2+B3,2)\begin{align*} W_1 &= B_{1,1}\\ W_2 &= (B_{2,2} + B_{2,2}) + (B_{3,2} + B_{3,2} + B_{3,2}) \end{align*}

4. Window Reduction - Final MSM Result

Putting all the windows together for the MSM sum, we get the following:

S=j=1λs2(j1)sWj=W1+22W2\begin{align*} S &=\sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j \\ &= W_1 + 2^2W_2 \\ \end{align*}

References

Written by ryan Kim &Ashley Jeong of A41

Last updated