LogoLogo
LogoLogo
  • Introduction
    • About Us
    • Notations & Definitions
      • MPC
      • ZK
    • Contribute to this Site!
  • Primitives
    • Multiplication
      • Karatsuba Multiplication
      • Toom-Cook Multiplication
    • NAF (Non-adjacent form)
    • Chinese Remainder Theorem (CRT)
    • Euclidean Algorithm
      • Extended Euclidean Algorithm
      • Binary Euclidean Algorithm
      • Extended Binary Euclidean Algorithm
    • Coding Theory
      • Linear Code
    • Number Theoretic Transform
    • Abstract Algebra
      • Group
        • -Morphisms
        • Batch Inverse
      • Elliptic Curve
        • Weierstrass Curve
          • Coordinate Forms
          • Fast Elliptic Curve Arithmetic and Improved WEIL Pairing Evaluation
        • Edwards Curve
          • Coordinate Forms
          • Twisted Edwards ↔ Short Weierstrass Transformation
        • Batch Inverse for Batch Point Additions
        • Scalar Multiplication
          • Double-and-add
          • GLV Decomposition
        • MSM
          • Pippenger's Algorithm
          • Signed Bucket Index
          • CycloneMSM
          • EdMSM
          • cuZK
        • 2-Chain and 2-Cycle of Elliptic Curves
    • Encryption Scheme
      • ElGamal Encryption
    • Modular Arithmetic
      • Modular Reduction
        • Barrett Reduction
        • Montgomery Reduction
      • Modular Inverse
        • Bernstein-Yang's Inverse
    • Multiset Check
    • Sumcheck
    • Commitment Scheme
      • Fflonk
      • SHPlonk
      • Zeromorph
  • MPC
    • Yao's Garbled Circuits
    • GMW
    • BMR
  • ZK
    • Arithmetization
      • R1CS
      • PLONK
      • AIR
    • Folding
      • LatticeFold
      • Nova
        • Nova over Cycles of Curves
    • Lookup
      • Lasso
      • LogUp-GKR
    • SNARK
      • Groth16
      • HyperPlonk
      • Spartan
        • SPARK
    • STARK
      • Additive NTT
      • Basefold
      • Binius
      • Brakedown
      • CircleSTARK
      • FRI
        • FRI Security Features and Optimizations
      • DEEP FRI
      • STIR
      • WHIR
    • Distributed ZK
      • Ryan's Trick for Distributed Groth16
  • Application
    • zkLogin
    • zkHoldem
    • zkTLS
      • DECO
      • Proxying is enough
  • zkVM
Powered by GitBook
On this page
  • Motivation
  • Algorithm Process
  • Introduction
  • 1. Scalar Decomposition
  • 2. Bucket Accumulation
  • 3. Bucket Reduction
  • 4. Window Reduction - Final MSM Result
  • Complexity Analysis
  • Total complexity
  • Optimization Table (When , )
  • Example
  • 1. Scalar Decomposition
  • 2+3. Bucket Accumulation + Reduction
  • 4. Window Reduction - Final MSM Result
  • References
Export as PDF
  1. Primitives
  2. Abstract Algebra
  3. Elliptic Curve
  4. MSM

Pippenger's Algorithm

Reduction of multi-scalar multiplication (MSM) into double and add operations

PreviousMSMNextSigned Bucket Index

Last updated 1 month ago

AKA "Bucket" method

Motivation

Improve upon by reducing the total number of double and add operations needed for .

Algorithm Process

Introduction

Multi-Scalar Multiplication (MSM) refers to the efficient computation of expressions in the following form:

S=∑i=1nkiPiS = \sum_{i=1}^{n} k_i P_iS=i=1∑n​ki​Pi​
  • kik_iki​: the iii-th λ\lambdaλ-bit scalar

  • PiP_iPi​: the iii-th point on an elliptic curve

Pippenger's algorithm speeds up MSM by dividing each scalar kik_iki​ into multiple sss-bit chunks, and then grouping and processing scalars by their corresponding bit positions (windows).

1. Scalar Decomposition

First, we decompose the scalars kik_iki​:

ki=ki,1+2ski,2+...+2(⌈λs⌉−1)ski,⌈λs⌉ki=∑j=1⌈λs⌉2(j−1)ski,j\begin{align*} k_i &= k_{i,1} + 2^{s}k_{i,2} + ... + 2^{(\lceil\frac{\lambda}{s}\rceil -1)s}k_{i,\lceil\frac{\lambda}{s}\rceil } \\ k_i &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil }2^{(j - 1)s} k_{i,j} \end{align*}ki​ki​​=ki,1​+2ski,2​+...+2(⌈sλ​⌉−1)ski,⌈sλ​⌉​=j=1∑⌈sλ​⌉​2(j−1)ski,j​​
  • sss : the number of bits per window

  • ⌈λs⌉\lceil\frac{\lambda}{s}\rceil ⌈sλ​⌉ : the total number of windows

Through this definition, we can rewrite the original MSM summation as follows:

S=∑i=1nkiPi=∑i=1n∑j=1⌈λs⌉2(j−1)ski,jPi=∑j=1⌈λs⌉2(j−1)s(∑i=1nki,jPi)=∑j=1⌈λs⌉2(j−1)sWj\begin{align*} S &= \sum_{i=1}^{n} k_i P_i \\ &= \sum_{i=1}^{n} \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s}k_{i,j} P_i \\ &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} \left( \sum_{i=1}^{n} k_{i,j} P_i \right) \\ &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j \end{align*}S​=i=1∑n​ki​Pi​=i=1∑n​j=1∑⌈sλ​⌉​2(j−1)ski,j​Pi​=j=1∑⌈sλ​⌉​2(j−1)s(i=1∑n​ki,j​Pi​)=j=1∑⌈sλ​⌉​2(j−1)sWj​​

Note that WjW_jWj​ is the sum for the jjj-th window.

2. Bucket Accumulation

Given our WjW_jWj​, we denote buckets BBB as the following:

Bℓ,j=∑i:kij=ℓPiB_{\ell, j} = \sum_{i: k_{ij} = \ell} P_iBℓ,j​=i:kij​=ℓ∑​Pi​

Or in other words, Bℓ,jB_{\ell, j}Bℓ,j​ refers to the ℓ\ellℓ bucket in window WjW_jWj​ consisting of the summation of points PiP_iPi​ with the same ki,jk_{i,j}ki,j​.

Note that ki,jk_{i,j}ki,j​ must be within the range [0,2w−1][0, 2^w - 1][0,2w−1]; however, if ki,jk_{i,j}ki,j​ is 0, as ki,jk_{i,j}ki,j​ is the scalar multiplier, the result of the bucket will be 0. Therefore, the practical range of ki,jk_{i,j}ki,j​ is [1,2w−1][1, 2^w - 1][1,2w−1].

3. Bucket Reduction

Thus, in total, a window WjW_jWj​ will be re-structured as follows:

Wj=∑i=1nkijPi=∑ℓ=12s−1ℓBℓ,jW_j = \sum_{i=1}^{n} k_{ij}P_i = \sum_{\ell=1}^{2^s - 1} \ell B_{\ell, j}Wj​=i=1∑n​kij​Pi​=ℓ=1∑2s−1​ℓBℓ,j​

Note that the computation of a singular ℓBℓ,j\ell B_{\ell,j}ℓBℓ,j​ is computed through a running sum of recursive summations and not through further scalar multiplications.

Complexity Cost:

  • Original: (2s−1)×ScalarMul+(2s−2)×PointAdd(2^s - 1) \times \mathsf{ScalarMul} + (2^s - 2) \times \mathsf{PointAdd}(2s−1)×ScalarMul+(2s−2)×PointAdd

  • Bucketed: (2s+1−2)×PointAdd(2^{s+1} - 2) \times \mathsf{PointAdd}(2s+1−2)×PointAdd

C++ Implementation

Bucket ReduceBucketPoints(const std::vector<Bucket>& buckets) {
  Bucket res = Bucket::Zero();
  Bucket sum = Bucket::Zero();
  for (size_t i = buckets.size() - 1; i != SIZE_MAX; --i) {
    sum += buckets[i];
    res += sum;
  }
  return res;
}

Example

W1=B1,1+2B2,1+3B3,1+4B4,1W_1 = B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1}W1​=B1,1​+2B2,1​+3B3,1​+4B4,1​
// Round 1
sum = B_(4,1);
res = B_(4,1);
// Round 2
sum = B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)];
// Round 3
sum = B_(2,1) + B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)] + [B_(2,1) + B_(3,1) + B_(4,1)];
// Round 3
sum = B_(1,1) + B_(2,1) + B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)] + [B_(2,1) + B_(3,1) + B_(4,1)] + [B_(1,1) + B_(2,1) + B_(3,1) + B_(4,1)];

// Total:
res = B_(1,1) + 2*B_(2,1) + 3*B_(3,1) + 4*B_(4,1);

4. Window Reduction - Final MSM Result

With the finished individual window computations, we return to the total MSM summation and add all windows together:

S=∑j=1⌈λs⌉2(j−1)sWjS= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_jS=j=1∑⌈sλ​⌉​2(j−1)sWj​

In practice, the computation is calculated recursively from j=⌈λs⌉j = \left\lceil \frac{\lambda}{s} \right\rceilj=⌈sλ​⌉ to j=1j = 1j=1 like so:

Tj={0j=⌈λs⌉Wj+2sTj+1j≥1T_j = \begin{cases} 0 & j = \left\lceil \frac{\lambda}{s} \right\rceil \\ W_j + 2 ^sT_{j+1} & j \ge 1 \\ \end{cases}Tj​={0Wj​+2sTj+1​​j=⌈sλ​⌉j≥1​

Complexity Cost

⌈λs⌉(s×PointDouble+PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + \mathsf{PointAdd})⌈sλ​⌉(s×PointDouble+PointAdd)

C++ Implementation

Bucket ReduceWindows(uint32_t lambda, uint32_t s, const std::vector<Bucket>& buckets) {
  Bucket res = Bucket::Zero();
  uint32_t window_num = static_cast<uint32_t>(std::ceil(lambda / s));
  for (uint32_t j = window_num - 1; j != UINT32_MAX; --i) {
    for (uint32_t i = 0; i < s; ++i) {
      res = res * 2;
    }
    res += buckets[j];
  }
  return res;
}

Complexity Analysis

  • To compute WjW_jWj​, n×PointAddn \times \mathsf{PointAdd}n×PointAdd is required: ⌈λs⌉(n×PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil(n \times \mathsf{PointAdd})⌈sλ​⌉(n×PointAdd)

  • Each WjW_jWj​ needs (2s+1−2)×PointAdd(2^{s+1} - 2) \times \mathsf{PointAdd}(2s+1−2)×PointAdd: ⌈λs⌉((2s+1−2)×PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil((2^{s+1} - 2) \times \mathsf{PointAdd})⌈sλ​⌉((2s+1−2)×PointAdd)

  • Final QQQ needs: ⌈λs⌉(s×PointDouble+PointAdd)\left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + \mathsf{PointAdd})⌈sλ​⌉(s×PointDouble+PointAdd)

Total complexity

⌈λs⌉(s×PointDouble+(n+2s+1−1)×PointAdd)≈λ×PointDouble+⌈λs⌉(n+2s+1−1)×PointAdd\left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + (n + 2^{s+1} - 1) \times \mathsf{PointAdd}) \approx \\\lambda \times \mathsf{PointDouble} + \left\lceil \frac{\lambda}{s} \right\rceil(n + 2^{s+1} - 1)\times \mathsf{PointAdd}⌈sλ​⌉(s×PointDouble+(n+2s+1−1)×PointAdd)≈λ×PointDouble+⌈sλ​⌉(n+2s+1−1)×PointAdd

Optimization Table (When , n=220n = 2^{20}n=220)

s
PointAdd Count

2

133,170,041

4

66,062,241

8

32,521,697

16

17,694,705

32

60,136,882,169

64

110,680,464,442,260,455,421

128

680,564,733,841,876,926,926,749,214,863,537,471,487

🔍 This clearly shows that selecting an optimal sss value is critical for performance.

Example

Let's try to run Pippenger's on the following simple example:

S=k1P1+k2P2+k3P3=12P1+9P2+13P3\begin{align*} S &= k_1P_1 + k_2P_2 + k_3P_3\\ &=12P_1 + 9P_2 + 13P_3 \end{align*}S​=k1​P1​+k2​P2​+k3​P3​=12P1​+9P2​+13P3​​

1. Scalar Decomposition

Let's define our bits per window sss to be 2. In this case, since the maximum amount of bits needed to constrain the values 12, 9, and 13 is 4, we will have ⌈λs⌉=max_bits/s=4/2=2\lceil\frac{\lambda}{s}\rceil =\mathsf{max\_bits} / s = 4/2 = 2⌈sλ​⌉=max_bits/s=4/2=2 windows.

Note that in binary we have:

k1=12=0⏟200⏟211⏟221⏟23k2=9=1⏟200⏟210⏟221⏟23k3=13=1⏟200⏟211⏟221⏟23k_1 =12 = \underbrace{0}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}\underbrace{1}_{2^3}\\ k_2 = 9 = \underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{0}_{2^2}\underbrace{1}_{2^3} \\ k_3 =13 = \underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}\underbrace{1}_{2^3}k1​=12=200​​210​​221​​231​​k2​=9=201​​210​​220​​231​​k3​=13=201​​210​​221​​231​​

Decomposing our k0k_0k0​, k1k_1k1​, and k2k_2k2​ results in the following:

ki=ki,1+2ski,2+...+2(⌈λs⌉−1)ski,⌈λs⌉\begin{align*} k_i &= k_{i,1} + 2^{s}k_{i,2} + ... + 2^{(\lceil\frac{\lambda}{s}\rceil -1)s}k_{i,\lceil\frac{\lambda}{s}\rceil } \\ \end{align*}ki​​=ki,1​+2ski,2​+...+2(⌈sλ​⌉−1)ski,⌈sλ​⌉​​
k1=12=0⏟200⏟21⏞window 11⏟221⏟23⏞window 2k2=9=1⏟200⏟21⏞window 10⏟221⏟23⏞window 2k3=13=1⏟200⏟21⏞window 11⏟221⏟23⏞window 2=k1,1+2sk1,2=k2,1+2sk2,2=k3,1+2sk3,2=0+3(22)=1+2(22)=1+3(22)\begin{align*} \begin{array}{rlrlrl} k_1 = 12 &= \overbrace{\underbrace{0}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{1}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} & \quad k_2 = 9 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{0}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} & \quad k_3 = 13 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{1}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} \\[1ex] &= k_{1,1} + 2^sk_{1,2} & \quad &= k_{2,1} + 2^sk_{2,2} & \quad &= k_{3,1} + 2^sk_{3,2} \\[1ex] &= 0 + 3(2^2) & \quad &= 1 + 2(2^2) & \quad &= 1 + 3(2^2) \\[1ex] \end{array} \end{align*} k1​=12​=200​​210​​​window 1​221​​231​​​window 2​=k1,1​+2sk1,2​=0+3(22)​k2​=9​=201​​210​​​window 1​220​​231​​​window 2​=k2,1​+2sk2,2​=1+2(22)​k3​=13​=201​​210​​​window 1​221​​231​​​window 2​=k3,1​+2sk3,2​=1+3(22)​​

2+3. Bucket Accumulation + Reduction

Wj=∑i=1nkijPi=∑ℓ=12s−1ℓBℓ,jW_j = \sum_{i=1}^{n} k_{ij}P_i = \sum_{\ell=1}^{2^s - 1} \ell B_{\ell, j}Wj​=i=1∑n​kij​Pi​=ℓ=1∑2s−1​ℓBℓ,j​

Given our sss of 2, we now have buckets of [B1,j,B2,j,B3,j][B_{1,j}, B_{2,j}, B_{3,j}][B1,j​,B2,j​,B3,j​] for a given window WjW_jWj​. Our windows are now re-structured to buckets like so:

W1=1⏞k2,1P2+1⏞k3,1P3=1(P2+P3)=B1,1W2=3⏞k1,2P1+2⏞k2,2P2+3⏞k3,2P3=2(P2)+3(P1+P3)=2B2,2+3B3,2\begin{align*} W_1 &= \overbrace{1}^{k_{2,1}}P_2 + \overbrace{1}^{k_{3,1}}P_3 = 1(P_2+P_3) = B_{1,1}\\ W_2 &= \overbrace{3}^{k_{1,2}}P_1 + \overbrace{2}^{k_{2,2}}P_2 + \overbrace{3}^{k_{3,2}}P_3 = 2(P_2) + 3(P_1 + P_3) = 2B_{2,2} + 3B_{3,2} \end{align*}W1​W2​​=1k2,1​P2​+1k3,1​P3​=1(P2​+P3​)=B1,1​=3k1,2​P1​+2k2,2​P2​+3k3,2​P3​=2(P2​)+3(P1​+P3​)=2B2,2​+3B3,2​​

From bottom up, in our bucket accumulation stage we compute the buckets:

Bucket
Computation

...and in our bucket reduction stage we compute the windows:

W1=B1,1W2=(B2,2+B2,2)+(B3,2+B3,2+B3,2)\begin{align*} W_1 &= B_{1,1}\\ W_2 &= (B_{2,2} + B_{2,2}) + (B_{3,2} + B_{3,2} + B_{3,2}) \end{align*}W1​W2​​=B1,1​=(B2,2​+B2,2​)+(B3,2​+B3,2​+B3,2​)​

4. Window Reduction - Final MSM Result

Putting all the windows together for the MSM sum, we get the following:

S=∑j=1⌈λs⌉2(j−1)sWj=W1+22W2\begin{align*} S &=\sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j \\ &= W_1 + 2^2W_2 \\ \end{align*}S​=j=1∑⌈sλ​⌉​2(j−1)sWj​=W1​+22W2​​

References

Written by & of A41

B1,1B_{1,1}B1,1​
P2+P3P_2 + P_3P2​+P3​
B2,2B_{2,2}B2,2​
P2P_2P2​
B3,2B_{3,2}B3,2​
P1+P3P_1+P_3P1​+P3​
Double-and-add
MSM
https://hackmd.io/lNOsNGikQgO0hLjYvH6HJA
https://eprint.iacr.org/2022/1321.pdf
Ashley Jeong
ryan Kim