AKA "Bucket" method
Motivation
Improve upon by reducing the total number of double and add operations needed for .
Algorithm Process
Introduction
Multi-Scalar Multiplication (MSM) refers to the efficient computation of expressions in the following form:
S = ∑ i = 1 n k i P i S = \sum_{i=1}^{n} k_i P_i S = i = 1 ∑ n k i P i k i k_i k i : the i i i -th λ \lambda λ -bit scalar
P i P_i P i : the i i i -th point on an elliptic curve
Pippenger's algorithm speeds up MSM by dividing each scalar k i k_i k i into multiple s s s -bit chunks , and then grouping and processing scalars by their corresponding bit positions (windows).
1. Scalar Decomposition
First, we decompose the scalars k i k_i k i :
k i = k i , 1 + 2 s k i , 2 + . . . + 2 ( ⌈ λ s ⌉ − 1 ) s k i , ⌈ λ s ⌉ k i = ∑ j = 1 ⌈ λ s ⌉ 2 ( j − 1 ) s k i , j \begin{align*}
k_i &= k_{i,1} + 2^{s}k_{i,2} + ... + 2^{(\lceil\frac{\lambda}{s}\rceil -1)s}k_{i,\lceil\frac{\lambda}{s}\rceil } \\
k_i &= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil }2^{(j - 1)s} k_{i,j}
\end{align*} k i k i = k i , 1 + 2 s k i , 2 + ... + 2 (⌈ s λ ⌉ − 1 ) s k i , ⌈ s λ ⌉ = j = 1 ∑ ⌈ s λ ⌉ 2 ( j − 1 ) s k i , j s s s : the number of bits per window
⌈ λ s ⌉ \lceil\frac{\lambda}{s}\rceil ⌈ s λ ⌉ : the total number of windows
Through this definition, we can rewrite the original MSM summation as follows:
S = ∑ i = 1 n k i P i = ∑ i = 1 n ∑ j = 1 ⌈ λ s ⌉ 2 ( j − 1 ) s k i , j P i = ∑ j = 1 ⌈ λ s ⌉ 2 ( j − 1 ) s ( ∑ i = 1 n k i , j P i ) = ∑ j = 1 ⌈ λ s ⌉ 2 ( j − 1 ) s W j \begin{align*}
S &= \sum_{i=1}^{n} k_i P_i \\
&= \sum_{i=1}^{n} \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s}k_{i,j} P_i \\
&= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} \left( \sum_{i=1}^{n} k_{i,j} P_i \right) \\
&= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j
\end{align*} S = i = 1 ∑ n k i P i = i = 1 ∑ n j = 1 ∑ ⌈ s λ ⌉ 2 ( j − 1 ) s k i , j P i = j = 1 ∑ ⌈ s λ ⌉ 2 ( j − 1 ) s ( i = 1 ∑ n k i , j P i ) = j = 1 ∑ ⌈ s λ ⌉ 2 ( j − 1 ) s W j Note that W j W_j W j is the sum for the j j j -th window.
2. Bucket Accumulation
Given our W j W_j W j , we denote buckets B B B as the following:
B ℓ , j = ∑ i : k i j = ℓ P i B_{\ell, j} = \sum_{i: k_{ij} = \ell} P_i B ℓ , j = i : k ij = ℓ ∑ P i Or in other words, B ℓ , j B_{\ell, j} B ℓ , j refers to the ℓ \ell ℓ bucket in window W j W_j W j consisting of the summation of points P i P_i P i with the same k i , j k_{i,j} k i , j .
Note that k i , j k_{i,j} k i , j must be within the range [ 0 , 2 w − 1 ] [0, 2^w - 1] [ 0 , 2 w − 1 ] ; however, if k i , j k_{i,j} k i , j is 0, as k i , j k_{i,j} k i , j is the scalar multiplier, the result of the bucket will be 0. Therefore, the practical range of k i , j k_{i,j} k i , j is [ 1 , 2 w − 1 ] [1, 2^w - 1] [ 1 , 2 w − 1 ] .
3. Bucket Reduction
Thus, in total, a window W j W_j W j will be re-structured as follows:
W j = ∑ i = 1 n k i j P i = ∑ ℓ = 1 2 s − 1 ℓ B ℓ , j W_j = \sum_{i=1}^{n} k_{ij}P_i = \sum_{\ell=1}^{2^s - 1} \ell B_{\ell, j} W j = i = 1 ∑ n k ij P i = ℓ = 1 ∑ 2 s − 1 ℓ B ℓ , j Note that the computation of a singular ℓ B ℓ , j \ell B_{\ell,j} ℓ B ℓ , j is computed through a running sum of recursive summations and not through further scalar multiplications.
Complexity Cost:
Original: ( 2 s − 1 ) × S c a l a r M u l + ( 2 s − 2 ) × P o i n t A d d (2^s - 1) \times \mathsf{ScalarMul} + (2^s - 2) \times \mathsf{PointAdd} ( 2 s − 1 ) × ScalarMul + ( 2 s − 2 ) × PointAdd
Bucketed: ( 2 s + 1 − 2 ) × P o i n t A d d (2^{s+1} - 2) \times \mathsf{PointAdd} ( 2 s + 1 − 2 ) × PointAdd
C++ Implementation
Copy Bucket ReduceBucketPoints(const std::vector<Bucket>& buckets) {
Bucket res = Bucket::Zero();
Bucket sum = Bucket::Zero();
for (size_t i = buckets.size() - 1; i != SIZE_MAX; --i) {
sum += buckets[i];
res += sum;
}
return res;
}
Example
W 1 = B 1 , 1 + 2 B 2 , 1 + 3 B 3 , 1 + 4 B 4 , 1 W_1 = B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1} W 1 = B 1 , 1 + 2 B 2 , 1 + 3 B 3 , 1 + 4 B 4 , 1
Copy // Round 1
sum = B_(4,1);
res = B_(4,1);
// Round 2
sum = B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)];
// Round 3
sum = B_(2,1) + B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)] + [B_(2,1) + B_(3,1) + B_(4,1)];
// Round 3
sum = B_(1,1) + B_(2,1) + B_(3,1) + B_(4,1);
res = B_(4,1) + [B_(3,1) + B_(4,1)] + [B_(2,1) + B_(3,1) + B_(4,1)] + [B_(1,1) + B_(2,1) + B_(3,1) + B_(4,1)];
// Total:
res = B_(1,1) + 2*B_(2,1) + 3*B_(3,1) + 4*B_(4,1);
4. Window Reduction - Final MSM Result
With the finished individual window computations, we return to the total MSM summation and add all windows together:
S = ∑ j = 1 ⌈ λ s ⌉ 2 ( j − 1 ) s W j S= \sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j S = j = 1 ∑ ⌈ s λ ⌉ 2 ( j − 1 ) s W j In practice, the computation is calculated recursively from j = ⌈ λ s ⌉ j = \left\lceil \frac{\lambda}{s} \right\rceil j = ⌈ s λ ⌉ to j = 1 j = 1 j = 1 like so:
T j = { 0 j = ⌈ λ s ⌉ W j + 2 s T j + 1 j ≥ 1 T_j = \begin{cases}
0 & j = \left\lceil \frac{\lambda}{s} \right\rceil \\
W_j + 2 ^sT_{j+1} & j \ge 1 \\
\end{cases} T j = { 0 W j + 2 s T j + 1 j = ⌈ s λ ⌉ j ≥ 1 Complexity Cost
⌈ λ s ⌉ ( s × P o i n t D o u b l e + P o i n t A d d ) \left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + \mathsf{PointAdd}) ⌈ s λ ⌉ ( s × PointDouble + PointAdd ) C++ Implementation
Copy Bucket ReduceWindows(uint32_t lambda, uint32_t s, const std::vector<Bucket>& buckets) {
Bucket res = Bucket::Zero();
uint32_t window_num = static_cast<uint32_t>(std::ceil(lambda / s));
for (uint32_t j = window_num - 1; j != UINT32_MAX; --i) {
for (uint32_t i = 0; i < s; ++i) {
res = res * 2;
}
res += buckets[j];
}
return res;
}
Complexity Analysis
To compute W j W_j W j , n × P o i n t A d d n \times \mathsf{PointAdd} n × PointAdd is required: ⌈ λ s ⌉ ( n × P o i n t A d d ) \left\lceil \frac{\lambda}{s} \right\rceil(n \times \mathsf{PointAdd}) ⌈ s λ ⌉ ( n × PointAdd )
Each W j W_j W j needs ( 2 s + 1 − 2 ) × P o i n t A d d (2^{s+1} - 2) \times \mathsf{PointAdd} ( 2 s + 1 − 2 ) × PointAdd : ⌈ λ s ⌉ ( ( 2 s + 1 − 2 ) × P o i n t A d d ) \left\lceil \frac{\lambda}{s} \right\rceil((2^{s+1} - 2) \times \mathsf{PointAdd}) ⌈ s λ ⌉ (( 2 s + 1 − 2 ) × PointAdd )
Final Q Q Q needs: ⌈ λ s ⌉ ( s × P o i n t D o u b l e + P o i n t A d d ) \left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + \mathsf{PointAdd}) ⌈ s λ ⌉ ( s × PointDouble + PointAdd )
Total complexity
⌈ λ s ⌉ ( s × P o i n t D o u b l e + ( n + 2 s + 1 − 1 ) × P o i n t A d d ) ≈ λ × P o i n t D o u b l e + ⌈ λ s ⌉ ( n + 2 s + 1 − 1 ) × P o i n t A d d \left\lceil \frac{\lambda}{s} \right\rceil(s \times \mathsf{PointDouble} + (n + 2^{s+1} - 1) \times \mathsf{PointAdd}) \approx \\\lambda \times \mathsf{PointDouble} + \left\lceil \frac{\lambda}{s} \right\rceil(n + 2^{s+1} - 1)\times \mathsf{PointAdd} ⌈ s λ ⌉ ( s × PointDouble + ( n + 2 s + 1 − 1 ) × PointAdd ) ≈ λ × PointDouble + ⌈ s λ ⌉ ( n + 2 s + 1 − 1 ) × PointAdd Optimization Table (When ,
n = 2 20 n = 2^{20} n = 2 20 )
110,680,464,442,260,455,421
680,564,733,841,876,926,926,749,214,863,537,471,487
🔍 This clearly shows that selecting an optimal s s s value is critical for performance.
Example
Let's try to run Pippenger's on the following simple example:
S = k 1 P 1 + k 2 P 2 + k 3 P 3 = 12 P 1 + 9 P 2 + 13 P 3 \begin{align*}
S &= k_1P_1 + k_2P_2 + k_3P_3\\
&=12P_1 + 9P_2 + 13P_3
\end{align*} S = k 1 P 1 + k 2 P 2 + k 3 P 3 = 12 P 1 + 9 P 2 + 13 P 3 1. Scalar Decomposition
Let's define our bits per window s s s to be 2. In this case, since the maximum amount of bits needed to constrain the values 12, 9, and 13 is 4, we will have ⌈ λ s ⌉ = m a x _ b i t s / s = 4 / 2 = 2 \lceil\frac{\lambda}{s}\rceil =\mathsf{max\_bits} / s = 4/2 = 2 ⌈ s λ ⌉ = max_bits / s = 4/2 = 2 windows.
Note that in binary we have:
k 1 = 12 = 0 ⏟ 2 0 0 ⏟ 2 1 1 ⏟ 2 2 1 ⏟ 2 3 k 2 = 9 = 1 ⏟ 2 0 0 ⏟ 2 1 0 ⏟ 2 2 1 ⏟ 2 3 k 3 = 13 = 1 ⏟ 2 0 0 ⏟ 2 1 1 ⏟ 2 2 1 ⏟ 2 3 k_1 =12 = \underbrace{0}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}\underbrace{1}_{2^3}\\
k_2 = 9 = \underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{0}_{2^2}\underbrace{1}_{2^3} \\
k_3 =13 = \underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}\underbrace{1}_{2^3} k 1 = 12 = 2 0 0 2 1 0 2 2 1 2 3 1 k 2 = 9 = 2 0 1 2 1 0 2 2 0 2 3 1 k 3 = 13 = 2 0 1 2 1 0 2 2 1 2 3 1 Decomposing our k 0 k_0 k 0 , k 1 k_1 k 1 , and k 2 k_2 k 2 results in the following:
k i = k i , 1 + 2 s k i , 2 + . . . + 2 ( ⌈ λ s ⌉ − 1 ) s k i , ⌈ λ s ⌉ \begin{align*}
k_i &= k_{i,1} + 2^{s}k_{i,2} + ... + 2^{(\lceil\frac{\lambda}{s}\rceil -1)s}k_{i,\lceil\frac{\lambda}{s}\rceil } \\
\end{align*} k i = k i , 1 + 2 s k i , 2 + ... + 2 (⌈ s λ ⌉ − 1 ) s k i , ⌈ s λ ⌉ k 1 = 12 = 0 ⏟ 2 0 0 ⏟ 2 1 ⏞ window 1 1 ⏟ 2 2 1 ⏟ 2 3 ⏞ window 2 k 2 = 9 = 1 ⏟ 2 0 0 ⏟ 2 1 ⏞ window 1 0 ⏟ 2 2 1 ⏟ 2 3 ⏞ window 2 k 3 = 13 = 1 ⏟ 2 0 0 ⏟ 2 1 ⏞ window 1 1 ⏟ 2 2 1 ⏟ 2 3 ⏞ window 2 = k 1 , 1 + 2 s k 1 , 2 = k 2 , 1 + 2 s k 2 , 2 = k 3 , 1 + 2 s k 3 , 2 = 0 + 3 ( 2 2 ) = 1 + 2 ( 2 2 ) = 1 + 3 ( 2 2 ) \begin{align*}
\begin{array}{rlrlrl}
k_1 = 12 &= \overbrace{\underbrace{0}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{1}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}}
& \quad
k_2 = 9 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{0}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}}
& \quad
k_3 = 13 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}}^{\text{window 1}}\overbrace{\underbrace{1}_{2^2}\underbrace{1}_{2^3}}^{\text{window 2}} \\[1ex]
&= k_{1,1} + 2^sk_{1,2}
& \quad &= k_{2,1} + 2^sk_{2,2}
& \quad &= k_{3,1} + 2^sk_{3,2} \\[1ex]
&= 0 + 3(2^2)
& \quad &= 1 + 2(2^2)
& \quad &= 1 + 3(2^2) \\[1ex]
\end{array}
\end{align*}
k 1 = 12 = 2 0 0 2 1 0 window 1 2 2 1 2 3 1 window 2 = k 1 , 1 + 2 s k 1 , 2 = 0 + 3 ( 2 2 ) k 2 = 9 = 2 0 1 2 1 0 window 1 2 2 0 2 3 1 window 2 = k 2 , 1 + 2 s k 2 , 2 = 1 + 2 ( 2 2 ) k 3 = 13 = 2 0 1 2 1 0 window 1 2 2 1 2 3 1 window 2 = k 3 , 1 + 2 s k 3 , 2 = 1 + 3 ( 2 2 ) 2+3. Bucket Accumulation + Reduction
W j = ∑ i = 1 n k i j P i = ∑ ℓ = 1 2 s − 1 ℓ B ℓ , j W_j = \sum_{i=1}^{n} k_{ij}P_i = \sum_{\ell=1}^{2^s - 1} \ell B_{\ell, j} W j = i = 1 ∑ n k ij P i = ℓ = 1 ∑ 2 s − 1 ℓ B ℓ , j Given our s s s of 2, we now have buckets of [ B 1 , j , B 2 , j , B 3 , j ] [B_{1,j}, B_{2,j}, B_{3,j}] [ B 1 , j , B 2 , j , B 3 , j ] for a given window W j W_j W j . Our windows are now re-structured to buckets like so:
W 1 = 1 ⏞ k 2 , 1 P 2 + 1 ⏞ k 3 , 1 P 3 = 1 ( P 2 + P 3 ) = B 1 , 1 W 2 = 3 ⏞ k 1 , 2 P 1 + 2 ⏞ k 2 , 2 P 2 + 3 ⏞ k 3 , 2 P 3 = 2 ( P 2 ) + 3 ( P 1 + P 3 ) = 2 B 2 , 2 + 3 B 3 , 2 \begin{align*}
W_1 &= \overbrace{1}^{k_{2,1}}P_2 + \overbrace{1}^{k_{3,1}}P_3 = 1(P_2+P_3) = B_{1,1}\\
W_2 &= \overbrace{3}^{k_{1,2}}P_1 + \overbrace{2}^{k_{2,2}}P_2 + \overbrace{3}^{k_{3,2}}P_3 = 2(P_2) + 3(P_1 + P_3) = 2B_{2,2} + 3B_{3,2}
\end{align*} W 1 W 2 = 1 k 2 , 1 P 2 + 1 k 3 , 1 P 3 = 1 ( P 2 + P 3 ) = B 1 , 1 = 3 k 1 , 2 P 1 + 2 k 2 , 2 P 2 + 3 k 3 , 2 P 3 = 2 ( P 2 ) + 3 ( P 1 + P 3 ) = 2 B 2 , 2 + 3 B 3 , 2 From bottom up, in our bucket accumulation stage we compute the buckets:
...and in our bucket reduction stage we compute the windows:
W 1 = B 1 , 1 W 2 = ( B 2 , 2 + B 2 , 2 ) + ( B 3 , 2 + B 3 , 2 + B 3 , 2 ) \begin{align*}
W_1 &= B_{1,1}\\
W_2 &= (B_{2,2} + B_{2,2}) + (B_{3,2} + B_{3,2} + B_{3,2})
\end{align*} W 1 W 2 = B 1 , 1 = ( B 2 , 2 + B 2 , 2 ) + ( B 3 , 2 + B 3 , 2 + B 3 , 2 ) 4. Window Reduction - Final MSM Result
Putting all the windows together for the MSM sum, we get the following:
S = ∑ j = 1 ⌈ λ s ⌉ 2 ( j − 1 ) s W j = W 1 + 2 2 W 2 \begin{align*}
S &=\sum_{j=1}^{\lceil\frac{\lambda}{s}\rceil } 2^{(j-1)s} W_j \\
&= W_1 + 2^2W_2 \\
\end{align*} S = j = 1 ∑ ⌈ s λ ⌉ 2 ( j − 1 ) s W j = W 1 + 2 2 W 2 References