LogoLogo
LogoLogo
  • Introduction
    • About Us
    • Notations & Definitions
      • MPC
      • ZK
    • Contribute to this Site!
  • Primitives
    • Multiplication
      • Karatsuba Multiplication
      • Toom-Cook Multiplication
    • NAF (Non-adjacent form)
    • Chinese Remainder Theorem (CRT)
    • Euclidean Algorithm
      • Extended Euclidean Algorithm
      • Binary Euclidean Algorithm
      • Extended Binary Euclidean Algorithm
    • Coding Theory
      • Linear Code
    • Number Theoretic Transform
    • Abstract Algebra
      • Group
        • -Morphisms
        • Batch Inverse
      • Elliptic Curve
        • Weierstrass Curve
          • Coordinate Forms
          • Fast Elliptic Curve Arithmetic and Improved WEIL Pairing Evaluation
        • Edwards Curve
          • Coordinate Forms
          • Twisted Edwards ↔ Short Weierstrass Transformation
        • Batch Inverse for Batch Point Additions
        • Scalar Multiplication
          • Double-and-add
          • GLV Decomposition
        • MSM
          • Pippenger's Algorithm
          • Signed Bucket Index
          • CycloneMSM
          • EdMSM
          • cuZK
        • 2-Chain and 2-Cycle of Elliptic Curves
    • Encryption Scheme
      • ElGamal Encryption
    • Modular Arithmetic
      • Modular Reduction
        • Barrett Reduction
        • Montgomery Reduction
      • Modular Inverse
        • Bernstein-Yang's Inverse
    • Multiset Check
    • Sumcheck
    • Commitment Scheme
      • Fflonk
      • SHPlonk
      • Zeromorph
  • MPC
    • Yao's Garbled Circuits
    • GMW
    • BMR
  • ZK
    • Arithmetization
      • R1CS
      • PLONK
      • AIR
    • Folding
      • LatticeFold
      • Nova
        • Nova over Cycles of Curves
    • Lookup
      • Lasso
      • LogUp-GKR
    • SNARK
      • Groth16
      • HyperPlonk
      • Spartan
        • SPARK
    • STARK
      • Additive NTT
      • Basefold
      • Binius
      • Brakedown
      • CircleSTARK
      • FRI
        • FRI Security Features and Optimizations
      • DEEP FRI
      • STIR
      • WHIR
    • Distributed ZK
      • Ryan's Trick for Distributed Groth16
  • Application
    • zkLogin
    • zkHoldem
    • zkTLS
      • DECO
      • Proxying is enough
  • zkVM
Powered by GitBook
On this page
  • Motivation
  • Explanation
  • Naive Pippenger's
  • Introducing NAF for buckets
  • Preventing the formation of an extra bucket
  • Code Example
  • References
Export as PDF
  1. Primitives
  2. Abstract Algebra
  3. Elliptic Curve
  4. MSM

Signed Bucket Index

Presentation: https://youtu.be/4bhCdM7B-Mk

PreviousPippenger's AlgorithmNextCycloneMSM

Last updated 27 days ago

AKA " method"

Note: this method is best used when negation for a group is cheap such as for an elliptic curve.

Motivation

When following a bucket-based method like , the amount of memory required for the buckets is demanding. How can we reduce the number of buckets per window from 2s−12^s-12s−1 to 2s−12^{s-1}2s−1, where sss is the number of bits per window, and reduce overhead?

Explanation

Naive

Let's take a look at an example of creating buckets with . Let's say the number of windows ⌈λs⌉=2\lceil\frac{\lambda}{s}\rceil=2⌈sλ​⌉=2 and the bit size per window s=3s = 3s=3. We'll look at the following example where kkk denotes scalars and PPP refers to points on an elliptic curve for the problem.

S=k1P1+k2P2+k3P3+k4P4+k5P5+k6P6+k7P7=57P1+50P2+43P3+36P4+29P5+22P6+15P7\begin{align*} S &= k_1P_1 + k_2P_2 + k_3P_3+ k_4P_4 + k_5P_5 + k_6P_6 + k_7P_7 \\ &= 57P_1 + 50P_2 + 43P_3 + 36P_4 + 29P_5 + 22P_6 + 15P_7 \\ \end{align*}S​=k1​P1​+k2​P2​+k3​P3​+k4​P4​+k5​P5​+k6​P6​+k7​P7​=57P1​+50P2​+43P3​+36P4​+29P5​+22P6​+15P7​​
k1=57=1⏟200⏟210⏟22⏞window 11⏟231⏟241⏟25⏞window 2=k1,1+k1,22s=1+7(23)k2=50=0⏟201⏟210⏟22⏞window 10⏟231⏟241⏟25⏞window 2=k2,1+k2,12s=2+6(23)k3=43=1⏟201⏟210⏟22⏞window 11⏟230⏟241⏟25⏞window 2=k3,1+k3,22s=3+5(23)k4=36=0⏟200⏟211⏟22⏞window 10⏟230⏟241⏟25⏞window 2=k4,1+k4,22s=4+4(23)k5=29=1⏟200⏟211⏟22⏞window 11⏟231⏟240⏟25⏞window 2=k5,1+k5,22s=5+3(23)k6=22=0⏟201⏟211⏟22⏞window 10⏟231⏟240⏟25⏞window 2=k6,1+k6,22s=6+2(23)k7=15=1⏟201⏟211⏟22⏞window 11⏟230⏟240⏟25⏞window 2=k7,1+k7,22s=7+1(23)\begin{align*} \begin{array}{ll} \begin{aligned} k_1 = 57 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{0}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{1}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{1,1} + k_{1,2}2^s \\ &= 1 + 7(2^3) \end{aligned} \quad\quad & \begin{aligned} k_2 = 50 &= \overbrace{\underbrace{0}_{2^0}\underbrace{1}_{2^1}\underbrace{0}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{0}_{2^3}\underbrace{1}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{2,1} + k_{2,1}2^s \\ &= 2 + 6(2^3) \end{aligned} \\[8ex] \begin{aligned} k_3 = 43 &= \overbrace{\underbrace{1}_{2^0}\underbrace{1}_{2^1}\underbrace{0}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{0}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{3,1} + k_{3,2}2^s \\ &= 3 + 5(2^3) \end{aligned} & \begin{aligned} k_4 = 36 &= \overbrace{\underbrace{0}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{0}_{2^3}\underbrace{0}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{4,1} + k_{4,2}2^s \\ &= 4 + 4(2^3) \end{aligned} \\[8ex] \begin{aligned} k_5 = 29 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{1}_{2^4}\underbrace{0}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{5,1} + k_{5,2}2^s \\ &= 5 + 3(2^3) \end{aligned} & \begin{aligned} k_6 = 22 &= \overbrace{\underbrace{0}_{2^0}\underbrace{1}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{0}_{2^3}\underbrace{1}_{2^4}\underbrace{0}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{6,1} + k_{6,2}2^s \\ &= 6 + 2(2^3) \end{aligned} \\[8ex] \begin{aligned} k_7 = 15 &= \overbrace{\underbrace{1}_{2^0}\underbrace{1}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{0}_{2^4}\underbrace{0}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{7,1} + k_{7,2}2^s \\ &= 7 + 1(2^3) \end{aligned} \end{array} \end{align*} k1​=57​=201​​210​​220​​​window 1​231​​241​​251​​​window 2​=k1,1​+k1,2​2s=1+7(23)​k3​=43​=201​​211​​220​​​window 1​231​​240​​251​​​window 2​=k3,1​+k3,2​2s=3+5(23)​k5​=29​=201​​210​​221​​​window 1​231​​241​​250​​​window 2​=k5,1​+k5,2​2s=5+3(23)​k7​=15​=201​​211​​221​​​window 1​231​​240​​250​​​window 2​=k7,1​+k7,2​2s=7+1(23)​​k2​=50​=200​​211​​220​​​window 1​230​​241​​251​​​window 2​=k2,1​+k2,1​2s=2+6(23)​k4​=36​=200​​210​​221​​​window 1​230​​240​​251​​​window 2​=k4,1​+k4,2​2s=4+4(23)​k6​=22​=200​​211​​221​​​window 1​230​​241​​250​​​window 2​=k6,1​+k6,2​2s=6+2(23)​​​

With out buckets , we create our windows WWW:

W1=P1+2P2+3P3+4P4+5P5+6P6+7P7=B1,1+2B2,1+3B3,1+4B4,1+5B5,1+6B6,1+7B7,1W2=P7+2P6+3P5+4P4+5P3+6P2+7P1=B1,2+2B2,2+3B3,2+4B4,2+5B5,2+6B6,2+7B7,2\begin{align*} W_1 &= P_1 + 2P_2 + 3P_3 + 4P_4 + 5P_5 +6P_6 + 7P_7 = B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1} + 5B_{5,1} + 6B_{6,1} + 7B_{7,1}\\ W_2 &= P_7 + 2P_6 + 3P_5+4P_4 + 5P_3 + 6P_2 + 7P_1 = B_{1,2} + 2B_{2,2} + 3B_{3,2} + 4B_{4,2} + 5B_{5,2} + 6B_{6,2} + 7B_{7,2}\\ \end{align*}W1​W2​​=P1​+2P2​+3P3​+4P4​+5P5​+6P6​+7P7​=B1,1​+2B2,1​+3B3,1​+4B4,1​+5B5,1​+6B6,1​+7B7,1​=P7​+2P6​+3P5​+4P4​+5P3​+6P2​+7P1​=B1,2​+2B2,2​+3B3,2​+4B4,2​+5B5,2​+6B6,2​+7B7,2​​

Introducing NAF for buckets

Note that any computation kPkPkP can be reduced to (k−2s)P+2sP(k-2^{s})P + 2^{s}P(k−2s)P+2sP where sss is the bit width of kkk.

This means we can define the following NAF bucket method k′=NAF(k,P)k' = \mathsf{NAF}(k, P)k′=NAF(k,P) for window WjW_jWj​ as such:

  1. If 0<k<2s−10 < k < 2^{s-1}0<k<2s−1:

    1. k′=kk' = kk′=k

    2. Accumulate PPP into the kkk-th bucket Bk,jB_{k,j} Bk,j​

  2. If 2s−1≤k<2s2^{s-1} \le k < 2^s2s−1≤k<2s:

    1. k′=k−2sk' = k - 2^sk′=k−2s, then kP=−k′(−P)+2sPkP = -k'(-P) + 2^sPkP=−k′(−P)+2sP

    2. Accumulate −P-P−P into the −k′-k'−k′-th bucket B−k′,jB_{-k', j}B−k′,j​

    3. Add PPP into the next window

Step 2c computes the 2sP2^{s}P2sP part of the kPkPkP reduction and is also known as the "carry."

Let's change our naive Pippenger's to follow this method and generate our buckets and windows:

Notice that this method requires the presence of an extra carry bucket B1,3B_{1,3}B1,3​. This extra carry bucket should not exist. We'll take a look later at how to prevent this after this Naive Pippenger's to NAF method is fully shown.

Bucket Accumulation

Buckets
Computation

Bucket Reduction

W1=B1,1+2B2,1+3B3,1+4B4,1W2=B1,2+2B2,2+3B3,2+4B4,2∼∗B1,3∗\begin{align*} W_1 &= B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1} \\ W_2 &= B_{1,2} + 2B_{2,2} + 3B_{3,2} + 4B_{4,2} \\ &\sim* B_{1,3}* \end{align*}W1​W2​​=B1,1​+2B2,1​+3B3,1​+4B4,1​=B1,2​+2B2,2​+3B3,2​+4B4,2​∼∗B1,3​∗​

Window Reduction

S=W1+22W2+24(B1,3)S = W_1 + 2^2W_2+ 2^4(B_{1,3})S=W1​+22W2​+24(B1,3​)

Preventing the formation of an extra bucket

1. Ensuring last window bit size is within limit

The formation of an extra carry bucket depends on the true bit size slasts_\mathsf{last}slast​ of the final window in relation to λ\lambdaλ, the total bits of kkk, and sss, the bits per windows.

Safe Case: slast<s−1s_\mathsf{last} < s - 1slast​<s−1

  • Example: λ=8\lambda = 8λ=8, s=5s = 5s=5 ⇒ k=k1+25k2k = k_1 + 2^5 k_2k=k1​+25k2​

  • Then slast=3s_\mathsf{last} = 3slast​=3, and 0≤k2<230 \le k_2 < 2^30≤k2​<23

  • Since k2+1<24k_2 + 1 < 2^4k2​+1<24, carry cannot occur ⇒ It’s safe to assume the final carry is 0.

Carry Case: slast≥s−1s_\mathsf{last} \ge s - 1slast​≥s−1

  • Example: λ=8\lambda = 8λ=8, s=4s = 4s=4 ⇒ k=k1+24k2k = k_1 + 2^4 k_2k=k1​+24k2​

  • Here slast=4s_\mathsf{last} = 4slast​=4, and 0≤k2<240 \le k_2 < 2^40≤k2​<24

  • Since it is possible that k2+1≥23k_2 + 1 \ge 2^3k2​+1≥23, carry can occur

Thus, if the restriction slast<s−1s_\mathsf{last} < s - 1slast​<s−1 is held, we can ensure there will be no new carry bucket created.

2. Naive Fix: Use Larger Bucket Only for Final Window

  • Use a larger bit-width bucket for the final window to absorb potential carry.

  • Cons: This is more complex and can be inefficient due to extra bucket space.

3. Use a Larger λ\lambdaλ Than the Actual Modulus Bit Size

Let’s say we are working with BN254, where λ=256\lambda = 256λ=256 and s=16s = 16s=16. The actual scalar field is 254 bits.

We express kkk as:

k=k1+216k2+⋯+216×15k16k = k_1 + 2^{16}k_2 + \dots + 2^{16\times15}k_{16}k=k1​+216k2​+⋯+216×15k16​

Let MMM be the modulus of BN254. Since the maximum value of kkk is M−1M - 1M−1, we have:

1≤k16+1=⌊M−1216×15⌋+1<2141 \le k_{16} + 1 = \lfloor \frac{M-1}{2^{16\times15}} \rfloor + 1 <2^{14}1≤k16​+1=⌊216×15M−1​⌋+1<214

This ensures that no overflow occurs in the final window.

M = 21888242871839275222246405745257275088548364400416034343698204186575808495617
print((M - 1) // (2**(16 * 15)) + 1 < 2 ** 14) # True

Yrrid, the Zprize winner in 2022 for the category "Prize 1a: Accelerating MSM Operations on GPU" explains one more optimization they used for their solution to guarantee that no carry occurs in the final window. When the MSB of kkk is set to 1, they apply the following transformation to reset the MSB to 0, thereby preventing any carry:

kP=(−k)(−P)=(M−k)(−P)kP = (-k)(-P) = (M - k)(-P)kP=(−k)(−P)=(M−k)(−P)

Here, MMM is the modulus of the scalar field (i.e., curve order), and k∈[0,M)k \in [0, M)k∈[0,M).

Let mlastm_\mathsf{last}mlast​ denote the sss-bit chunk in the final window. Yrrid's trick is said to ensure that mlast+1m_\mathsf{last} + 1mlast​+1 doesn't overflow.

Though this is what they state, it appears as if overflow can still occur as shown in the following example:

For example, in F11\mathbb{F}_{11}F11​ with λ=4\lambda = 4λ=4, the values where the MSB of kkk is set to 1 are 8, 9, and 10. This means that -8, -9, and -10 will be transformed into 3, 2, and 1, respectively. With this transformation, the new maximum value of kkk becomes 23−1=72^3 - 1 = 723−1=7. If s=2s = 2s=2, then:

k=k1+22k2k = k_1 + 2^2k_2k=k1​+22k2​

In this case with the maximum value of kkk as 7, k2k_2k2​ will be at most 1, meaning it does not satisfy its limit constraints:

1≤k2+1<22−1=21\le k_2 + 1< 2^{2-1} = 21≤k2​+1<22−1=2

Code Example

void FillDigits(uint32_t kLambda, uint32_t s, const ScalarField& f, std::vector<int64_t>* digits) {
  // The size of `digits` must be equal to ceil(lambda / s)
  // (lambda: total number of bits, s: window size in bits)
  CHECK_EQ(digits->size(), static_cast<size_t>(std::ceil(kLambda / s)));
  
  // `radix` is 2^s — the maximum range per window
  uint64_t radix = uint64_t{1} << s;

  // `carry` to be propagated to the next window
  uint64_t carry = 0;
  size_t bit_offset = 0;
  for (size_t i = 0; i < digits->size(); ++i) {
      // mᵢ: s-bit value extracted from the current window, including previous carry
    uint64_t m_i = f.ExtractBits(bit_offset, s) + carry;
  
    // Compute carry: if mᵢ is greater than or equal to radix / 2, carry = 1; 
    //.               otherwise carry = 0
    // This is based on the NAF rounding threshold (radix / 2)
    carry = (m_i + (radix >> 1)) >> s;
    
    // Convert mᵢ to its NAF form: mᵢ - carry * 2^s
    (*digits)[i] = static_cast<int64_t>(m_i) - static_cast<int64_t>(carry << s);
    bit_offset += s;
  }
  
  // Apply the final carry to the last digit
  // This adjustment preserves the total scalar value
  digits->back() += (carry << s);
}

void AddBasesToBuckets(uint32_t s, const std::vector<Point> bases, 
    const std::vector<std::vector<int64_t>>& digits,
    size_t window_index, std::vector<Bucket>& buckets) {
  for (size_t i = 0; i < scalar_digits.size(); ++i) {
    int64_t scalar = scalar_digits[i][window_index];
    if (0 < scalar) {
      // If the scalar is positive:
      // Place the corresponding base into the (scalar - 1)-th bucket
      //
      // Why `scalar - 1`?
      // Because scalars range from 1 to 2^{s-1} - 1 in NAF,
      // but bucket indices are 0-based (0 to 2^{s-1} - 2)
      // so we subtract 1 to get the correct index.
      buckets[static_cast<uint64_t>(scalar - 1)] += bases[i];
    } else if (0 > scalar) {
      // If the scalar is negative:
      // Place the *negated* base into the (-scalar - 1)-th bucket
      //
      // Why `-scalar - 1`?
      // Because negative scalars range from -1 to -2^{s-1},
      // and we want to map -1 → index 0, -2 → index 1, ..., just like for positives.
      buckets[static_cast<uint64_t>(-scalar - 1)] -= bases[i];
    }
  }
}

References

: I'm unsure why they don't just add ki,j=2s−1k_{i,j} = 2^{s-1}ki,j​=2s−1 into the Bk,jB_{k,j}Bk,j​ bucket. Perhaps this is a source of optimization when implementing MSM.

’s Trick

Here's a code snippet from implementing NAF for bucket MSM:

Written by & of A41

B1,1B_{1,1}B1,1​
P1−P7P_1-P_7P1​−P7​
B2,1B_{2,1}B2,1​
P2−P6P_2-P_6P2​−P6​
B3,1B_{3,1}B3,1​
P3−P5P_3-P_5P3​−P5​
B4,1B_{4,1}B4,1​
−P4-P_4−P4​
B1,2B_{1,2}B1,2​
−P1-P_1−P1​
B2,2B_{2,2}B2,2​
P7−P2P_7-P_2P7​−P2​
B3,2B_{3,2}B3,2​
P6−P4−P3P_6-P_4-P_3P6​−P4​−P3​
B4,2B_{4,2}B4,2​
−P5-P_5−P5​
B1,3B_{1,3}B1,3​
P5+P4+P3+P2+P1P_5+P_4 + P_3+P_2+P_1P5​+P4​+P3​+P2​+P1​
NAF (non-adjacent form)
Pippenger's algorithm
Pippenger's
Pippenger's algorithm
MSM
Yrrid
Tachyon
https://hackmd.io/jNXoVmBaSSmE1Z9zgvRW_w
Ashley Jeong
Ashley Jeong
ryan Kim
Stage 1: Separating previous window W1W_1W1​ into new buckets
Stage 2: Separating previous window W2W_2W2​ into new buckets
https://github.com/kroma-network/tachyon/blob/e7b13063c6f110b9851655cca740ee62ffb41137/tachyon/math/elliptic_curves/msm/algorithms/pippenger/pippenger.h#L119-L123
    if (is_last_window) {
      bucket_size = size_t{1} << ctx_.window_bits;
    } else {
      bucket_size = size_t{1} << (ctx_.window_bits - 1);
    }