Signed Bucket Index

Presentation: https://youtu.be/4bhCdM7B-Mk

AKA "NAF (non-adjacent form) method"

Note: this method is best used when negation for a group is cheap such as for an elliptic curve.

Motivation

When following a bucket-based method like Pippenger's algorithm, the amount of memory required for the buckets is demanding. How can we reduce the number of buckets per window from 2s12^s-1 to 2s12^{s-1}, where ss is the number of bits per window, and reduce overhead?

Explanation

Let's take a look at an example of creating buckets with Pippenger's algorithm. Let's say the number of windows λs=2\lceil\frac{\lambda}{s}\rceil=2 and the bit size per window s=3s = 3. We'll look at the following example where kk denotes scalars and PP refers to points on an elliptic curve for the MSM problem.

S=k1P1+k2P2+k3P3+k4P4+k5P5+k6P6+k7P7=57P1+50P2+43P3+36P4+29P5+22P6+15P7\begin{align*} S &= k_1P_1 + k_2P_2 + k_3P_3+ k_4P_4 + k_5P_5 + k_6P_6 + k_7P_7 \\ &= 57P_1 + 50P_2 + 43P_3 + 36P_4 + 29P_5 + 22P_6 + 15P_7 \\ \end{align*}
k1=57=120021022window 1123124125window 2=k1,1+k1,22s=1+7(23)k2=50=020121022window 1023124125window 2=k2,1+k2,12s=2+6(23)k3=43=120121022window 1123024125window 2=k3,1+k3,22s=3+5(23)k4=36=020021122window 1023024125window 2=k4,1+k4,22s=4+4(23)k5=29=120021122window 1123124025window 2=k5,1+k5,22s=5+3(23)k6=22=020121122window 1023124025window 2=k6,1+k6,22s=6+2(23)k7=15=120121122window 1123024025window 2=k7,1+k7,22s=7+1(23)\begin{align*} \begin{array}{ll} \begin{aligned} k_1 = 57 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{0}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{1}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{1,1} + k_{1,2}2^s \\ &= 1 + 7(2^3) \end{aligned} \quad\quad & \begin{aligned} k_2 = 50 &= \overbrace{\underbrace{0}_{2^0}\underbrace{1}_{2^1}\underbrace{0}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{0}_{2^3}\underbrace{1}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{2,1} + k_{2,1}2^s \\ &= 2 + 6(2^3) \end{aligned} \\[8ex] \begin{aligned} k_3 = 43 &= \overbrace{\underbrace{1}_{2^0}\underbrace{1}_{2^1}\underbrace{0}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{0}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{3,1} + k_{3,2}2^s \\ &= 3 + 5(2^3) \end{aligned} & \begin{aligned} k_4 = 36 &= \overbrace{\underbrace{0}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{0}_{2^3}\underbrace{0}_{2^4}\underbrace{1}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{4,1} + k_{4,2}2^s \\ &= 4 + 4(2^3) \end{aligned} \\[8ex] \begin{aligned} k_5 = 29 &= \overbrace{\underbrace{1}_{2^0}\underbrace{0}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{1}_{2^4}\underbrace{0}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{5,1} + k_{5,2}2^s \\ &= 5 + 3(2^3) \end{aligned} & \begin{aligned} k_6 = 22 &= \overbrace{\underbrace{0}_{2^0}\underbrace{1}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{0}_{2^3}\underbrace{1}_{2^4}\underbrace{0}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{6,1} + k_{6,2}2^s \\ &= 6 + 2(2^3) \end{aligned} \\[8ex] \begin{aligned} k_7 = 15 &= \overbrace{\underbrace{1}_{2^0}\underbrace{1}_{2^1}\underbrace{1}_{2^2}}^{\mathclap{\text{window 1}}} \overbrace{\underbrace{1}_{2^3}\underbrace{0}_{2^4}\underbrace{0}_{2^5}}^{\mathclap{\text{window 2}}} \\[1ex] &= k_{7,1} + k_{7,2}2^s \\ &= 7 + 1(2^3) \end{aligned} \end{array} \end{align*}

With buckets , we create our windows WW:

W1=P1+2P2+3P3+4P4+5P5+6P6+7P7=B1,1+2B2,1+3B3,1+4B4,1+5B5,1+6B6,1+7B7,1W2=P7+2P6+3P5+4P4+5P3+6P2+7P1=B1,2+2B2,2+3B3,2+4B4,2+5B5,2+6B6,2+7B7,2\begin{align*} W_1 &= P_1 + 2P_2 + 3P_3 + 4P_4 + 5P_5 +6P_6 + 7P_7 = B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1} + 5B_{5,1} + 6B_{6,1} + 7B_{7,1}\\ W_2 &= P_7 + 2P_6 + 3P_5+4P_4 + 5P_3 + 6P_2 + 7P_1 = B_{1,2} + 2B_{2,2} + 3B_{3,2} + 4B_{4,2} + 5B_{5,2} + 6B_{6,2} + 7B_{7,2}\\ \end{align*}

Introducing NAF for buckets

Note that any computation kPkP can be reduced to (k2s)P+2sP(k-2^{s})P + 2^{s}P where ss is the bit width of kk.

This means we can define the following NAF bucket method k=NAF(k,P)k' = \mathsf{NAF}(k, P) for window WjW_j as such:

  1. If 0<k<2s10 < k < 2^{s-1}:

    1. k=kk' = k

    2. Accumulate PP into the kk-th bucket Bk,jB_{k,j}

  2. If 2s1k<2s2^{s-1} \le k < 2^s:

    1. k=k2sk' = k - 2^s, then kP=k(P)+2sPkP = -k'(-P) + 2^sP

    2. Accumulate P-P into the k-k'-th bucket Bk,jB_{-k', j}

    3. Add PP into the next window

Step 2c computes the 2sP2^{s}P part of the kPkP reduction and is also known as the "carry."

Let's change our naive Pippenger's to follow this method and generate our buckets and windows:

Stage 1: Separating previous window W1W_1 into new buckets
Stage 2: Separating previous window W2W_2 into new buckets

Notice that this method requires the presence of an extra carry bucket B1,3B_{1,3}. This extra carry bucket should not exist. We'll take a look later at how to prevent this after this Naive Pippenger's to NAF method is fully shown.

Bucket Accumulation

Buckets
Computation

B1,1B_{1,1}

P1P7P_1-P_7

B2,1B_{2,1}

P2P6P_2-P_6

B3,1B_{3,1}

P3P5P_3-P_5

B4,1B_{4,1}

P4-P_4

B1,2B_{1,2}

P1-P_1

B2,2B_{2,2}

P7P2P_7-P_2

B3,2B_{3,2}

P6P4P3P_6-P_4-P_3

B4,2B_{4,2}

P5-P_5

B1,3B_{1,3}

P5+P4+P3+P2+P1P_5+P_4 + P_3+P_2+P_1

Bucket Reduction

W1=B1,1+2B2,1+3B3,1+4B4,1W2=B1,2+2B2,2+3B3,2+4B4,2B1,3\begin{align*} W_1 &= B_{1,1} + 2B_{2,1} + 3B_{3,1} + 4B_{4,1} \\ W_2 &= B_{1,2} + 2B_{2,2} + 3B_{3,2} + 4B_{4,2} \\ &\sim* B_{1,3}* \end{align*}

Window Reduction

S=W1+22W2+24(B1,3)S = W_1 + 2^2W_2+ 2^4(B_{1,3})

Preventing the formation of an extra bucket

1. Ensuring last window bit size is within limit

The formation of an extra carry bucket depends on the true bit size slasts_\mathsf{last} of the final window in relation to λ\lambda, the total bits of kk, and ss, the bits per windows.

Safe Case: slast<s1s_\mathsf{last} < s - 1

  • Example: λ=8\lambda = 8, s=5s = 5k=k1+25k2k = k_1 + 2^5 k_2

  • Then slast=3s_\mathsf{last} = 3, and 0k2<230 \le k_2 < 2^3

  • Since k2+1<24k_2 + 1 < 2^4, carry cannot occur ⇒ It’s safe to assume the final carry is 0.

Carry Case: slasts1s_\mathsf{last} \ge s - 1

  • Example: λ=8\lambda = 8, s=4s = 4k=k1+24k2k = k_1 + 2^4 k_2

  • Here slast=4s_\mathsf{last} = 4, and 0k2<240 \le k_2 < 2^4

  • Since it is possible that k2+123k_2 + 1 \ge 2^3, carry can occur

Thus, if the restriction slast<s1s_\mathsf{last} < s - 1 is held, we can ensure there will be no new carry bucket created.

2. Naive Fix: Use Larger Bucket Only for Final Window

https://github.com/kroma-network/tachyon/blob/e7b13063c6f110b9851655cca740ee62ffb41137/tachyon/math/elliptic_curves/msm/algorithms/pippenger/pippenger.h#L119-L123
    if (is_last_window) {
      bucket_size = size_t{1} << ctx_.window_bits;
    } else {
      bucket_size = size_t{1} << (ctx_.window_bits - 1);
    }
  • Use a larger bit-width bucket for the final window to absorb potential carry.

  • Cons: This is more complex and can be inefficient due to extra bucket space.

3. Use a Larger λ\lambda Than the Actual Modulus Bit Size

Let’s say we are working with BN254, where λ=256\lambda = 256 and s=16s = 16. The actual scalar field is 254 bits.

We express kk as:

k=k1+216k2++216×15k16k = k_1 + 2^{16}k_2 + \dots + 2^{16\times15}k_{16}

Let MM be the modulus of BN254. Since the maximum value of kk is M1M - 1, we have:

1k16+1=M1216×15+1<2141 \le k_{16} + 1 = \lfloor \frac{M-1}{2^{16\times15}} \rfloor + 1 <2^{14}

This ensures that no overflow occurs in the final window.

M = 21888242871839275222246405745257275088548364400416034343698204186575808495617
print((M - 1) // (2**(16 * 15)) + 1 < 2 ** 14) # True

4. Use Yrrid’s Trick

Yrrid, the Zprize winner in 2022 for the category "Prize 1a: Accelerating MSM Operations on GPU" explains one more optimization they used for their solution to guarantee that no carry occurs in the final window. When the MSB of kk is set to 1, they apply the following transformation to reset the MSB to 0, thereby preventing any carry:

kP=(k)(P)=(Mk)(P)kP = (-k)(-P) = (M - k)(-P)

Here, MM is the modulus of the scalar field (i.e., curve order), and k[0,M)k \in [0, M).

Let mlastm_\mathsf{last} denote the ss-bit chunk in the final window. Yrrid's trick is said to ensure that we don't overflow to a mlast+1m_\mathsf{last} + 1 chunk.

For example, in F11\mathbb{F}_{11} with λ=4\lambda = 4, the values where the MSB of kk is set to 1 are 8, 9, and 10. This means that -8, -9, and -10 will be transformed into 3, 2, and 1, respectively. With this transformation, the new maximum value of kk becomes 231=72^3 - 1 = 7. If s=2s = 2, then:

k=k1+22k2k = k_1 + 2^2k_2

In this case with the maximum value of kk as 7, k2k_2 will be at most 1.

Code Example

Here's a code snippet from Tachyon implementing NAF for bucket MSM:

void FillDigits(uint32_t kLambda, uint32_t s, const ScalarField& f, std::vector<int64_t>* digits) {
  // The size of `digits` must be equal to ceil(lambda / s)
  // (lambda: total number of bits, s: window size in bits)
  CHECK_EQ(digits->size(), static_cast<size_t>(std::ceil(kLambda / s)));
  
  // `radix` is 2^s — the maximum range per window
  uint64_t radix = uint64_t{1} << s;

  // `carry` to be propagated to the next window
  uint64_t carry = 0;
  size_t bit_offset = 0;
  for (size_t i = 0; i < digits->size(); ++i) {
      // mᵢ: s-bit value extracted from the current window, including previous carry
    uint64_t m_i = f.ExtractBits(bit_offset, s) + carry;
  
    // Compute carry: if mᵢ is greater than or equal to radix / 2, carry = 1; 
    //.               otherwise carry = 0
    // This is based on the NAF rounding threshold (radix / 2)
    carry = (m_i + (radix >> 1)) >> s;
    
    // Convert mᵢ to its NAF form: mᵢ - carry * 2^s
    (*digits)[i] = static_cast<int64_t>(m_i) - static_cast<int64_t>(carry << s);
    bit_offset += s;
  }
  
  // Apply the final carry to the last digit
  // This adjustment preserves the total scalar value
  digits->back() += (carry << s);
}

void AddBasesToBuckets(uint32_t s, const std::vector<Point> bases, 
    const std::vector<std::vector<int64_t>>& digits,
    size_t window_index, std::vector<Bucket>& buckets) {
  for (size_t i = 0; i < scalar_digits.size(); ++i) {
    int64_t scalar = scalar_digits[i][window_index];
    if (0 < scalar) {
      // If the scalar is positive:
      // Place the corresponding base into the (scalar - 1)-th bucket
      //
      // Why `scalar - 1`?
      // Because scalars range from 1 to 2^{s-1} - 1 in NAF,
      // but bucket indices are 0-based (0 to 2^{s-1} - 2)
      // so we subtract 1 to get the correct index.
      buckets[static_cast<uint64_t>(scalar - 1)] += bases[i];
    } else if (0 > scalar) {
      // If the scalar is negative:
      // Place the *negated* base into the (-scalar - 1)-th bucket
      //
      // Why `-scalar - 1`?
      // Because negative scalars range from -1 to -2^{s-1},
      // and we want to map -1 → index 0, -2 → index 1, ..., just like for positives.
      buckets[static_cast<uint64_t>(-scalar - 1)] -= bases[i];
    }
  }
}

References

Written by ryan Kim & Ashley Jeong of A41

Last updated