GLV Decomposition

Splitting a scalar multiplication into 2 and reducing count of double operations

Note 1. The elliptic curve in use should have efficiently-computable endomorphisms

Note 2. The elliptic curve in use must have the equation y2=x3+by^2= x^3 + b, where a=0a=0

Algorithm Process

We attempt to calculate the following elliptic curve point-scalar multiplication:

kPkP

...where kk is the scalar and point PP is (x,y)(x,y).

1. Find a scalar λ\lambda

...that satisfies the following:

λP=(βx,y)=P\lambda P = (\beta x , y) = P'

β\beta will be a cubic root of the base field or in other words for a base field Fp\mathbb{F}_p we have β31modp\beta^3 \equiv 1 \mod{p}

As β\beta is a 3rd root of unity, the elliptic curve equation (with a=0a=0) holds:

y2=(βx)3+b=x3+by^2 =(\beta \cdot x)^3 + b = x^3 + b

TODO: Explain how to find λ\lambda

Note that the property of endomorphism is used for this process.

2. Decompose the scalar kk

k=k1+λk2modpk = k_1 + \lambda k_2 \mod{p}

...where k1k_1 and k2k_2 are around half the bits of kk.

3. Calculate the total

This means, by definition, the following is true:

kP=k1P+λk2P=k1P+k2PkP = k_1P + \lambda k_2 P = k_1P + k_2P'

Comparison of Cost

In the GLV paper, they mention their algorithm reduces the number of double operations by around half in comparison to other algorithms due to the decomposed scalar. However, determining the exact cost benefit of GLV proves quite difficult. The authors of GLV state the following in their analysis section (which has been paraphrased for simplicity):

Whereas a current "highly efficient algorithm" that uses exponent recoding and a sliding window algorithm costs 157 point doubles and 24 point additions for 160-bit scalar multiplication, with GLV, it costs around 79 point doubles and 38 point additions. If a point double costs 8 field multiplications and a point addition costs 11 field multiplications (per Jacobian point form), then GLV has a run time of around 66% the "highly efficient algorithm." With an increase in bit length, the comparative run time percentage gets better. For example, 512-bit scalar multiplication reduces with percentage to 62%.

By this description, we can conclude that GLV severely decreases the total number of point doubles, but can slightly increase the number of point additions.

GLV Advantages

Parallelization

The separation of one scalar multiplication into smaller parts introduces the possibility for running parallelization.

Reduced overhead from shared memory

Memory for intermediate computations can be shared by first calculating k1Pk_1P and k2Pk_2P. After calculations are finished, β\beta can be simply multiplied to k2Pk_2P's result.

Point ScalarMultiply(const Point& p, int k) {
    auto [k1, k2] = Point::Split(k); // GLV scalar decomposition

    Point s = p;
    Point p1, p2;

    int max_bits = std::max(
        static_cast<int>(std::bit_width(static_cast<unsigned int>(k1))),
        static_cast<int>(std::bit_width(static_cast<unsigned int>(k2)))
    );
    
    // Share intermediate computations when running Double-and-add on 2 parts
    for (int i = 0; i < max_bits; ++i) {
        if (k1 & (1 << i)) p1 += s;
        if (k2 & (1 << i)) p2 += s;
        s = s.Double();
    }

    return p1 + Point::Endo(p2);
}

MSM

GLV is essential for optimization of Multi-scalar multiplication (MSM) as it can be used to decompose MSM multiplications before running separate MSM's on the smaller parts. Moreover, when utilizing a bucket optimization technique (like Pippenger's) for MSM, if GLV is used before Signed Bucket Indexes, the number of buckets required per window is reduced to 14\frac{1}{4} of the original number.

References

Written by Ashley Jeong and ryan Kim of A41

Last updated