is a powerful tool that can be tweaked to support custom gates () and lookup tables (). However, as circuits grow larger, FFT becomes a bottleneck. modifies Plonk by using multivariate polynomials over a boolean hypercube. This achieves two main advantages:
Eliminating the need for FFT.
Enabling efficient computation of high-degree custom gates without significant overhead, resulting in faster proving times.
Background
Why is FFT a bottleneck?
Modern SNARK
Polynomial IOP (Interactive Oracle Proof) refers to a verification method where the prover provides an oracle to the verifier. The verifier interacts by sampling random points, and in the final round, accesses the oracle to verify computations on these random points.
Here, the oracle can be thought of as a polynomial. However, since polynomials often have high degrees, they are not succinct. To address this, a PCS (Polynomial Commitment Scheme) is used. Instead of sending the full polynomial, the prover sends a commitment that represents the polynomial. During the final round of verification, the prover can reveal (or "open") the evaluations at the random points along with a proof, enabling succinct verification. This combination allows for succinct SNARKs.
Modern SNARKs are built by combining Polynomial IOP with PCS, structured as follows:
When is FFT used in Plonk?
A
B
C
1
1
2
1
2
3
2
3
5
3
5
8
For simplicity, the Wiring Check (Permutation Argument) is omitted in this example, as the focus is on why FFT is used.
The prover commits to the polynomials A, B, and C to assert that the following relationship holds. This step uses PCS.BatchCommit and corresponds to Oracle:
High-degree custom gates in Plonk
The verifier samples a random y.
Inverse FFT → High Degree FFT → Gate Evaluation → High Degree Inverse FFT.
To validate the proof, the verifier performs the following test. If the test is passed, the verifier accepts the proof:
From the above, the approximate total execution time for the Plonk prover can be calculated as follows:
Protocol Explanation
This image is taken from the paper, and describes HyperPlonk and HyperPlonk+.
ZeroCheck → SumCheck
In HyperPlonk, the way constraints are imposed is not fundamentally different from Plonk. For example, to represent an addition gate, the following expression can be used:
Plonk uses a ZeroCheck followed by a QuotientCheck, as shown below:
HyperPlonk instead uses a ZeroCheck followed by a SumCheck:
High-degree custom gates in HyperPlonk
A
B
C
(0, 0)
1
1
2
(0, 1)
1
2
3
(1, 0)
2
3
5
(1, 1)
3
5
8
Why does using SumCheck eliminate the overhead for high-degree custom gates? Using the same example as above, the prover's claim can be expressed as follows:
In the SumCheck protocol, the univariate polynomial sent by the prover to the verifier in each round has a degree that is less than or equal to the degree of the multivariate polynomial. Thus, the prover only needs to send univariate polynomials with a maximum degree equal to the degree of the custom gate + 1.
Consider how a univariate polynomial can be constructed from a 1st-degree multivariate polynomial. For a 1st-degree polynomial in 2 variables, it can be expressed as:
ProductCheck
f(X)
g(X)
v(X)
Wiring Identity → MultiSetEquality
MultiSetEquality can now be replaced by the ProductCheck protocol described earlier. This enables efficient verification using the established steps for product consistency.
For handling Permutation PIOP on small fields, Section 3.6 of the paper provides a detailed explanation. Interested readers are encouraged to explore that section for additional insights.
This "shift through the domain" is achieved using a shift function, which for univariate polynomials is defined as:
For multivariate polynomials, the shift function can be represented using a mechanism similar to the one illustrated below (also from the paper):
This cyclically shifts all non-zero points in the boolean hypercube.
The XOR operation can be algebraically expressed as:
Using this linearized shift function, MultiSetEquality Check can be efficiently performed. By leveraging the defined Polynomial IOP, this shift function enables the protocol to verify lookup arguments in HyperPlonk+ without significant overhead.
Conclusion
We have explored how HyperPlonk operates through examples. Unlike Plonk, HyperPlonk introduces a method to leverage the boolean hypercube for proofs. This approach eliminates the computational overhead caused by FFTs as the maximum degree of custom gates increases, resulting in faster proof generation.
Graph b above and the video screenshot both support this observation. While not covered, HyperPlonk has also contributed to the optimization of multivariate polynomial commitments, as seen in systems like Orion+. Despite these significant contributions, the Conclusions and Open Problems section of the paper highlights several areas for future work, introducing ideas for continued improvements and research.
The diagram above illustrates the FFT process for a polynomial of size 8. First, as shown on the left side of the diagram, a operation is performed. Then, operations are executed at each stage. A bit reversal operation takes O(N), and a butterfly operation also requires O(N). Since the number of stages is logN, the overall complexity is O(N⋅logN). Furthermore, FFT is notoriously difficult to parallelize. This is because, as the stages progress, the memory locations required to compute the outputs for the next stage become increasingly scattered. If calculations must be performed on hardware like GPUs with limited memory, these constraints can be critical.
Let's walk through a simple circuit as an example. The circuit above represents a Fibonacci sequence starting with (1,1). Each column in the circuit is interpreted as a univariate polynomial. Since Plonk implements Univariate IOP + Univariate PCS, the protocol proceeds as follows:
PCS.BatchCommit({A,B,C})→{CA,CB,CC}
After Poly IOP for Wiring Identity, to assert that the gate identity is satisfied, the prover commits to Q. Since t is a known polynomial (pre-shared between prover and verifier), it does not require a separate commitment. This step corresponds to Poly IOP for Gate Identity → Poly IOP for Quotient Check:
PCS.Commit(Q)→CQ where A(X)+B(X)−C(X)=t(X)⋅Q(X) and t(X)=ω4−1
The verifier samples a random x from the entire field space.
Since x is sampled randomly from the field, the prover must interpolate the polynomials A,B, and C to their Lagrange basis using IFFT:
This section is based on the. Custom gates enable representation of complex operations that cannot easily be expressed with simple addition and multiplication. Let’s explore how they are implemented in Plonk.
Consider a polynomial of degree d (or a table with d+1 rows). In the earlier example, we dealt with a simple addition operation, where the degree of A+B−C was d. This simplicity made computation straightforward.
When dealing with more complex circuits, custom gates can be represented as fi, with selectors si. For a given domain H, each custom gate fi must satisfy:
si(X)⋅fi(X)=t(X)⋅Qi(X), ∀X∈H,where si(X)={10if fi(X) is enabledotherwise
Checking each gate fi individually is inefficient for the verifier. Instead, we use a common ZK technique: random linear combination. Starting from step 2 of the protocol:
Using y, the prover combines all custom gates into a single equation:
i=0∑m−1yi⋅si(X)⋅fi(X)=t(X)⋅Q(X), ∀X∈H
Here, m is the number of gates.
The degree of Q is calculated as:
Then the degree of Q is n⋅d−1, where n is the max degree of the custom gates. (i.e., the maximum number of polynomial multiplications). Here, n is 3. Since the degree of Q can be much higher than d, it needs to be decomposed into n smaller polynomials qi to fit within the commitment scheme:
Before committing to qi, the prover must compute fi. For high-degree gates, this involves the following:
The degree becomes n⋅d for the sections highlighted in green, which significantly slows down computation. If the n of the custom gate is set too small, flexibility to create a diverse range of computations decreases. On the other hand, if n is set too large, it introduces significant overhead in High-Degree FFT, Gate Evaluation, and High-Degree Inverse FFT. Therefore, n must be set to an appropriate value.. On the other hand, there is no such limitation in Halo2.
In summary, in a simple protocol as discussed earlier, Q could be committed before conducting the Inverse FFT. However, when custom gates are introduced, committing Q requires going through the following steps:
The verifier checks whether the following equation holds at the random point x. Here, si is assumed to be a polynomial already known to both the prover and the verifier:
If n is treated as a constant, the SumCheck protocol allows ZeroCheck to be performed in linear time. For reference, the complexity calculation here differs from the one provided in the paper. For more details, refer to Section 3.1 (Computing SumCheck for high-degree polynomials) and Section 3.8 of the original paper, which introduce further optimizations for the SumCheck protocol.
ProductCheck is a method to verify whether the product of a polynomial over a boolean hypercube equals a constant c:
X∈{0,1}μ∏h(X)=c, where h(X)=g(X)f(X)
For univariate polynomials, this is typically computed using a running product column, as explained in . For multivariate polynomials, the process involves the following steps:
Define v to satisfy the following conditions:
v(0,X)=h(X),v(1,X)=v(X,0)⋅v(X,1), where v(1,…,1)=0
This allows v to be computed as shown in the diagram below:
Using h^, perform ZeroCheck → SumCheck as described earlier. Since h^ evaluates to zero for all points in {0,1}3, this check ensures the validity of v.
Finally, the verifier queries v at (1,…,1,0) to ensure the products of h are equal to c.
In univariate polynomials, verifying Wiring Identity involves constructing siσ as shown in the figure below. The green arrows denote how the label is moved.
This is then transformed into a MultiSetEquality using a random value α:
The equation above, taken from the Plookup paper, shows how values fi,si,ti+1,si+1 are queried at both the i-th and i+1-th indices. For univariate polynomials in Plonk, this can be easily expressed as:
next(X)=ω⋅X
Consider g4(b1,b2,b3,b4) in the context of a 4-dimensional boolean hypercube. The function g4 operates as follows:
This reduces the complexity, allowing g4 to be expressed as a linear function rather than a quadratic one. This is depicted in the formula below:
This also implies that the domain in the circuit must be represented differently. Specifically, the point (0,0) cannot be included because it is not cyclic. Instead, the cyclic domain should exclude (0,0) and follow this order in the case of 4 or 8 rows:
The figure above is a capture from the presentation by Benedikt Bünz at ZK Summit 8 ():