DIZK
Presentation: https://youtu.be/jbWOxNntmPQ
Introduction
Generating a zero-knowledge proof (ZKProof) is computationally expensive and involves significant memory overhead. DIZK addresses this by leveraging Apache Spark to generate Groth16 proofs across multiple machines in a distributed cluster environment.
Background
Please read Distributed FFT!
Protocol Explanation

Distributed Setup
In the Groth16 proving system, the setup phase is critical for generating the proving and verification keys. The most computationally intensive part of this process is evaluating the QAP polynomials at a specific point . DIZK is designed to perform this operation efficiently in a distributed environment using Spark.
Step 1: Distributed Lagrange Evaluation
As part of the QAP instance reduction, we must evaluate the Lagrange basis polynomials at the point :
Each executor computes for its assigned index in parallel, and stores the results as an Resilient Distributed Dataset (RDD).
Step 2: QAP Instance Reduction
Input: ,
Output: , where
Naive (Strawman) Approach:
Join with by index
Map each pair to the product
Reduce by to compute
This join operation would produce a structure like:
However, the actual matrix is only almost sparse. For example:
As a result, certain executors—such as those handling the first row in the example—become overloaded, causing out-of-memory errors or straggler tasks.
To address this imbalance, Spark offers techniques such as blockjoin
and skewjoin
.
Blockjoin replicates the smaller RDD (typically ) across all executors so that tasks can be executed evenly. This results in a join structure like:
Then each machine is assigned the set of pairs . While this approach ensures even data distribution, it significantly increases memory usage for the executor performing the join.
Carson Lee mentioned that while this approach resolves the join imbalance issue, it may also include unnecessary data in the final result.
Skewjoin identifies skewed keys and splits them across multiple partitions, replicating the corresponding RDD entries. However, since each is used only once, this ends up being equivalent to a naive join, offering no real benefit.
Solution: Hybrid Join Strategy
Phase 1: Identify Dense Rows
Tag any row with more than non-zero entries as a dense row
Let be the set of all such dense row indices
Phase 2: Execute Hybrid Join
Dense rows :
Join with
Map each pair to
Sparse rows :
Same join and map as strawman
Final aggregation:
Union both results:
Reduce by to compute each
Step 3: Distributed fixMSM
In the final part of the setup, the evaluated values must be converted into group elements on an elliptic curve. This is done using Fixed-base Multi-Scalar Multiplication (fixMSM).
Lookup Table Generation: Precompute scalar multiples of a fixed base point :
Broadcast: Distribute this lookup table to all executors
Parallel Evaluation: Each executor uses the lookup table to compute scalar multiplications for its assigned chunk
This method aligns well with Spark's execution model, ensuring balanced workload across executors. By precomputing and broadcasting the table, DIZK avoids redundant computation and achieves both memory and runtime efficiency.
Distributed Prover
In Groth16, the prover takes as input the proving key , public input , and witness , and generates the proof . This process consists of two main stages:
Generate the QAP witness vector based on and
Construct the proof by combining with through linear combinations
Step 1: QAP Witness Generation
In Groth16’s QAP-based construction, the witness corresponds to the coefficient vector of the rational function , defined as:
is a vanishing polynomial over domain , carefully chosen so that it does not vanish over an auxiliary domain .
The numerator consists of three composite polynomial terms, which are computationally intensive to evaluate.
To compute this efficiently, DIZK performs the following steps:
Evaluate the numerator polynomials over the domain
Convert the evaluations to coefficients via inverse FFT
Re-evaluate over a disjoint domain via forward FFT
Perform component-wise division by , followed by interpolation to recover
These FFT operations are implemented using Distributed FFT.
The most computationally demanding part is the evaluation of the first polynomial term:
Input: ,
Output:
Naive (Strawman) Approach:
Join with on key
Map each pair to the product
Reduce by to compute
This results in the same type of computational skew encountered during QAP instance reduction in the setup phase. However, the key difference here is that the bottleneck occurs across columns rather than rows.
Fortunately, the matrix 's sparsity structure is fixed for a given circuit, even though the witness vector changes with each input. Thus, DIZK can reuse the sparsity analysis from the setup phase to optimize this join.
Solution: Hybrid Join Strategy
Phase 1: Identify Dense Columns
Mark any column with non-zero entries as a dense column
Let denote the set of all such column indices
Phase 2: Hybrid Join Execution
Dense columns ():
Join with on key
Map to
Sparse columns ():
Same join and map as strawman
Merge and Reduce:
Combine both outputs:
Reduce by to obtain
Step 2: Distributed varMSM
In the final step of proof generation, the prover performs an inner product between a scalar vector and a vector of elliptic curve base points. DIZK distributes this Variable-base multi-scalar multiplication (varMSM) using Pippenger’s algorithm, which is well-suited for this scenario.
Chunk Distribution: The set of scalar-base point pairs is partitioned evenly across a set of executors
Local Computation: Each executor computes a partial result using Pippenger's locally:
Global Reduction: The driver aggregates the partial results from each executor:
This design allows DIZK to scale to large witness sizes and high-throughput proof generation with minimal overhead and excellent parallelism.
Conclusion

As shown in the figure above, we observe that increasing the number of executors allows the system to support larger instance sizes.

From the graph above, we can observe the following:
When the number of executors is fixed, the setup time and prover time increase linearly with the instance size.
When the instance size is fixed, the setup time and prover time decrease linearly as the number of executors increases.
DIZK shows that zkSNARKs, which were traditionally constrained by memory and computational limits, can be effectively scaled using distributed computing frameworks like Apache Spark. By parallelizing key components such as FFT, Lagrange evaluation, and MSM, DIZK enables proof generation for circuits with billions of constraints, significantly expanding the practical applicability of zero-knowledge proofs. Its modular design also provides reusable distributed primitives that can benefit other proof systems like STARKs.
References
Last updated