Pianist (Plonk vIA uNlimited dISTribution) is a distributed proof generation system based on Plonk.
Protocol Explanation
Notation
M: number of machines
N: size of the entire circuit
T: size of each sub-circuit, where T=MN, a power of 2
Arithmetic Constraint System for Each Party
Each participant is responsible for a portion Ci of the global circuit C, and constructs a local constraint system to prove correctness of that sub-circuit.
This system consists of two main parts:
Gate Constraint — ensures that gate computations are correct
Copy Constraint — ensures that wire connections are consistent
Gate Constraint
Plonk targets fan-in 2 arithmetic circuits, where each gate has at most two inputs (left and right) and one output.
For each sub-circuit Ci assigned to machine Pi, the following univariate polynomials are defined for a circuit size T:
ai(X): left input
bi(X): right input
oi(X): output
Each of these is expressed in the Lagrange basis as:
Finally, the gate constraint must hold at all evaluation points in the domain:
gi(X)=0∀X∈ΩX
where ΩX={ωX0,…,ωXT−1}.
Copy Constraint
One of the core ideas in Plonk is modeling wire consistency using a permutation argument. To enable this, the verifier sends two random field elements η,γ∈F. The prover then constructs the following running product polynomial:
Here, kA=1, and kB,kO are distinct non-residue constants.
What are non-residue constants?
In general, when constructing permutation or lookup arguments over a finite field F, non-residue constants are used to apply distinct "shifts". Specifically, if ω is a primitive n-th root of unity generating a multiplicative subgroup H⊂F, then any k∈/H is considered a non-residue, or a coset shift. One can then define a new coset domainkH={kh∣h∈H} that is disjoint from H.
The prover must prove the following holds:
zi(ωXT)=k=0∏T−1fi′(ωXk)fi(ωXk)=1
This is enforced via the following constraints:
pi,0(X):=L0(X)⋅(zi(X)−1)
pi,1(X):=zi(X)⋅fi(X)−zi(ωX⋅X)⋅fi′(X)
Quotient Polynomial
To consolidate all constraints into a single polynomial relation, we prove the existence of a polynomial hi(X) satisfying:
gi(X)+λ⋅pi,0(X)+λ2⋅pi,1(X)=VX(X)⋅hi(X)
where the vanishing polynomial over the evaluation domain is defined as:
VX(X)=XT−1
Constraint System for Data-parallel Circuit
This section presents the core technique that aggregates locally proven gate and copy constraints from each machine into a single unified SNARK proof.
Assuming the sub-circuits C0,C1,…,CM−1 are independent, the goal is to:
Aggregate local proof polynomials into a single bivariate polynomial, and
Transform the distributed proof system into a globally verifiable SNARK
Naive Approach
The most straightforward idea is to aggregate sub-polynomials using powers of Y:
where VY(Y)=YM−1 is the vanishing polynomial for the Y-domain.
Constraint System for General Circuit
In the data-parallel setting, there are no connections between sub-circuits, so each machine can independently prove its portion. However, real-world circuits such as those in zkEVMs have highly interconnected gates.
In this general setting, the same wire value may appear across multiple sub-circuits, requiring a cross-machine copy constraint. To support this in a distributed setting, we must modify the permutation argument. The key idea is to include not just the position within a sub-circuit (X), but also the machine index (Y).
Modified Copy Constraint
The global permutation check requires each machine to construct a running product that satisfies:
🔶 The terms marked in orange are only required in the General Circuit case.
The soundness of this protocol is formally proven in Appendix A.
Communication Analysis
Except for HY(Y,X) and W(Y), all other polynomials can be distributed across M machines, with only their commitments needing to be sent.
For W(Y), it can be computed by the master node using the final values (z0∗,…,zM−1∗) received from all machines.
For HY(Y,X), the prover does not need to send the full polynomial. Instead, the prover collects evaluations at X=α from each machine and reconstructs HY(Y,α). For example:
A(Y,α)=i=0∑M−1Ri(Y)⋅ai(α)
Therefore, the overall communication complexity is O(M).
Proving Time Analysis
Each prover Pi computes zi(X) and hi(X) in O(TlogT)
The master node P0 computes W(Y) and HY(Y,α) in O(MlogM)
So the maximum latency per machine is O(TlogT+MlogM)
The total proving time across all machines is O(NlogT+MlogM)
This is a practical improvement over the original Plonk prover time complexity of O(NlogN).
Distributed KZG
The original KZG commitment scheme is designed for univariate polynomials. Since Pianist models the entire proof system as a bivariate polynomialf(Y,X), we need to extend KZG to support multiple variables. Furthermore, due to the distributed nature of the system, each machine must compute only a portion of the commitment and proof, while the master node assembles the full result.
Suppose we have the following bivariate polynomial:
Robust Collaborative Proving System for Data-parallel Circuit
Pianist introduces a new proof model to ensure security even in malicious settings.
This is called the Robust Collaborative Proving System (RCPS) — a protocol that ensures the final proof can still be correctly constructed and verified even when some machines are faulty or adversarial.
While standard Plonk (on a single machine) hits a memory limit at 32 transactions, Pianist can scale up to 8192 transactions simply by increasing the number of machines. With 64 machines, Pianist is able to prove 8192 transactions in just 313 seconds. This demonstrates that the number of transactions included in a single proof scales proportionally with the number of machines. Moreover, the time taken by the master node to gather and finalize the proof is just 2–16 ms, which is negligible compared to the overall proving time. This highlights that communication overhead in Pianist's parallel architecture is minimal.
While Figure 1 shows the performance on data-parallel circuits (e.g., zkRollups), Figure 2 extends the evaluation to general circuits with arbitrary subcircuit connections. In all circuit sizes tested, the proving time decreases nearly linearly with the number of machines, demonstrating strong scalability even in general circuits. In smaller circuits, the marginal benefit of scaling beyond 4–8 machines may diminish, but for larger circuits, the benefits of parallelization become increasingly significant.
In other words, Pianist performs even better on larger circuits, showcasing a desirable scalability property.
As shown above, the memory usage per machine decreases significantly as the number of machines increases. For small circuits, memory savings are minor since the total memory requirement is already low, but for large circuits, scaling the number of machines directly determines whether proof generation is even feasible.
In conclusion, Pianist transforms the original Plonk proving system into a fully distributed, scalable framework that:
Enables parallel proving of large circuits across multiple machines,
Maintains constant proof size, verifier time, and communication overhead (all O(1)), and
Offers high practical utility for real-world zk systems like zkRollups and zkEVMs.