Pianist
Introduction
Pianist (Plonk vIA uNlimited dISTribution) is a distributed proof generation system based on Plonk.
Protocol Explanation
Notation
: number of machines
: size of the entire circuit
: size of each sub-circuit, where , a power of 2
Arithmetic Constraint System for Each Party
Each participant is responsible for a portion of the global circuit , and constructs a local constraint system to prove correctness of that sub-circuit.
This system consists of two main parts:
Gate Constraint — ensures that gate computations are correct
Copy Constraint — ensures that wire connections are consistent
Gate Constraint
Plonk targets fan-in 2 arithmetic circuits, where each gate has at most two inputs (left and right) and one output.
For each sub-circuit assigned to machine , the following univariate polynomials are defined for a circuit size :
: left input
: right input
: output
Each of these is expressed in the Lagrange basis as:
where is a Lagrange polynomial, and can be computed in time using the Number Theoretic Transform (NTT).
All gate operations are encoded in a single polynomial equation:
The coefficients vary depending on the gate type:
Addition Gate
Multiplication Gate
Public Input Gate
Finally, the gate constraint must hold at all evaluation points in the domain:
where .
Copy Constraint
One of the core ideas in Plonk is modeling wire consistency using a permutation argument. To enable this, the verifier sends two random field elements . The prover then constructs the following running product polynomial:
Where:
: actual wiring values
: permuted reference wiring
Here, , and are distinct non-residue constants.
What are non-residue constants? In general, when constructing permutation or lookup arguments over a finite field , non-residue constants are used to apply distinct "shifts". Specifically, if is a primitive -th root of unity generating a multiplicative subgroup , then any is considered a non-residue, or a coset shift. One can then define a new coset domain that is disjoint from .
The prover must prove the following holds:
This is enforced via the following constraints:
Quotient Polynomial
To consolidate all constraints into a single polynomial relation, we prove the existence of a polynomial satisfying:
where the vanishing polynomial over the evaluation domain is defined as:
Constraint System for Data-parallel Circuit
This section presents the core technique that aggregates locally proven gate and copy constraints from each machine into a single unified SNARK proof.
Assuming the sub-circuits are independent, the goal is to:
Aggregate local proof polynomials into a single bivariate polynomial, and
Transform the distributed proof system into a globally verifiable SNARK
Naive Approach
The most straightforward idea is to aggregate sub-polynomials using powers of :
A multiplication gate would then be expressed as:
This introduces cross-terms such as , making it difficult to reason about and distribute the proof efficiently.
Bivariate Lagrange Polynomial Aggregation
To address this, inspired by Caulk, we use Lagrange basis polynomials in :
Now, a multiplication gate becomes:
This removes any cross-terms and allows clean aggregation of constraints.
Global Constraint Definitions
Gate Constraint:
Copy Constraint:
Quotient Polynomial:
Verifier Strategy
The verifier checks that for every , the constraint holds at :
where is the vanishing polynomial for the -domain.
Constraint System for General Circuit
In the data-parallel setting, there are no connections between sub-circuits, so each machine can independently prove its portion. However, real-world circuits such as those in zkEVMs have highly interconnected gates.
In this general setting, the same wire value may appear across multiple sub-circuits, requiring a cross-machine copy constraint. To support this in a distributed setting, we must modify the permutation argument. The key idea is to include not just the position within a sub-circuit (X), but also the machine index (Y).
Modified Copy Constraint
The global permutation check requires each machine to construct a running product that satisfies:
Where:
: actual wire expression
: sorted reference wiring
Now, the final value of the running product for each machine,
is no longer guaranteed to be 1. Thus, the constraints must be modified as follows:
The master node then collects the final values from each machine and checks:
To do this, a new polynomial is constructed as:
This leads to the following constraints:
Quotient Polynomial
All constraints are combined into a single relation by proving the existence of such that:
where is the vanishing polynomial for the X-domain.
Global Constraint Definitions
Copy Constraints:
Here, and are defined as:
Quotient Polynomial: All constraints are bundled into a single bivariate relation using :
Polynomial IOP for Data-parallel and General Circuits
: Send oracles for
: Send random challenges
: Send oracles for
: Send random challenge
: Send oracle for
: Send random challenge
: Send oracle for
queries all oracles at and checks:
🔶 The terms marked in orange are only required in the General Circuit case.
The soundness of this protocol is formally proven in Appendix A.
Communication Analysis
Except for and , all other polynomials can be distributed across machines, with only their commitments needing to be sent.
For , it can be computed by the master node using the final values received from all machines.
For , the prover does not need to send the full polynomial. Instead, the prover collects evaluations at from each machine and reconstructs . For example:
Therefore, the overall communication complexity is .
Proving Time Analysis
Each prover computes and in
The master node computes and in
So the maximum latency per machine is
The total proving time across all machines is
This is a practical improvement over the original Plonk prover time complexity of .

Distributed KZG
The original KZG commitment scheme is designed for univariate polynomials. Since Pianist models the entire proof system as a bivariate polynomial , we need to extend KZG to support multiple variables. Furthermore, due to the distributed nature of the system, each machine must compute only a portion of the commitment and proof, while the master node assembles the full result.
Suppose we have the following bivariate polynomial:
Each machine holds a slice defined as:
Generates the public parameters :
Each prover computes:
The master node aggregates:
Each computes:
Then sends to
computes:
Then computes:
Sends to
The verifier parses and checks:
Proof:
The exponent in is
The exponent in is:
The exponent in is:
Robust Collaborative Proving System for Data-parallel Circuit
Pianist introduces a new proof model to ensure security even in malicious settings.
This is called the Robust Collaborative Proving System (RCPS) — a protocol that ensures the final proof can still be correctly constructed and verified even when some machines are faulty or adversarial.
📌 For technical details, see Section 5 of the paper.
Conclusion

While standard Plonk (on a single machine) hits a memory limit at 32 transactions, Pianist can scale up to 8192 transactions simply by increasing the number of machines. With 64 machines, Pianist is able to prove 8192 transactions in just 313 seconds. This demonstrates that the number of transactions included in a single proof scales proportionally with the number of machines. Moreover, the time taken by the master node to gather and finalize the proof is just 2–16 ms, which is negligible compared to the overall proving time. This highlights that communication overhead in Pianist's parallel architecture is minimal.

While Figure 1 shows the performance on data-parallel circuits (e.g., zkRollups), Figure 2 extends the evaluation to general circuits with arbitrary subcircuit connections. In all circuit sizes tested, the proving time decreases nearly linearly with the number of machines, demonstrating strong scalability even in general circuits. In smaller circuits, the marginal benefit of scaling beyond 4–8 machines may diminish, but for larger circuits, the benefits of parallelization become increasingly significant. In other words, Pianist performs even better on larger circuits, showcasing a desirable scalability property.

As shown above, the memory usage per machine decreases significantly as the number of machines increases. For small circuits, memory savings are minor since the total memory requirement is already low, but for large circuits, scaling the number of machines directly determines whether proof generation is even feasible.
In conclusion, Pianist transforms the original Plonk proving system into a fully distributed, scalable framework that:
Enables parallel proving of large circuits across multiple machines,
Maintains constant proof size, verifier time, and communication overhead (all ), and
Offers high practical utility for real-world zk systems like zkRollups and zkEVMs.
References
Written by ryan Kim from A41
Last updated