Pianist

Introduction

Pianist (Plonk vIA uNlimited dISTribution) is a distributed proof generation system based on Plonk.

Protocol Explanation

Notation

  • MM: number of machines

  • NN: size of the entire circuit

  • TT: size of each sub-circuit, where T=NMT = \frac{N}{M}, a power of 2

Arithmetic Constraint System for Each Party

Each participant is responsible for a portion CiC_i of the global circuit CC, and constructs a local constraint system to prove correctness of that sub-circuit.

This system consists of two main parts:

  1. Gate Constraint — ensures that gate computations are correct

  2. Copy Constraint — ensures that wire connections are consistent


Gate Constraint

Plonk targets fan-in 2 arithmetic circuits, where each gate has at most two inputs (left and right) and one output.

For each sub-circuit CiC_i assigned to machine PiP_i, the following univariate polynomials are defined for a circuit size TT:

  • ai(X)a_i(X): left input

  • bi(X)b_i(X): right input

  • oi(X)o_i(X): output

Each of these is expressed in the Lagrange basis as:

ai(X)=j=0T1ai,jLj(X)a_i(X) = \sum_{j=0}^{T-1} a_{i,j} L_j(X)

where Lj(X)L_j(X) is a Lagrange polynomial, and can be computed in O(TlogT)O(T \log T) time using the Number Theoretic Transform (NTT).

All gate operations are encoded in a single polynomial equation:

gi(X):=qa,i(X)ai(X)+qb,i(X)bi(X)+qo,i(X)oi(X)+qab,i(X)ai(X)bi(X)+qc,i(X)g_i(X) := q_{a,i}(X)a_i(X) + q_{b,i}(X)b_i(X) + q_{o,i}(X)o_i(X) + q_{ab,i}(X)a_i(X)b_i(X) + q_{c,i}(X)

The qq coefficients vary depending on the gate type:

  • Addition Gate

    (qa,i,qb,i,qo,i,qab,i,qc,i)=(1,1,1,0,0)gi(X)=ai(X)+bi(X)oi(X)(q_{a,i}, q_{b,i}, q_{o,i}, q_{ab,i}, q_{c,i}) = (1, 1, -1, 0, 0) \quad \Rightarrow \quad g_i(X) = a_i(X) + b_i(X) - o_i(X)
  • Multiplication Gate

    (qa,i,qb,i,qo,i,qab,i,qc,i)=(0,0,1,1,0)gi(X)=ai(X)bi(X)oi(X)(q_{a,i}, q_{b,i}, q_{o,i}, q_{ab,i}, q_{c,i}) = (0, 0, -1, 1, 0) \quad \Rightarrow \quad g_i(X) = a_i(X) b_i(X) - o_i(X)
  • Public Input Gate

    (qa,i,qb,i,qo,i,qab,i,qc,i)=(0,0,1,0,ini,j)gi(X)=oi(X)+qc,i(X)(q_{a,i}, q_{b,i}, q_{o,i}, q_{ab,i}, q_{c,i}) = (0, 0, -1, 0, \mathsf{in}_{i,j}) \quad \Rightarrow \quad g_i(X) = -o_i(X) + q_{c,i}(X)

Finally, the gate constraint must hold at all evaluation points in the domain:

gi(X)=0XΩXg_i(X) = 0 \quad \forall X \in \Omega_X

where ΩX={ωX0,,ωXT1}\Omega_X = \{ \omega_X^0, \dots, \omega_X^{T-1} \}.


Copy Constraint

One of the core ideas in Plonk is modeling wire consistency using a permutation argument. To enable this, the verifier sends two random field elements η,γF\eta, \gamma \in \mathbb{F}. The prover then constructs the following running product polynomial:

zi(ωXj)={1if j=0k=0j1fi(ωXk)fi(ωXk)otherwisez_i(\omega_X^j) = \begin{cases} 1 & \text{if } j = 0 \\ \prod_{k=0}^{j-1} \frac{f_i(\omega_X^k)}{f'_i(\omega_X^k)} & \text{otherwise} \end{cases}

Where:

  • fi(X)f_i(X): actual wiring values

    fi(X):=(ai(X)+ησa,i(X)+γ)(bi(X)+ησb,i(X)+γ)(oi(X)+ησc,i(X)+γ)f_i(X) := (a_i(X) + \eta \cdot \sigma_{a,i}(X) + \gamma)(b_i(X) + \eta \cdot \sigma_{b,i}(X) + \gamma)(o_i(X) + \eta \cdot \sigma_{c,i}(X) + \gamma)
  • fi(X)f'_i(X): permuted reference wiring

    fi(X):=(ai(X)+ηkAX+γ)(bi(X)+ηkBX+γ)(oi(X)+ηkOX+γ)f'_i(X) := (a_i(X) + \eta k_A X + \gamma)(b_i(X) + \eta k_B X + \gamma)(o_i(X) + \eta k_O X + \gamma)

Here, kA=1k_A = 1, and kB,kOk_B, k_O are distinct non-residue constants.

The prover must prove the following holds:

zi(ωXT)=k=0T1fi(ωXk)fi(ωXk)=1z_i(\omega_X^T) = \prod_{k=0}^{T-1} \frac{f_i(\omega_X^k)}{f'_i(\omega_X^k)} = 1

This is enforced via the following constraints:

pi,0(X):=L0(X)(zi(X)1)p_{i,0}(X) := L_0(X) \cdot (z_i(X) - 1)
pi,1(X):=zi(X)fi(X)zi(ωXX)fi(X)p_{i,1}(X) := z_i(X) \cdot f_i(X) - z_i(\omega_X \cdot X) \cdot f'_i(X)

Quotient Polynomial

To consolidate all constraints into a single polynomial relation, we prove the existence of a polynomial hi(X)h_i(X) satisfying:

gi(X)+λpi,0(X)+λ2pi,1(X)=VX(X)hi(X)g_i(X) + \lambda \cdot p_{i,0}(X) + \lambda^2 \cdot p_{i,1}(X) = V_X(X) \cdot h_i(X)

where the vanishing polynomial over the evaluation domain is defined as:

VX(X)=XT1V_X(X) = X^T - 1

Constraint System for Data-parallel Circuit

This section presents the core technique that aggregates locally proven gate and copy constraints from each machine into a single unified SNARK proof.

Assuming the sub-circuits C0,C1,,CM1C_0, C_1, \dots, C_{M-1} are independent, the goal is to:

  • Aggregate local proof polynomials into a single bivariate polynomial, and

  • Transform the distributed proof system into a globally verifiable SNARK


Naive Approach

The most straightforward idea is to aggregate sub-polynomials using powers of YY:

A(Y,X):=i=0M1Yiai(X)A(Y, X) := \sum_{i=0}^{M-1} Y^i a_i(X)

A multiplication gate would then be expressed as:

A(Y,X)B(Y,X)C(Y,X)=(i=0M1Yiai(X))(i=0M1Yibi(X))(i=0M1Yici(X))A(Y, X) B(Y, X) - C(Y, X) = \left( \sum_{i=0}^{M-1} Y^i a_i(X) \right) \left( \sum_{i=0}^{M-1} Y^i b_i(X) \right) - \left( \sum_{i=0}^{M-1} Y^i c_i(X) \right)

This introduces cross-terms such as Yi+jai(X)bj(X)Y^{i+j} a_i(X) b_j(X), making it difficult to reason about and distribute the proof efficiently.


Bivariate Lagrange Polynomial Aggregation

To address this, inspired by Caulk, we use Lagrange basis polynomials in YY:

A(Y,X):=i=0M1Ri(Y)ai(X)A(Y, X) := \sum_{i=0}^{M-1} R_i(Y) a_i(X)

Now, a multiplication gate becomes:

A(Y,X)B(Y,X)C(Y,X)=(i=0M1Ri(Y)ai(X))(i=0M1Ri(Y)bi(X))(i=0M1Ri(Y)ci(X))A(Y, X) B(Y, X) - C(Y, X) = \left( \sum_{i=0}^{M-1} R_i(Y) a_i(X) \right) \left( \sum_{i=0}^{M-1} R_i(Y) b_i(X) \right) - \left( \sum_{i=0}^{M-1} R_i(Y) c_i(X) \right)

This removes any cross-terms and allows clean aggregation of constraints.


Global Constraint Definitions

  • Gate Constraint:

G(Y,X):=Qa(Y,X)A(Y,X)+Qb(Y,X)B(Y,X)+Qo(Y,X)O(Y,X)+Qab(Y,X)A(Y,X)B(Y,X)+Qc(Y,X)G(Y, X) := Q_a(Y, X) A(Y, X) + Q_b(Y, X) B(Y, X) + Q_o(Y, X) O(Y, X) + Q_{ab}(Y, X) A(Y, X) B(Y, X) + Q_c(Y, X)
  • Copy Constraint:

P0(Y,X):=L0(X)(Z(Y,X)1)P_0(Y, X) := L_0(X) \cdot (Z(Y, X) - 1)
P1(Y,X):=Z(Y,X)S{A,B,O}(S(Y,X)+ησs(Y,X)+γ)Z(Y,ωXX)S{A,B,O}(S(Y,X)+ηksX+γ)P_1(Y, X) := Z(Y, X) \prod_{S \in \{A, B, O\}} (S(Y, X) + \eta \cdot \sigma_s(Y, X) + \gamma) - Z(Y, \omega_X X) \prod_{S \in \{A, B, O\}} (S(Y, X) + \eta \cdot k_s X + \gamma)
  • Quotient Polynomial:

G(Y,X)+λP0(Y,X)+λ2P1(Y,X)=VX(X)HX(Y,X)G(Y, X) + \lambda P_0(Y, X) + \lambda^2 P_1(Y, X) = V_X(X) \cdot H_X(Y, X)

Verifier Strategy

The verifier checks that for every i[0,M1]i \in [0, M-1], the constraint holds at Y=ωYiY = \omega_Y^i:

G(Y,X)+λP0(Y,X)+λ2P1(Y,X)VX(X)HX(Y,X)=VY(Y)HY(Y,X)G(Y, X) + \lambda P_0(Y, X) + \lambda^2 P_1(Y, X) - V_X(X) H_X(Y, X) = V_Y(Y) H_Y(Y, X)

where VY(Y)=YM1V_Y(Y) = Y^M - 1 is the vanishing polynomial for the YY-domain.

Constraint System for General Circuit

In the data-parallel setting, there are no connections between sub-circuits, so each machine can independently prove its portion. However, real-world circuits such as those in zkEVMs have highly interconnected gates.

In this general setting, the same wire value may appear across multiple sub-circuits, requiring a cross-machine copy constraint. To support this in a distributed setting, we must modify the permutation argument. The key idea is to include not just the position within a sub-circuit (X), but also the machine index (Y).


Modified Copy Constraint

The global permutation check requires each machine to construct a running product that satisfies:

i=0M1j=0T1fi(ωXj)fi(ωXj)=?1\prod_{i=0}^{M-1} \prod_{j=0}^{T-1} \frac{f_i(\omega_X^j)}{f'_i(\omega_X^j)} \stackrel{?}{=} 1

Where:

  • fi(X)f_i(X): actual wire expression

    fi(X)=(ai(X)+ηYδY,a,i(X)+ηXδX,a,i(X)+γ)(bi(X)+ηYδY,b,i(X)+ηXδX,b,i(X)+γ)(oi(X)+ηYδY,o,i(X)+ηXδX,o,i(X)+γ)\begin{aligned} f_i(X) = & (a_i(X) + \eta_Y \delta_{Y,a,i}(X) + \eta_X \delta_{X,a,i}(X) + \gamma) \\ & (b_i(X) + \eta_Y \delta_{Y,b,i}(X) + \eta_X \delta_{X,b,i}(X) + \gamma) \\ & (o_i(X) + \eta_Y \delta_{Y,o,i}(X) + \eta_X \delta_{X,o,i}(X) + \gamma) \end{aligned}
  • fi(X)f'_i(X): sorted reference wiring

    fi(X)=(ai(X)+ηYY+ηXkAX+γ)(bi(X)+ηYY+ηXkBX+γ)(oi(X)+ηYY+ηXkOX+γ)\begin{aligned} f'_i(X) = & (a_i(X) + \eta_Y Y + \eta_X k_A X + \gamma) \\ & (b_i(X) + \eta_Y Y + \eta_X k_B X + \gamma) \\ & (o_i(X) + \eta_Y Y + \eta_X k_O X + \gamma) \end{aligned}

Now, the final value of the running product for each machine,

zi=zi(ωXT1)fi(ωXT1)fi(ωXT1)z_i^* = z_i(\omega_X^{T-1}) \cdot \frac{f_i(\omega_X^{T-1})}{f'_i(\omega_X^{T-1})}

is no longer guaranteed to be 1. Thus, the constraints must be modified as follows:

pi,0(X):=L0(X)(zi(X)1)p_{i,0}(X) := L_0(X) \cdot (z_i(X) - 1)
pi,1(X):=(1LT1(X))(zi(X)fi(X)zi(ωXX)fi(X))p_{i,1}(X) := (1 - L_{T-1}(X)) \cdot (z_i(X) f_i(X) - z_i(\omega_X X) f'_i(X))

The master node then collects the final values from each machine (z0,,zM1)(z_0^*, \dots, z_{M-1}^*) and checks:

i=0M1j=0T1fi(ωXj)fi(ωXj)=i=0M1zi=?1\prod_{i=0}^{M-1} \prod_{j=0}^{T-1} \frac{f_i(\omega_X^j)}{f'_i(\omega_X^j)} = \prod_{i=0}^{M-1} z_i^* \stackrel{?}{=} 1

To do this, a new polynomial zlast(X)z_{\mathsf{last}}(X) is constructed as:

zlast(ωYi)=ωi={1if i=0k=0i1zkotherwisez_{\mathsf{last}}(\omega_Y^i) = \omega_i = \begin{cases} 1 & \text{if } i = 0 \\ \prod_{k=0}^{i-1} z_k^* & \text{otherwise} \end{cases}

This leads to the following constraints:

pi,2(X):=ω01p_{i,2}(X) := \omega_0 - 1
pi,3(X):=LT1(X)(ωizi(X)fi(X)ω(i+1)modMfi(X))p_{i,3}(X) := L_{T-1}(X) \cdot (\omega_i z_i(X) f_i(X) - \omega_{(i+1) \mod M} f_i'(X))

Quotient Polynomial

All constraints are combined into a single relation by proving the existence of hi(X)h_i(X) such that:

gi(X)+λpi,0(X)+λ2pi,1(X)+λ3pi,2(X)=VX(X)hi(X)g_i(X) + \lambda p_{i,0}(X) + \lambda^2 p_{i,1}(X) + \lambda^3 p_{i,2}(X) = V_X(X) \cdot h_i(X)

where VX(X)=XT1V_X(X) = X^T - 1 is the vanishing polynomial for the X-domain.


Global Constraint Definitions

  • Copy Constraints:

P0(Y,X):=L0(X)(Z(Y,X)1)P1(Y,X):=(1LT1(X))[Z(Y,X)F(Y,X)Z(Y,ωXX)F(Y,X)]P2(Y):=R0(Y)(W(Y)1)P3(Y,X):=LT1(X)[W(Y)Z(Y,X)F(Y,X)W(ωYY)F(Y,X)]\begin{aligned} P_0(Y, X) &:= L_0(X) \cdot (Z(Y, X) - 1) \\ P_1(Y, X) &:= (1 - L_{T-1}(X)) \cdot \left[ Z(Y, X) F(Y, X) - Z(Y, \omega_X X) F'(Y, X) \right] \\ P_2(Y) &:= R_0(Y) \cdot (W(Y) - 1) \\ P_3(Y, X) &:= L_{T-1}(X) \cdot \left[ W(Y) Z(Y, X) F(Y, X) - W(\omega_Y Y) F'(Y, X) \right] \end{aligned}

Here, F(Y,X)F(Y, X) and F(Y,X)F'(Y, X) are defined as:

F(Y,X):=S{A,B,O}(S(Y,X)+ηYδY,S(Y,X)+ηXδX,S(Y,X)+γ)F(Y, X) := \prod_{S \in \{A, B, O\}} (S(Y, X) + \eta_Y \delta_{Y,S}(Y, X) + \eta_X \delta_{X,S}(Y, X) + \gamma)
F(Y,X):=S{A,B,O}(S(Y,X)+ηYY+ηXkSX+γ)F'(Y, X) := \prod_{S \in \{A, B, O\}} (S(Y, X) + \eta_Y Y + \eta_X k_S X + \gamma)
  • Quotient Polynomial: All constraints are bundled into a single bivariate relation using HY(Y,X)H_Y(Y, X):

G(Y,X)+λP0(Y,X)+λ2P1(Y,X)+λ3P2(Y)+λ4P3(Y,X)VX(X)HX(Y,X)=VY(Y)HY(Y,X)G(Y, X) + \lambda P_0(Y, X) + \lambda^2 P_1(Y, X) + \lambda^3 P_2(Y) + \lambda^4 P_3(Y, X) - V_X(X) H_X(Y, X) = V_Y(Y) H_Y(Y, X)

Polynomial IOP for Data-parallel and General Circuits

  1. PV\mathcal{P} \to \mathcal{V}: Send oracles for {A(Y,X),B(Y,X),C(Y,X)}\{A(Y, X), B(Y, X), C(Y, X)\}

  2. VP\mathcal{V} \to \mathcal{P}: Send random challenges ηY,ηX,γ\textcolor{orange}{\eta_Y}, \eta_X, \gamma

  3. PV\mathcal{P} \to \mathcal{V}: Send oracles for {Z(Y,X),W(Y)}\{Z(Y, X), \textcolor{orange}{W(Y)}\}

  4. VP\mathcal{V} \to \mathcal{P}: Send random challenge λ\lambda

  5. PV\mathcal{P} \to \mathcal{V}: Send oracle for HX(Y,X)H_X(Y, X)

    HX(Y,X)=i=0M1Ri(Y)gi(X)+λpi,0(X)+λ2pi,1(X)+λ4pi,3(X)VX(X)H_X(Y, X) = \sum_{i=0}^{M-1} R_i(Y) \cdot \frac{g_i(X) + \lambda p_{i,0}(X) + \lambda^2 p_{i,1}(X) + \textcolor{orange}{\lambda^4 p_{i,3}(X)}}{V_X(X)}
  6. VP\mathcal{V} \to \mathcal{P}: Send random challenge α\alpha

  7. PV\mathcal{P} \to \mathcal{V}: Send oracle for HY(Y,α)H_Y(Y, \alpha)

    HY(Y,α)=G(Y,α)+λP0(Y,α)+λ2P1(Y,α)+λ3P2(Y)+λ4P3(Y,α)VX(α)HX(Y,α)VY(Y)H_Y(Y, \alpha) = \frac{G(Y, \alpha) + \lambda P_0(Y, \alpha) + \lambda^2 P_1(Y, \alpha) + \textcolor{orange}{\lambda^3 P_2(Y) + \lambda^4 P_3(Y, \alpha)} - V_X(\alpha) H_X(Y, \alpha)}{V_Y(Y)}
  8. V\mathcal{V} queries all oracles at X=α,Y=βX = \alpha, Y = \beta and checks:

    G(β,α)+λP0(β,α)+λ2P1(β,α)+λ3P2(β)+λ4P3(β,α)VX(α)HX(β,α)=?VY(β)HY(β,α)G(\beta, \alpha) + \lambda P_0(\beta, \alpha) + \lambda^2 P_1(\beta, \alpha) + \textcolor{orange}{\lambda^3 P_2(\beta) + \lambda^4 P_3(\beta, \alpha)} - V_X(\alpha) H_X(\beta, \alpha) \stackrel{?}{=} V_Y(\beta) H_Y(\beta, \alpha)

🔶 The terms marked in orange are only required in the General Circuit case.

The soundness of this protocol is formally proven in Appendix A.


Communication Analysis

Except for HY(Y,X)H_Y(Y, X) and W(Y)W(Y), all other polynomials can be distributed across MM machines, with only their commitments needing to be sent.

  • For W(Y)W(Y), it can be computed by the master node using the final values (z0,,zM1)(z_0^*, \dots, z_{M-1}^*) received from all machines.

  • For HY(Y,X)H_Y(Y, X), the prover does not need to send the full polynomial. Instead, the prover collects evaluations at X=αX = \alpha from each machine and reconstructs HY(Y,α)H_Y(Y, \alpha). For example:

    A(Y,α)=i=0M1Ri(Y)ai(α)A(Y, \alpha) = \sum_{i=0}^{M-1} R_i(Y) \cdot a_i(\alpha)

Therefore, the overall communication complexity is O(M)O(M).


Proving Time Analysis

  • Each prover PiP_i computes zi(X)z_i(X) and hi(X)h_i(X) in O(TlogT)O(T \log T)

  • The master node P0P_0 computes W(Y)W(Y) and HY(Y,α)H_Y(Y, \alpha) in O(MlogM)O(M \log M)

  • So the maximum latency per machine is O(TlogT+MlogM)O(T \log T + M \log M)

  • The total proving time across all machines is O(NlogT+MlogM)O(N \log T + M \log M)

This is a practical improvement over the original Plonk prover time complexity of O(NlogN)O(N \log N).

Distributed KZG

The original KZG commitment scheme is designed for univariate polynomials. Since Pianist models the entire proof system as a bivariate polynomial f(Y,X)f(Y, X), we need to extend KZG to support multiple variables. Furthermore, due to the distributed nature of the system, each machine must compute only a portion of the commitment and proof, while the master node assembles the full result.

Suppose we have the following bivariate polynomial:

f(Y,X)=i=0M1j=0T1fi,jRi(Y)Lj(X)(where fi,j=fi(ωj))f(Y, X) = \sum_{i=0}^{M-1} \sum_{j=0}^{T-1} f_{i,j} R_i(Y) L_j(X) \quad \text{(where } f_{i,j} = f_i(\omega^j) \text{)}

Each machine holds a slice fi(X)f_i(X) defined as:

fi(X)=j=0T1fi,jLj(X)f_i(X) = \sum_{j=0}^{T-1} f_{i,j} L_j(X)

DKZG.KeyGen(1λ,M,T)\mathsf{DKZG.KeyGen}(1^\lambda, M, T)

Generates the public parameters pp\mathsf{pp}:

pp=([1]1,(([Ri(τY)Lj(τX)]1)i=0M1)j=0T1,[1]2,[τX]2,[τY]2)\mathsf{pp} = \left( [1]_1, \left(\left( [R_i(\tau_Y) L_j(\tau_X)]_1 \right)_{i=0}^{M-1}\right) _{j=0}^{T-1}, [1]_2, [\tau_X]_2, [\tau_Y]_2 \right)

DKZG.Commit(f,pp)\mathsf{DKZG.Commit}(f, \mathsf{pp})

Each prover Pi\mathcal{P}_i computes:

comfi=j=0T1fi,j[Ri(τY)Lj(τX)]1\mathsf{com}_{f_i} = \sum_{j=0}^{T-1} f_{i,j} \cdot [R_i(\tau_Y) L_j(\tau_X)]_1

The master node P0\mathcal{P}_0 aggregates:

comf=i=0M1comfi=[f(τY,τX)]1\mathsf{com}_f = \sum_{i=0}^{M-1} \mathsf{com}_{f_i} = [f(\tau_Y, \tau_X)]_1

DKZG.Open(f,β,α,pp)\mathsf{DKZG.Open}(f, \beta, \alpha, \mathsf{pp})

  1. Each Pi\mathcal{P}_i computes:

    • fi(α)f_i(\alpha)

    • q0(i)(X)=fi(X)fi(α)Xαq_0^{(i)}(X) = \frac{f_i(X) - f_i(\alpha)}{X - \alpha}

    • π0(i)=[Ri(τY)q0(i)(τX)]1\pi_0^{(i)} = [R_i(\tau_Y) \cdot q_0^{(i)}(\tau_X)]_1

    Then sends fi(α),π0(i)f_i(\alpha), \pi_0^{(i)} to P0\mathcal{P}_0

  2. P0\mathcal{P}_0 computes:

    • π0=i=0M1π0(i)\pi_0 = \sum_{i=0}^{M-1} \pi_0^{(i)}

    • f(Y,α)=i=0M1Ri(Y)fi(α)f(Y, \alpha) = \sum_{i=0}^{M-1} R_i(Y) f_i(\alpha)

  3. Then computes:

    • f(β,α)f(\beta, \alpha)

    • q1(Y)=f(Y,α)f(β,α)Yβq_1(Y) = \frac{f(Y, \alpha) - f(\beta, \alpha)}{Y - \beta}

    • π1=[q1(τY)]1\pi_1 = [q_1(\tau_Y)]_1

    Sends (z=f(β,α),πf=(π0,π1))(z = f(\beta, \alpha), \pi_f = (\pi_0, \pi_1)) to V\mathcal{V}


DKZG.Verify(comf,β,α,z,πf,pp)\mathsf{DKZG.Verify}(\mathsf{com}_f, \beta, \alpha, z, \pi_f, \mathsf{pp})

The verifier V\mathcal{V} parses πf=(π0,π1)\pi_f = (\pi_0, \pi_1) and checks:

e(comf[z]1,[1]2)=?e(π0,[τXα]2)e(π1,[τYβ]2)e(\mathsf{com}_f - [z]_1, [1]_2) \stackrel{?}{=} e(\pi_0, [\tau_X - \alpha]_2) \cdot e(\pi_1, [\tau_Y - \beta]_2)

Proof:

  • The exponent in e(comf[z]1,[1]2)e(\mathsf{com}_f - [z]_1, [1]_2) is f(τY,τX)f(β,α)f(\tau_Y, \tau_X) - f(\beta, \alpha)

  • The exponent in e(π0,[τXα]2)e(\pi_0, [\tau_X - \alpha]_2) is:

    i=0M1Ri(τY)(fi(τX)fi(α))=f(τY,τX)f(τY,α)\sum_{i=0}^{M-1} R_i(\tau_Y) \cdot (f_i(\tau_X) - f_i(\alpha)) = f(\tau_Y, \tau_X) - f(\tau_Y, \alpha)
  • The exponent in e(π1,[τYβ]2)e(\pi_1, [\tau_Y - \beta]_2) is:

    f(τY,α)f(β,α)f(\tau_Y, \alpha) - f(\beta, \alpha)

Robust Collaborative Proving System for Data-parallel Circuit

Pianist introduces a new proof model to ensure security even in malicious settings.

This is called the Robust Collaborative Proving System (RCPS) — a protocol that ensures the final proof can still be correctly constructed and verified even when some machines are faulty or adversarial.

📌 For technical details, see Section 5 of the paper.

Conclusion

While standard Plonk (on a single machine) hits a memory limit at 32 transactions, Pianist can scale up to 8192 transactions simply by increasing the number of machines. With 64 machines, Pianist is able to prove 8192 transactions in just 313 seconds. This demonstrates that the number of transactions included in a single proof scales proportionally with the number of machines. Moreover, the time taken by the master node to gather and finalize the proof is just 2–16 ms, which is negligible compared to the overall proving time. This highlights that communication overhead in Pianist's parallel architecture is minimal.


While Figure 1 shows the performance on data-parallel circuits (e.g., zkRollups), Figure 2 extends the evaluation to general circuits with arbitrary subcircuit connections. In all circuit sizes tested, the proving time decreases nearly linearly with the number of machines, demonstrating strong scalability even in general circuits. In smaller circuits, the marginal benefit of scaling beyond 4–8 machines may diminish, but for larger circuits, the benefits of parallelization become increasingly significant. In other words, Pianist performs even better on larger circuits, showcasing a desirable scalability property.


As shown above, the memory usage per machine decreases significantly as the number of machines increases. For small circuits, memory savings are minor since the total memory requirement is already low, but for large circuits, scaling the number of machines directly determines whether proof generation is even feasible.


In conclusion, Pianist transforms the original Plonk proving system into a fully distributed, scalable framework that:

  • Enables parallel proving of large circuits across multiple machines,

  • Maintains constant proof size, verifier time, and communication overhead (all O(1)O(1)), and

  • Offers high practical utility for real-world zk systems like zkRollups and zkEVMs.

References

Written by ryan Kim from A41

Last updated