GKR

Presentation: https://youtu.be/OTCxQ9qIzDY

Overview

Although interactive proofs like Sumcheck enabled the verifier to efficiently check the validity of solutions to problems in complexity classes beyond NP, real world computing entities cannot solve a large instance of problems beyond NP practically.

Still, it can be pretty useful regarding more solvable problems in the real world (even for the verifier) if the verifier's cost is much less than solving it by itself and the prover's work is not increased asymptotically compared to what it takes to merely solve it.

Let C\mathcal{C} be the circuit in layered form whose number of gates (or circuit size) is SS and circuit depth is dd. The GKR protocol is important in a sense that :

  • P\mathcal{P}-cost is almost O(S)O(S); it is not much more than the cost to solve the circuit.

  • V\mathcal{V}-cost is O(dlogS)O(d \log S); it's even less than just reading through the entire circuit.

    • Note: This needs the assumption that the circuit is logspace uniform, which implies that it has a succinct implicit description. That's why V\mathcal{V} can verify even without looking into the entire C\mathcal{C}.

  • It can function as a general-purpose protocol for any P\mathcal{P} and V\mathcal{V} to run an IP on any arithmetic circuit evaluation instance.

The protocol starts from P\mathcal{P} sending the claimed output value to V\mathcal{V} as the first message.

In a high level view, the protocol iterates from the first layer (output layer) to the bottom (input layer), where in each iteration ii the goal is to reduce the claim of the output values of the gates in layer ii to the claim of the output values of the gates in layer i+1i+1.

Going through all the way down, the first claim is reduced to that of input values, which are known to V\mathcal{V} and thus can be checked by itself. Reduction in each iteration is done by the sumcheck protocol.

When a circuit is in layered form, it means every leaf appears in the last layer (at depth dd). For a circuit that is not layered, there is a transformation with a blowup in size of factor dd.


Protocol details

Let C\mathcal{C} be a fan-in 2 circuit over a finite field F\mathbb{F} in a layered form, with

  • SS : the size of the circuit (number of total gates)

  • SiS_i : the number of gates in layer ii (hence each gates in layer ii is labeled by 0,,Si10, \dots, S_i - 1)

  • dd : the depth

  • nn : the input length (i.e. SdS_d)

and each gate is either addition or multiplication.

Without loss of generality let Si=2kiS_i = 2^{k_i}. Let Wi:{0,1}kiFW_i : \{0,1\}^{k_i} \rightarrow \mathbb{F} to be a mapping which takes as input the binary representation of gate labels in layer ii and gives the gate output.

Define "wiring predicate" in1,i,in2,i:{0,1}ki{0,1}ki+1\mathsf{in}_{1,i}, \mathsf{in}_{2,i} : \{0,1\}^{k_i} \rightarrow \{0,1\}^{k_{i+1}}, which takes as input the gate label and gives the label of one of its input gates in the lower layer.

Lastly, define "gate switch" addi,multi:{0,1}ki+2ki+1{0,1}\mathsf{add}_i, \mathsf{mult}_i : \{0,1\}^{k_i + 2k_{i+1}} \rightarrow \{0,1\} which takes as input the in/out gate labels and gives its instruction type. For example, supposing that gate aa in layer ii is an addition gate,

addi(a,b,c)={1if (b,c)=(in1,i(a),in2,i(a))0otherwise\mathsf{add}_i(a,b,c) = \begin{cases} 1 & \text{if } (b,c) = (\mathsf{in}_{1,i}(a), \mathsf{in}_{2,i}(a)) \\ 0 & \text{otherwise} \end{cases}

Functions with a tilde on top of the function name indicates the multilinear extension of the mapping.

(TODO: Make polynomial extension page and set backlinks to it in pages Spartan, Lasso, etc.)

Lemma 1

The multilinear extension of the gate output polynomial can be represented as below.

W~i(z)=b,c{0,1}ki+1add~i(z,b,c)(W~i+1(b)+W~i+1(c))+mult~i(z,b,c)(W~i+1(b)W~i+1(c))\widetilde{W}_i(z) = \sum_{b,c \in \{0,1\}^{k_{i+1}}} \widetilde{\mathsf{add}}_i(z,b,c)(\widetilde{W}_{i+1}(b) +\widetilde{W}_{i+1}(c)) + \widetilde{\mathsf{mult}}_i(z,b,c)(\widetilde{W}_{i+1}(b) \cdot \widetilde{W}_{i+1}(c))

Proof.

Obviously both hands match at {0,1}ki\{0,1\}^{k_i}. As both hands are multilinear in zz and a multlinear extension of such mapping is uniquely defined at domain {0,1}ki\{0,1\}^{k_i}, both hands are identical.

Circuit over F5\mathbb{F}_5 with only multiplication gates.

Protocol

  1. P\mathcal{P} sends a function D:{0,1}k0FD : \{0,1\}^{k_0} \rightarrow \mathbb{F} claimed to equal W0W_0 (the function that maps output gate labels to output values).

  2. V\mathcal{V} picks a random r0$Fk0r_0 \stackrel{\$}{\leftarrow} \mathbb{F}^{k_0}, and lets m0D~(r0)m_0 \leftarrow \widetilde{D}(r_0). The remainder of the protocol is devoted to confirming that m0=W~0(r0)m_0 = \widetilde{W}_0(r_0).

  3. Now iterate for i=0,1,,d1i = 0, 1, \dots, d-1. Each iteration is devoted to reduce the claim on W~i(ri)\widetilde{W}_i(r_i) to the claim on W~i+1(ri+1)\widetilde{W}_{i+1}(r_{i+1}).

    1. Define the (2ki+1)(2k_{i+1})-variate polynomial:fri(i)(b,c)=add~i(ri,b,c)(W~i+1(b)+W~i+1(c))+mult~i(ri,b,c)(W~i+1(b)W~i+1(c))f_{r_i}^{(i)}(b,c) = \widetilde{\mathsf{add}}_i(r_i, b, c) \left( \widetilde{W}_{i+1}(b) + \widetilde{W}_{i+1}(c) \right) + \widetilde{\mathsf{mult}}_i(r_i, b, c) \left( \widetilde{W}_{i+1}(b) \cdot \widetilde{W}_{i+1}(c) \right)

    2. P\mathcal{P}'s claim is identical to that b,c{0,1}ki+1fri(i)(b,c)=mi\sum_{b, c \in \{0,1\}^{k_{i+1}}} f_{r_i}^{(i)}(b, c) = m_i.

    3. So that V\mathcal{V} may check this claim, P\mathcal{P} and V\mathcal{V} apply the sumcheck protocol to fri(i)f_{r_i}^{(i)}, up until V\mathcal{V}'s final check in that protocol, where V\mathcal{V} must evaluate fri(i)f_{r_i}^{(i)} at a randomly chosen point (b,c)Fki+1×Fki+1(b^*, c^*) \in \mathbb{F}^{k_{i+1}} \times \mathbb{F}^{k_{i+1}}.

      1. Note that V\mathcal{V} only needs to know add~i\widetilde{\mathsf{add}}_i and mult~i\widetilde{\mathsf{mult}}_i and not the entire polynomial fri(i)f_{r_i}^{(i)}.

    4. Now we need to reduce the two claims on W~i+1(b)\widetilde{W}_{i+1}(b^*) and W~i+1(c)\widetilde{W}_{i+1}(c^*) to one claim. Let \ell be the unique line satisfying (0)=b\ell(0) = b^* and (1)=c\ell(1) = c^*. P\mathcal{P} sends a univariate polynomial qq of degree at most ki+1k_{i+1} to V\mathcal{V}, claimed to equal W~i+1\widetilde{W}_{i+1} restricted to \ell.

    5. V\mathcal{V} now performs the final check in the sumcheck protocol, using q(0)q(0) and q(1)q(1) in place of W~i+1(b)\widetilde{W}_{i+1}(b^*) and W~i+1(c)\widetilde{W}_{i+1}(c^*).

      1. Note we assume here that for each layer ii, V\mathcal{V} can evaluate the multilinear extensions add~i\widetilde{\mathsf{add}}_i and mult~i\widetilde{\mathsf{mult}}_i in polylogarithmic time. This is necessary for keeping the verifier cost sublinear to circuit size SS.

    6. V\mathcal{V} chooses rFr^* \in \mathbb{F} at random and sets ri+1(r)r_{i+1} \leftarrow \ell(r^*) and mi+1q(ri+1) m_{i+1} \leftarrow q(r_{i+1}).

      1. Note that q(ri+1)=W~i+1(ri+1)q(r_{i+1}) = \widetilde{W}_{i+1}(r_{i+1}). The claim is successfully reduced to that of the next layer.

  4. After the final round, V\mathcal{V} directly checks md=W~d(rd)m_d = \widetilde{W}_d(r_d). As V\mathcal{V} knows the input vector, it can evaluate W~d\widetilde{W}_d at any point with O(n)O(n) time complexity where n=2kdn = 2^{k_d}, i.e. the length of input.

For a more detailed explanation and analysis on costs and soundness, please refer to Section 4.1 and 4.2 of the original paper.

In particular, the prover cost is reduced from O(S3)O(S^3) of the naive method of sumcheck protocol to O(S2)O(S^2) of the optimized method which exploits the fact that add~i\widetilde{\mathsf{add}}_i and mult~i\widetilde{\mathsf{mult}}_i are sparse and structured and thus can be evaluated very efficiently. That is, as most of the evaluations are guaranteed to be zero, instead of passing over the entire evaluation domain, it passes only over the values that are known to be nonzero. Furthermore, the prover cost can be reduced to O(S)O(S) when the computation can be parallelized by data, as shown in the next section.

Similarly in V\mathcal{V}'s cost analysis, the cost for evaluating add~i\widetilde{\mathsf{add}}_i and mult~i\widetilde{\mathsf{mult}}_i is considered to be lower order and omitted. This owes to repeatedly structured wiring patterns that commonly arise in circuits of real world problems in interest. Even without structured wiring patterns, we also have an option to let the verifier itself commit to add~i\widetilde{\mathsf{add}}_i and mult~i\widetilde{\mathsf{mult}}_i (with O(S)O(S) preprocessing time) using polynomial commitment scheme and let the prover open it each time in need, which turns the interactive proof to an interactive argument.

Leveraging Data Parallelism

When the circuit has the data parallelism property; that is, the same sub-computation is applied independently on different pieces of data and then possibly aggregated, the performance of GKR is greatly improved. Fortunately, data parallel computation is pervasive in real-world computing, including the use cases in ZK era such as LogUp-GKR, Lasso, Libra and Ceno.

Let C\mathcal{C} be a circuit of size SS with an arbitrary wiring pattern, and let C\mathcal{C}' be a "super-circuit" of size BSB \cdot S that applies C\mathcal{C} independently to B=2bB = 2^b input data before aggregating results. Labels of gates consists of two indices, one for the location in the layer and the other for which circuit among the copies to choose. For example, a=(a1,a2){0,1}ki×{0,1}ba = (a_1, a_2) \in \{0,1\}^{k_{i}} \times \{0,1\}^b.

Let hh denote the polynomial Fki×bF\mathbb{F}^{k_i \times b} \rightarrow \mathbb{F} defined by

h(a1,a2):=b1,c10,1ki+1g(a1,a2,b1,c1)h(a_1, a_2) :=\sum_{b_1, c_1 \in {0,1}^{k_{i+1}}} g(a_1, a_2, b_1, c_1)

where

g(a1,a2,b1,c1):=add~i(a1,b1,c1)(W~i+1(b1,a2)+W~i+1(c1,a2))+mult~i(a1,b1,c1)W~i+1(b1,a2)W~i+1(c1,a2) g(a_1, a_2, b_1, c_1) :=\widetilde{\mathsf{add}}_i(a_1, b_1, c_1) \left( \widetilde{W}_{i+1}^\prime(b_1, a_2) + \widetilde{W}_{i+1}^\prime(c_1, a_2) \right)+ \widetilde{\mathsf{mult}}_i(a_1, b_1, c_1) \cdot \widetilde{W}_{i+1}^\prime(b_1, a_2) \cdot \widetilde{W}_{i+1}^\prime(c_1, a_2)

Then hh extends WiW_i', just as in the single data case. However since hh is not multilinear like before, we multi-linearize hh, and represent the multilinear extension of WiW_i' as below. For any z{0,1}ki+bz \in \{0,1\}^{k_i + b},

W~i(z)=(a1,a2,b1,c1){0,1}ki+b+2ki+1gz(i)(a1,a2,b1,c1)\widetilde{W}_i^\prime(z) = \sum_{(a_1, a_2, b_1, c_1) \in \{0,1\}^{k_i + b + 2k_{i+1}}} g_z^{(i)}(a_1, a_2, b_1, c_1) \\

where

gz(i)(a1,a2,b1,c1):=eq~ki+b(z,(a1,a2))[add~i(a1,b1,c1)(W~i+1(b1,a2)+W~i+1(c1,a2))+mult~i(a1,b1,c1)W~i+1(b1,a2)W~i+1(c1,a2)]g_z^{(i)}(a_1, a_2, b_1, c_1) := \widetilde{\mathsf{eq}}_{k_i + b}(z, (a_1, a_2)) \cdot \left[ \widetilde{\textsf{add}}_i(a_1, b_1, c_1) \left( \widetilde{W}_{i+1}^\prime(b_1, a_2) + \widetilde{W}_{i+1}^\prime(c_1, a_2) \right) + \widetilde{\textsf{mult}}_i(a_1, b_1, c_1) \cdot \widetilde{W}_{i+1}^\prime(b_1, a_2) \cdot \widetilde{W}_{i+1}^\prime(c_1, a_2) \right]

Note that as the same circuit structure is repeated, add~i\widetilde{\mathsf{add}}_i and mult~i\widetilde{\mathsf{mult}}_i do not depend on a2a_2, which significantly reduces the cost of evaluating them for both parties and mitigates the bottleneck.

The protocol is the same as the single data version, except for that at layer ii, the sumcheck protocol is applied to gri(i)g_{r_i}^{(i)} instead.

Check Section 5 of the paper for detailed cost analysis.


References

Last updated