Sumcheck

Overview

The sumcheck protocol is a cryptographic proof protocol designed to efficiently prove and verify whether the sum of a multivariate polynomial over a specified domain equals a claimed value, given :

  • A multivariate polynomial g(X1,X2,,Xv)g(X_1,X_2, \dots, X_{v})

  • A finite domain BvB^v for any BFB \subseteq \mathbb{F}

  • A claimed sum

C1:=b1Bb2BbvBg(b1,,bv)C_1 := \sum_{b_1 \in B}\sum_{b_2 \in B} \dots \sum_{b_v \in B} g(b_1, \dots, b_v)

or in short C1:=bBvg(b)C_1 := \sum_{b \in B^v}g(b). We denote the true correct sum by HH.

Throughout this article, we assume that B={0,1}B = \{0,1\} which is the case where the evaluation domain is a boolean hypercube. degj(g)\deg_j(g) denotes the degree of gg in variable XjX_j. We denote the prover and the verifier by P\mathcal{P} and V\mathcal{V}.

The goal is for P\mathcal{P} to convince V\mathcal{V} that the claimed C1=HC_1 = H, without having V\mathcal{V} evaluate gg on all {0,1}v\{0,1\}^v elements (which would be computationally expensive).

Protocol

  1. In the first round, P\mathcal{P} sends the univariate polynomial g1(X1)g_1(X_1) and claims it equals the following:

(x2,,xv){0,1}v1g(X1,x2,,xv)\sum_{(x_2, \dots, x_v) \in \{0,1\}^{v-1}}g(X_1, x_2, \dots, x_{v})

Note that g1(0)+g1(1)=C1g_1(0) + g_1(1) = C_1 if P\mathcal{P} is honest.

  1. V\mathcal{V} checks if g1(0)+g1(1)=C1g_1(0) + g_1(1) = C_1. If not, V\mathcal{V} rejects.

  2. V\mathcal{V} checks if deg(g1)deg1(g)\deg(g_1) \le \deg_1(g). If not, V\mathcal{V} rejects.

  3. V\mathcal{V} chooses a random r1Fr_1 \in \mathbb{F} and sends it to P\mathcal{P}. Both V\mathcal{V} and P\mathcal{P} set C2:=g1(r1)C_2 := g_1(r_1).

  4. Now in the jjth round, for 1<j<v1<j<v, P\mathcal{P} sends the univariate polynomial gj(Xj)g_j(X_j) and claims it equals the following:

(xj+1,,xv){0,1}vjg(r1,,rj1,Xj,xj+1,,xv)\sum_{(x_{j+1}, \dots, x_v) \in \{0,1\}^{v-j}}g(r_1,\dots,r_{j-1}, X_j, x_{j+1}, \dots, x_{v})

Note that gj(0)+gj(1)=Cjg_j(0) + g_j(1) = C_j if P\mathcal{P} is honest.

  1. V\mathcal{V} checks if gj(0)+gj(1)=Cjg_j(0) + g_j(1) = C_j. If not, V\mathcal{V} rejects.

  2. V\mathcal{V} checks if deg(gj)degj(g)\deg(g_j) \le \deg_j(g). If not, V\mathcal{V} rejects.

  3. V\mathcal{V} chooses a random rjFr_j \in \mathbb{F} and sends it to P\mathcal{P}. Both V\mathcal{V} and P\mathcal{P} set Cj+1:=gj(rj)C_{j+1} := g_j(r_j).

This process continues iteratively, reducing the dimension at each step until reaching a single variable.

  1. In round vv, P\mathcal{P} sends gv(Xv)g_v(X_v) to V\mathcal{V} claiming it equals:

g(r1,,rv1,Xv)g(r_1, \dots, r_{v-1}, X_v)
  1. V\mathcal{V} checks if gv(0)+gv(1)=Cvg_{v}(0) + g_{v}(1) = C_v. If not, V\mathcal{V} rejects.

  2. V\mathcal{V} chooses a random rvFr_v \in \mathbb{F} and evaluates g(r1,,rv)g(r_1, \dots, r_v) with a single oracle query to gg. V\mathcal{V} checks that gv(rv)=g(r1,,rv)g_v(r_v) = g(r_1, \dots, r_v). If not, V\mathcal{V} rejects.

Completeness

Let si(X)s_i(X) be the polynomial that P\mathcal{P} would have sent in round ii if it were honest :

si(Xi)=(xi+1,,xv){0,1}vig(r1,,ri1,Xi,xi+1,,xv)s_i(X_i) = \sum_{(x_{i+1}, \dots, x_v) \in \{0,1\}^{v-i}}g(r_1,\dots,r_{i-1}, X_i, x_{i+1}, \dots, x_{v})

If C1C_1 and g1(X)g_1(X) that P\mathcal{P} provided in the first round is equal to HH and s1(X)s_1(X), respectively, then g1(0)+g1(1)g_1(0) + g_1(1) is equal to C1C_1. V\mathcal{V} checks if this holds, then randomly samples r1r_1 and sends it to P\mathcal{P}. The only remaining part is merely a recursive invocation of another sumcheck protocol on s1(r1)s_1(r_1) where the claimed value is C2:=g1(r1)C_2 := g_1(r_1). At the bottom of the recursion, which is the last round of the protocol, P\mathcal{P} passes the verification as gv(rv)=sv(rv)=g(r1,,rv)g_v(r_v) = s_v(r_v) = g(r_1, \dots, r_v). Thus, the sumcheck protocol has perfect completeness.

Soundness

What is the soundness error, that is, what is the probability that P\mathcal{P} passes the protocol even though the first value given in the first step is wrong, i.e. C1HC_1 \neq H?

If the following occurs at any round of the protocol, P\mathcal{P} can pass the verification even if the first claimed value is wrong:

  • For the challenge ri$Fr_i \stackrel{\$}{\leftarrow} \mathbb{F} sampled by V\mathcal{V}, luckily gi(ri)=si(ri)g_i(r_i) = s_i(r_i) for the gisig_i \neq s_i that P\mathcal{P} had sent in the previous round.

Once the scenario outlined above occurs, P\mathcal{P} can proceed onto the remaining rounds with a valid transcript to deceive V\mathcal{V}. The soundness error δs\delta_s is bounded by vd/Fvd / |\mathbb{F}|, which can be calculated by the Schwartz-Zippel lemma. In other words, V\mathcal{V} can be convinced that gi(X)=si(X)g_i(X) = s_i(X) if gi(ri)=si(ri)g_i(r_i) = s_i(r_i) for a random sampled rir_i.

Cost

We denote the cost of evaluation oracle query to gg by TT. In reality, TT is equal to the cost to evaluate gg at a single input in Fv\mathbb{F}^v.

Communication cost.

At each round, P\mathcal{P} generates an univariate polynomial gi(X)g_i(X) whose degree is at most degi(g)\deg_i(g). To uniquely satisfy such polynomial, P\mathcal{P} needs to send degi(g)\deg_i(g) evaluations on gg.

Prover cost.

The cost of computing such prescribed messages at each round is equal to evaluating gi(X)g_i(X) at degi(g)\deg_i(g) points. Letting the oracle query cost be TT and assuming degi(g)=O(1)\deg_i(g) = O(1), the prover's runtime cost at round ii is O(2vT)O(2^v \cdot T).

Verifier cost.

Also assuming degi(g)=O(1)\deg_i(g) = O(1), the total verifier cost is dominated by TT; the cost of evaluating gg only once.

Combining with Fiat-Shamir transform, we have successfully reduced a complicated interactive proof into a simple evaluation task for V\mathcal{V}. Unfortunately, in many real world problems, V\mathcal{V} does not have access to a constant cost oracle query to gg, or even a single evaluation is prohibitively expensive.

The key part of sumcheck-based protocols is about how we will grant V\mathcal{V} oracle access. There should be either a way V\mathcal{V} can efficiently evaluate gg, or we should run another protocol for the value g(r)g(r) (refer to polynomial commitment scheme).

Example

#SAT

For a nn-variate boolean formula ϕ\phi, whose size SS is poly(n)\mathsf{poly} (n), calculate

x{0,1}nϕ(x)\sum_{x \in \{0,1\}^n} \phi(x)

The fastest algorithm ever known to solve #SAT\#\mathsf{SAT} is exponential, which means it's not much better than brute-forcing it.

In addition to that, there is no polynomial time deterministic & non-interactive algorithm V\mathcal{V}. This is because even if P\mathcal{P} provides every valid input it knows as a witness, there is no way V\mathcal{V} can check in polynomial time whether there are more valid inputs. #SATNP\#\mathsf{SAT} \notin \mathsf{NP}.

However there exists an interactive proof for this, which enables V\mathcal{V} to verify if the claimed value is correct within polynomial time. #SATIP\#\mathsf{SAT} \in \mathsf{IP}.

This implies that the interactive proof is more powerful than merely proving an NP statement. It is known that NPPSPACE=IP\mathsf{NP} \sube \mathsf{PSPACE} = \mathsf{IP}, where PSPACE\mathsf{PSPACE} denotes the set of all problems that can be solved by Turing machines with polynomial space complexity.

Protocol

Transform ϕ\phi into an arithmetic circuit ψ(x)\psi(x). AND gate can be extended to yzy \cdot z, OR gate to y+zyzy + z - y\cdot z, NOT gate to 1y1-y. Now extend circuit ψ\psi to a nn-variate polynomial gg, which means that x{0,1}ng(x)=x{0,1}nϕ(x)\sum_{x\in\{0,1\}^{n}} g(x) = \sum_{x\in\{0,1\}^{n}} \phi(x).

Degree of gg is poly(S)\mathsf{poly} (S). Run the sumcheck protocol on gg.


References

Written by ryan Kim and Carson Lee from A41

Last updated