Lasso

Lasso by Srinath Setty, Justin Thaler and Riad Wahby


Overview

This paper introduces Lasso\mathsf{Lasso}, which is a lookup argument that allows an untrusted prover to commit to a vector aFma \in \mathbb{F}^m and prove that all entries of aa reside in some predetermined table tFnt \in \mathbb{F}^n.

Lasso\mathsf{Lasso}'s performance characteristics unlock the so-called "Lookup singularity".

A key difference of Lasso\mathsf{Lasso} from other lookup arguments is that, if the table tt is structured(as we will explain later), then no party needs to commit to tt, enabling the use of enormous tables (e.g. of size 21282^{128} or larger). Most common use cases of lookup argument, such as range checks, bitwise operations or big-number arithmetic, obtains huge benefits in cost from this characteristic of Lasso\mathsf{Lasso}.

Lookup arguments.

Imagine that a prover wishes to establish that at no point in a program's execution did any integer ever exceed 21282^{128}.

A naive approach to accomplish this inside a circuit-SAT instance is to have the circuit take 128 field elements as its advice inputs, which constitute the binary representation of xx. The circuit checks that all 128 advice elements are in {0,1}\{0,1\} and they indeed equal to the binary representation of xx, i.e., x=i=01272ibix = \sum^{127}_{i=0} 2^i \cdot b_i. A single overflow check turns into at least 129 constraints and additional 128 field elements in witness.

Lookup tables offer a better approach. The verifier initializes a lookup table containing all integers between 00 and 212812^{128} - 1. Then the overflow check above amounts to simply confirming that xx is in the table. Of course a table of size 21282^{128} is way too large to be represented, even for the prover. This paper is devoted to describe the techniques to enable such a table lookup without requiring the table to ever be explicitly materialized by either of them. Indeed, the table in this example is highly structured in a way that it can be optimized using the techniques we will introduce.

Lasso\mathsf{Lasso}'s starting point is Spark\mathsf{Spark}, which is an optimal polynomial commitment scheme for sparse multilinear polynomials from Spartan. Just as Spartan, Lasso\mathsf{Lasso} can be instantiated with any multilinear PCS. Particularly for SNARKs that have the prover commit to the witness using a multilinear PCS, Lasso\mathsf{Lasso} can be applied seamlessly to provide a lookup argument in either R1CS or Plonkish circuit satisfiability.

Jolt and Lasso.

Jolt\mathsf{Jolt} is a RISC-V-powered zkVM which achieved substantial improvements in front-end design by exploiting Lasso\mathsf{Lasso}. Jolt\mathsf{Jolt} replaces the instruction execution part of a zkVM circuit with Lasso\mathsf{Lasso} by letting it look up into a gigantic evaluation table rather than evaluating it in a circuit. For each of the primitive RISC-V instructions, the resulting evaluation table has the structure that we require to apply Lasso\mathsf{Lasso} for a significant reduce in the cost. Thaler states that Lasso\mathsf{Lasso} and Jolt\mathsf{Jolt} together essentially achieve a vision outlined by Barry WhiteHat called the Lookup singularity - seeking to transform arbitrary computer program into circuits that only perform lookups.


Preliminiaries

As they already explained in previous presentations, I skip the parts for Multilinear extensions, SNARKs, Polynomial commitment scheme, Polynomial IOP and the sumcheck protocol.

Sparsity.

mm-sparse refers to multilinear polynomials g:FlFg: \mathbb{F}^l \rightarrow \mathbb{F} in ll variables such that g(x)0g(x) \neq 0 for at most mm values of x{0,1}lx \in \{0,1\}^l. In other words, gg has at most mm non-zero coefficients in its multilinear Lagrange polynomial basis. If m2lm \ll 2^l, only a tiny fraction of the possible coefficients are non-zero. If m=Θ(2l)m = \Theta(2^l), we refer to such gg as a dense polynomial.

Dense representation of multilinear polynomials.

The fact that the MLE of a function is unique offers the following method to represent any multilinear polynomial, which is briefer than its point-value representation or coefficient representation.

A multilinear polynomial g:FlFg: \mathbb{F}^l \rightarrow \mathbb{F} can be represented uniquely by the list of tuples LL such that for all i{0,1}l,(to-field(i),g(i))Li \in \{0,1\}^l, (\mathsf{to}\text{-}\mathsf{field}(i), g(i)) \in L if and only if g(i)0g(i) \neq 0.

Here, to-field\mathsf{to}\text{-}\mathsf{field} is the canonical injection from {0,1}l\{0,1\}^l to F\mathbb{F} and to-bits\mathsf{to}\text{-}\mathsf{bits} is the inverse mapping. You can think of them as transformations between a field element and its binary representation.

We refer to LL as the dense representation of a sparse polynomial gg.

Definition 1.

A multilinear polynomial gg in ll variables is a sparse multilinear polynomial if DenseRepr(g)|\mathsf{DenseRepr}(g)| is sub-linear in Θ(2l) \Theta(2^l). Otherwise, it is a dense multilinear polynomial.


Technical overview

For a lookup argument from aFma \in \mathbb{F}^m to tFnt \in \mathbb{F}^n, it suffices for a prover to show that it knows a sparse matrix MFm×nM \in \mathbb{F}^{m \times n} such that (1) for each row of MM, only one cell has a value of 1 and the rest are zeroes and that (2) Mt=aM \cdot t = a.

Below demonstrates a simple example of demonstrating that a=[2,3,0]a = [2, 3, 0] is a valid lookup into the table t=[0,1,2,3]t = [0, 1, 2, 3]. You can see that MM is indeed sparse.

[001000011000][0123]=[230] \begin{bmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ 3 \\ \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \\ 0 \\ \end{bmatrix}

Proving the matrix-vector multiplication above is equivalent to confirming that

Equation (1):

y{0,1}lognM~(r,y)t~(y)=a~(r) \sum_{{y} \in \{0,1\}^{\log n}} \widetilde{M}({r},{y}) \cdot \tilde{t}({y}) = \tilde{a}({r})

for an rFlogm{r} \in \mathbb{F}^{\log m} randomly sampled by the verifier.

To run a sumcheck protocol on this equation to prove it, the verifier should be able to evaluate M~(r,y)t~(y)\widetilde{M}({r},{y}) \cdot \tilde{t}({y}) at some random yy succintly, i.e. with a cost independent of nn, regarding that the size of lookup table might be very large. Clearly, we cannot achieve this by committing to M~\widetilde{M} and t~\tilde{t} using normal PCS.

MM is a sparse matrix where only mm elements out of m×nm \times n are non-zero. Hence, the above is efficiently provable by having the prover commit to the sparse polynomial M~\widetilde{M} using Surge\mathsf{Surge}, which is a generalization of Spark\mathsf{Spark} for Lasso\mathsf{Lasso}, and instead of commiting to the enormous tt, sending only its succint description assuming that it is structured in a way we define. Once we have such a commitment scheme, we instantly obtain the desired lookup argument.


Spark

Spark\mathsf{Spark}, as explained above, is a polynomial commitment scheme for sparse polynomials. Let gg be a (logN\log N)-variate multilinear polynomial with sparsity mm. The prover commits to a unique dense representation of the sparse polynomial gg, which is the list that specifies only the multilinear Lagrange bases with non-zero coefficients. When the verifier requests an evaluation g(r)g(r), the prover returns the claimed evaluation vv along with the evaluation proof.

Let cc be such that N=mcN = m^c without loss of generality. There is a simple algorithm that takes as input the dense representation of gg and outputs g(r)g(r) in O(cm)O(c \cdot m) time.

Algorithm 1.

Decompose the logN=clogm\log N = c \log m variables of r r into cc blocks, each of size logm\log m, writing r=(r1,...,rc)(Flogm)cr = (r_1, ..., r_c) \in (\mathbb{F}^{\log m})^c. Then any (logN\log N)-variate Lagrange basis polynomial evaluated at rr can be expressed as a product of cc smaller Lagrange basis polynomials. With O(cm)O(c\cdot m) cost of pre-computing a write-once memory, each basis polynomial can be calculated by cc memory look-ups and extra field operations.

Example.

Let N=26N = 2^6, c=3c = 3 and m=22m = 2^2.

Let g(x)=1g(x) = 1 for x=000001,...,000100x = 000001, ..., 000100. Otherwise, g(x)=0g(x) = 0 for x{0,1}6x \in \{0,1\}^6. Subscript 1, 2, and 3 each denotes the 2-bits length partition of the 6-bits binary string. Xi(r)\mathcal{X}_i(r) is equal to eq~(i,r)\widetilde{\mathsf{eq}}(i, r).

Observe that the evaluation of g(x)g(x) on some rr can be decomposed in this way:

g(r)=i=141Xto-bits(i)(r)=i=141(Xto-bits(i)1(r1)Xto-bits(i)2(r2)Xto-bits(i)3(r3))g({r}) = \sum^{4}_{i=1} 1 \cdot \mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)}({r}) \\ = \sum^4_{i=1} 1\cdot (\mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)_{1}}({r_{1}}) \cdot \mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)_{2}}({r_{2}}) \cdot \mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)_{3}}({r_{3}}))

The memory is composed of c(=3)c(=3) subtables t1,t2t_1, t_2 and t3t_3 where each tit_i is a table of size mm populated by the values from X0,0(ri)\mathcal{X}_{0,0}({r_i}) to X1,1(ri)\mathcal{X}_{1,1}({r_i}). Each subtable can be computed by the prover in O(m)O(m), leading to a total computation cost of O(cm)O(c \cdot m).

t₁
t₂
t₃

X0,0(r1)\mathcal{X}_{0,0}({r_1})

X0,0(r2)\mathcal{X}_{0,0}({r_2})

X0,0(r3)\mathcal{X}_{0,0}({r_3})

X0,1(r1)\mathcal{X}_{0,1}({r_1})

X0,1(r2)\mathcal{X}_{0,1}({r_2})

X0,1(r3)\mathcal{X}_{0,1}({r_3})

X1,0(r1)\mathcal{X}_{1,0}({r_1})

X1,0(r2)\mathcal{X}_{1,0}({r_2})

X1,0(r3)\mathcal{X}_{1,0}({r_3})

X1,1(r1)\mathcal{X}_{1,1}({r_1})

X1,1(r2)\mathcal{X}_{1,1}({r_2})

X1,1(r3)\mathcal{X}_{1,1}({r_3})

Notice that this is cheaper with a factor of logm\log m than a naive approach, whose cost is O(mlogN)O(m \log N) since there are mm Lagrange basis polynomials with logN\log N variables.

The efficiency of the algorithm above comes from the fact that gg is sparse. Spark\mathsf{Spark} is merely a SNARK as an argument that the prover correctly ran this algorithm on the committed description of gg. For simplicity, in the demonstrations below, we talk about the special case where c=2c = 2 and then expand it for a general result.


Detailed explanation

Let DD be a mm-sparse (2logm2\log m)-variate multilinear polynomial, as we fixed the value of cc to be 2. For now, you can think of DD as an m×mm \times m matrix with only mm nonzero elements.

When its dense representation is given, one can express its evaluation as below, by decomposing the (2logm2\log m)-variate Lagrange basis polynomial into two (logm\log m)-variate Lagrange basis polynomial.

D(rx,ry)=(i,j){0,1}logm×{0,1}logm:D(i,j)0D(i,j)eq~(i,rx)eq~(j,ry)D({r_x}, {r_y}) = \sum_{({i},{j})\in\{0,1\}^{\log \mathsf{m}} \times \{0,1\}^{\log \mathsf{m}} : D({i},{j}) \neq 0} D({i},{j}) \cdot \widetilde{\mathsf{eq}}({i}, {r_x}) \cdot \widetilde{\mathsf{eq}}({j}, {r_y})

Let's introduce three (logm\log m)-variable multilinear polynomials val\mathsf{val}, row\mathsf{row} and col\mathsf{col}, which extend the dense representation of DD. That is, k{0,1}logmk \in \{0,1\}^{\log m} iterates the m nonzero elements in DD in some canonical order, and for each kk, there exist some i,ji, j such that row(k)=to-field(i)\mathsf{row}(k) = \mathsf{to}\text{-}\mathsf{field}(i), col(k)=to-field(j)\mathsf{col}(k) = \mathsf{to}\text{-}\mathsf{field}(j) and val(k)=D(i,j)\mathsf{val}(k) = D(i, j).

val\mathsf{val} maps kk to each value of the nonzero elements and row\mathsf{row} and col\mathsf{col} maps kk to a corresponding location of it in DD(or, equivalently, to the index of each two subtables in Algorithm 1).

Above equation can be simplified to:

Equation (2):

D(rx,ry)=k{0,1}logmval(k)eq~(to-bits(row(k)),rx)eq~(to-bits(col(k)),ry)D(r_x, r_y) = \sum_{k\in\{0,1\}^{\log m}} \mathsf{val}(k) \cdot \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x) \cdot \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{col}(k)), r_y)

Let Erx(k)=eq~(to-bits(row(k)),rx)E_{rx}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x) and Ery(k)=eq~(to-bits(col(k)),ry)E_{ry}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{col}(k)), r_y) each of which are (logm\log m)-variate. We define a PIOP for proving the evaluation of DD at (rx,ry)(r_x, r_y). We assume that val\mathsf{val} is committed to beforehand.


  1. PV\mathcal{P} \rightarrow \mathcal{V} : Erx(k)E_{rx}(k) and Ery(k)E_{ry}(k) as oracles.

  2. PV\mathcal{P} \leftrightarrow \mathcal{V} : run the sumcheck reduction to reduce this statement

v=k{0,1}logmval(k)Erx(k)Ery(k)v = \sum_{k\in\{0,1\}^{\log m}} \mathsf{val}(k) \cdot E_{rx}(k) \cdot E_{ry}(k)

to the following checks:

  • val(rz)=?vval\mathsf{val}(r_z) \stackrel{?}{=} v_{val}

  • Erx(rz)=?vErxE_{rx}(r_z) \stackrel{?}{=} v_{E_{rx}} and Ery(rz)=?vEryE_{ry}(r_z) \stackrel{?}{=} v_{E_{ry}}.

where the claimed values are provided by P \mathcal{P} and rzr_z is randomly sampled by V\mathcal{V} at the end of the sumcheck protocol.

  1. V\mathcal{V} : check if the three equalities above hold with an oracle query to each of the polynomials.


A key part of achieving a sound argument with the PIOP above is to enable the verifier to check that the committed ErxE_{rx} and EryE_{ry} are built correctly, i.e.

  • k{0,1}logm,Erx(k)=eq~(to-bits(row(k)),rx)\forall k \in \{0,1\}^{\log m}, E_{rx}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x)

  • k{0,1}logm,Ery(k)=eq~(to-bits(col(k)),ry)\forall k \in \{0,1\}^{\log m}, E_{ry}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{col}(k)), r_y).

Checking the above is equivalent to checking if an untrusted prover ran Algorithm 1 above (which is the case where c=2c = 2 and the two subtables correspond to ErxE_{rx} and EryE_{ry}), i.e., that the prover initialized the memory with correct values and the subsequent memory read operations occurred in a correct manner. We introduce offline memory checking, which is a protocol that reduces such memory consistency argument to a multiset equality check.


Offline memory checking

Offline memory checking is a protocol between an untrusted memory and a trusted checker, which is the prover and the verifier in our protocol's context, respectively.

The checker maintains two local state sets : RS\mathsf{RS} and WS\mathsf{WS}. RS\mathsf{RS} is initially empty. For M\mathsf{M}-sized memory, WS\mathsf{WS} is initialized as following : for all i[N1/c]i \in [N^{1/c}], the tuple (i,vi,0)(i, v_i, 0) is included. Here, the first element is the index, second one is the value and the third one is the accessed count of each memory slot.

For each read operation at address aa, suppose that the untrusted memory responds with a value-count pair (v,t)(v, t). Then the checker updates its local state as follows:

  1. RSRS{(a,v,t)}\mathsf{RS} \leftarrow \mathsf{RS} \cup \{(a, v, t)\}

  2. store (v,t+1)(v, t+1) at address aa in the untrusted memory, and

  3. WSWS{(a,v,t+1)}\mathsf{WS} \leftarrow \mathsf{WS} \cup \{(a,v,t+1)\}.

Claim 1.

If for every read operation, the untrusted memory returns the tuple last written to that location, then there exists a set S\mathsf{S} with cardinality M\mathsf{M} consisting of tuples of the form (k,vk,tk)(k, v_k, t_k) for all k[M]k \in [\mathsf{M}] such that WS=RSS\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}, where S\mathsf{S} is simply the current memory view. Conversely, if the untrusted memory ever returns a value vv for a memory read k[M]k \in [\mathsf{M}] which does not equal the value initially written to cell kk, then there does not exist any set S\mathsf{S} such that WS=RSS\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}.


I skip the proof of the claim, but the example below demonstrates how the first direction of the claim holds. Three memory slots with addresses 1, 2 and 3 are initialized with values 100, 101 and 102. Three read operations at address 2, 2 and 3.

RS
WS
Read address 2.
RS
WS
Read address 2.
RS
WS
Read address 3.
RS
WS
S

(1, 100, 0)

(1, 100, 0)

(1, 100, 0)

(1, 100, 0)

(1, 100, 0)

(2, 101, 0)

(2, 101, 0)

(2, 101, 0) (2, 101, 1)

(2, 101, 0) (2, 101, 1)

(2, 101, 0) (2, 101, 1) (2, 101, 2)

(2, 101, 0) (2, 101, 1)

(2, 101, 0) (2, 101, 1) (2, 101, 2)

(2, 101, 2)

(3, 102, 0)

(3, 102, 0)

(3, 102, 0)

(3, 102, 0)

(3, 102, 0) (3, 102, 1)

(3, 102, 1)

You can see that there exist an S\mathsf{S} such that WS=RSS\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}, and the S\mathsf{S} is equal to the final memory view.


Now we define the counter polynomials. Given the size M\mathsf{M} memory and a list of mm addresses involved in read operations, one can compute two vectors CrFmC_r \in \mathbb{F}^m and CfFMC_f \in \mathbb{F}^{\mathsf{M}} where Cr[k]C_r[k] stores the 'correct' count that would have been returned for kk-th read operation and Cf[j]C_f[j] stores the 'correct' final count stored at memory location jj after the mm read operations. Computing these requires O(m)O(m) operations over F\mathbb{F}.

Let read_ts=Cr~\textsf{read\_ts} = \widetilde{C_r}, write_cts=Cr~+1\textsf{write\_cts} = \widetilde{C_r} + 1, and final_cts=Cf~\mathsf{final\_cts} = \widetilde{C_f}. We refer to these polynomials as counter polynomials.

The evaluation proof

Claim 2.

Regarding a (2logM2 \log M)-variate multilinear polynomial, suppose that (row,col,val\mathsf{row, col, val}) is committed in advance, and

Erx,Ery,read_tsrow,final_ctsrow,read_tscol,final_ctscolE_{rx}, E_{ry}, \textsf{read\_ts}_{\mathsf{row}}, \textsf{final\_cts}_{\mathsf{row}}, \textsf{read\_ts}_{\mathsf{col}}, \textsf{final\_cts}_{\mathsf{col}}

are the polynomials sent by the prover at the beginning of the evaluation proof. Notice that the verifier can manually compute write_ctsrow\mathsf{write\_cts}_{\mathsf{row}} and write_ctscol\mathsf{write\_cts}_{\mathsf{col}}.

If ErxE_{rx} is correctly computed, i.e. k{0,1}logm,Erx(k)=eq~(to-bits(row(k)),rx)\forall k \in \{0,1\}^{\log m}, E_{rx}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x),

then the following holds : WS=RSS\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}, where

  • WS={(to-field(i),eq~(i,rx),0):i{0,1}logM}{(row(k),Erx(k),write_ctsrow(k)):k{0,1}logm};\mathsf{WS} = \{(\textsf{to-field}(i), \widetilde{\mathsf{eq}}(i, r_x), 0): i \in \{0,1\}^{\log \mathsf{M}} \} \cup \{ (\mathsf{row}(k), E_{rx}(k), \textsf{write\_cts}_{\mathsf{row}}(k)) : k \in \{0,1\}^{\log m} \} ;

  • RS={(row(k),Erx(k),read_tsrow(k)):k{0,1}logm};\mathsf{RS} = \{(\mathsf{row}(k), E_{rx}(k), \textsf{read\_ts}_{\mathsf{row}}(k)) : k \in \{0,1\}^{\log m}\}; and

  • S={(to-field(i),eq~(i,rx),final_ctsrow(i)):i{0,1}logM}.\mathsf{S} = \{(\textsf{to-field}(i), \widetilde{\mathsf{eq}}(i, r_x), \textsf{final\_cts}_{\textsf{row}}(i)): i \in \{0,1\}^{\log {\mathsf{M}}} \}.

The high-level conclusion of the memory checking above is that when ErxE_{rx} is the polynomial that is produced by an untrusted memory, the checker (verifier) initializes the memory with eq~(i,rx)\widetilde{\mathsf{eq}}(i, r_x) which are the expected 'correct' values, and then by checking if the memory lookups are correctly done for each of the Erx(k)E_{rx}(k), the verifier is convinced that the provided ErxE_{rx} is correct. The same logic is applied for EryE_{ry}.

In reality, the verifier doesn't perform the check above manually. Instead, to check that Hτ,γ(WS)=Hτ,γ(RS)Hτ,γ(S)\mathcal{H}_{\tau, \gamma}(\mathsf{WS}) = \mathcal{H}_{\tau, \gamma}(\mathsf{RS}) \cdot \mathcal{H}_{\tau, \gamma}(\textsf{S}), two parties run a grand product argument, which is a multiset equality check that is performed via a GKR protocol whose circuit is simply a binary tree of multiplication gates. Since running a GKR protocol is reduced to a single evaluation of the input vector on a random value sampled by the verifier, and the verifier virtually has an oracle access to all of the required polynomials assuming they are committed by the prover (or is able to evaluate by himself), the memory check above can be done efficiently with a few evaluation proofs. For a more detailed explanation on the grand product argument, refer to Appendix E of the paper.


Generalization and Specialization

We will generalize the above protocol, Spark\mathsf{Spark}, so that it supports an arbitrary value for cc rather than a fixed value of 2, which implies that the gigantic lookup table can be decomposed into even smaller subtables of size N1/cN^{1/c}.

Subsequently we will specialize Spark\mathsf{Spark} for Lasso\mathsf{Lasso} with some cost optimization exploiting the properties of lookup arguments. Eventually, we will further generalize Spark\mathsf{Spark} by relaxing the conditions that make a table 'structured' for Lasso\mathsf{Lasso} to be applied. With these modifications altogether, we introduce Surge\mathsf{Surge}, as a sparse polynomial commitment scheme for Lasso\mathsf{Lasso} and its final PIOP.

  1. Supporting logN=clogm\log N = c \log m variables, rather than 2logm2 \log m.

cc can be interpreted as the number of the subtables which are the result of decomposition of the original table of size NN. We can simply factor eq~clogm\widetilde{\mathsf{eq}}_{c \log \mathsf{m}} into a product of cc terms, instead of two terms, each of which are a (logm)(\log m)-variate multilinear Lagrange basis polynomial. Hence, from this point, we use the notation of dimi(x)\mathsf{dim_i}(x) for i[c]i \in [c], for representing the injection from a boolean hypercube to an index in each subtable, instead of row(x)\mathsf{row}(x) and col(x)\mathsf{col}(x).

  1. Specializing for Lasso

As we saw in Equation (1), the sparse m×Nm \times N matrix MM that we want to commit to can only have value 1 as its non-zero element. It means that the polynomial val(x)\mathsf{val}(x) is also fixed to 1 and the commitment to it is unnecessary. We can save one polynomial to commit to with this per each lookup argument.

Furthermore, if we let c=2 c = 2 and sample rxr_x from Flogm \mathbb{F}^{\log m} and ryr_y from FlogN\mathbb{F}^{\log N} (this means that each rxr_x and ryr_y represents the row index and column index of matrix MM), since each row has exactly one non-zero element, to-bits(row(k))\textsf{to-bits}(\textsf{row}(k)) is simply kk. Erx(k)=eq~(k,rx)E_{rx}(k) = \widetilde{\mathsf{eq}}(k, r_x) can be evaluated by verifier on its own, thus there is no need for committing to it or proving its well-formness.

[0010000110001000][0123]=[2300] \begin{bmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ 3 \\ \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \\ 0 \\ 0 \\ \end{bmatrix}

MM is a 4 x 4 matrix with sparsity of 4. Let kk be a two bits binary string for indexing 4 nonzero elements, i.e. k=00,01,10,11k = 00, 01, 10, 11. The table below is the description of mapping row\mathsf{row} and col\mathsf{col}. It is obvious that the mapping row(k)\mathsf{row}(k) is simply an iteration from 0 to 3, which is virtually equal to kk. This is because there are exactly one nonzero element per each row. Therefore, it is unnecessary to commit to the polynomial Erx(k)E_{rx}(k), as well as val(k)\mathsf{val}(k). Check the table below.

k
row(k)
col(k)
val(k)

00

0

2

1

01

1

3

1

10

2

0

1

11

3

0

1

  1. Generalizing the struct of lookup tables

We redefine the term 'structured' to be more comprehensive and well-defined for the properties that the lookup tables may have. If (1) a lookup table tt can be expressed as a tensor product of c2c \geq 2 smaller tables (i.e. decomposible), and (2) for each subtable its multilinear extension polynomial can be evaluated quickly (i.e. MLE-structured), we regard such table tt as structured and Lasso can be applied.

Suppose that there is an integer k1k \geq 1 and α=kc \alpha = k \cdot c tables T1,...,TαT_1, ..., T_{\alpha} of size N1/cN^{1/c} along with an α\alpha-variate multilinear polynomial gg such that the following holds.

T[r]=g(T1[r1],...,Tk[r1],Tk+1[r2],...,T2k[r2],...,Tαk+1[rc],...,Tα[rc]).T[r] = g(T_1[r_1], ..., T_k[r_1], T_{k+1}[r_2], ..., T_{2k}[r_2], ..., T_{\alpha-k+1}[r_c], ..., T_{\alpha}[r_c]).

It means that each element of TT can be briefly expressed by each corresponding element in T1,...,TαT_1, ..., T_{\alpha}.


Example.

A table for a range check in the first example ([1,...,2128][1, ..., 2^{128} ]) can be easily expressed in the way above with k=1 k = 1 and c=32c = 32, for instance.

T=[1,...,2128]T1=T2=...=T32=[1,...,24]T[r]=2124T1[r1]+2120T2[r2]+...+T32[r32]T = [1, ..., 2^{128}] \\ T_1 = T_2 = ... = T_{32} = [1, ..., 2^4] \\ T[r] = 2^{124} \cdot T_1[r_1] + 2^{120} \cdot T_2[r_2] + ... + T_{32}[r_{32}]

Going back, we rewrite the Equation (2) as below:

j{0,1}logNM~(r,j)T[j]=a~(r)\sum_{j \in \{0,1\}^{\log N}} \widetilde{M}(r, j) \cdot T[j] = \tilde{a}(r)

If and only if Mt=aM \cdot t = a, the equation above holds, with a soundness error logm/F \log m / |\mathbb{F}|. In Lasso\mathsf{Lasso}, after committing to M~\widetilde{M}(by committing to dim1,...,dimc\mathsf{dim_1}, ..., \mathsf{dim_c}) and a~\tilde{a}, the verifier sends random sampled rr to see if above holds.

Let nz(i)\mathsf{nz}(i) denote the unique nonzero column in row ii of MM. We can rewrite the LHS as following:

j{0,1}logNM~(r,j)T[j]=i{0,1}logmeq~(i,r)T[nz(i)]\sum_{j \in \{0,1\}^{\log N}} \widetilde{M}(r, j) \cdot T[j]\\ = \sum_{i \in \{0,1\}^{\log m}} \widetilde{\mathsf{eq}}(i, r) \cdot T[\mathsf{nz}(i)]

And with the decomposibility of TT,

i{0,1}logmeq~(i,r)T[nz(i)]=i{0,1}logmeq~(i,r)g(T1[dim1(i)],...,Tk[dim1(i)],Tk+1[dim2(i)],...,T2k[dim2(i)],...,Tαk+1[dimc(i)],...,Tα[dimc(i)])\sum_{i \in \{0,1\}^{\log m}} \widetilde{\mathsf{eq}}(i, r) \cdot T[\mathsf{nz}(i)] \\ = \sum_{i \in \{0,1\}^{\log m}} \widetilde{\mathsf{eq}}(i, r) \cdot g(T_1[\mathsf{dim}_1(i)], ..., T_k[\mathsf{dim}_1(i)], T_{k+1}[\mathsf{dim}_2(i)], ..., T_{2k}[\mathsf{dim}_2(i)], ..., T_{\alpha-k+1}[\mathsf{dim}_c(i)], ..., T_{\alpha}[\mathsf{dim}_c(i)])

Now the prover should prove that

  • for some E1,...,EαE_1, ..., E_{\alpha}, above is equal to a~(r)\tilde{a}(r), using the sum-check protocol and

  • E1,...,EαE_1, ..., E_{\alpha} are built with correct lookups into the subtables T1,...,TαT_1, ..., T_{\alpha}, using the offline memory checking.

Assuming that the verifier can evaluates the MLE of each subtables quickly, both parties can participate in the protocol only with the short descriptions about the subtables, without ever fully materializing the entire lookup table TT throughout the protocol.

Final PIOP is described below.


P\mathcal{P} has committed to cc multilinear polynomials dim1,...,dimc\mathsf{dim_1}, ..., \mathsf{dim_c}, each over logm\log m variables.

  1. PV\mathcal{P} \rightarrow \mathcal{V} : 2α\alpha different (logm\log m)-variate multilinear polynomials E1,...,EαE_1, ..., E_{\alpha}, read_ts1\textsf{read\_ts}_1,..., read_tsα\textsf{read\_ts}_{\alpha} and α\alpha different (logN/c\log N / c)-variate multilinear polynomials final_cts1\textsf{final\_cts}_1, ..., final_ctsα\textsf{final\_cts}_{\alpha}.

    // EiE_i is purported to specify the values of each of the mm reads into TiT_i.

    // read_ts\textsf{read\_ts} and final_cts\textsf{final\_cts} are the "counter polynomials" for each memory slot of the subtables after the computation.

  2. V \mathcal{V} and P\mathcal{P} apply the sumcheck protocol to the polynomial h(k):=eq~(r,k)g(E1(k),...,Eα(k))h(k) := \widetilde{\mathsf{eq}}(r,k) \cdot g(E_1(k), ..., E_{\alpha}(k)) which reduces the check of the sum of h(k) h(k) on the boolean hypercube to : Ei(rz)=?vEiE_i(r_z) \stackrel{?}{=} v_{E_i} for i=1,...,αi = 1, ..., \alpha.

  3. V \mathcal{V} : check if the above equalities hold with one oracle query to each EiE_i.

    // Up to this point, the correctness of EiE_i is are not guaranteed yet. V \mathcal{V} and P\mathcal{P} runs the offline memory checking to check that Ei(j) E_i(j) equals Ti[dimi(j)]T_i[\mathsf{dim}_i(j)] for all j{0,1}logmj \in \{0,1\}^{\log m}.

  4. VP\mathcal{V} \rightarrow \mathcal{P} : τ,γRF\tau, \gamma \in_{R} \mathbb{F}.

    // In practice, cc instances of the sumcheck protocol below can be reduced to a single run of a sumcheck applied to random linear combinations of the polynomials.

  5. VP\mathcal{V} \leftrightarrow \mathcal{P} : For i=1,...,αi = 1, ..., {\alpha}, run a GKR-based grand products protocol to reduce the check Hτ,γ(WS)=?Hτ,γ(RS)Hτ,γ(S)\mathcal{H}_{\tau, \gamma}(\mathsf{WS}) \stackrel{?}{=} \mathcal{H}_{\tau, \gamma}(\mathsf{RS}) \cdot \mathcal{H}_{\tau, \gamma}(\textsf{S}) to the following checks:

Ei(ri)=?vEidimi(ri)=?vi,read_tsi(ri)=?vread_ts,final_ctsi(ri)=?vfinal_ctsE_i(r_i') \stackrel{?}{=} v_{E_i} \\ \mathsf{dim}_i(r_i') \stackrel{?}{=} v_i, \textsf{read\_ts}_i(r_i') \stackrel{?}{=} v_{\textsf{read\_ts}}, \textsf{final\_cts}_i(r_i'') \stackrel{?}{=} v_{\textsf{final\_cts}}
  1. V \mathcal{V} : check the equalities above with an oracle query to each of EiE_i, dimi\mathsf{dim}_i, read_tsi\textsf{read\_ts}_i and final_ctsi\textsf{final\_cts}_i.

  • Input : A polynomial commitment to the multilinear polynomials a~:FlogmF\tilde{a} : \mathbb{F}^{\log m} \rightarrow \mathbb{F}, and a description of a structured table TT of size NN.

  • P\mathcal{P} sends a Surge\textsf{Surge}-commitment to the multilinear extension M~\widetilde{M} of a matrix M{0,1}m×NM \in \{0,1\}^{m \times N}. This consists of cc different (logm\log m)-variate multilinear polynomials dim1,...,dimc\mathsf{dim}_1, ..., \mathsf{dim}_c.

  • V \mathcal{V} picks a random rFlogmr \in \mathbb{F}^{\log m} and sends rr to P\mathcal{P}. V \mathcal{V} makes one evaluation query to a~\tilde{a} to learn a~(r)\tilde{a}(r).

  • P\mathcal{P} and V \mathcal{V} apply Surge\textsf{Surge}, allowing P\mathcal{P} to prove that y{0,1}logNM~(r,y)T[y]=a~(r)\sum_{y \in \{0,1\}^{log N}} \widetilde{M}(r,y)\cdot T[y] = \tilde{a}(r).


Cost

Cost of Spark

It's worth of noting that while at first glance it seems like increasing cc incurs an increase in the commitment size and verification cost in a factor linear to cc, since most of the PCS have efficient batching properties for evaluation proofs, the factor cc can be omitted (i.e. the prover and verifier costs for verifying polynomial evaluations do not grow with cc).

Prover time

The prover can compute its message for the sumcheck protocol with O(bkαm)O(b \cdot k \cdot \alpha \cdot m) field operations where α=kc\alpha = k \cdot c and bb is the number of monomials in gg. For many tables of practical interest, bb and kk factor can be eliminated (where gg has a 1 or 2 total degree, which is also the case of the range check example above), which gives O(cm)O(c \cdot m) total cost.

For the offline memory checking argument, the costs for the prover is similar to Spark\mathsf{Spark} : O(αm+αN1/c)O(\alpha \cdot m + \alpha \cdot N^{1/c}) field operations plus committing to a low-order number of field elements.

Verifier cost

The verifier performs O(klogm)O(k \cdot \log m) field operations. The costs of the memory checking argument, which can be batched, are identical to Spark\mathsf{Spark}.


References

Written by Carson Lee

Last updated