LogoLogo
LogoLogo
  • Introduction
    • About Us
    • Notations & Definitions
      • MPC
      • ZK
    • Contribute to this Site!
  • Primitives
    • Multiplication
      • Karatsuba Multiplication
      • Toom-Cook Multiplication
    • NAF (Non-adjacent form)
    • Chinese Remainder Theorem (CRT)
    • Euclidean Algorithm
      • Extended Euclidean Algorithm
      • Binary Euclidean Algorithm
      • Extended Binary Euclidean Algorithm
    • Coding Theory
      • Linear Code
    • Number Theoretic Transform
    • Abstract Algebra
      • Group
        • -Morphisms
        • Batch Inverse
      • Elliptic Curve
        • Weierstrass Curve
          • Coordinate Forms
          • Fast Elliptic Curve Arithmetic and Improved WEIL Pairing Evaluation
        • Edwards Curve
          • Coordinate Forms
          • Twisted Edwards ↔ Short Weierstrass Transformation
        • Batch Inverse for Batch Point Additions
        • Scalar Multiplication
          • Double-and-add
          • GLV Decomposition
        • MSM
          • Pippenger's Algorithm
          • Signed Bucket Index
          • CycloneMSM
          • EdMSM
          • cuZK
        • 2-Chain and 2-Cycle of Elliptic Curves
    • Encryption Scheme
      • ElGamal Encryption
    • Modular Arithmetic
      • Modular Reduction
        • Barrett Reduction
        • Montgomery Reduction
      • Modular Inverse
        • Bernstein-Yang's Inverse
    • Multiset Check
    • Sumcheck
    • Commitment Scheme
      • Fflonk
      • SHPlonk
      • Zeromorph
  • MPC
    • Yao's Garbled Circuits
    • GMW
    • BMR
  • ZK
    • Arithmetization
      • R1CS
      • PLONK
      • AIR
    • Folding
      • LatticeFold
      • Nova
        • Nova over Cycles of Curves
    • Lookup
      • Lasso
      • LogUp-GKR
    • SNARK
      • Groth16
      • HyperPlonk
      • Spartan
        • SPARK
    • STARK
      • Additive NTT
      • Basefold
      • Binius
      • Brakedown
      • CircleSTARK
      • FRI
        • FRI Security Features and Optimizations
      • DEEP FRI
      • STIR
      • WHIR
    • Distributed ZK
      • Ryan's Trick for Distributed Groth16
  • Application
    • zkLogin
    • zkHoldem
    • zkTLS
      • DECO
      • Proxying is enough
  • zkVM
Powered by GitBook
On this page
  • Overview
  • Lookup arguments.
  • Jolt and Lasso.
  • Preliminiaries
  • Sparsity.
  • Dense representation of multilinear polynomials.
  • Technical overview
  • Spark
  • Detailed explanation
  • Offline memory checking
  • The evaluation proof
  • Generalization and Specialization
  • Cost
  • Cost of Spark
  • Prover time
  • Verifier cost
  • References
Export as PDF
  1. ZK
  2. Lookup

Lasso

PreviousLookupNextLogUp-GKR

Last updated 1 month ago

by Srinath Setty, Justin Thaler and Riad Wahby


Overview

This paper introduces Lasso\mathsf{Lasso}Lasso, which is a lookup argument that allows an untrusted prover to commit to a vector a∈Fma \in \mathbb{F}^ma∈Fm and prove that all entries of aaa reside in some predetermined table t∈Fnt \in \mathbb{F}^nt∈Fn.

Lasso\mathsf{Lasso}Lasso's performance characteristics unlock the so-called "".

A key difference of Lasso\mathsf{Lasso}Lasso from other lookup arguments is that, if the table ttt is structured(as we will explain later), then no party needs to commit to ttt, enabling the use of enormous tables (e.g. of size 21282^{128}2128 or larger). Most common use cases of lookup argument, such as range checks, bitwise operations or big-number arithmetic, obtains huge benefits in cost from this characteristic of Lasso\mathsf{Lasso}Lasso.

Lookup arguments.

Imagine that a prover wishes to establish that at no point in a program's execution did any integer ever exceed 21282^{128}2128.

A naive approach to accomplish this inside a circuit-SAT instance is to have the circuit take 128 field elements as its advice inputs, which constitute the binary representation of xxx. The circuit checks that all 128 advice elements are in {0,1}\{0,1\}{0,1} and they indeed equal to the binary representation of xxx, i.e., x=∑i=01272i⋅bix = \sum^{127}_{i=0} 2^i \cdot b_ix=∑i=0127​2i⋅bi​. A single overflow check turns into at least 129 constraints and additional 128 field elements in witness.

Lookup tables offer a better approach. The verifier initializes a lookup table containing all integers between 000 and 2128−12^{128} - 12128−1. Then the overflow check above amounts to simply confirming that xxx is in the table. Of course a table of size 21282^{128}2128 is way too large to be represented, even for the prover. This paper is devoted to describe the techniques to enable such a table lookup without requiring the table to ever be explicitly materialized by either of them. Indeed, the table in this example is highly structured in a way that it can be optimized using the techniques we will introduce.

Lasso\mathsf{Lasso}Lasso's starting point is Spark\mathsf{Spark}Spark, which is an optimal polynomial commitment scheme for sparse multilinear polynomials from Spartan. Just as Spartan, Lasso\mathsf{Lasso}Lasso can be instantiated with any multilinear PCS. Particularly for SNARKs that have the prover commit to the witness using a multilinear PCS, Lasso\mathsf{Lasso}Lasso can be applied seamlessly to provide a lookup argument in either R1CS or Plonkish circuit satisfiability.

Jolt and Lasso.

Jolt\mathsf{Jolt}Jolt is a RISC-V-powered zkVM which achieved substantial improvements in front-end design by exploiting Lasso\mathsf{Lasso}Lasso. Jolt\mathsf{Jolt}Jolt replaces the instruction execution part of a zkVM circuit with Lasso\mathsf{Lasso}Lasso by letting it look up into a gigantic evaluation table rather than evaluating it in a circuit. For each of the primitive RISC-V instructions, the resulting evaluation table has the structure that we require to apply Lasso\mathsf{Lasso}Lasso for a significant reduce in the cost. Thaler states that Lasso\mathsf{Lasso}Lasso and Jolt\mathsf{Jolt}Jolt together essentially achieve a vision outlined by Barry WhiteHat called the - seeking to transform arbitrary computer program into circuits that only perform lookups.


Preliminiaries

Sparsity.

Dense representation of multilinear polynomials.

The fact that the MLE of a function is unique offers the following method to represent any multilinear polynomial, which is briefer than its point-value representation or coefficient representation.

Definition 1.


Technical overview

Proving the matrix-vector multiplication above is equivalent to confirming that

Equation (1):


Spark

Algorithm 1.

Example.

t₁
t₂
t₃

Detailed explanation

Above equation can be simplified to:

Equation (2):


to the following checks:



Offline memory checking

Offline memory checking is a protocol between an untrusted memory and a trusted checker, which is the prover and the verifier in our protocol's context, respectively.

Claim 1.


I skip the proof of the claim, but the example below demonstrates how the first direction of the claim holds. Three memory slots with addresses 1, 2 and 3 are initialized with values 100, 101 and 102. Three read operations at address 2, 2 and 3.

RS
WS
Read address 2.
RS
WS
Read address 2.
RS
WS
Read address 3.
RS
WS
S

(1, 100, 0)

(1, 100, 0)

(1, 100, 0)

(1, 100, 0)

(1, 100, 0)

(2, 101, 0)

(2, 101, 0)

(2, 101, 0) (2, 101, 1)

(2, 101, 0) (2, 101, 1)

(2, 101, 0) (2, 101, 1) (2, 101, 2)

(2, 101, 0) (2, 101, 1)

(2, 101, 0) (2, 101, 1) (2, 101, 2)

(2, 101, 2)

(3, 102, 0)

(3, 102, 0)

(3, 102, 0)

(3, 102, 0)

(3, 102, 0) (3, 102, 1)

(3, 102, 1)


The evaluation proof

Claim 2.


Generalization and Specialization

  1. Specializing for Lasso

k
row(k)
col(k)
val(k)

00

0

2

1

01

1

3

1

10

2

0

1

11

3

0

1

  1. Generalizing the struct of lookup tables


Example.


Going back, we rewrite the Equation (2) as below:

Now the prover should prove that

Final PIOP is described below.



Cost

Cost of Spark

Prover time

Verifier cost


References

As they already explained in previous presentations, I skip the parts for Multilinear extensions, SNARKs, Polynomial commitment scheme, Polynomial IOP and the .

mmm-sparse refers to multilinear polynomials g:Fl→Fg: \mathbb{F}^l \rightarrow \mathbb{F}g:Fl→F in lll variables such that g(x)≠0g(x) \neq 0g(x)=0 for at most mmm values of x∈{0,1}lx \in \{0,1\}^lx∈{0,1}l. In other words, ggg has at most mmm non-zero coefficients in its multilinear Lagrange polynomial basis. If m≪2lm \ll 2^lm≪2l, only a tiny fraction of the possible coefficients are non-zero. If m=Θ(2l)m = \Theta(2^l)m=Θ(2l), we refer to such ggg as a dense polynomial.

A multilinear polynomial g:Fl→Fg: \mathbb{F}^l \rightarrow \mathbb{F}g:Fl→F can be represented uniquely by the list of tuples LLL such that for all i∈{0,1}l,(to-field(i),g(i))∈Li \in \{0,1\}^l, (\mathsf{to}\text{-}\mathsf{field}(i), g(i)) \in Li∈{0,1}l,(to-field(i),g(i))∈L if and only if g(i)≠0g(i) \neq 0g(i)=0.

Here, to-field\mathsf{to}\text{-}\mathsf{field}to-field is the canonical injection from {0,1}l\{0,1\}^l{0,1}l to F\mathbb{F}F and to-bits\mathsf{to}\text{-}\mathsf{bits}to-bits is the inverse mapping. You can think of them as transformations between a field element and its binary representation.

We refer to LLL as the dense representation of a sparse polynomial ggg.

A multilinear polynomial ggg in lll variables is a sparse multilinear polynomial if ∣DenseRepr(g)∣|\mathsf{DenseRepr}(g)|∣DenseRepr(g)∣ is sub-linear in Θ(2l) \Theta(2^l)Θ(2l). Otherwise, it is a dense multilinear polynomial.

For a lookup argument from a∈Fma \in \mathbb{F}^ma∈Fm to t∈Fnt \in \mathbb{F}^nt∈Fn, it suffices for a prover to show that it knows a sparse matrix M∈Fm×nM \in \mathbb{F}^{m \times n}M∈Fm×n such that (1) for each row of MMM, only one cell has a value of 1 and the rest are zeroes and that (2) M⋅t=aM \cdot t = aM⋅t=a.

Below demonstrates a simple example of demonstrating that a=[2,3,0]a = [2, 3, 0]a=[2,3,0] is a valid lookup into the table t=[0,1,2,3]t = [0, 1, 2, 3]t=[0,1,2,3]. You can see that MMM is indeed sparse.

[001000011000]⋅[0123]=[230] \begin{bmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ 3 \\ \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \\ 0 \\ \end{bmatrix}​001​000​100​010​​⋅​0123​​=​230​​
∑y∈{0,1}log⁡nM~(r,y)⋅t~(y)=a~(r) \sum_{{y} \in \{0,1\}^{\log n}} \widetilde{M}({r},{y}) \cdot \tilde{t}({y}) = \tilde{a}({r}) y∈{0,1}logn∑​M(r,y)⋅t~(y)=a~(r)

for an r∈Flog⁡m{r} \in \mathbb{F}^{\log m}r∈Flogm randomly sampled by the verifier.

To run a sumcheck protocol on this equation to prove it, the verifier should be able to evaluate M~(r,y)⋅t~(y)\widetilde{M}({r},{y}) \cdot \tilde{t}({y})M(r,y)⋅t~(y) at some random yyy succintly, i.e. with a cost independent of nnn, regarding that the size of lookup table might be very large. Clearly, we cannot achieve this by committing to M~\widetilde{M}M and t~\tilde{t}t~ using normal PCS.

MMM is a sparse matrix where only mmm elements out of m×nm \times nm×n are non-zero. Hence, the above is efficiently provable by having the prover commit to the sparse polynomial M~\widetilde{M}M using Surge\mathsf{Surge}Surge, which is a generalization of Spark\mathsf{Spark}Spark for Lasso\mathsf{Lasso}Lasso, and instead of commiting to the enormous ttt, sending only its succint description assuming that it is structured in a way we define. Once we have such a commitment scheme, we instantly obtain the desired lookup argument.

Spark\mathsf{Spark}Spark, as explained above, is a polynomial commitment scheme for sparse polynomials. Let ggg be a (log⁡N\log NlogN)-variate multilinear polynomial with sparsity mmm. The prover commits to a unique dense representation of the sparse polynomial ggg, which is the list that specifies only the multilinear Lagrange bases with non-zero coefficients. When the verifier requests an evaluation g(r)g(r)g(r), the prover returns the claimed evaluation vvv along with the evaluation proof.

Let ccc be such that N=mcN = m^cN=mc without loss of generality. There is a simple algorithm that takes as input the dense representation of ggg and outputs g(r)g(r)g(r) in O(c⋅m)O(c \cdot m)O(c⋅m) time.

Decompose the log⁡N=clog⁡m\log N = c \log mlogN=clogm variables of r rr into ccc blocks, each of size log⁡m\log mlogm, writing r=(r1,...,rc)∈(Flog⁡m)cr = (r_1, ..., r_c) \in (\mathbb{F}^{\log m})^cr=(r1​,...,rc​)∈(Flogm)c. Then any (log⁡N\log NlogN)-variate Lagrange basis polynomial evaluated at rrr can be expressed as a product of ccc smaller Lagrange basis polynomials. With O(c⋅m)O(c\cdot m)O(c⋅m) cost of pre-computing a write-once memory, each basis polynomial can be calculated by ccc memory look-ups and extra field operations.

Let N=26N = 2^6N=26, c=3c = 3c=3 and m=22m = 2^2m=22.

Let g(x)=1g(x) = 1g(x)=1 for x=000001,...,000100x = 000001, ..., 000100x=000001,...,000100. Otherwise, g(x)=0g(x) = 0g(x)=0 for x∈{0,1}6x \in \{0,1\}^6x∈{0,1}6. Subscript 1, 2, and 3 each denotes the 2-bits length partition of the 6-bits binary string. Xi(r)\mathcal{X}_i(r)Xi​(r) is equal to eq~(i,r)\widetilde{\mathsf{eq}}(i, r)eq​(i,r).

Observe that the evaluation of g(x)g(x)g(x) on some rrr can be decomposed in this way:

g(r)=∑i=141⋅Xto-bits(i)(r)=∑i=141⋅(Xto-bits(i)1(r1)⋅Xto-bits(i)2(r2)⋅Xto-bits(i)3(r3))g({r}) = \sum^{4}_{i=1} 1 \cdot \mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)}({r}) \\ = \sum^4_{i=1} 1\cdot (\mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)_{1}}({r_{1}}) \cdot \mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)_{2}}({r_{2}}) \cdot \mathcal{X}_{\mathsf{to}\text{-}\mathsf{bits}(i)_{3}}({r_{3}}))g(r)=i=1∑4​1⋅Xto-bits(i)​(r)=i=1∑4​1⋅(Xto-bits(i)1​​(r1​)⋅Xto-bits(i)2​​(r2​)⋅Xto-bits(i)3​​(r3​))

The memory is composed of c(=3)c(=3)c(=3) subtables t1,t2t_1, t_2t1​,t2​ and t3t_3t3​ where each tit_iti​ is a table of size mmm populated by the values from X0,0(ri)\mathcal{X}_{0,0}({r_i})X0,0​(ri​) to X1,1(ri)\mathcal{X}_{1,1}({r_i})X1,1​(ri​). Each subtable can be computed by the prover in O(m)O(m)O(m), leading to a total computation cost of O(c⋅m)O(c \cdot m)O(c⋅m).

Notice that this is cheaper with a factor of log⁡m\log mlogm than a naive approach, whose cost is O(mlog⁡N)O(m \log N)O(mlogN) since there are mmm Lagrange basis polynomials with log⁡N\log NlogN variables.

The efficiency of the algorithm above comes from the fact that ggg is sparse. Spark\mathsf{Spark}Spark is merely a SNARK as an argument that the prover correctly ran this algorithm on the committed description of ggg. For simplicity, in the demonstrations below, we talk about the special case where c=2c = 2c=2 and then expand it for a general result.

Let DDD be a mmm-sparse (2log⁡m2\log m2logm)-variate multilinear polynomial, as we fixed the value of ccc to be 2. For now, you can think of DDD as an m×mm \times mm×m matrix with only mmm nonzero elements.

When its dense representation is given, one can express its evaluation as below, by decomposing the (2log⁡m2\log m2logm)-variate Lagrange basis polynomial into two (log⁡m\log mlogm)-variate Lagrange basis polynomial.

D(rx,ry)=∑(i,j)∈{0,1}log⁡m×{0,1}log⁡m:D(i,j)≠0D(i,j)⋅eq~(i,rx)⋅eq~(j,ry)D({r_x}, {r_y}) = \sum_{({i},{j})\in\{0,1\}^{\log \mathsf{m}} \times \{0,1\}^{\log \mathsf{m}} : D({i},{j}) \neq 0} D({i},{j}) \cdot \widetilde{\mathsf{eq}}({i}, {r_x}) \cdot \widetilde{\mathsf{eq}}({j}, {r_y})D(rx​,ry​)=(i,j)∈{0,1}logm×{0,1}logm:D(i,j)=0∑​D(i,j)⋅eq​(i,rx​)⋅eq​(j,ry​)

Let's introduce three (log⁡m\log mlogm)-variable multilinear polynomials val\mathsf{val}val, row\mathsf{row}row and col\mathsf{col}col, which extend the dense representation of DDD. That is, k∈{0,1}log⁡mk \in \{0,1\}^{\log m}k∈{0,1}logm iterates the m nonzero elements in DDD in some canonical order, and for each kkk, there exist some i,ji, ji,j such that row(k)=to-field(i)\mathsf{row}(k) = \mathsf{to}\text{-}\mathsf{field}(i)row(k)=to-field(i), col(k)=to-field(j)\mathsf{col}(k) = \mathsf{to}\text{-}\mathsf{field}(j)col(k)=to-field(j) and val(k)=D(i,j)\mathsf{val}(k) = D(i, j)val(k)=D(i,j).

val\mathsf{val}val maps kkk to each value of the nonzero elements and row\mathsf{row}row and col\mathsf{col}col maps kkk to a corresponding location of it in DDD(or, equivalently, to the index of each two subtables in Algorithm 1).

D(rx,ry)=∑k∈{0,1}log⁡mval(k)⋅eq~(to-bits(row(k)),rx)⋅eq~(to-bits(col(k)),ry)D(r_x, r_y) = \sum_{k\in\{0,1\}^{\log m}} \mathsf{val}(k) \cdot \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x) \cdot \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{col}(k)), r_y)D(rx​,ry​)=k∈{0,1}logm∑​val(k)⋅eq​(to-bits(row(k)),rx​)⋅eq​(to-bits(col(k)),ry​)

Let Erx(k)=eq~(to-bits(row(k)),rx)E_{rx}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x)Erx​(k)=eq​(to-bits(row(k)),rx​) and Ery(k)=eq~(to-bits(col(k)),ry)E_{ry}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{col}(k)), r_y)Ery​(k)=eq​(to-bits(col(k)),ry​) each of which are (log⁡m\log mlogm)-variate. We define a PIOP for proving the evaluation of DDD at (rx,ry)(r_x, r_y)(rx​,ry​). We assume that val\mathsf{val}val is committed to beforehand.

P→V\mathcal{P} \rightarrow \mathcal{V}P→V : Erx(k)E_{rx}(k)Erx​(k) and Ery(k)E_{ry}(k)Ery​(k) as oracles.

P↔V\mathcal{P} \leftrightarrow \mathcal{V}P↔V : run the sumcheck reduction to reduce this statement

v=∑k∈{0,1}log⁡mval(k)⋅Erx(k)⋅Ery(k)v = \sum_{k\in\{0,1\}^{\log m}} \mathsf{val}(k) \cdot E_{rx}(k) \cdot E_{ry}(k)v=k∈{0,1}logm∑​val(k)⋅Erx​(k)⋅Ery​(k)

val(rz)=?vval\mathsf{val}(r_z) \stackrel{?}{=} v_{val}val(rz​)=?vval​

Erx(rz)=?vErxE_{rx}(r_z) \stackrel{?}{=} v_{E_{rx}}Erx​(rz​)=?vErx​​ and Ery(rz)=?vEryE_{ry}(r_z) \stackrel{?}{=} v_{E_{ry}}Ery​(rz​)=?vEry​​.

where the claimed values are provided by P \mathcal{P}P and rzr_zrz​ is randomly sampled by V\mathcal{V}V at the end of the sumcheck protocol.

V\mathcal{V}V : check if the three equalities above hold with an oracle query to each of the polynomials.

A key part of achieving a sound argument with the PIOP above is to enable the verifier to check that the committed ErxE_{rx}Erx​ and EryE_{ry}Ery​ are built correctly, i.e.

∀k∈{0,1}log⁡m,Erx(k)=eq~(to-bits(row(k)),rx)\forall k \in \{0,1\}^{\log m}, E_{rx}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x)∀k∈{0,1}logm,Erx​(k)=eq​(to-bits(row(k)),rx​)

∀k∈{0,1}log⁡m,Ery(k)=eq~(to-bits(col(k)),ry)\forall k \in \{0,1\}^{\log m}, E_{ry}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{col}(k)), r_y)∀k∈{0,1}logm,Ery​(k)=eq​(to-bits(col(k)),ry​).

Checking the above is equivalent to checking if an untrusted prover ran Algorithm 1 above (which is the case where c=2c = 2c=2 and the two subtables correspond to ErxE_{rx}Erx​ and EryE_{ry}Ery​), i.e., that the prover initialized the memory with correct values and the subsequent memory read operations occurred in a correct manner. We introduce offline memory checking, which is a protocol that reduces such memory consistency argument to a multiset equality check.

The checker maintains two local state sets : RS\mathsf{RS}RS and WS\mathsf{WS}WS. RS\mathsf{RS}RS is initially empty. For M\mathsf{M}M-sized memory, WS\mathsf{WS}WS is initialized as following : for all i∈[N1/c]i \in [N^{1/c}]i∈[N1/c], the tuple (i,vi,0)(i, v_i, 0) (i,vi​,0) is included. Here, the first element is the index, second one is the value and the third one is the accessed count of each memory slot.

For each read operation at address aaa, suppose that the untrusted memory responds with a value-count pair (v,t)(v, t)(v,t). Then the checker updates its local state as follows:

RS←RS∪{(a,v,t)}\mathsf{RS} \leftarrow \mathsf{RS} \cup \{(a, v, t)\}RS←RS∪{(a,v,t)}

store (v,t+1)(v, t+1)(v,t+1) at address aaa in the untrusted memory, and

WS←WS∪{(a,v,t+1)}\mathsf{WS} \leftarrow \mathsf{WS} \cup \{(a,v,t+1)\}WS←WS∪{(a,v,t+1)}.

If for every read operation, the untrusted memory returns the tuple last written to that location, then there exists a set S\mathsf{S}S with cardinality M\mathsf{M}M consisting of tuples of the form (k,vk,tk)(k, v_k, t_k)(k,vk​,tk​) for all k∈[M]k \in [\mathsf{M}]k∈[M] such that WS=RS∪S\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}WS=RS∪S, where S\mathsf{S}S is simply the current memory view. Conversely, if the untrusted memory ever returns a value vvv for a memory read k∈[M]k \in [\mathsf{M}]k∈[M] which does not equal the value initially written to cell kkk, then there does not exist any set S\mathsf{S}S such that WS=RS∪S\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}WS=RS∪S.

You can see that there exist an S\mathsf{S}S such that WS=RS∪S\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}WS=RS∪S, and the S\mathsf{S}S is equal to the final memory view.

Now we define the counter polynomials. Given the size M\mathsf{M}M memory and a list of mmm addresses involved in read operations, one can compute two vectors Cr∈FmC_r \in \mathbb{F}^mCr​∈Fm and Cf∈FMC_f \in \mathbb{F}^{\mathsf{M}}Cf​∈FM where Cr[k]C_r[k]Cr​[k] stores the 'correct' count that would have been returned for kkk-th read operation and Cf[j]C_f[j]Cf​[j] stores the 'correct' final count stored at memory location jjj after the mmm read operations. Computing these requires O(m)O(m)O(m) operations over F\mathbb{F}F.

Let read_ts=Cr~\textsf{read\_ts} = \widetilde{C_r}read_ts=Cr​​, write_cts=Cr~+1\textsf{write\_cts} = \widetilde{C_r} + 1write_cts=Cr​​+1, and final_cts=Cf~\mathsf{final\_cts} = \widetilde{C_f}final_cts=Cf​​. We refer to these polynomials as counter polynomials.

Regarding a (2log⁡M2 \log M2logM)-variate multilinear polynomial, suppose that (row,col,val\mathsf{row, col, val}row,col,val) is committed in advance, and

Erx,Ery,read_tsrow,final_ctsrow,read_tscol,final_ctscolE_{rx}, E_{ry}, \textsf{read\_ts}_{\mathsf{row}}, \textsf{final\_cts}_{\mathsf{row}}, \textsf{read\_ts}_{\mathsf{col}}, \textsf{final\_cts}_{\mathsf{col}}Erx​,Ery​,read_tsrow​,final_ctsrow​,read_tscol​,final_ctscol​

are the polynomials sent by the prover at the beginning of the evaluation proof. Notice that the verifier can manually compute write_ctsrow\mathsf{write\_cts}_{\mathsf{row}}write_ctsrow​ and write_ctscol\mathsf{write\_cts}_{\mathsf{col}}write_ctscol​.

If ErxE_{rx}Erx​ is correctly computed, i.e. ∀k∈{0,1}log⁡m,Erx(k)=eq~(to-bits(row(k)),rx)\forall k \in \{0,1\}^{\log m}, E_{rx}(k) = \widetilde{\mathsf{eq}}(\mathsf{to}\text{-}\mathsf{bits}(\mathsf{row}(k)), r_x)∀k∈{0,1}logm,Erx​(k)=eq​(to-bits(row(k)),rx​),

then the following holds : WS=RS∪S\mathsf{WS} = \mathsf{RS} \cup \mathsf{S}WS=RS∪S, where

WS={(to-field(i),eq~(i,rx),0):i∈{0,1}log⁡M}∪{(row(k),Erx(k),write_ctsrow(k)):k∈{0,1}log⁡m};\mathsf{WS} = \{(\textsf{to-field}(i), \widetilde{\mathsf{eq}}(i, r_x), 0): i \in \{0,1\}^{\log \mathsf{M}} \} \cup \{ (\mathsf{row}(k), E_{rx}(k), \textsf{write\_cts}_{\mathsf{row}}(k)) : k \in \{0,1\}^{\log m} \} ;WS={(to-field(i),eq​(i,rx​),0):i∈{0,1}logM}∪{(row(k),Erx​(k),write_ctsrow​(k)):k∈{0,1}logm};

RS={(row(k),Erx(k),read_tsrow(k)):k∈{0,1}log⁡m};\mathsf{RS} = \{(\mathsf{row}(k), E_{rx}(k), \textsf{read\_ts}_{\mathsf{row}}(k)) : k \in \{0,1\}^{\log m}\}; RS={(row(k),Erx​(k),read_tsrow​(k)):k∈{0,1}logm}; and

S={(to-field(i),eq~(i,rx),final_ctsrow(i)):i∈{0,1}log⁡M}.\mathsf{S} = \{(\textsf{to-field}(i), \widetilde{\mathsf{eq}}(i, r_x), \textsf{final\_cts}_{\textsf{row}}(i)): i \in \{0,1\}^{\log {\mathsf{M}}} \}. S={(to-field(i),eq​(i,rx​),final_ctsrow​(i)):i∈{0,1}logM}.

The high-level conclusion of the memory checking above is that when ErxE_{rx}Erx​ is the polynomial that is produced by an untrusted memory, the checker (verifier) initializes the memory with eq~(i,rx)\widetilde{\mathsf{eq}}(i, r_x)eq​(i,rx​) which are the expected 'correct' values, and then by checking if the memory lookups are correctly done for each of the Erx(k)E_{rx}(k)Erx​(k), the verifier is convinced that the provided ErxE_{rx}Erx​ is correct. The same logic is applied for EryE_{ry}Ery​.

In reality, the verifier doesn't perform the check above manually. Instead, to check that Hτ,γ(WS)=Hτ,γ(RS)⋅Hτ,γ(S)\mathcal{H}_{\tau, \gamma}(\mathsf{WS}) = \mathcal{H}_{\tau, \gamma}(\mathsf{RS}) \cdot \mathcal{H}_{\tau, \gamma}(\textsf{S})Hτ,γ​(WS)=Hτ,γ​(RS)⋅Hτ,γ​(S), two parties run a grand product argument, which is a multiset equality check that is performed via a GKR protocol whose circuit is simply a binary tree of multiplication gates. Since running a GKR protocol is reduced to a single evaluation of the input vector on a random value sampled by the verifier, and the verifier virtually has an oracle access to all of the required polynomials assuming they are committed by the prover (or is able to evaluate by himself), the memory check above can be done efficiently with a few evaluation proofs. For a more detailed explanation on the grand product argument, refer to of the paper.

We will generalize the above protocol, Spark\mathsf{Spark}Spark, so that it supports an arbitrary value for cc c rather than a fixed value of 2, which implies that the gigantic lookup table can be decomposed into even smaller subtables of size N1/cN^{1/c}N1/c.

Subsequently we will specialize Spark\mathsf{Spark}Spark for Lasso\mathsf{Lasso}Lasso with some cost optimization exploiting the properties of lookup arguments. Eventually, we will further generalize Spark\mathsf{Spark}Spark by relaxing the conditions that make a table 'structured' for Lasso\mathsf{Lasso}Lasso to be applied. With these modifications altogether, we introduce Surge\mathsf{Surge}Surge, as a sparse polynomial commitment scheme for Lasso\mathsf{Lasso}Lasso and its final PIOP.

Supporting log⁡N=clog⁡m\log N = c \log mlogN=clogm variables, rather than 2log⁡m2 \log m2logm.

ccc can be interpreted as the number of the subtables which are the result of decomposition of the original table of size NNN. We can simply factor eq~clog⁡m\widetilde{\mathsf{eq}}_{c \log \mathsf{m}}eq​clogm​ into a product of ccc terms, instead of two terms, each of which are a (log⁡m)(\log m)(logm)-variate multilinear Lagrange basis polynomial. Hence, from this point, we use the notation of dimi(x)\mathsf{dim_i}(x)dimi​(x) for i∈[c]i \in [c]i∈[c], for representing the injection from a boolean hypercube to an index in each subtable, instead of row(x)\mathsf{row}(x)row(x) and col(x)\mathsf{col}(x)col(x).

As we saw in Equation (1), the sparse m×Nm \times Nm×N matrix MMM that we want to commit to can only have value 1 as its non-zero element. It means that the polynomial val(x)\mathsf{val}(x)val(x) is also fixed to 1 and the commitment to it is unnecessary. We can save one polynomial to commit to with this per each lookup argument.

Furthermore, if we let c=2 c = 2c=2 and sample rxr_xrx​ from Flog⁡m \mathbb{F}^{\log m}Flogm and ryr_yry​ from Flog⁡N\mathbb{F}^{\log N}FlogN (this means that each rxr_xrx​ and ryr_yry​ represents the row index and column index of matrix MMM), since each row has exactly one non-zero element, to-bits(row(k))\textsf{to-bits}(\textsf{row}(k))to-bits(row(k)) is simply kkk. Erx(k)=eq~(k,rx)E_{rx}(k) = \widetilde{\mathsf{eq}}(k, r_x)Erx​(k)=eq​(k,rx​) can be evaluated by verifier on its own, thus there is no need for committing to it or proving its well-formness.

[0010000110001000]⋅[0123]=[2300] \begin{bmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ 3 \\ \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \\ 0 \\ 0 \\ \end{bmatrix}​0011​0000​1000​0100​​⋅​0123​​=​2300​​

MMM is a 4 x 4 matrix with sparsity of 4. Let kkk be a two bits binary string for indexing 4 nonzero elements, i.e. k=00,01,10,11k = 00, 01, 10, 11k=00,01,10,11. The table below is the description of mapping row\mathsf{row}row and col\mathsf{col}col. It is obvious that the mapping row(k)\mathsf{row}(k)row(k) is simply an iteration from 0 to 3, which is virtually equal to kkk. This is because there are exactly one nonzero element per each row. Therefore, it is unnecessary to commit to the polynomial Erx(k)E_{rx}(k)Erx​(k), as well as val(k)\mathsf{val}(k)val(k). Check the table below.

We redefine the term 'structured' to be more comprehensive and well-defined for the properties that the lookup tables may have. If (1) a lookup table ttt can be expressed as a tensor product of c≥2c \geq 2c≥2 smaller tables (i.e. decomposible), and (2) for each subtable its multilinear extension polynomial can be evaluated quickly (i.e. MLE-structured), we regard such table ttt as structured and Lasso can be applied.

Suppose that there is an integer k≥1k \geq 1k≥1 and α=k⋅c \alpha = k \cdot cα=k⋅c tables T1,...,TαT_1, ..., T_{\alpha}T1​,...,Tα​ of size N1/cN^{1/c}N1/c along with an α\alphaα-variate multilinear polynomial ggg such that the following holds.

T[r]=g(T1[r1],...,Tk[r1],Tk+1[r2],...,T2k[r2],...,Tα−k+1[rc],...,Tα[rc]).T[r] = g(T_1[r_1], ..., T_k[r_1], T_{k+1}[r_2], ..., T_{2k}[r_2], ..., T_{\alpha-k+1}[r_c], ..., T_{\alpha}[r_c]).T[r]=g(T1​[r1​],...,Tk​[r1​],Tk+1​[r2​],...,T2k​[r2​],...,Tα−k+1​[rc​],...,Tα​[rc​]).

It means that each element of TTT can be briefly expressed by each corresponding element in T1,...,TαT_1, ..., T_{\alpha}T1​,...,Tα​.

A table for a range check in the first example ([1,...,2128][1, ..., 2^{128} ][1,...,2128]) can be easily expressed in the way above with k=1 k = 1k=1 and c=32c = 32c=32, for instance.

T=[1,...,2128]T1=T2=...=T32=[1,...,24]T[r]=2124⋅T1[r1]+2120⋅T2[r2]+...+T32[r32]T = [1, ..., 2^{128}] \\ T_1 = T_2 = ... = T_{32} = [1, ..., 2^4] \\ T[r] = 2^{124} \cdot T_1[r_1] + 2^{120} \cdot T_2[r_2] + ... + T_{32}[r_{32}]T=[1,...,2128]T1​=T2​=...=T32​=[1,...,24]T[r]=2124⋅T1​[r1​]+2120⋅T2​[r2​]+...+T32​[r32​]
∑j∈{0,1}log⁡NM~(r,j)⋅T[j]=a~(r)\sum_{j \in \{0,1\}^{\log N}} \widetilde{M}(r, j) \cdot T[j] = \tilde{a}(r)j∈{0,1}logN∑​M(r,j)⋅T[j]=a~(r)

If and only if M⋅t=aM \cdot t = aM⋅t=a, the equation above holds, with a soundness error log⁡m/∣F∣ \log m / |\mathbb{F}|logm/∣F∣. In Lasso\mathsf{Lasso}Lasso, after committing to M~\widetilde{M}M(by committing to dim1,...,dimc\mathsf{dim_1}, ..., \mathsf{dim_c}dim1​,...,dimc​) and a~\tilde{a}a~, the verifier sends random sampled rrr to see if above holds.

Let nz(i)\mathsf{nz}(i)nz(i) denote the unique nonzero column in row iii of MMM. We can rewrite the LHS as following:

∑j∈{0,1}log⁡NM~(r,j)⋅T[j]=∑i∈{0,1}log⁡meq~(i,r)⋅T[nz(i)]\sum_{j \in \{0,1\}^{\log N}} \widetilde{M}(r, j) \cdot T[j]\\ = \sum_{i \in \{0,1\}^{\log m}} \widetilde{\mathsf{eq}}(i, r) \cdot T[\mathsf{nz}(i)]j∈{0,1}logN∑​M(r,j)⋅T[j]=i∈{0,1}logm∑​eq​(i,r)⋅T[nz(i)]

And with the decomposibility of TTT,

∑i∈{0,1}log⁡meq~(i,r)⋅T[nz(i)]=∑i∈{0,1}log⁡meq~(i,r)⋅g(T1[dim1(i)],...,Tk[dim1(i)],Tk+1[dim2(i)],...,T2k[dim2(i)],...,Tα−k+1[dimc(i)],...,Tα[dimc(i)])\sum_{i \in \{0,1\}^{\log m}} \widetilde{\mathsf{eq}}(i, r) \cdot T[\mathsf{nz}(i)] \\ = \sum_{i \in \{0,1\}^{\log m}} \widetilde{\mathsf{eq}}(i, r) \cdot g(T_1[\mathsf{dim}_1(i)], ..., T_k[\mathsf{dim}_1(i)], T_{k+1}[\mathsf{dim}_2(i)], ..., T_{2k}[\mathsf{dim}_2(i)], ..., T_{\alpha-k+1}[\mathsf{dim}_c(i)], ..., T_{\alpha}[\mathsf{dim}_c(i)])i∈{0,1}logm∑​eq​(i,r)⋅T[nz(i)]=i∈{0,1}logm∑​eq​(i,r)⋅g(T1​[dim1​(i)],...,Tk​[dim1​(i)],Tk+1​[dim2​(i)],...,T2k​[dim2​(i)],...,Tα−k+1​[dimc​(i)],...,Tα​[dimc​(i)])

for some E1,...,EαE_1, ..., E_{\alpha}E1​,...,Eα​, above is equal to a~(r)\tilde{a}(r)a~(r), using the sum-check protocol and

E1,...,EαE_1, ..., E_{\alpha}E1​,...,Eα​ are built with correct lookups into the subtables T1,...,TαT_1, ..., T_{\alpha}T1​,...,Tα​, using the offline memory checking.

Assuming that the verifier can evaluates the MLE of each subtables quickly, both parties can participate in the protocol only with the short descriptions about the subtables, without ever fully materializing the entire lookup table TTT throughout the protocol.

P\mathcal{P}P has committed to ccc multilinear polynomials dim1,...,dimc\mathsf{dim_1}, ..., \mathsf{dim_c}dim1​,...,dimc​, each over log⁡m\log mlogm variables.

P→V\mathcal{P} \rightarrow \mathcal{V}P→V : 2α\alphaα different (log⁡m\log mlogm)-variate multilinear polynomials E1,...,EαE_1, ..., E_{\alpha}E1​,...,Eα​, read_ts1\textsf{read\_ts}_1read_ts1​,..., read_tsα\textsf{read\_ts}_{\alpha}read_tsα​ and α\alphaα different (log⁡N/c\log N / clogN/c)-variate multilinear polynomials final_cts1\textsf{final\_cts}_1final_cts1​, ..., final_ctsα\textsf{final\_cts}_{\alpha}final_ctsα​.

// EiE_iEi​ is purported to specify the values of each of the mmm reads into TiT_iTi​.

// read_ts\textsf{read\_ts}read_ts and final_cts\textsf{final\_cts}final_cts are the "counter polynomials" for each memory slot of the subtables after the computation.

V \mathcal{V}V and P\mathcal{P}P apply the sumcheck protocol to the polynomial h(k):=eq~(r,k)⋅g(E1(k),...,Eα(k))h(k) := \widetilde{\mathsf{eq}}(r,k) \cdot g(E_1(k), ..., E_{\alpha}(k))h(k):=eq​(r,k)⋅g(E1​(k),...,Eα​(k)) which reduces the check of the sum of h(k) h(k)h(k) on the boolean hypercube to : Ei(rz)=?vEiE_i(r_z) \stackrel{?}{=} v_{E_i}Ei​(rz​)=?vEi​​ for i=1,...,αi = 1, ..., \alphai=1,...,α.

V \mathcal{V}V : check if the above equalities hold with one oracle query to each EiE_iEi​.

// Up to this point, the correctness of EiE_iEi​ is are not guaranteed yet. V \mathcal{V}V and P\mathcal{P}P runs the offline memory checking to check that Ei(j) E_i(j)Ei​(j) equals Ti[dimi(j)]T_i[\mathsf{dim}_i(j)]Ti​[dimi​(j)] for all j∈{0,1}log⁡mj \in \{0,1\}^{\log m}j∈{0,1}logm.

V→P\mathcal{V} \rightarrow \mathcal{P}V→P : τ,γ∈RF\tau, \gamma \in_{R} \mathbb{F}τ,γ∈R​F.

// In practice, ccc instances of the sumcheck protocol below can be reduced to a single run of a sumcheck applied to random linear combinations of the polynomials.

V↔P\mathcal{V} \leftrightarrow \mathcal{P}V↔P : For i=1,...,αi = 1, ..., {\alpha}i=1,...,α, run a GKR-based grand products protocol to reduce the check Hτ,γ(WS)=?Hτ,γ(RS)⋅Hτ,γ(S)\mathcal{H}_{\tau, \gamma}(\mathsf{WS}) \stackrel{?}{=} \mathcal{H}_{\tau, \gamma}(\mathsf{RS}) \cdot \mathcal{H}_{\tau, \gamma}(\textsf{S})Hτ,γ​(WS)=?Hτ,γ​(RS)⋅Hτ,γ​(S) to the following checks:

Ei(ri′)=?vEidimi(ri′)=?vi,read_tsi(ri′)=?vread_ts,final_ctsi(ri′′)=?vfinal_ctsE_i(r_i') \stackrel{?}{=} v_{E_i} \\ \mathsf{dim}_i(r_i') \stackrel{?}{=} v_i, \textsf{read\_ts}_i(r_i') \stackrel{?}{=} v_{\textsf{read\_ts}}, \textsf{final\_cts}_i(r_i'') \stackrel{?}{=} v_{\textsf{final\_cts}}Ei​(ri′​)=?vEi​​dimi​(ri′​)=?vi​,read_tsi​(ri′​)=?vread_ts​,final_ctsi​(ri′′​)=?vfinal_cts​

V \mathcal{V}V : check the equalities above with an oracle query to each of EiE_iEi​, dimi\mathsf{dim}_idimi​, read_tsi\textsf{read\_ts}_iread_tsi​ and final_ctsi\textsf{final\_cts}_ifinal_ctsi​.

Input : A polynomial commitment to the multilinear polynomials a~:Flog⁡m→F\tilde{a} : \mathbb{F}^{\log m} \rightarrow \mathbb{F}a~:Flogm→F, and a description of a structured table TTT of size NNN.

P\mathcal{P}P sends a Surge\textsf{Surge}Surge-commitment to the multilinear extension M~\widetilde{M}M of a matrix M∈{0,1}m×NM \in \{0,1\}^{m \times N}M∈{0,1}m×N. This consists of ccc different (log⁡m\log mlogm)-variate multilinear polynomials dim1,...,dimc\mathsf{dim}_1, ..., \mathsf{dim}_cdim1​,...,dimc​.

V \mathcal{V}V picks a random r∈Flog⁡mr \in \mathbb{F}^{\log m}r∈Flogm and sends rrr to P\mathcal{P}P. V \mathcal{V}V makes one evaluation query to a~\tilde{a}a~ to learn a~(r)\tilde{a}(r)a~(r).

P\mathcal{P}P and V \mathcal{V}V apply Surge\textsf{Surge}Surge, allowing P\mathcal{P}P to prove that ∑y∈{0,1}logNM~(r,y)⋅T[y]=a~(r)\sum_{y \in \{0,1\}^{log N}} \widetilde{M}(r,y)\cdot T[y] = \tilde{a}(r)∑y∈{0,1}logN​M(r,y)⋅T[y]=a~(r).

It's worth of noting that while at first glance it seems like increasing ccc incurs an increase in the commitment size and verification cost in a factor linear to ccc, since most of the PCS have efficient batching properties for evaluation proofs, the factor ccc can be omitted (i.e. the prover and verifier costs for verifying polynomial evaluations do not grow with ccc).

The prover can compute its message for the sumcheck protocol with O(b⋅k⋅α⋅m)O(b \cdot k \cdot \alpha \cdot m)O(b⋅k⋅α⋅m) field operations where α=k⋅c\alpha = k \cdot cα=k⋅c and bbb is the number of monomials in ggg. For many tables of practical interest, bbb and kkk factor can be eliminated (where ggg has a 1 or 2 total degree, which is also the case of the range check example above), which gives O(c⋅m)O(c \cdot m)O(c⋅m) total cost.

For the offline memory checking argument, the costs for the prover is similar to Spark\mathsf{Spark}Spark : O(α⋅m+α⋅N1/c)O(\alpha \cdot m + \alpha \cdot N^{1/c})O(α⋅m+α⋅N1/c) field operations plus committing to a low-order number of field elements.

The verifier performs O(k⋅log⁡m)O(k \cdot \log m)O(k⋅logm) field operations. The costs of the memory checking argument, which can be batched, are identical to Spark\mathsf{Spark}Spark.

Written by

X0,0(r1)\mathcal{X}_{0,0}({r_1})X0,0​(r1​)
X0,0(r2)\mathcal{X}_{0,0}({r_2})X0,0​(r2​)
X0,0(r3)\mathcal{X}_{0,0}({r_3})X0,0​(r3​)
X0,1(r1)\mathcal{X}_{0,1}({r_1})X0,1​(r1​)
X0,1(r2)\mathcal{X}_{0,1}({r_2})X0,1​(r2​)
X0,1(r3)\mathcal{X}_{0,1}({r_3})X0,1​(r3​)
X1,0(r1)\mathcal{X}_{1,0}({r_1})X1,0​(r1​)
X1,0(r2)\mathcal{X}_{1,0}({r_2})X1,0​(r2​)
X1,0(r3)\mathcal{X}_{1,0}({r_3})X1,0​(r3​)
X1,1(r1)\mathcal{X}_{1,1}({r_1})X1,1​(r1​)
X1,1(r2)\mathcal{X}_{1,1}({r_2})X1,1​(r2​)
X1,1(r3)\mathcal{X}_{1,1}({r_3})X1,1​(r3​)
sumcheck protocol
https://eprint.iacr.org/2023/1216.pdf
https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf
https://eprint.iacr.org/2019/550.pdf
Lasso
Lookup singularity
Lookup singularity
Appendix E
Carson Lee