In recent STARKs, there has been a trend of using smaller fields to achieve faster performance, such as Goldilocks (64-bit) and smaller 32-bit fields like Baby Bear and Mersenne 31. Naturally, this leads to the question: Can we use the smallest field, the binary field?
Given that computers are optimized for bit operations, using a binary field could promise better performance. Additionally, if we can express constraints efficiently using the binary field, this would allow us to handle binary operations in commonly used hash functions like Keccak more effectively.
While the paper attempted to use binary fields, it faced challenges:
The binary field was too small to serve as an "alphabet" in Reed-Solomon (RS) codes compared to the block length, necessitating embedding.
As noted in , using a shift function to traverse the entire evaluation domain introduced overhead.
This article will explore how overcomes these issues.
Background
Embedding Overhead
Embedding overhead refers to the inefficiency caused when representing values smaller than a given field F and filling the space difference with zero padding or similar techniques. This overhead often occurs during range checks or when binary operations are arithmetized.
Binary Field
Addition in a binary field operates as follows:
0+0≡0(mod2)0+1≡1(mod2)1+0≡1(mod2)1+1≡0(mod2)
This is equivalent to the XOR operation. Since x + (-x) = 0 defines -x as the negative of x, and 0 + 0 = 0 and 1 + 1 = 0, the negative of any element in a binary field is the element itself. In other words, subtraction is equivalent to addition:
Furthermore, a doubling operation such as 2⋅a always equals 0, since it becomes a multiple of 2. Multiplication in a binary field is therefore defined as follows:
For example, let’s represent 135 using a binary tower. Here, X2 corresponds to 24, X1 corresponds to 22, and X0 corresponds to 2:
135=X2X1X0+X1+X0+1∵135=24⋅22⋅2+22+2+1
By continually substituting Xi with Xi−1, we can transform it into the form of a multilinear polynomial:
hX1+l∈T2≅F24, where h,l∈T1≅F22=(h1X0+h2)X1+(l1X0+l2)=h1X1X0+h2X1+l1X0+l2, where hi,li∈T0≅F2
The equation above shows how we can transform a case of T2 to a multilinear polynomial, while the equation below shows how we can transform a case of T3 to a multilinear polynomial.
hX2+l∈T3≅F28, where h,l∈T2≅F24=(h1X1+h2)X2+l1X1+l2, where hi,li∈T1≅F22 if i≤2=((h3X0+h4)X1+h5X0+h6)X2+(l3X0+l4)X1+l5X0+l6=h3X2X1X0+h4X2X1+h5X2X0+h6X2+l3X1X0+l4X1+l5X0+l6,where hi,li∈T0≅F2 if i>2
This allows for the canonical mapping of bitstrings. (For example, in F7 the 3-bit space maps both 0 and 7 to 0.)
v=i=0∑2i−1bi⋅2i
Here, the set of bi are coefficients of a multilinear polynomial.
In the extension field, addition can be performed using XOR as in the binary field. However, for multiplication, the following steps must be used:
Multiply both sides (for i≥0):
(aXi+b)(cXi+d)=acXi2+(ad+bc)Xi+bd
Substitute Xi2 with Xi−1Xi+ac:
acXi2+(ad+bc)Xi+bd=(ad+bc+acXi−1)Xi+ac+bd
If i−1=0, stop. Otherwise, repeat the process for ac, ad, bc, and bd.
Note that the classic method for constructing extension fields is as follows:
F2128:=F2[X]/f(X), where f(X) is irreducible and degf(X)=128
In this extension field, addition is still computed using XOR. For multiplication, polynomials of degree 127 are multiplied and then divided by f(X).
Using Binary Towers for extension field operations provides an advantage over doing so with the classic method due to the following reasons:
In circuits, various data sizes (e.g., bits) can be represented efficiently with Binary Towers.
When expressing F216 using 16 instances of F2, efficient embedding methods (packing) are available. In contrast, classic methods polynomial and are often less efficient.
Multiplying an element of F216 with an element of F2128 is faster than multiplying two elements of F2128. The cost is 8⋅Θ(16log23). In contrast, classic methods require embedding F216 into F2128, making them slower.
Protocol Explanation
Block-level Encoding-based PCS
g(X0,…,Xl−1)∈Tι[X0,…,Xl−1]≤1 and evaluation point r=(r0,…,rl−1)∈Tτl, let t be an (m0=2l0)×(m1=2l1) matrix consisting of the coefficients in the multilinear Lagrange basis. (For simplicity, let ι=0,κ=4,τ=7. As aforementioned in the Binary Towers section, Tι=F2,Tκ=F216,Tτ=F2128. The evaluation g(r) can then be computed as follows:
Here, l=(l0=7)+l1, and ti∈(Tιm1)m0 is the i-th row of t.
From the m0×m1 matrix t, pack each row of length m1 into a new m0×2κm1 matrix pack(t).
If the rate is ρ, encode each row of the m0×2κm1 matrix pack(t) into rows of length n=2κm1⋅ρ−1, forming an m0×n matrix u. The prover sends a Merkle hash-based commitment of the u matrix to the verifier. Note that while the polynomial represented by the t matrix is g, the polynomial represented by pack(t) is a new polynomial g′.
The verifier samples r=(r0,…,rl−1)←F2128l and sends it to the prover.
The prover computes s=g(r0,…,rl−1) and t′=⊗i=l1l−1(1−ri,ri)⋅(ti)i=0m0−1, and sends both to the verifier. Note that r is over F2128, t is over F2, and t′ is over F2128. As mentioned in Binary Towers, multiplication in this setting is performed efficiently.
The verifier samples ji←{0,…,n−1} for γ times, where γ is the repetition count, and sends J={j0,…,jγ−1} to the prover (Remember n=2κm1⋅ρ−1).
The verifier uses Merkle proofs to verify that {(ui,j)}j∈J is correct.
Finally, the verifier checks that packing and encoding were performed correctly:
. Given that ji ranges between 0 and n=16c⋅ρ−1, testing is performed at the block level in units of F216. This is referred to as block-level testing.
Pack Virtual Polynomial
Let us revisit the block-level encoding described earlier. In a Polynomial IOP, the polynomial that the verifier requests the prover to open is g∈F2[X0,…,Xl−1]≤1. However, the polynomial that the prover actually commits to and opens is g′, which is derived by packing g into g′∈F2κ[X0,…,Xl−κ−1]≤1. The packing functionpackκ is defined as follows:
Here, g~ is the multilinear extension (MLE) of g, and βv represents the multilinear Lagrange basis. For example, when κ=2 and l=4, the equation becomes:
By applying a sumcheck protocol, this is replaced with an evaluation claim:
g~(r∣∣r′)⋅β~(r)=?s
Thus, we obtain g~(r∣∣r′), allowing us to query the original polynomial g through its virtual polynomialg′.
Shift Virtual Polynomial
As introduced in the previous article on HyperPlonk, the shift polynomial helps traverse the evaluation domain. For b∈{0,…,l−1} and shift offset o∈Bb, the shift mappingsb,o:Bl→Bl is defined such that, for u:=sb,o(v), the condition {u}≡{v}+{o}(mod2b) holds. For example, shifting by o={1,0}∈B2 (a single step) works as follows:
This allows us to query the shift polynomialshiftb,o(g) using the virtual polynomialg and s-indb,o.
Conclusion
While this article does not cover all the details, it addresses the key aspects of utilizing the binary field. To enable RS encoding using the binary field, we applied a modified version of Brakedown with packing and used the shift operator to traverse the domain efficiently.
The image above is captured from the Binius paper. The first row, Hyrax, shows the PCS used in Lasso, while the second and third rows show FRI, the PCS used in Plonky3. In Plonky3, even when data smaller than 31 bits is used, there is no speed difference, so the data is omitted for efficiency. In the case of Binius, while it is slower than Plonky3 Keccak-256 for 32-bit data, it significantly reduces the commitment time for 1-bit data.
The image above is captured from the Binius paper. It turns out Binius is almost 2 times faster to commit, around 2~3 times faster to prove, and around 5~6 times faster to verify.
A binary tower is a hierarchical structure of finite fields, where each field in the hierarchy is an extension of the previous field, and the base field is F2. In , it is stated that an extension field can be constructed as
Notably, using the , the number of multiplications can be reduced from 4 to 3:
According to , multiplication with this binary tower structure can be computed in Θ((2i)log23).
In Brakedown, instead of Reed-Solomon (RS) codes, linear codes with linear encoding time are used. In comparison, Binius adopts the overall style of Brakedown while leveraging RS codes with the , which states that the maximum distance for [n,k,d]-codes is d≤n−k+1. Furthermore, instead of using for RS encoding in a binary field, additive FFT () is used; however, the size of the alphabet must be at least n, but the size of the alphabet F2 is far smaller than n. To address this, we construct F216 using, for example, 16 instances of F2. This method is called packing. Since the encoding is performed in blocks of F216, this is referred to as block-level encoding.