Binius

This article aims to intuitively explain the objectives and process of the Binius protocol.

Introduction

In recent STARKs, there has been a trend of using smaller fields to achieve faster performance, such as Goldilocks (64-bit) and smaller 32-bit fields like Baby Bear and Mersenne 31. Naturally, this leads to the question: Can we use the smallest field, the binary field?

Given that computers are optimized for bit operations, using a binary field could promise better performance. Additionally, if we can express constraints efficiently using the binary field, this would allow us to handle binary operations in commonly used hash functions like Keccak more effectively.

While the STARK paper attempted to use binary fields, it faced challenges:

  1. The binary field was too small to serve as an "alphabet" in Reed-Solomon (RS) codes compared to the block length, necessitating embedding.

  2. As noted in HyperPlonk, using a shift function to traverse the entire evaluation domain introduced overhead.

This article will explore how Binius overcomes these issues.

Background

Embedding Overhead

Embedding overhead refers to the inefficiency caused when representing values smaller than a given field F\mathbb{F} and filling the space difference with zero padding or similar techniques. This overhead often occurs during range checks or when binary operations are arithmetized. For example, in the Keccak-256 circuit, out of 2,531 columns, 2,328 columns each represent a single bit, but since each column can hold up to 64 bits, there is a massive embedding overhead.

Binary Field

Addition in a binary field operates as follows:

0+00(mod2)0+11(mod2)1+01(mod2)1+10(mod2)0 + 0 \equiv 0 \pmod 2 \\ 0 + 1 \equiv 1 \pmod 2 \\ 1 + 0 \equiv 1 \pmod 2 \\ 1 + 1 \equiv 0 \pmod 2 \\

This is equivalent to the XOR operation. Since x + (-x) = 0 defines -x as the negative of x, and 0 + 0 = 0 and 1 + 1 = 0, the negative of any element in a binary field is the element itself. In other words, subtraction is equivalent to addition:

00=0+00(mod2)01=0+11(mod2)10=1+01(mod2)11=1+10(mod2)0 - 0 = 0 + 0 \equiv 0 \pmod 2 \\ 0 - 1 = 0 + 1 \equiv 1 \pmod 2 \\ 1 - 0 = 1 + 0 \equiv 1 \pmod 2 \\ 1 - 1 = 1 + 1 \equiv 0 \pmod 2

Furthermore, a doubling operation such as 2a2 \cdot a always equals 0, since it becomes a multiple of 2. Multiplication in a binary field is therefore defined as follows:

000(mod2)010(mod2)100(mod2)111(mod2)0 \cdot 0 \equiv 0 \pmod 2 \\ 0 \cdot 1 \equiv 0 \pmod 2 \\ 1 \cdot 0 \equiv 0 \pmod 2 \\ 1 \cdot 1 \equiv 1 \pmod 2

This is equivalent to the AND operation.

Binary Tower

A binary tower is a hierarchical structure of finite fields, where each field in the hierarchy is an extension of the previous field, and the base field is F2\mathbb{F}_2​. In An Iterated Quadratic Extension of GF(2), it is stated that an extension field can be constructed as

T0:=F2T1:=T0[X0]/(X02+X0+1)F22T2:=T1[X1]/(X12+X1X0+1)F24T3:=T2[X2]/(X22+X2X1+1)F28Tι:=Tι1[Xι1]/(Xι12+Xι1Xι2+1)F22ιT_0 := \mathbb{F}_2 \\T_1 := T_0[X_0] / (X_0^2 + X_0 + 1) \cong \mathbb{F}_{2^2}\\T_2 := T_1[X_1] / (X_1^2 + X_1 X_0 + 1) \cong \mathbb{F}_{2^4} \\T_3 := T_2[X_2] / (X_2^2 + X_2 X_1 + 1) \cong \mathbb{F}_{2^8}\\ \cdots \\T_\iota := T_{\iota - 1}[X_{\iota - 1}] / (X_{\iota-1}^2 + X_{\iota-1} X_{\iota - 2} + 1) \cong \mathbb{F}_{2^{2^\iota}}

This can be illustrated as follows:

For example, let’s represent 135 using a binary tower. Here, X2X_2 corresponds to 242^4, X1X_1 ​corresponds to 222^2, and X0X_0​ corresponds to 22:

135=X2X1X0+X1+X0+1135=24222+22+2+1135 =X_2X_1X_0 + X_1 + X_0 + 1 \because 135 = 2^4\cdot2^2\cdot2 + 2^2 + 2 +1 \\

By continually substituting XiX_i​ with Xi1X_{i-1}​, we can transform it into the form of a multilinear polynomial:

hX1+lT2F24, where h,lT1F22=(h1X0+h2)X1+(l1X0+l2)=h1X1X0+h2X1+l1X0+l2, where hi,liT0F2\begin{align*} & h X_1 + l \in T_2 \cong \mathbb{F}_{2^4} \text{, where }h,l \in T_1 \cong \mathbb{F}_{2^2} \\ &= (h_1X_0 + h_2)X_1 + (l_1X_0 + l_2) \\ &= h_1X_1X_0 + h_2X_1 +l_1X_0 + l_2\text{, where } h_i, l_i \in T_0 \cong \mathbb{F}_2 \\ \end{align*}

The equation above shows how we can transform a case of T2T_2 to a multilinear polynomial, while the equation below shows how we can transform a case of T3T_3 to a multilinear polynomial.

hX2+lT3F28, where h,lT2F24=(h1X1+h2)X2+l1X1+l2, where hi,liT1F22 if i2=((h3X0+h4)X1+h5X0+h6)X2+(l3X0+l4)X1+l5X0+l6=h3X2X1X0+h4X2X1+h5X2X0+h6X2+l3X1X0+l4X1+l5X0+l6,where hi,liT0F2 if i>2\begin{align*} & hX_2 + l \in T_3 \cong \mathbb{F}_{2^8} \text{, where } h, l \in T_2 \cong \mathbb{F}_{2^4} \\ &= (h_1X_1 + h_2)X_2 + l_1X_1 + l_2 \text{, where } h_i, l_i \in T_1 \cong \mathbb{F}_{2^2} \text{ if } i \le 2 \\ &= ((h_3X_0 + h_4)X_1 + h_5X_0 + h_6)X_2 + (l_3X_0 + l_4)X_1 + l_5X_0 + l_6 \\ &= h_3X_2X_1X_0 + h_4X_2X_1 + h_5X_2X_0+h_6X_2+l_3X_1X_0 + l_4X_1 + l_5X_0 + l_6 \text{,} \\ &\text{where } h_i, l_i \in T_0 \cong \mathbb{F}_2 \text{ if } i > 2 \end{align*}

This allows for the canonical mapping of bitstrings. (For example, in F7\mathbb{F}_7 the 3-bit space maps both 0 and 7 to 0.)

v=i=02i1bi2iv = \sum_{i = 0}^{2^i - 1} b_i \cdot 2^i

Here, the set of bib_i are coefficients of a multilinear polynomial.

In the extension field, addition can be performed using XOR as in the binary field. However, for multiplication, the following steps must be used:

  1. Multiply both sides (for i0i \geq 0):

(aXi+b)(cXi+d)=acXi2+(ad+bc)Xi+bd(aX_i + b)(cX_i + d) = ac{X_i}^2 + (ad + bc)X_i + bd
  1. Substitute Xi2{X_i}^2 with Xi1Xi+acX_{i-1}X_i + ac:

acXi2+(ad+bc)Xi+bd=(ad+bc+acXi1)Xi+ac+bdac{X_i}^2 + (ad + bc)X_i + bd = (ad + bc + acX_{i-1})X_i + ac + bd
  1. If i1=0i - 1 = 0, stop. Otherwise, repeat the process for acac, adad, bcbc, and bdbd.

Notably, using the Karatsuba algorithm, the number of multiplications can be reduced from 4 to 3:

(aXi+b)(cXi+d)=((a+b)(c+d)acbd+acXi1)Xi+ac+bd(aX_i + b)(cX_i + d) = ((a + b)(c + d) - ac - bd + acX_{i-1})X_i + ac + bd

According to On efficient inversion in tower fields of characteristic two, multiplication with this binary tower structure can be computed in Θ((2i)log23)\Theta((2^i)^{log_2 3}).

Note that the classic method for constructing extension fields is as follows:

F2128:=F2[X]/f(X), where f(X) is irreducible and degf(X)=128\mathbb{F}_{2^{128}} := \mathbb{F}_2[X] / f(X)\text{, where } f(X) \text{ is irreducible and } \deg f(X) = 128

In this extension field, addition is still computed using XOR. For multiplication, polynomials of degree 127 are multiplied and then divided by f(X)f(X).

Using Binary Towers for extension field operations provides an advantage over doing so with the classic method due to the following reasons:

  • In circuits, various data sizes (e.g., bits) can be represented efficiently with Binary Towers.

  • When expressing F216\mathbb{F}_{2^{16}} using 16 instances of F2\mathbb{F}_2​, efficient embedding methods (packing) are available. In contrast, classic methods polynomial and are often less efficient.

  • Multiplying an element of F216\mathbb{F}_{2^{16}} with an element of F2128\mathbb{F}_{2^{128}} is faster than multiplying two elements of F2128\mathbb{F}_{2^{128}}. The cost is 8Θ(16log23)8 \cdot \Theta(16^{\log_2 3}). In contrast, classic methods require embedding F216\mathbb{F}_{2^{16}}​ into F2128\mathbb{F}_{2^{128}}​, making them slower.

Protocol Explanation

Block-level Encoding-based PCS

In Brakedown, instead of Reed-Solomon (RS) codes, linear codes with linear encoding time are used. In comparison, Binius adopts the overall style of Brakedown while leveraging RS codes with the Singleton bound, which states that the maximum distance for [n,k,d][n, k, d]-codes is dnk+1d \leq n - k + 1. Furthermore, instead of using radix-2 FFT for RS encoding in a binary field, additive FFT (paper) is used; however, the size of the alphabet must be at least nn, but the size of the alphabet F2\mathbb{F}_2​ is far smaller than nn. To address this, we construct F216\mathbb{F}_{2^{16}}​ using, for example, 16 instances of F2\mathbb{F}_2​. This method is called packing. Since the encoding is performed in blocks of F216\mathbb{F}_{2^{16}}​, this is referred to as block-level encoding.

g(X0,,Xl1)Tι[X0,,Xl1]1g(X_0, \dots, X_{l-1}) \in T_{\iota}[X_0, \dots, X_{l-1}]^{\le1} and evaluation point r=(r0,,rl1)Tτl\bm{r} = (r_0, \dots, r_{l-1}) \in T_\tau^l, let tt be an (m0=2l0)×(m1=2l1)(m_0 = 2^{l_0}) \times (m_1 = 2^{l_1}) matrix consisting of the coefficients in the multilinear Lagrange basis. (For simplicity, let ι=0,κ=4,τ=7\iota = 0, \kappa = 4, \tau = 7. As aforementioned in the Binary Towers section, Tι=F2,Tκ=F216,Tτ=F2128 T_\iota = \mathbb{F}_{2}, T_\kappa = \mathbb{F}_{2^{16}}, T_\tau = \mathbb{F}_{2^{128}}. The evaluation g(r)g(\bm{r}) can then be computed as follows:

g(r)=i=l1l1(1ri,ri)(ti)i=0m01i=0l11(1ri,ri)g(\bm{r}) = \otimes_{i = l_1}^{l - 1}(1 - r_i, r_i)\cdot (t_i)_{i=0}^{m_0 - 1} \cdot \otimes_{i = 0}^{l_1 - 1}(1 - r_i, r_i)

Here, l=(l0=7)+l1l = (l_0 = 7) + l_1, and ti(Tιm1)m0\bm{t_i} \in (T_\iota^{m_1})^{m_0} is the ii-th row of t\bm{t}.

  1. From the m0×m1m_0 \times m_1 ​matrix t\bm{t}, pack each row of length m1m_1 ​into a new m0×m12κm_0 \times \frac{m_1}{2^\kappa} matrix pack(t)\mathsf{pack}(\bm{t}).

  2. If the rate is ρ\rho, encode each row of the m0×m12κm_0 \times \frac{m_1}{2^\kappa} matrix pack(t)\mathsf{pack}(\bm{t}) into rows of length n=m12κρ1n = \frac{m_1}{2^\kappa}\cdot\rho^{-1}, forming an m0×nm_0 \times n matrix uu. The prover sends a Merkle hash-based commitment of the uu matrix to the verifier. Note that while the polynomial represented by the t\bm{t} matrix is gg, the polynomial represented by pack(t)\mathsf{pack}(\bm{t}) is a new polynomial gg'.

  3. The verifier samples r=(r0,,rl1)F2128l\bm{r} = (r_0, \dots,r_{l-1}) \leftarrow \mathbb{F_{2^{128}}^l}​ and sends it to the prover.

  4. The prover computes s=g(r0,,rl1)s = g(r_0, \dots, r_{l-1}) and t=i=l1l1(1ri,ri)(ti)i=0m01\bm{t'} = \otimes_{i = l_1}^{l-1}(1 - r_i, r_i)\cdot (\bm{t_i})_{i = 0}^{m_0 - 1}, and sends both to the verifier. Note that r\bm{r} is over F2128\mathbb{F}_{2^{128}}, t\bm{t} is over F2\mathbb{F}_2​, and t\bm{t'} is over F2128\mathbb{F}_{2^{128}}. As mentioned in Binary Towers, multiplication in this setting is performed efficiently.

  5. The verifier samples ji{0,,n1}j_i \leftarrow \{0, \dots, n - 1\} for γ\gamma times, where γ\gamma is the repetition count, and sends J={j0,,jγ1}\bm{J} = \{j_0, \dots, j_{\gamma - 1}\} to the prover (Remember n=m12κρ1n = \frac{m_1}{2^\kappa}\cdot\rho^{-1}).

  6. The verifier uses Merkle proofs to verify that {(ui,j)}jJ\big\{(u_{i, j})\big\}_{j \in \bm{J}}​ is correct.

  7. Finally, the verifier checks that packing and encoding were performed correctly: . Given that jij_i​ ranges between 00 and n=c16ρ1n = \frac{c}{16} \cdot \rho^{-1}, testing is performed at the block level in units of F216\mathbb{F}_{2^{16}}​. This is referred to as block-level testing.

Pack Virtual Polynomial

Let us revisit the block-level encoding described earlier. In a Polynomial IOP, the polynomial that the verifier requests the prover to open is gF2[X0,,Xl1]1g \in \mathbb{F}_2[X_0, \dots, X_{l-1}]^{\leq 1}. However, the polynomial that the prover actually commits to and opens is gg', which is derived by packing gg into gF2κ[X0,,Xlκ1]1g' \in \mathbb{F}_{2^{\kappa}}[X_0, \dots, X_{l - \kappa - 1}]^{\leq 1}. The packing function pack~κ\widetilde{\mathsf{pack}}_{\kappa}​ is defined as follows:

pack~κ(g)(X0,,Xlκ1):=vBκg~(v0,,vκ1,X0,,Xlκ1)βv\widetilde{\mathsf{pack}}_{\kappa}(g)(X_0, \dots, X_{l - \kappa - 1}) :=\sum_{v\in B_\kappa}\tilde{g}(v_0, \dots, v_{\kappa - 1}, X_0, \dots, X_{l - \kappa - 1}) \cdot \beta_v

Here, g~\tilde{g} is the multilinear extension (MLE) of gg, and βv\beta_v represents the multilinear Lagrange basis. For example, when κ=2\kappa = 2 and l=4l = 4, the equation becomes:

pack~2(g)(X0,X1)=g~(1,1,X0,X1)8+g~(0,1,X0,X1)4+g~(1,0,X0,X1)2+g~(0,0,X0,X1)1\begin{align*} \widetilde{\mathsf{pack}}_2(g)(X_0, X_1) =&\tilde{g}(1, 1, X_0, X_1)\cdot 8 + \tilde{g}(0, 1, X_0, X_1)\cdot 4 + \\ &\tilde{g}(1, 0, X_0, X_1)\cdot 2 + \tilde{g}(0, 0, X_0, X_1)\cdot 1 \end{align*}

Using a sumcheck protocol, this computation can be replaced with an evaluation claim such that:

pack~κ(g)(r)=?s\widetilde{\mathsf{pack}}_\kappa(g)(\bm{r'}) \stackrel{?}= s'

Replacing βv\beta_v with β~Fκ[X0,,Xκ1]1\tilde{\beta} \in \mathbb{F}_\kappa[X_0, \dots, X_{\kappa - 1}]^{\le 1}, the following equation is defined:

pack~κ(g)(r)(Y0,,Yκ1):=g~(Y0,,Yκ1,r0,,rlκ1)β~(Y0,,Yκ1)\widetilde{\mathsf{pack}}_{\kappa}(g)(\bm{r'})(Y_0, \dots, Y_{\kappa - 1}) := \tilde{g}(Y_0, \dots, Y_{\kappa - 1}, {r'}_0, \dots, {r'}_{l - \kappa - 1}) \cdot \tilde{\beta}(Y_0, \dots, Y_{\kappa - 1})

The following sum holds:

YBκpack~κ(g)(r)(Y0,,Yκ1)=s\sum_{Y \in B_{ \kappa}}\widetilde{\mathsf{pack}}_{\kappa}(g)(\bm{r'})(Y_0, \dots, Y_{\kappa - 1}) = s'

For example, when κ=2\kappa = 2:

pack~2(g)(r)=s=g~(1,1,r0,r1)Y1Y0+g~(0,1,r0,r1)Y1+g~(1,0,r0,r1)Y0+g~(0,0,r0,r1)1\begin{align*} \widetilde{\mathsf{pack}}_2(g)(\bm{r'}) = s' =&\tilde{g}(1, 1, {r'}_0, {r'}_1)\cdot Y_1Y_0 + \tilde{g}(0, 1, {r'}_0, {r'}_1)\cdot Y_1 + \\ &\tilde{g}(1, 0, {r'}_0, {r'}_1)\cdot Y_0 + \tilde{g}(0, 0, {r'}_0, {r'}_1)\cdot 1 \end{align*}

By applying a sumcheck protocol, this is replaced with an evaluation claim:

g~(rr)β~(r)=?s\tilde{g}(\bm{r} || \bm{r'}) \cdot \tilde{\beta}(\bm{r}) \stackrel{?}= s

Thus, we obtain g~(rr)\tilde{g}(\bm{r} || \bm{r'}), allowing us to query the original polynomial gg through its virtual polynomial gg'.

Shift Virtual Polynomial

As introduced in the previous article on HyperPlonk, the shift polynomial helps traverse the evaluation domain. For b{0,,l1}\bm{b} \in \{0, \dots, l-1\} and shift offset oBbo \in B_b​, the shift mapping sb,o:BlBls_{b, o}: B_l \rightarrow B_l​ is defined such that, for u:=sb,o(v)u := s_{b, o}(v), the condition {u}{v}+{o}(mod2b)\{u\} \equiv \{v\} + \{o\} \pmod{2^b} holds. For example, shifting by o={1,0}B2o = \{1, 0\} \in B_2​ (a single step) works as follows:

sb,o(0,0,0)=(1,0,0)sb,o(1,0,0)=(0,1,0)sb,o(0,1,0)=(1,1,0)sb,o(1,1,0)=(0,0,0)sb,o(0,0,1)=(1,0,1)sb,o(1,0,1)=(0,1,1)sb,o(0,1,1)=(1,1,1)sb,o(1,1,1)=(0,0,1)s_{b, o}(0, 0, 0) = (1, 0, 0) \\ s_{b, o}(1, 0, 0) = (0, 1, 0) \\ s_{b, o}(0, 1, 0) = (1, 1, 0) \\ s_{b, o}(1, 1, 0) = (0, 0, 0) \\ s_{b, o}(0, 0, 1) = (1, 0, 1) \\ s_{b, o}(1, 0, 1) = (0, 1, 1) \\ s_{b, o}(0, 1, 1) = (1, 1, 1) \\ s_{b, o}(1, 1, 1) = (0, 0, 1) \\

The shift operator is defined as:

shiftb,o(g):=gsb,o\text{shift}_{b, o}(g) := g\circ s_{b,o}

The shift indicator s-indb,o\text{s-ind}_{b, o}​ can be expressed as follows: (Refer to the section 4.3. in the paper to see how it is algebraically expressed)

s-indb,o(x,y){1if {y}?{x}+{o}(mod2b)0otherwise\text{s-ind}_{b, o}(x, y) \rightarrow \begin{cases} 1& \text{if } \{y\} \stackrel{?}\equiv \{x\} + \{o\} \pmod{2^b}\\ 0 &\text{otherwise} \end{cases}

Thus, the shift polynomial shift~b,o(g)\widetilde{\mathsf{shift}}_{b, o}(g) is defined as:

shift~b,o(g)(v):=uBbg~(u0,,ub1,vb,,vl1)s-ind~b,o(u0,,ub1,v0,,vb1)\widetilde{\text{shift}}_{b, o}(g)(\bm{v}) := \sum_{u \in B_b}\tilde{g}(u_0, \dots, u_{b-1}, v_b, \dots, v_{l-1})\cdot \widetilde{\text{s-ind}}_{b, o}(u_0, \dots, u_{b-1}, v_0, \dots, v_{b-1})

For the earlier example:

shift~b,o(g)(0,0,0)=g~(1,0,0)shift~b,o(g)(1,0,0)=g~(0,1,0)shift~b,o(g)(0,1,0)=g~(1,1,0)shift~b,o(g)(1,1,0)=g~(0,0,0)shift~b,o(g)(0,0,1)=g~(1,0,1)shift~b,o(g)(1,0,1)=g~(0,1,1)shift~b,o(g)(0,1,1)=g~(1,1,1)shift~b,o(g)(1,1,1)=g~(0,0,1)\widetilde{\text{shift}}_{b, o}(g)(0, 0, 0) = \tilde{g}(1, 0, 0) \\ \widetilde{\text{shift}}_{b, o}(g)(1, 0, 0) = \tilde{g}(0, 1, 0) \\ \widetilde{\text{shift}}_{b, o}(g)(0, 1, 0) = \tilde{g}(1, 1, 0) \\ \widetilde{\text{shift}}_{b, o}(g)(1, 1, 0) = \tilde{g}(0, 0, 0) \\ \widetilde{\text{shift}}_{b, o}(g)(0, 0, 1) = \tilde{g}(1, 0, 1) \\ \widetilde{\text{shift}}_{b, o}(g)(1, 0, 1) = \tilde{g}(0, 1, 1) \\ \widetilde{\text{shift}}_{b, o}(g)(0, 1, 1) = \tilde{g}(1, 1, 1) \\ \widetilde{\text{shift}}_{b, o}(g)(1, 1, 1) = \tilde{g}(0, 0, 1) \\

By applying the sumcheck protocol, we get the evaluation claim:

shift~b,o(g)(r)=?s\widetilde{\text{shift}}_{b,o}(g)(\bm{r'}) \stackrel{?}= s'

This leads to:

shift~b,o(g)(r)(Y0,,Yb1):=g~(Y0,,Yb1,rb,,rl1)s-ind~b,o(Y0,,Yb1,r0,,rb1)\widetilde{\text{shift}}_{b,o}(g)(\bm{r'})(Y_0, \dots, Y_{b- 1}) := \tilde{g}(Y_0, \dots, Y_{b- 1}, {r'}_b, \dots, {r'}_{l - 1}) \cdot \widetilde{\text{s-ind}}_{b,o}(Y_0, \dots, Y_{b - 1}, {r'}_0, \dots, {r'}_{b-1})

The following sum holds:

YBκshift~b,o(g)(r)(Y0,,Yb1)=s\sum_{Y \in B_{ \kappa}}\widetilde{\text{shift}}_{b,o}(g)(\bm{r'})(Y_0, \dots, Y_{b - 1}) = s'

This allows us to query the shift polynomial shiftb,o(g)\text{shift}_{b, o}(g) using the virtual polynomial gg and s-indb,o\text{s-ind}_{b,o}.

Conclusion

While this article does not cover all the details, it addresses the key aspects of utilizing the binary field. To enable RS encoding using the binary field, we applied a modified version of Brakedown with packing and used the shift operator to traverse the domain efficiently.

The image above is captured from the Binius paper. The first row, Hyrax, shows the PCS used in Lasso, while the second and third rows show FRI, the PCS used in Plonky3. In Plonky3, even when data smaller than 31 bits is used, there is no speed difference, so the data is omitted for efficiency. In the case of Binius, while it is slower than Plonky3 Keccak-256 for 32-bit data, it significantly reduces the commitment time for 1-bit data.

The image above is captured from the Binius paper. It turns out Binius is almost 2 times faster to commit, around 2~3 times faster to prove, and around 5~6 times faster to verify.

Written by ryan Kim from A41

Last updated