Cairo

Presentation: https://youtu.be/zNWatOe9Hmo

Introduction

Let us assume we are designing a zkDSL. There are two major architectural approaches to consider.


1. ASIC Approach

This is the Circom-style approach, where a program is taken as input and compiled into a circuit. The resulting circuit can prove only the execution of that specific program, making it essentially an application-specific circuit (ASIC).

In other words, whenever the program changes, a new circuit must be generated.


2. CPU (von Neumann) Approach

This is the Cairo-style approach, where a single universal circuit models a CPU architecture. In this model, a program is loaded into memory, and the program counter (PC) fetches, decodes, and executes instructions sequentially.

Thus, the same circuit can verify the execution of different programs.

The name Cairo originates from CPU AIR


When designing a DSL for zero-knowledge proving, the two most critical factors are programmability and efficiency.

Let us now explore why Cairo chose the CPU-based (von Neumann) approach to achieve this balance.

Description

ASIC vs CPU Approach

ASIC Approach

For example, to prove that xyx \ne y, the statement can be arithmetized as follows:

a:(xy)a=1\exists a : (x - y) \cdot a = 1

In this ASIC approach, a circuit is generated specifically for a single program, which makes it highly efficient for that particular computation.

However, inefficiencies arise when handling conditional statements and loops.


Conditional Statement

A conditional statement can be expressed using a selector variable ss:

s×arithtrue+(1s)×arithfalses \times \mathsf{arith_{true}} + (1 - s) \times \mathsf{arith_{false}}

At runtime, only one branch is actually executed, but in the circuit, both branches must be represented as algebraic constraints.


Loop Statement

A loop is typically unrolled up to a predefined maximum bound BB, and the loop body is repeated BB times during arithmetization.

Even if the loop terminates earlier at runtime, all BB iterations are still represented as constraints — leading to redundant computations and reduced efficiency.


Summary

As a result, the ASIC approach has the following limitations:

  • Inefficient: unnecessary constraints arise from conditionals and loops

  • Non-reusable: every time the program changes, the circuit must be recompiled


CPU Approach

In contrast, the CPU approach introduces some overhead for instruction fetching and decoding, but Cairo minimizes this overhead through several architectural design choices.


1. Metrics-based Instruction Set Design

Cairo designs its instruction set to minimize the number of trace cells used.

  • For example, suppose we define two instruction sets: ISAA\mathsf{ISA_A}, which includes a specific instruction instr\mathsf{instr}, and ISAB\mathsf{ISA_B}, which does not. Let nin_i denote the number of trace cells per cycle for instruction set ISAi\mathsf{ISA}_i , and kik_i denote the total number of cycles required to execute a program PP. Then, the total number of trace cells produced during execution is given by ni×kin_i \times k_i.

  • In general, Cairo selects an instruction set such that:

kA×nA<kB×nBk_A \times n_A < k_B \times n_B

This ensures that, on average, the overall proof cost is minimized. In other words, the instruction set is chosen to optimize average proving efficiency.


2. Low-degree AIR Constraint

Cairo’s AIR (Algebraic Intermediate Representation) constraints are designed to remain mostly quadratic (degree 2). When a constraint of degree 3 is unavoidable, additional trace cells are introduced to reduce it back to degree 2 through algebraic transformation.

This approach ensures that the STARK proof system operates efficiently, maintaining low-degree polynomial constraints throughout the entire trace.


3. Builtins

If new instructions are added but rarely used, they introduce unnecessary overhead. Conversely, implementing every primitive operation purely as Cairo instructions would also be inefficient.

To resolve this trade-off, Cairo introduces the concept of builtins.

Each builtin has its own independent memory segment, and the Cairo program simply writes values into this segment to trigger the corresponding operation automatically under the AIR constraints.

For example, using a range-check builtin, one can efficiently verify that a value lies within a specific range — without introducing a new instruction.

Thus, Cairo’s builtins enable operation-level modularity that allows efficient computation through memory-based invocation rather than instruction expansion.


Nondeterminism

Consider an NP-complete problem, such as SAT, and consider the following two algorithms:

Algorithm A takes a SAT instance and a candidate assignment as input, and returns True if the assignment satisfies the formula.

Algorithm B takes a SAT instance and iterates over all possible assignments. If it finds one that satisfies the formula, it returns True; otherwise, it returns False.

If a prover wants to convince a verifier that a certain SAT formula is satisfiable, either algorithm can be used — since if either one returns True, the formula is indeed satisfiable.

Of course, Algorithm A is much more efficient, so in practice, this approach is preferred for proof construction.

This style of proving — where the prover demonstrates that Algorithm A would return True — is called nondeterministic programming.

In other words, instead of showing the entire computation, the prover can guess a potential solution and only prove that the guess is correct. This allows the proof to be much shorter and more efficient.

Cairo supports this style of nondeterministic proving through a mechanism called “Hint.”

For example, if the computation involves y=xy = \sqrt{x}, the prover does not need to show the full algorithm for computing the square root. Instead, the prover can guess a value yy and then prove that y2=xy^2 = x.

In other words, Cairo focuses on proving the correctness of results, not the process of computation, thereby significantly reducing the overall proving cost.


Register

In a typical physical computing system, memory access is expensive, so general-purpose registers are used to store frequently accessed values.

However, in Cairo, memory access is extremely cheap (See why it's cheap later), and therefore no general-purpose registers are needed. All values are accessed directly through memory cells.

Cairo provides only three address registers, which serve as pointers:

  • pc: Program Counter — points to the current instruction.

  • ap: Allocation Pointer — used to allocate new memory cells during execution.

  • fp: Frame Pointer — used to access function arguments and local variables. At the start of a function, fp\mathsf{fp} is initialized to the same value as ap\mathsf{ap}. During function execution, it remains constant, and when the function returns, it is reset to its previous value.


Instruction

Cairo’s instructions operate according to the rules described in Section 4.5. An instruction occupies one word if it has no immediate value, and two words if it includes an immediate value.

To encode an instruction, the following constraint must hold: (I'll skip the definitions of each term)

inst=off~dst+216off~op0+232off~op1+248f~0\mathsf{inst} = \widetilde{off}_{dst} + 2^{16} \cdot \widetilde{off}_{op0} + 2^{32} \cdot \widetilde{off}_{op1} + 2^{48} \cdot \widetilde{f}_0

Because each term must lie within the range [0,216)[0, 2^{16}) a permutation range-check (see Section 9.9) is required to enforce this constraint.

For this representation to be valid, Cairo also requires that:

char(F)>263\mathrm{char}(\mathbb{F}) > 2^{63}

This ensures that all 63-bit instruction words fit within the field F\mathbb{F} without wrap-around or overflow.


Cairo Machine

The Cairo Machine is not a general-purpose computer designed to perform arbitrary computation. Instead, its purpose is to verify the correctness of a computation.

Cairo defines two kinds of machines:

  1. Deterministic Cairo Machine

  2. Non-deterministic Cairo Machine

Here, let FP=Z/P\mathbb{F}_P = \mathbb{Z}/P, and F\mathbb{F} is an extension field of FP \mathbb{F}_P.


Deterministic Cairo Machine

A Deterministic Cairo Machine takes the following inputs:

  1. The number of steps TNT \in \mathbb{N}

  2. A memory function m:FFm: \mathbb{F} \to \mathbb{F}

  3. A sequence of states S=(S0,S1,,ST)S = (S_0, S_1, \dots, S_T), where each state is defined as Si=(pci,api,fpi)F3S_i = (\mathsf{pc}_i, \mathsf{ap}_i, \mathsf{fp}_i) \in \mathbb{F}^3

If, for every ii, the transition SiSi+1S_i \to S_{i+1} is valid according to Cairo’s transition rules, the machine outputs “accept”; otherwise, it outputs “reject.”


Example

Suppose the following input is given:

  • T=10T = 10

  • Initial state S0=(0,5,5)S_0 = (0, 5, 5)

  • Initial memory function mm:

i
m(i)

0

0x48307ffe7fff8000

1

0x010780017fff7fff

2

-1

3

1

4

1


1. First Instruction Execution

m(pc0)=m(0)=0x48307ffe7fff8000m(\mathsf{pc}_0) = m(0) = \text{0x48307ffe7fff8000}

Decoded fields:

Field
Bit Range
Value

offdst\mathsf{off_{dst}}

0–16

0

offop0\mathsf{off_{op0}}

16–32

-1

offop1\mathsf{off_{op1}}

32–48

-2

dstreg\mathsf{dst_{reg}}

48–49

0

op0reg\mathsf{op0_{reg}}

49–50

0

op1src\mathsf{op1_{src}}

50–53

4

reslogic\mathsf{res_{logic}}

53–55

1

pcupdate\mathsf{pc_{update}}

55–58

0

apupdate\mathsf{ap_{update}}

58–60

2

opcode\mathsf{opcode}

60–63

4

According to Section 4.5 transition rules:

  • op0=m(ap0+offop0)=m(ap01)=m(4)\mathsf{op0} = m(\mathsf{ap_0 + off_{op0}}) = m(\mathsf{ap_0} - 1) = m(4)

  • op1=m(ap1+offop1)=m(ap12)=m(3)\mathsf{op1} = m(\mathsf{ap_1 + off_{op1}}) = m(\mathsf{ap_1} - 2) = m(3)

  • res=op0+op1=m(4)+m(3)\mathsf{res} = \mathsf{op0} + \mathsf{op1} = m(4) + m(3)

  • dst=m(ap0+offdst)=m(ap0+0)=m(5)\mathsf{dst} = m(\mathsf{ap_0 + off_{dst}}) = m(\mathsf{ap_0} + 0) = m(5)

  • assert(res=dst)\textcolor{red}{\mathsf{assert(res = dst)}}

  • pc1=pc0+1\mathsf{pc_1} = \mathsf{pc_0} + 1

  • ap1=ap0+1\mathsf{ap_1} = \mathsf{ap_0} + 1

  • fp1=fp0\mathsf{fp_1} = \mathsf{fp_0}

Hence, m(5)=m(4)+m(3)\textcolor{red}{m(5) = m(4) + m(3)}, and the state updates to S1=(1,6,5)S_1 = (1, 6, 5).


2. Second Instruction Execution

m(pc1)=m(1)=0x010780017fff7fffm(\mathsf{pc}_1) = m(1) = \text{0x010780017fff7fff}

Decoded fields:

Field
Bit Range
Value

offdst\mathsf{off_{dst}}

0–16

-1

offop0\mathsf{off_{op0}}

16–32

-1

offop1\mathsf{off_{op1}}

32–48

1

dstreg\mathsf{dst_{reg}}

48–49

1

op0reg\mathsf{op0_{reg}}

49–50

1

op1src\mathsf{op1_{src}}

50–53

1

reslogic\mathsf{res_{logic}}

53–55

0

pcupdate\mathsf{pc_{update}}

55–58

2

apupdate\mathsf{ap_{update}}

58–60

0

opcode\mathsf{opcode}

60–63

0

Transition rules:

  • op0=m(fp1+offop0)=m(fp11)=m(4)\mathsf{op0} = m(\mathsf{fp_1 + off_{op0}}) = m(\mathsf{fp_1} - 1) = m(4)

  • op1=m(pc1+offop1)=m(pc1+1)=m(2)\mathsf{op1} = m(\mathsf{pc_1 + off_{op1}}) = m(\mathsf{pc_1} + 1) = m(2)

  • res=op1=m(2)\mathsf{res} = \mathsf{op1} = m(2)

  • dst=m(fp1+offdst)=m(fp11)=m(2)\mathsf{dst} = m(\mathsf{fp_1 + off_{dst}}) = m(\mathsf{fp_1} - 1) = m(2)

  • pc2=pc1+res=pc1+m(2)\mathsf{pc_2} = \mathsf{pc_1} + \mathsf{res} = \mathsf{pc_1} + m(2)

  • ap2=ap1\mathsf{ap_2} = \mathsf{ap_1}

  • fp2=fp1\mathsf{fp_2} = \mathsf{fp_1}

Since m(2)=1m(2) = -1, we have pc2=0\mathsf{pc_2} = 0, which means the program counter loops back to the beginning.

Thus, this program repeatedly alternates between pc=0\mathsf{pc} = 0 and pc=1.\mathsf{pc} = 1.


3. Memory Configuration for “accept”

To be accepted, memory should be given as follows: (and it's (continuous) read-only memory)

i
m(i)

0

0x48307ffe7fff8000

1

0x010780017fff7fff

2

-1

3

1

4

1

5

2

6

3

7

5

8

8

9

13


Non-deterministic Cairo Machine

A Non-deterministic Cairo Machine takes the following inputs:

  1. The number of steps TNT \in \mathbb{N}

  2. A partial memory function m:AFm^*: A^* \to \mathbb{F}, where AFPA^* \subseteq \mathbb{F}_P

  3. Initial and final values of program counter and allocation pointer:

pcI,pcF,apI,apF\mathsf{pc_I}, \mathsf{pc_F}, \mathsf{ap_I}, \mathsf{ap_F}

(The initial frame pointer fp\mathsf{fp} is set to apI\mathsf{ap_I}.)


If we define the initial and final states as:

S0=(pcI,apI,apI),ST=(pcF,apF,)S_0 = (\mathsf{pc_I}, \mathsf{ap_I}, \mathsf{ap_I}), \quad S_T = (\mathsf{pc_F}, \mathsf{ap_F}, *)

then, if there exists a full memory extension m:FFm: \mathbb{F} \to \mathbb{F} of mm^* such that a valid transition sequence S0S1STS_0 \to S_1 \to \dots \to S_T exists, the machine accepts; otherwise, it rejects. (This is non-determinism!)


Here, mm^* represents the public memory, which includes:

  1. Program bytecode

    • m(progbase+i)=bim^*(\mathsf{prog_{base}} + i) = b_i for i[0,b)i \in [0, |b|)

    • pcI=progbase+progstart\mathsf{pc_I} = \mathsf{prog_{base}} + \mathsf{prog_{start}}

    • pcF=progbase+progend\mathsf{pc_F} = \mathsf{prog_{base}} + \mathsf{prog_{end}}

  2. Program input/output necessary for verification


A STARK prover then constructs a proof asserting that:

“The nondeterministic Cairo Machine accepts given the input (T,m,pcI,pcF,apI,apF)(T, m^*, \mathsf{pc_I}, \mathsf{pc_F}, \mathsf{ap_I}, \mathsf{ap_F}).”


Cairo Runner

The Cairo Runner executes a compiled Cairo program and produces the following outputs:

  • Inputs that cause the Non-deterministic Cairo Machine to accept: (T,m,pcI,pcF,apI,apF)(T, m^*, \mathsf{pc_I}, \mathsf{pc_F}, \mathsf{ap_I}, \mathsf{ap_F})

  • Inputs that cause the Deterministic Cairo Machine to accept: (T,m,S)(T, m, S)


While the Cairo Machine supports random-access memory by definition, an efficient Cairo AIR implementation requires that memory accesses be continuous in address order.

To achieve this, the Cairo Runner introduces the concept of relocatable memory segments.


Memory Segment Management

Whenever memory allocation is required during program execution, the Cairo Runner creates and assigns a new memory segment.

The size of each segment does not need to be specified in advance.

After program execution finishes, the Runner performs a relocation phase, where each segment is assigned a concrete base address so that the entire memory space can be linearized into one continuous address range.


Relocation Is Not Proven

The relocation process is not included in the proof. It is performed locally by the Cairo Runner and does not appear in the STARK proof itself.

In a read-write memory model, this could be problematic — a malicious prover could intentionally create overlapping segments, causing one segment’s write operation to inadvertently change the value in another segment.


Why This Is Safe in Cairo

In Cairo, memory is read-only (immutable). Once a value is written, it cannot be modified.

Therefore, even if two segments overlap, it does not matter which segment a value is read from — as long as the value itself is consistent.

In other words, as far as the Cairo execution model is concerned, overlapping segments and non-overlapping segments are indistinguishable.


Memory Layout

Function Call Stack

  • Function arguments provided by the caller are stored at memory addresses [fp3],[fp4],[\mathsf{fp} - 3], [\mathsf{fp} - 4], \dots

  • Pointer to the caller’s frame is stored at [fp2][\mathsf{fp} - 2]

  • Return address — the instruction to be executed after the function returns — is stored at [fp1][\mathsf{fp} - 1]

  • Local variables allocated by the function are stored at [fp],[fp+1],[\mathsf{fp}], [\mathsf{fp} + 1], \dots

Additionally, function return values are stored in memory at [ap1],[ap2],[\mathsf{ap} - 1], [\mathsf{ap} - 2], \dots, where ap\mathsf{ap} is the value of the allocation pointer at the end of the function.

The figure above illustrates the call stack for the following code:

f:
  call g
  ret
g:
  call h
  call h
  ret
h:
  ret

Nondeterministic Continuous Read-Only Random-Access Memory

Description

Cairo implements a Nondeterministic Continuous Read-Only Random-Access Memory model. In this model, memory is read-only — once written, values cannot be modified — and memory accesses must be contiguous.

What does “contiguous” mean? It means:

  1. If x<yx < y,

  2. and there is a memory access at xx,

  3. and there is a memory access at yy,

then there must also exist a memory access at every address aa such that x<a<yx < a < y.

In this model, the cost depends on the number of memory accesses, not on how much total memory is used. Therefore, developers do not need to worry about freeing or reusing memory cells.


Constraints

Let

L1={(ai,vi)}i=0n1,L2={(ai,vi)}i=0n1L_1 = \{(a_i, v_i)\}_{i=0}^{n-1}, \quad L_2 = \{(a_i', v_i')\}_{i=0}^{n-1}

be two lists of memory accesses.

The following constraints ensure Cairo’s continuous, read-only memory behavior:

  • Continuity

    (ai+1ai)(ai+1ai1)=0for all i[0,n1)(a_{i+1}' - a_i')(a_{i+1}' - a_i' - 1) = 0 \quad \text{for all } i \in [0, n-1)

    → Ensures consecutive addresses differ by exactly 1.

  • Single-valuedness

    (vi+1vi)(ai+1ai1)=0for all i[0,n1)(v_{i+1}' - v_i')(a_{i+1}' - a_i' - 1) = 0 \quad \text{for all } i \in [0, n-1)

    → Ensures that if two accesses share the same address, their values must be identical.

  • Permutation

    The permutation argument enforces that the two memory access lists correspond to the same mapping.

    • Initial value

      (z(a0+α×v0))×p0=z(a0+α×v0)(z - (a_0' + \alpha \times v_0')) \times p_0 = z - (a_0 + \alpha \times v_0)
    • Final value

      pn1=1p_{n-1} = 1
    • Cumulative product step

      (z(ai+α×vi))×pi=z(ai+α×vi)for all i[1,n)(z - (a_{i}' + \alpha \times v_{i}')) \times p_i = z - (a_i + \alpha \times v_i) \quad \text{for all } i \in [1, n)

If these constraints hold, L1L_1 forms a continuous read-only memory.


Summary

  • Each memory access uses five trace cells: (ai,vi,ai+1,vi+1,z)(a_i, v_i, a_{i+1}, v_{i+1}, z)

  • Each instruction performs three memory accesses: op0,op1,dst\mathsf{op0}, \mathsf{op1}, \mathsf{dst}


Public Memory

Description

The program’s input, output, and bytecode that are required for verification must be shared with the verifier. These are provided as part of the input mm^* to the Non-deterministic Cairo Machine.

The prover must then demonstrate the existence of an extended memory function mm that is consistent with mm^*.


Constraints

  • Append A|A^*| pairs of (0,0)(0, 0) to L1L_1, and append {(a,m(a))}aA\{(a, m^*(a))\}_{a \in A^*} to L2L_2.

  • Modify the final value constraint as follows:

aA(z(a+α×m(a)))zA\frac{\prod_{a \in A^*} (z - (a + \alpha \times m^*(a)))}{z^{|A^*|}}

This ensures that the public memory mm^* is properly incorporated into the permutation argument used to verify memory consistency.


Program Input & Output

  • Program input: The data that serves as the program’s input. This data does not need to be shared with the verifier.

  • Program output: The data produced during program execution. This data is shared with the verifier.


For example, suppose the program input is nn, and the goal is to compute the nn-th Fibonacci number. Depending on how the program output is defined, the proof statement differs:

Output Contents
Meaning of the Statement

both nn and yy

“The nn-th Fibonacci number is yy.”

only yy

“I know some nn such that the nn-th Fibonacci number is yy.” (but nn is not revealed)

only nn

“I have computed the nn-th Fibonacci number.” (but the result is hidden)


Conclusion

Cairo demonstrates how a universal, proof-friendly CPU architecture can bridge the gap between programmability and efficiency in zero-knowledge proving systems.

Unlike ASIC-style zkDSLs, which generate a dedicated circuit for each program, Cairo models computation at the CPU level — allowing any program to be proven on the same proving system. This shift from “program-specific circuits” to “program-verifying machines” is what makes Cairo fundamentally scalable and expressive.

Reference

Written by ryan Kim from A41

Last updated