Some (very special) computer programs naturally correspond to arithmetic circuits. One example is the computer program that performs naive multiplication of matrices over some field. But most computer programs are neither straight-line nor algebraic. They often involve conditional statements, operations such as integer division or floating point arithmetic that do not naturally correspond to finite field arithmetic, and more. In these cases, frontend overhead will be substantial.
One popular frontend approach is to produce circuits that essentially execute step-by-step some simple CPU, also called a virtual machine (VM). Frontend designers specify a set of “primitive operations” analogous to assembly instructions for real computer processors. Developers who want to use the frontend will either write “witness-checking programs” directly in the assembly language or else in some higher-level language such as Solidity, and have their programs automatically compiled into assembly code.
For example, StarkWare’s Cairo is a very limited assembly language in which assembly instructions roughly permit addition and multiplication over a finite field, function calls, and reads and writes to an immutable (i.e., write-once) memory. The Cairo VM is a von Neumann architecture, meaning that the circuits produced by the frontend essentially take a Cairo program as public input and “run” the program on the witness. The Cairo language is Turing Complete — despite its limited instruction set, it can simulate more standard architectures, although doing so may be expensive. The Cairo frontend turns Cairo programs executing T primitive instructions into what is called a “degree-2 AIR with T rows and about 50 columns.” What exactly this means is not important for this post, but as far as SNARK provers are concerned, this corresponds to circuits with between 50 and 100 gates for each of the T steps of the Cairo CPU.
RISC Zero takes a similar approach to Cairo, but with the virtual machine being the so-called RISC-V architecture, an open-source architecture with a rich software ecosystem that is growing in popularity. As a very simple instruction set, designing an efficient SNARK frontend that supports it may be relatively tractable (at least relative to complicated architectures such as x86 or ARM). As of May, RISC Zero is turning programs executing T primitive RISC-V instructions into degree-5 AIRs with 3T rows and 160 columns. This corresponds to circuits with at least 500 gates per step of the RISC-V CPU. Further improvements are anticipated in the near future.
The various zkEVM projects (e.g., zkSync 2.0, Scroll, Polygon’s zkEVM) take the virtual machine to be (duh) the Ethereum Virtual Machine. The process of turning every EVM instruction into an equivalent “gadget” (i.e., an optimized circuit implementing the instruction) is substantially more involved than for the simpler Cairo and RISC-V architectures. For this and other reasons, some of the zkEVM projects are not directly implementing the EVM instruction set but rather compiling high-level Solidity programs into other assembly languages before turning those into circuits. Performance results from these projects are pending.
“CPU emulator” projects such as RISC-V and Cairo produce a single circuit that can handle all programs in the associated assembly language. Alternative approaches are “ASIC-like,” producing different circuits for different programs. This ASIC-like approach can yield smaller circuits for some programs, especially when the assembly instruction that the program executes at each timestep does not depend on the program’s input. For example, it can potentially avoid frontend overhead entirely for straight-line programs such as naive matrix multiplication. But the ASIC approach also seems highly limited; as far as I know, it’s not known how to use it to support loops without predetermined iteration bounds.
The final component of frontend overhead comes from the fact that all SNARKs use circuits that operate over finite fields. The CPU on your laptop can multiply or add two integers with a single machine instruction. If a frontend outputs a circuit over a field of large enough characteristic, it can essentially simulate that multiplication or addition via the corresponding field operation. However, implementing the field operation on a real CPU will typically require many machine instructions (although some modern processors do have native support for certain fields).
Some SNARK backends enable more flexible field choices than others. For example, if the backend makes use of a cryptographic group G, the circuit’s field has to match the number of elements in G, which can be limiting. In addition, not all fields support practical FFT algorithms.
There is only one implemented SNARK, Brakedown, that natively supports computations over arbitrary (large enough) fields. Along with its descendents, it has the fastest known concrete prover performance even over fields that other SNARKs support, but its proofs are currently too large for many blockchain applications. Recent work seeks to improve the proof size, but the prover is asymptotically slower and there appear to be barriers to practicality.
Some projects have chosen to work over fields with particularly fast arithmetic. For example, Plonky2 and others use a field of characteristic 264 – 232 + 1 because arithmetic in this field can be implemented several times faster than in less structured fields. However, using such a small characteristic may lead to challenges in efficiently representing integer arithmetic via field operations. (For instance, multiplying a 32-bit unsigned integer by any number larger than 232 might yield a result greater than the field characteristic. Hence, the field choice naturally supports arithmetic only on 32-bit numbers.)
But no matter what, for all popular SNARKs today to achieve 128 bits of security (without a significant increase in verification costs), they have to work over a field of size larger than 2128. As far as I can tell, this means that each field operation is going to require at least about ten 64-bit machine multiplications, and considerably more additions and bitwise operations. Hence, one should factor in at least an order of magnitude frontend overhead due to the need for circuits that operate over finite fields.
To summarize, existing frontends that use a virtual machine abstraction produce circuits with 100x to 1000x gates per step of the virtual machine, and possibly more for more complicated virtual machines. On top of that, finite field arithmetic is at least 10x slower than analogous instructions on modern processors (with possible exceptions if the processor has built-in support for certain fields). An “ASIC frontend approach” might reduce some of these overheads but is currently limited in the kinds of programs it can support.