A new era in SNARK design: Releasing Jolt

4.9.24

Today, a16z crypto research and engineering teams released an initial implementation of Jolt, a new approach to SNARK design that is already up to 2x faster than the state of the art, with more improvements still to come. See this post for a high-level overview to this body of work, as well as this post for benchmarks and code; as well as this post for technical specifics about Jolt’s architecture and performance compared to existing zkVMs.

Today we’re releasing Jolt, arguably the fastest zkVM to date, which offers up to a 2x improvement over the current state of the art. This initial implementation is only a starting point, and we expect significant improvements coming in the weeks and months to come.

As important as speed is, it’s just half the story: compared to any zkVM today, Jolt is simpler, and easier to audit and extend. This reduces the likelihood of errors and will facilitate building high-assurance implementations. Its performance and extensibility mean that more people will be able to apply it in more applications.

Challenges in SNARK design. Jolt is a zkVM (zero knowledge virtual machine) – a SNARK that lets the prover prove that it correctly ran a specified computer program, where the program is written in the assembly language of some simple CPU. zkVMs offer a fantastic developer experience: They make SNARKs usable by anyone who can write a computer program, eliminating the need for in-depth cryptographic knowledge. But usability comes at a steep price. Today’s zkVMs are remarkably complicated and offer terrible performance for the prover. Today, proving a computer program was run correctly is millions of times slower than simply running the program.

To address these challenges, Jolt deviates significantly from how prior zkVMs have been designed. In this post, I explain how Jolt’s approach is different, and why it has important advantages. (For an overview of how to use Jolt and how we engineered Jolt into reality, see this other post by my colleagues.)

The sum-check protocol and the lookup singularity. At the heart of Jolt is the sum-check protocol. While today’s popular SNARKs use univariate polynomials and so-called polynomial IOPs that use only one or two rounds, the sum-check protocol is based on multivariate polynomials and uses many rounds of interaction. It exploits this interaction to mitigate key performance bottlenecks for the prover, without significantly increasing costs for the verifier.

At a higher level of abstraction, Jolt builds on our earlier work called Lasso, which we implemented and released in August 2023. Lasso is a lookup argument: a way for the prover to cryptographically commit to many values and prove that those values all reside in a predetermined table. Like Jolt itself, Lasso uses the sum-check protocol to ensure that this proof is fast to compute.

Jolt uses Lasso to instantiate the lookup singularity. This means that, unlike the zkVMs deployed today, where protocol designers hand-specify optimized constraint systems to capture the primitive instructions of the CPU, Jolt simply applies Lasso to the table containing all evaluations of the instruction. This table is far too big for anyone to explicitly write down, but in Lasso neither the prover nor the verifier ever needs to. This is because the table is highly structured: any lookup into the big table can be answered by performing a handful of lookups into vastly smaller tables.

Jolt’s performance. On CPUs, the initial Jolt implementation is about 6x faster than RISC Zero, and up to 2x faster than the recently-released SP1 (although the comparison is somewhat complicated). The initial Jolt implementation uses an elliptic-curve-based commitment scheme called Hyrax. (We use Hyrax because it is simple and transparent: Comparable prover performance can be obtained using other curve-based commitment schemes with much lower verification costs.)

Several straightforward engineering optimizations are still unimplemented, and these will improve Jolt’s speed by another factor of about 1.5x in the next few weeks. More significantly, we will soon replace Hyrax with a hashing-based scheme called Binius whose development was partially inspired by Lasso and Jolt. The Binius commitment scheme is especially fast when committing to small values, which are the only values that need to be committed in Jolt. Incorporating Binius will yield an additional 5x speedup for the Jolt prover and perhaps much more.

Additional benefits. While Jolt is already the fastest zkVM to date, it has exciting benefits beyond performance. For example, because Jolt instantiates the lookup singularity, it will take only a couple of days of developer time to extend the instruction set to richer ISAs like RV64IM (which supports 64-bit rather than the 32-bit data types that Jolt supports today).

That’s a major benefit of handling all instructions via lookups: to add a primitive instruction to the VM, one specifies a “decomposition” of the new instruction into “sub-instructions” that operate on smaller data types. Specifying these decompositions is a light lift from a developer’s perspective.

Many additional benefits of Jolt’s approach can be found in #1 of my companion FAQ on Jolt’s initial implementation.

Implications. Upon the release of the Lasso and Jolt papers in September 2023, I made several claims that were met with vocal skepticism (see my companion FAQ for details). It’s worth revisiting these claims here.

Claims validated by Jolt’s initial performance

1. We’ve been approaching zkVM design the wrong way.

Before Jolt, and even still today, many in the community claimed that very simple instruction sets designed to be “SNARK-friendly” lead to faster zkVMs (see Cairo, Miden, Valida, etc.). In fact, Jolt is nearly as fast per instruction (and in some cases, faster) than prior zkVMs that have much simpler instruction sets. And Jolt is just as fast for any instruction set for which the instructions satisfy a natural decomposability property. This includes RISC-V, an instruction set that is far more complicated than purportedly SNARK-friendly ones.

This suggests that the characteristics making instruction sets compatible with conventional hardware also makes them friendly to sum-check-based SNARKs like Jolt. The prevailing approach to zkVM design, designing custom “SNARK-friendly” instruction sets, is thus doubly misguided: it hurts rather than helps performance, and it severs access to the extensive tooling developed for established ISAs such as RISC-V. See #7 of my companion FAQ for further discussion.

2. To obtain a fast prover, SNARKs should use the sum-check protocol to minimize the prover’s commitment costs. Under the hood, this entails using multivariate polynomials rather than univariate ones.

This is the top-level design philosophy behind Lasso and Jolt.

Committing to data is expensive for the prover. On top of that, the prover has to prove that the committed data is “well-formed,” that is, that the data complies with some predefined criteria. In other words, the prover must show that they’ve committed to the “correct” data, and not to incorrect or invalid data.

The more data the prover commits, the more expensive it is to compute the commitment, and the more work the prover has to do to prove the data is well-formed.

The sum-check protocol uses multiple rounds of interaction (which are later removed via the Fiat-Shamir transformation) to minimize the amount of committed data. This makes it fast to commit and to prove that the commitment is well-formed.

In contrast, today’s popular SNARKs all use constant-round polynomial IOPs. This is not the best path to obtain fast provers. Evidence shows that polynomial IOPs either need a lot of rounds (i.e., the sum-check protocol), or they need to commit a lot of data. Many people still haven’t understood this because one can give SNARKs with one-round or two-round polynomial IOPs, they’re just expensive. Jolt should help people finally catch on.

Jolt’s speedups over current methods may seem modest now, but they are poised to accelerate as known optimizations are implemented. And additional improvements should come as the techniques underlying Jolt attract additional attention and scrutiny.

3. Elliptic curve commitment schemes are not slower than today’s popular hashing-based SNARKs.

Repeating what I’ve said previously, until today, the main reason people have believed hashing-based commitment schemes are faster than curve-based ones is that popular SNARKs like Plonk require the prover to commit to random field elements rather than small field elements, and curve-based commitments are very slow in this context. (Plonk requires the prover to commit to 11 field elements per gate, 8 of which are random.)

Lasso and Jolt show that it’s not necessary to have the prover commit to random field elements. They also significantly reduce the number of committed field elements. Given this, curve-based SNARKs can be faster than the current generation of hashing-based ones.

When I made this claim in September, it was met with considerable skepticism. Jolt’s initial performance shows that this reaction was unfounded. In my companion FAQ, I explain exactly where the most prominent (and most incorrect) analyses erred.

This should not surprise anyone. Contrary to common belief, a wealth of clear evidence already showed that curve-based SNARKs can be faster than current hashing-based ones.

The zk-Bench paper found that, on simple benchmarks like field exponentiation, even Plonk and Groth16 are much faster than Starky (a hashing-based SNARK with a reputation for speed, which is not even recursive). Moreover, this holds even when the exponentiation for Starky is defined over the 64-bit Goldilocks field, and for Plonk and Groth16 it’s defined over the 254-bit scalar field of BN254 (though the benchmark did not maximally optimize Starky).
Other benchmarks have also found Groth16 to be significantly faster than plonky2 (a recursive hashing-based SNARK).
Nova, a curve-based SNARK, already outperforms today’s hashing-based SNARKs on some benchmarks. Perversely, one of these have been prominently cited as evidence that hashing-based SNARKs are the fastest available. In fact, the benchmark shows that Nova is about 60x faster than recursive hashing-based SNARKs like plonky2, and slightly faster than non-recursive ones like Starky (in this benchmark, Starky was run on a circuit that doesn’t correctly implement SHA256 and is at least 2x too small. This error has not been rectified).
Commitment schemes based on elliptic curves are known to be faster at committing to small values than hashing-based schemes that use Poseidon or even Poseidon2 as the hash function (as used by plonky2, RISC Zero, and many other projects).

Binius, with its extremely fast commitment for small values and newly-reduced proof size, is poised to finally move hashing-based schemes in front of curve-based schemes, perhaps by a significant margin (at least for applications that can naturally be represented over fields of characteristic two). But this will only hold when the Binius commitment scheme is combined with sum-check-based polynomial IOPs like Lasso and Jolt. Criticism of curve-based schemes predating Binius, Lasso, and Jolt was unfounded.

4. zkVMs are slow.

zkVMs provide an excellent developer experience, allowing anyone aiming to deploy a SNARK to write their witness-checking program in any high-level programming language that compiles to the assembly language of the VM (i.e., RISC-V for Jolt). However, they come with massive prover overhead.

Relative to native execution of a RISC-V program, how much slower is the Jolt prover today? The answer is about 500,000 times slower.

Since Jolt is 6x (or more) faster than deployed zkVMs (RISC Zero, Stone, etc.), this means that today’s deployments are actually millions of times slower than native execution. Marketing departments can say whatever they want. The reality is that such huge overheads render today’s deployed zkVMs untenable for their intended applications.

Where does this 500,000 number come from? The Jolt prover spends about 1/5 of its time on commitments (though this fraction will increase as we optimize it further), and this involves about 100 group operations per step of the RISC-V CPU (which is defined over 32-bit data types). Each group operation translates to about 6 field multiplications in a 256-bit field. And each field operation (using Montgomery modular multiplication) takes about 150 CPU cycles on CPUs with 32-bit registers. So proving each step of the RISC-V CPU requires 5 ⋅ 100 ⋅ 6 ⋅ 150 = 450,000 cycles of the RISC-V CPU.

This calculation is broadly consistent with how long the Jolt prover takes to run on an M3 Max laptop today: Single-threaded, the Jolt prover proves about 9,000 RISC-V cycles per second (9 kHz), which is about 500,000 times fewer than the M3’s clock speed of about 4 billion clock cycles per second. With 16 threads, Jolt achieves about 90 kHz today.

This poor performance is why all deployed zkVMs today use “precompiles,” or hand-optimized protocols for specific computations that come up over and over again, like SHA2 or Keccak evaluations. But overreliance on precompiles is dangerous: designing pre-complies is exactly the error-prone and time-intensive process that zkVMs are meant to obviate.

The overhead of zkVMs will need to decrease to a factor of 10,000 relative to native execution – meaning 50 times faster than Jolt’s current speed – before zkVMs can approach their potential as a cornerstone technology for blockchain scalability.

5. What’s special about the sum-check protocol?

Why is the sum-check protocol able to minimize commitment costs for the prover? One answer is that it is a many-round interactive proof. In SNARK applications, this typically means logarithmically many rounds, though proofs can be short even if the number of rounds is set to a moderately large constant (say, 6 or more).

And the sum-check protocol is not just any many-round interactive proof: it is the most efficient and powerful one. For example, in certain settings we know that it achieves optimal tradeoffs between round complexity and verifier costs.

In SNARK design, round complexity ultimately doesn’t matter, because all interaction is removed via the Fiat-Shamir transformation. (In general, applying Fiat-Shamir to many-round protocols can lead to a major security loss even when modeling hash functions as random oracles, but this does not happen for the sum-check protocol because it satisfies a technical security notion called round-by-round soundness.)

So there’s no reason not to use many rounds. But today’s SNARK deployments don’t. Or, more precisely, almost all of the interaction they do use is contained “inside” the polynomial commitment scheme, where it doesn’t help make the prover fast.

Specifically, today’s popular SNARKs are based on polynomial IOPs with only one or two rounds. It is possible to “simulate” the functionality of the sum-check protocol with just one or two rounds, but this comes at the price of much higher prover costs.

For example, the “univariate sum-check protocol” is a one-round polynomial IOP that solves a problem analogous to the one solved by the sum-check protocol itself. (The name is confusing; the “univariate sum-check protocol” has nothing to do with the actual sum-check protocol, aside from solving a similar problem.) The univariate protocol demands extra work from the prover, like computing a quotient polynomial and committing to it, and it imposes restrictions on the finite field used.

Embracing the sum-check protocol. Interaction is a tool to reduce commitment costs for the prover. We must use it to the fullest. Transitioning to SNARKs based on the sum-check protocol and multivariate polynomials will require time and effort, but it’s necessary to achieve scalable provers.

***

Just as the path to fast SNARK provers goes through the sum-check protocol, the path to fast zkVMs goes through Jolt. The initial implementation is now complete, but there is a vast amount of building left to do. Check out the (partial) list of tasks on our roadmap, and get in touch if you’d like to contribute.

***

Justin Thaler is Research Partner at a16z and an Associate Professor in the Department of Computer Science at Georgetown University. His research interests include verifiable computing, complexity theory, and algorithms for massive data sets.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investment-list/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures/ for additional important information.