The path to secure and efficient zkVMs: How to track progress

3.11.25

zkVMs (zero knowledge virtual machines) promise to “democratize SNARKs” by allowing anyone, even those with no specialized SNARK expertise, to prove that they have correctly run any program on a given input (or witness). Their core strength lies in the developer experience, but currently they present enormous challenges in both security and performance. For zkVMs to fulfill their promise, designers will have to overcome these challenges. In this post, I lay out the likely stages for their development, the completion of which will take several years. Don’t let anyone tell you differently.

The challenges

On the security side, zkVMs are highly complex software projects still riddled with bugs. On the performance side, proving correct execution of a program can be hundreds of thousands of times slower than running it natively, making real-world deployment for most applications untenable right now.

Despite these realities, much of the blockchain industry portrays zkVMs as ready for immediate deployment. Indeed, some projects already pay significant computation costs to generate proofs of onchain activity. Because of the bugs, this is merely an expensive way of pretending a system is secured by a SNARK when it is actually either secured by permissioning, or far worse, vulnerable to attack.

The truth is, we are still years away from meeting even the most basic goals for a secure and performant zkVM. This post proposes a series of staged, concrete goals to track our collective progress — goals that cut through hype and help the community focus on genuine advancements.

Stages of security

Background

SNARK-based zkVMs typically include two main components:

Polynomial Interactive Oracle Proof (PIOP): The interactive proof framework used to prove statements about polynomials (or constraints derived from them).
Polynomial Commitment Scheme (PCS): Ensures that the prover cannot lie about polynomial evaluations without being detected.

zkVMs essentially encode valid execution traces as constraint systems — broadly meaning that they enforce correct register and memory usage by a virtual machine — and then apply a SNARK to prove these constraints are satisfied.

The only path to ensuring that systems as complicated as zkVMs are free of bugs is formal verification. Below is a breakdown of security stages. Stage 1 focuses entirely on correct protocols while Stages 2 and 3 focus on correct implementations.

Security stage 1: Correct protocols

Stage 1a

A formally verified proof of soundness for the PIOP.

Stage 1b

A formally verified proof that the PCS is binding under some cryptographic assumption or an idealized model.

Stage 1c

If using Fiat-Shamir, a formally verified proof that the succinct argument obtained by combining the PIOP and the PCS is secure in the random oracle model (augmented with other cryptographic assumptions as appropriate).

Stage 1d

A formally verified proof that the constraint system to which the PIOP is applied is equivalent to the VM’s semantics.

Stage 1e

A comprehensive “gluing together” of all these pieces into a single, formally verified proof of a secure SNARK for running any program specified by the VM’s bytecode. If the protocol intends to achieve zero knowledge, this property must also be formally verified, ensuring that no sensitive information about the witness is revealed.

Recursion Caveat: If the zkVM uses recursion, every PIOP, commitment scheme, and constraint system involved anywhere in that recursion must be verified to consider a sub-stage complete.

Security stage 2: Correct verifier implementation

A formally verified proof that an actual implementation of the zkVM verifier (in Rust, Solidity, etc.) matches the protocol verified in Stage 1. Achieving this ensures the implemented protocol is sound (rather than merely the design on paper, or an inefficient specification written in, say, Lean).

The reason Stage 2 focuses on the verifier implementation alone (and not the prover) is two fold. First, getting the verifier right is already sufficient for soundness (i.e., ensuring that the verifier cannot be convinced that any false statement is in fact true). Second, zkVM verifier implementations are over an order of magnitude simpler than prover implementations.

Security stage 3: Correct prover implementation

A formally verified proof that an actual implementation of the zkVM prover correctly generates proofs for the proof system verified in Stages 1 and 2. This ensures completeness — that is, any system using the zkVM will not get “stuck” with a statement that cannot be proven. If the prover intends to achieve zero knowledge, this property must be formally verified.

Projected timelines

Stage 1 Progress: We can expect incremental achievements over the next year (e.g., ZKLib is one such effort). But no zkVM is likely to fully meet Stage 1 for at least two years.
Stages 2 and 3: These can advance in parallel with some aspects of Stage 1. For example, some teams have shown that an implementation of the Plonk verifier matches the protocol in a paper (although the paper’s protocol itself might not be fully verified). Nevertheless, I do not expect any zkVM to reach Stage 3 in fewer than four years — and likely longer.

Key caveats: Fiat-Shamir security and verified bytecode

One major complication is that there are open research questions surrounding the security of the Fiat-Shamir transformation. All three stages treat Fiat-Shamir and random oracles as if their security is unassailable, but in reality the entire paradigm may turn out to have vulnerabilities. This is due to the difference between the random oracle idealization and actual hash functions used in practice. In the worst case, a system that has reached Stage 2 could later be found completely insecure due to Fiat-Shamir issues. This is cause for serious concern and continued research. We may need to modify the transformation itself to better guard against such vulnerabilities.

Systems without recursion are on somewhat firmer theoretical ground because certain known attacks involve circuits reminiscent of those used in recursive proofs. But the risk remains a fundamental open question.

Another caveat is that proving you’ve correctly run a computer program (specified via bytecode) has limited value if the bytecode itself is flawed. Consequently, the utility of zkVMs depends heavily on methods for generating formally verified bytecode — an enormous challenge that falls outside the scope of this post.

On post-quantum security

Over at least the next five years (and likely longer), quantum computers do not pose a serious threat, whereas bugs are an existential risk. So the primary focus now should be meeting the security and performance stages in this post. If we can meet them faster with SNARKs that are not quantum-secure then we should do so. And we should use those until either post-quantum SNARKs catch up, or there is a serious concern that a cryptographically relevant quantum computer is imminent.

Concrete security levels

100 bits of classical security is the absolute bare minimum anyone should deploy a SNARK securing anything of value (there are still some deployments that do not meet this low bar). Even this should not be acceptable: standard cryptographic practice is to use 128 bits of security and up. If SNARK performance were where it truly needs to be, we wouldn’t be skimping on security to eke out incremental performance gains.

Stages of performance

The Current Situation

Currently, zkVM provers incur overhead factors approaching one million times the cost of native execution. If a program takes X cycles to run, proving correct execution will cost on the order of X times 1 million CPU cycles. This was the case roughly one year ago, and it remains the case today (despite misconceptions to the contrary).

Popular narratives often frame this overhead in ways that sound acceptable. For example, you might hear:

“Generating proofs for all of Ethereum mainnet costs under a million dollars per year.”
“We nearly have real-time proof generation of Ethereum blocks using a cluster of dozens of GPUs.”
“Our latest zkVM is 1000× faster than its predecessor.”

While technically accurate, these claims can be misleading without proper context. For instance:

1000× faster than an older zkVM still leaves it very slow in absolute terms. This is much more a statement of how bad things were than how good things are.
There are already proposals to 10x the amount of computation processed by Ethereum mainnet. This will render current zkVM performance far too slow.
What people are calling “nearly real-time proving of Ethereum blocks” is still much slower than what many blockchain applications demand (e.g., Optimism has a block time of 2s, substantially faster than Ethereum’s 12s block time).
“Dozens of GPUs running at all times, without fail” falls short of acceptable liveness guarantees.
These prover times are often reported for proof sizes that are too large for many applications (e.g., over 1 MB).
Under a million dollars per year to prove all activity on Ethereum mainnet reflects the fact that Ethereum full nodes perform only about $25 worth of compute per year.

For applications beyond blockchains, such overheads are plainly too high. No amount of parallelization or engineering can offset such a massive overhead.

We should aim for slowdowns no worse than 100,000× relative to native execution as a basic baseline — and even that is only the first step. True mainstream adoption probably requires overheads close to 10,000× or lower.

Measuring performance

SNARK performance has three primary components:

Intrinsic efficiency of the underlying proof system.
Application-specific optimizations (e.g., pre-compiles).
Engineering and hardware speedups (e.g., GPUs, FPGAs, or multi-core CPUs).

While (2) and (3) are vital for real-world deployments, they’re generally available to any proof system so they don’t necessarily reflect progress on fundamental overheads. For instance, adding GPU acceleration plus pre-compiles to a zkEVM can easily yield a 50× speedup over a purely CPU-based, no-precompile approach — enough to make an intrinsically less efficient system appear superior to one that simply hasn’t received the same polishing.

Consequently, this post focuses on measuring how well a SNARK performs absent specialized hardware and pre-compiles. It’s a departure from current benchmarking approaches, which often lump all three factors into a single “headline number.” This is akin to judging diamonds by how many hours they’ve been polished rather than by their innate clarity.

The goal here is to isolate the intrinsic overhead of general-purpose proof systems — minimizing barriers to entry for underexplored techniques and allowing the community to cut through confounding factors to focus on true progress in proof-system design.

Performance stages

Below are my proposed milestones for performance, organized into five stages. First, we need to slash prover overhead on CPUs by multiple orders of magnitude. Only then should the focus turn to further reductions via hardware. Memory usage must improve as well.

In all of the stages below, developers should not have to tailor their code to the zkVM setting to achieve the necessary performance. Developer experience is a primary benefit of zkVMs. Sacrificing DevEx to meet performance benchmarks defeats the point of both the benchmark and the zkVM itself.

These metrics focus on prover costs. However, any prover metric can be met trivially if one allows unbounded verifier costs (i.e., no upper limit on proof size or verification time). Consequently, for a system to qualify for the stages described, it is crucial to specify maximum values for both proof size and verification time.

Stage 1 Requirements: “Reasonably non-trivial verification costs”

Proof size: The proof must be smaller than the witness.
Verification time: Verifying the proof must be no slower than running the program natively (i.e., executing the computation without a proof of correctness).

These are bare minimum succinctness requirements. They ensure that proof sizes and verification times are not worse than the trivial approach of sending the witness to the verifier and having the verifier directly check it for correctness.

Stage 2 and beyond

Maximum proof size: 256 KB
Maximum verification time: 16 ms

These cutoffs are intentionally generous to accommodate novel, fast-prover techniques that may come with higher verification costs. At the same time, they exclude proofs so expensive that few projects would be willing to include them on a blockchain.

Speed stage 1

Single-threaded proving must be at most one hundred thousand times slower than native execution, measured across a range of applications (beyond just proving Ethereum blocks), without relying on pre-compiles.

Concretely, think of a RISC-V process running at roughly 3 billion cycles per second on a modern laptop. Achieving Stage 1 means you can prove about 30,000 RISC-V cycles per second (single-threaded) on that same laptop.

Verifier costs must be “reasonably non-trivial” as defined earlier.

Speed stage 2

Single-threaded proving must be at most ten thousand times slower than native execution.

Alternatively, because some promising SNARK approaches (especially those over binary fields) are hindered by current CPUs and GPUs, you can reach this stage using FPGAs (or even ASICs) by comparing:

The number of RISC-V cores an FPGA can emulate at native speed, vs.
The number of FPGAs required to emulate and prove RISC-V execution in (near) real-time.

If the latter number is at most a factor of 10,000 more than the former, you qualify for Stage 2.

Proof size must be at most 256 KB and verifier time at most 16 ms on standard CPUs.

Speed stage 3

In addition to achieving Speed Stage 2, also achieve a proving overhead of under 1000× (for a wide range of applications) using automatically synthesized and formally verified pre-compiles. Essentially, dynamically customize an instruction set for each program to accelerate proving, but do so in an easy-to-use and formally verified manner.

(For a deeper discussion of why pre-compiles are a double-edged sword — and why “hand-rolled” pre-compiles are not a sustainable approach — see the next section.)

Memory stage 1

Stage 1 speed is achieved with less than 2 GB of memory required for the prover (while also achieving zero knowledge).

This is critical for many mobile devices or browsers and thus opens up countless client-side zkVM use cases. Client-side proving matters because our phones are our constant link to the real world: they track our locations, credentials, and more. If a proof requires more than 1-2 GB of memory to produce, it’s simply too much for most mobile devices today. Hitting that 2 GB threshold opens the door to real-time, on-device SNARK proofs for everything from privacy-preserving location checks to portable, verifiable credentials.

Two clarifications are in order:

The 2 GB space bound should hold even for large statements (those requiring many trillions of CPU cycles to run natively). A proof system that achieves the space bound only for small statements lacks broad applicability.
It is easy to keep the prover’s space below 2 GB of memory if the prover is very slow. So to make Memory Stage 1 non-trivial, I am requiring that Speed Stage 1 be met within the 2 GB space bound.

Memory stage 2

Stage 1 speed is achieved with less than 200 MB of memory usage (10x better than Memory Stage 1).

Why push below 2 GB? Consider a non-blockchain example: Every time you visit a website via HTTPS, you download certificates for authentication and encryption. Instead, websites could send zk-proofs of possession of these certificates. Large sites may issue millions of these proofs per second. If each proof requires 2 GB of memory to generate, you’re talking about petabytes of RAM in total. Pushing memory usage further down is vital for non-blockchain deployments.

Pre-compiles: The last mile or a crutch?

In zkVM design, a pre-compile refers to a special-purpose SNARK (or constraint system) tailored to specific functions — such as Keccak/SHA hashing or elliptic-curve group operations for digital signatures. Within Ethereum — where most of the heavy lifting involves Merkle hashing and signature checks — a few hand-crafted pre-compiles can reduce prover overhead. But relying on them as a crutch will not get SNARKs where they need to go.

Why pre-compiles are a crutch

Still too slow for most applications (both within and outside blockchains): Even with hashing and signature pre-compiles, current zkVMs remain too slow—both within and beyond blockchain contexts—because of deep inefficiencies in the core proof system.
Security failures: Hand-written pre-compiles that are not formally verified are almost certainly riddled with bugs, risking catastrophic security failures.
Suboptimal developer experience: In most zkVMs today, adding new pre-compiles means hand-authoring constraint systems for every function — essentially returning to a 1960s-style workflow. Even when using existing pre-compiles, developers must refactor code to call each one. We should be optimizing for security and developer experience, not sacrificing both to chase incremental performance gains. Doing so merely proves that performance isn’t where it needs to be.
Confounding benchmarks: Benchmarking on workloads dominated by a handful of repetitive cryptographic operations risks selecting for projects that have simply spent the most time optimizing hand-written constraint systems for that application. This is not the best path to advancing the science of SNARK design.
I/O Overheads and no RAM: Although pre-compiles improve performance for heavy cryptographic tasks, they may not deliver meaningful speedups for more diverse workloads because they incur major overhead for passing inputs/outputs, and they cannot use RAM.

Even within blockchain contexts, as soon as you move beyond a monolithic L1 like Ethereum — say, you want to build a slew of cross-chain bridges — you’re confronted with different hash functions and signature schemes. Throwing precompile after precompile at the problem doesn’t scale and introduces massive security risks.

For all of these reasons, our priority should be making the underlying zkVM more efficient. Whatever techniques produce the best zkVMs will also produce the best pre-compiles.

I do believe pre-compiles will remain critical in the long run, but only once they’re automatically synthesized and formally verified. That way, we can maintain the developer-experience benefits of a zkVM while avoiding catastrophic security risks. This perspective is reflected in Speed Stage 3.

Projected timelines

I expect a small number of zkVMs to achieve Speed Stage 1 and Memory Stage 1 later this year. I think we achieve Speed Stage 2 in the next two years as well, but it’s not yet clear we can get quite there without some new ideas that aren’t in the research literature yet.

I expect that the remaining stages (Speed Stage 3 and Memory Stage 2) will take several years to meet.

Although I have separately identified stages for zkVM security and performance in this post, these aspects of zkVMs are not entirely independent. As more vulnerabilities are discovered in zkVMs, I expect that some will be fixable only with a substantial loss in performance. Until a zkVM reaches Security Stage 2, performance results should be viewed as tentative.

***

zkVMs hold immense promise for making zero-knowledge proofs truly universal, but they remain in their infancy — rife with security challenges and debilitating performance overheads. Hype and marketing spin make it difficult to measure genuine progress. By articulating clear security and performance milestones, I hope to provide a roadmap that cuts through the noise. We’ll get there, but it will take time and sustained effort in both research and engineering.

***

Justin Thaler is Research Partner at a16z and an Associate Professor in the Department of Computer Science at Georgetown University. His research interests include verifiable computing, complexity theory, and algorithms for massive data sets.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.