1. Relative to native execution of a RISC-V program, how fast do you think the Jolt prover can be, once Jolt is re-implemented to use D&P’s commitments?
Here is a crude and speculative estimate. The Jolt prover commits to about 800 bits of data per step of the RISC-V CPU on 32-bit data types with multiplication extension (two caveats: first, some RISC-V instructions are dealt with via multiple pseudoinstructions, e.g., a division instruction is handled by having the prover provide the quotient and remainder, and checking both via a multiplication, addition, and inequality check. Second, this number might change slightly after we work over the lookup tables’ decompositions of GF[2128]).
Using D&P’s commitment scheme and assuming recursion will not be a bottleneck, for a T-step computation, the main commitment costs are as follows.
First, apply additive FFTs to about 200 T bytes of data in total. More precisely, the Ligero/Brakedown prover does O([latex]\sqrt{T}[/latex]) independent FFTs each of size O([latex]\sqrt{T}[/latex]). (This involves less total work and superior parallelization than than a single FFT of length O(T).) Second, hash about 200 T bytes using a standard hash function like Keccak or SHA2.
Empirically, D&P find that the FFT and hashing are roughly equal contributors to prover time.
Using an estimate of about 70 RISC-V cycles per byte hashed by Keccak, we can expect these operations to be about 30,000 times slower than simply running the RISC-V program with no proof of correctness. That is, for the Jolt prover to prove it correctly ran a RISC-V program Ψ, the prover (when itself implemented in RISC-V) will take at least 30,000 times more cycles than Ψ itself.
These commitment costs are “fast” enough that the bottleneck for the prover may be the field operations it performs in the sum-check protocol (even when incorporating optimizations for small-characteristic fields). So as a rough and speculative ballpark, I’d guess the Jolt prover (when itself implemented in RISC-V) will be about 50,000 times slower than simply running the RISC-V program.
This entire calculation is a little bit silly: When Jolt is deployed, the prover itself is not likely to be implemented in RISC-V. But it does give a rough sense of how one might think about estimating prover overhead for a zkVM.
Although a 50,000-fold slowdown may seem large, it is 20x faster than the 1-million-fold overhead I optimistically estimated was achievable about 18 months ago. The majority of this improvement is due to the smaller amount of committed data (and smaller size of each committed value) unlocked by Lasso and Jolt. The rest is due to better commitment schemes (e.g., improved ability to use fast hash functions and leverage smallness of committed values in hashing-based SNARKs).
2. D&P give fast SNARKs for Keccak and Grøstl. Why these hash functions? What other hash functions are amenable to these techniques?
Keccak/SHA3 is an obvious choice, as it is a NIST standard, an Ethereum pre-compile, and a key bottleneck for Type-1 zkEVMs. (SHA3 is the official NIST standard, Keccak is the Ethereum pre-compile, and they differ only in minor ways that are irrelevant for SNARK costs.)
D&P consider Grøstl because it should lead to an even faster prover, while maintaining many of the benefits of Keccak. In particular, Grøstl has been subjected to strong cryptanalytic scrutiny because it made it to the final round of NIST’s SHA-3 competition (even though it wasn’t selected in the end), and because it uses the AES S-box. And because of AES acceleration instructions, AES-NI, Grøstl runs even faster on Intel chips than Keccak.
D&P’s prover should be faster for Grøstl than Keccak because Grøstl is essentially natively defined over GF[28], which means D&P’s prover can commit to fewer field elements than with Keccak. (See Item #9 below for details on how this benefits the prover.) The upshot is that Grøstl should be even more amenable than Keccak to (recursive) SNARKs, because it’s faster both for the prover and on chip.
D&P’s SNARKs are not at all specific to Keccak and Grøstl. The techniques should be applicable to plenty of other hash functions. For example, D&P expect SHA2 to also be as nice as Keccak, but have not yet worked through the details.
3. I thought that Lasso/Jolt were targeted at elliptic-curve based commitment schemes?
No, Lasso and Jolt are not specific to curve-based commitments. But their benefits over prior work, as of a couple months ago, were most obvious when combined with curve-based commitments. This is because curve-based commitments pay a particularly high price when the prover has to commit to random field elements, so Lasso/Jolt’s novel ability to avoid this has its most dramatic performance implications when these commitments are used.
In short, while no one had previously designed SNARKs using curve-based commitments to take advantage of small values being committed, this had, to some extent, already been exploited by hashing-based commitments that work over small fields.
But there are two ways that Lasso and Jolt improve over prior work even when using hashing-based commitments. First, D&P show that hashing-based commitments can benefit from only small field elements being committed in much stronger ways than previously known. For example, whereas today’s commitment schemes incur the same prover cost to commit to a 1-bit value as a 32-bit value, with D&P’s scheme it is nearly 32 times cheaper to commit to a 1-bit value. (See my other post for details.) Second, not only do Lasso and Jolt ensure the prover commits only to small field elements, they also ensure that the prover commits to fewer field elements than with non-sum-check-based SNARKs. Indeed, in Jolt we carefully counted the total bit-complexity across all committed field elements and confirmed that it is substantially less than what is done in existing zkVMs.
Another technical nuisance caused us to highlight curve-based commitments when Lasso/Jolt were released a few months ago: The only hashing-based commitment scheme with polylogarithmic proof size, FRI, was targeted at univariate polynomials, while Lasso/Jolt use multilinear polynomials. (See this recent blog post for an exposition.) Several known transformations adapt FRI into a commitment scheme for multilinear polynomials, but these add overheads (in prover time, proof size, or both) that we consider highly unattractive. BaseFold newly allows “direct” multilinear commitments with polylogarithmic proof size, although the resulting prover is slower than Brakedown’s and the proofs are bigger than FRI’s.
In contrast to FRI, the Ligero/Brakedown commitment scheme directly applies to multilinear polynomials and has a very fast prover. But previously, applying recursion to lower its proof size was difficult because the verifier does a lot of hash operations, making recursion expensive. By giving much faster SNARKs for hashing, D&P’s work will substantially reduce the cost of this recursion.
4. Haven’t you said that curve-based commitment schemes are faster than hashing-based ones (when committing only to small values, as in Lasso/Jolt)? Doesn’t that contradict your endorsement of D&P’s SNARKs?
First, as I’ve said previously, there are important SNARK applications where hashing-based SNARKs would definitely not be the most performant, because it makes sense to work over the base field of the elliptic curve group. When working over these fields, curve-based commitments are faster. Proving any statement about elliptic curve cryptosystems (including knowledge of ECDSA digital signatures authorizing blockchain transactions) falls into this category.
Second, even in applications where it is reasonable to work over fields of small characteristic, the performance comparison of hashing-based schemes to curve-based ones is complicated. For example, it depends heavily on how fast the hash function used in the hashing-based scheme is. Today, many (but not all) projects use slower hash functions like Poseidon to enable recursion. With such hash functions, hashing-based schemes are definitely slower than curve-based ones (when committing to small values as in Lasso/Jolt). Even with fast hash functions, it’s not at all clear that they are faster (as per my previous comments).
But D&P boost hashing-based commitments by speeding them up when working over fields of characteristic two and allowing the prover to better leverage the smallness of committed values, compared to existing hashing-based schemes like FRI. Hence, my current expectation is that Ligero/Brakedown over fields of characteristic two will be the way to go, unless proving statements natively defined over other finite fields.
In summary, until today, the main reason people have believed that hashing-based commitment schemes are faster than curve-based ones is that popular SNARKs like Plonk require the prover to commit to random field elements, rather than small field elements, and curve-based commitment schemes are very slow in this context. Lasso and Jolt show that it is not necessary to have the prover commit to random field elements. And in this context, the comparison is, at a minimum, more nuanced. Up until today, curve-based schemes have actually been faster, but with D&P’s improvements, the reverse is now true (except for statements natively defined over a large field).
5. Haven’t you said that hashing-based commitment schemes have low security?
There is nothing inherently insecure about hashing-based commitment schemes like FRI or Ligero/Brakedown. But projects have pervasively prioritized performance over security by deploying FRI at configurations at which known attacks are close to feasible, and assuming that these known attacks on FRI are exactly optimal.
One nice thing about the Ligero/Brakedown commitment is that the main conjectures about FRI (namely, its conjectured security at proximity parameters beyond the Johnson bound) are not relevant, so there is no temptation for SNARK designers to base security on such conjectures.
Similarly, I have long been apprehensive about the use of ostensibly “SNARK-friendly” hashes like Poseidon in SNARK deployments. The security of these hashes (even the ones that have been around the longest) has not been scrutinized nearly as much as standard hashes like Keccak.
In both cases above, my view is that projects have been jeopardizing security in order to paper over the performance deficiencies of today’s SNARKs. The easiest way to eliminate these practices is to simply develop better-performing SNARKs.
Relatedly, I view the current practice of hand-designing “SNARK-friendly” VMs, and constraint systems implementing those VMs, as a major security issue (as well as a massive consumer of developer resources), due to the error-prone nature of designing the constraint systems and of developing new compilers from high-level languages to the assembly code of the custom VMs. I hope that Jolt renders this practice obsolete, by showing that standard instruction sets are indeed just as SNARK-friendly, and eliminating any need or incentive to hand-design constraint systems implementing those instruction sets.
The way to eliminate a practice with negative security implications is to come up with more performative SNARKs that render the practice irrelevant.
6. D&P use the Reed-Solomon code in their implemented polynomial commitment scheme, but Brakedown can use any code. Is it worth exploring other codes for possible performance benefits?
7. In what ways are Ligero/Brakedown better for the prover than FRI?
D&P’s dramatically improved prover time when committing to very small values is unique to Ligero/Brakedown, at least for the time being. In addition:
- D&P not only improve prover time when committing to small values, they improve prover space. This is a major bottleneck, especially for hashing-based SNARKs. For example, Polygon’s zkEVM prover requires 250+ GBs today, to prove a circuit that processes a batch of transactions consuming roughly 10 million gas.
- Ligero/Brakedown has much more flexibility in terms of the error-correcting code used. Indeed, much of D&P’s improvements for committing to small values can be obtained simply by using concatenated codes within Ligero/Brakedown.
- When using the Reed-Solomon code, the Ligero/Brakedown prover does many small FFTs rather than one big FFT. This saves a factor of two in runtime of the FFT, and is better for parallelization.
FRI also needs to do both an FFT and an IFFT (at a technical level, this stems from FRI needing to actually evaluate the committed polynomial at many points). The Ligero/Brakedown prover can skip the IFFT (at a technical level, skipping the IFFT stems from Ligero/Brakedown’s superior flexibility in choice of error-correcting code). This saves another 33% on prover time if using a “Reed-Solomon blowup factor” of 2 (this is the blowup factor that optimizes for prover time).
- Ligero/Brakedown evaluation proofs don’t require the prover to do extra Merkle hashing. FRI does (although most invocations of FRI amortize the cost of evaluation proofs over many committed polynomials).
8. Can you sketch how D&P ensures that committing to a GF[2] element is cheaper than a GF[2^2] element, which is cheaper than a GF[2^4] element, and so on?
The time it takes the prover to commit to a bunch of values with D&P’s commitment scheme roughly depends only on the total number of bits required to specify those values, where b bits are required to specify a value between 0 and 2b. Other prover costs in D&P’s SNARKs do grow with the number of field elements used to encode those bits (see #9 below for details). Here is how D&P achieve this.
The Brakedown polynomial commitment scheme encodes (subvectors of) the vector of values to be committed with any desired error-correcting code. Say a bunch of the values to be committed are in GF[2], but we’d like the code itself to operate over the larger field GF[216]. (There are many technical reasons we’d want to do this, and in fact, it’s necessary if we want to apply certain codes to vectors of length up to 216.)
To accomplish this, we can simply use code concatenation, which entails breaking all of the GF[2] values into chunks of size 16, and “packing” each chunk of 16 GF[2] values into a single GF[216] field element. This reduces the number of field elements to be committed by a factor of 16. We can then apply any error-correcting code that works over the field GF[216] (this is called the “outer code”). Each symbol of the resulting codeword can then be “unpacked” into sixteen GF[2] elements, and the result encoded with an “inner code” that’s defined over GF[2].
Effectively, the concatenated approach lowers the length of the committed vector (as measured by number of field elements) by a factor of 16 but requires the prover to do packing and unpacking operations, as well as apply the inner code to each (unpacked) symbol of the outer codeword.
This simple approach, of applying Brakedown with concatenated codes, already achieves many of the benefits of D&P’s work. But D&P take a different approach that leads to an even faster prover (at the cost of slightly bigger proofs). For example, D&P’s actual approach avoids the cost of applying the inner code to each unpacked symbol of the outer codeword.
9. Since D&P’s commitment scheme makes it so cheap to commit to values in {0, 1}, why not just have the prover commit to bit-decompositions of all values arising in the computation? That is, why not implement the entire computation with a Boolean circuit, and have the SNARK assign a “full” field element to each input bit and gate in the circuit?
In their SNARK for Keccak, D&P do actually have the prover only commit to values in {0, 1}, but this is not necessarily a good idea in general.
It is true that D&P’s commitment time roughly grows with the sum of the bit-complexities of all the committed values, independent of how many field elements those values are split across (and this is why it turns out to be a reasonable idea to commit only to one-bit values in a SNARK for Keccak).
However, this does not mean that all costs are independent of the number of field elements committed. In particular, the size of the commitment scheme’s evaluation proofs grows with the (square root of) the number of committed field elements.
Another cost that grows with the number of field elements committed is the number of terms that need to be summed within some applications of the sum-check protocol in D&P’s SNARKs. Roughly speaking, committing to x times more field elements entails the sum-check prover summing x times as many terms. There are optimizations available that mitigate this overhead, but the mitigation is not perfect. That is, the sum-check prover is still likely to be slower if x one-bit values are committed, compared to when those values are packed into a single x-bit field element before committing.
D&P mitigate this latter cost of committing to many one-bit values by providing sum-check-based protocols to pack many one-bit values into a single field element after those values have been committed. This lets them avoid paying much in terms of sum-check-prover time for the many committed values, while still reaping their benefits (in particular, once a bit decomposition is committed, certain operations like bitwise AND do not cost any additional commitments when they are proved via sum-check).
10. What specifically are the benefits of working over fields of characteristic two exploited by D&P?
There are many. Here is a partial list.
- D&P heavily exploit tower field constructions. In the context of fields of characteristic two, this refers to constructing GF[22] as a degree-2 extension of GF[2], then GF[24] as a degree-2 extension of GF[4], then GF[28] as a degree-2 extension of GF[24], and so forth. Especially efficient tower constructions are known for fields of characteristic two.
- The sum-check protocol computes [latex] \sum_{x ∈ {0, 1}^n} g(x) [/latex] for a multivariate polynomial g. The Boolean hypercube {0, 1}n (and its subcubes) are power-of-two size, so subfields align nicely with subcubes. D&P exploit this to make it easy to pack many small-field elements into a single element of a larger extension field.
- D&P currently use Reed-Solomon codes in the Ligero/Brakedown polynomial commitment scheme. Efficient Reed-Solomon encoding requires additive FFTs, which work very nicely in fields of characteristic two, but not in other fields. However, using other codes would avoid the need for FFTs completely.
- Fields of characteristic two are handled nicely by real hardware. Real-world computers are based on power-of-two-sized data types. You can perfectly fit maximal information in registers, cache lines, and so on, with no padding. Intel even has primitive instructions built into chips that make arithmetic in GF[28] especially fast (Galois Field New Instructions [GFNI]). This can be leveraged to achieve very fast GF[2k] arithmetic, even for k > 8, when tower constructions are used.
11. Aren’t there limits to how small one can make SNARK proofs by recursively composing SNARKs using D&P’s commitment scheme with themselves?
Yes, the “recursion threshold” of SNARKs using Ligero/Brakedown commitments is somewhat high. Recursion threshold refers to the proof size, such that recursively applying a Brakedown/Ligero-based SNARK to the verifier no longer yields a shorter proof. I expect the recursion threshold to be on the order of a few MBs.
If smaller proofs are desired, I expect that this can be achieved by composing with other, smaller-proof SNARKs, as detailed in #12 below. (In the event this turns out to be wrong, it should not be viewed as a failure of Binius, but rather as an indictment of the scalability of today’s popular SNARKs. If they can’t prove in a reasonable amount of time that a couple of MBs worth of data was hashed, how can we call them scalable?)
Regardless, there are reasons beyond lowering proof size that fast recursive composition is important. Most notably, it is a crucial tool for controlling prover space requirements. Because (non-recursive) SNARKs are highly space-intensive for the prover, people will break large computations up into small pieces, prove each piece separately, and use recursive proving to “link” the pieces together (see this blog post from RISC Zero for an exposition of this). D&P’s fast SNARKs for standard hash functions like Keccak enable this recursive proving to be done quickly, even if the proofs are somewhat large.
12. Suppose you want to compose a SNARK using D&P’s commitment scheme with an elliptic-curve-based SNARK like Plonk or Groth16, in order to reduce verifier costs before posting the proof on-chain. Won’t that require proving statements involving “non-native” field arithmetic, since D&P’s verifier operates over GF[2^{128}], while curve-based SNARKs use large prime-order fields?
Yes, this is a potential challenge. However, I expect that ways will be found to address it.
One possibility is first composing with a SNARK using a hashing-based polynomial commitment scheme with shorter proofs (perhaps FRI, transformed into a multilinear polynomial commitment scheme, or BaseFold) before composing with a curve-based SNARK. Note that FRI can work natively over fields of characteristic two, and in fact this setting was considered in the original FRI paper. Limitations on the use of such fields in currently popular SNARKs come from the use of non-sum-check-based polynomial IOPs, as opposed to FRI itself.
This does not eliminate the issue of non-native field arithmetic, but substantially mitigates it, because the FRI verifier does fewer total operations (for large enough polynomials), and especially fewer field operations, compared to the Ligero/Brakedown verifier.
13. Can SNARKs using D&P’s commitment scheme be combined with folding schemes like Nova?
This will run into the same issue as #12, namely, that folding schemes use elliptic curves, which are typically defined over large prime-order fields, while D&P’s commitment scheme uses power-of-two sized fields.
I expect significant further advancements in folding schemes, and that they will play a major role in the future of SNARK design. However, it may be the case that they simply don’t combine well with hashing-based SNARKs over fields of very small characteristic.
As it stands, elliptic-curve-based commitments and folding schemes should be used for statements natively defined over large fields, or whenever prover space is paramount (folding schemes like Nova are far superior to other SNARKs in terms of prover space, roughly because they can break large computations up into much smaller pieces than other SNARKs). And hashing-based schemes over small-characteristic fields should be used otherwise.
It is also still possible that future advances to folding schemes cause them to overtake hashing-based schemes. Indeed, Nova is already faster than the current generation of hashing-based SNARKs on some benchmarks (despite many claims that current hashing-based SNARKs are faster than curve-based ones).
14. Can’t one “pay by the bit” with hashing-based polynomial commitment schemes simply by concatenating many small field elements together before hashing them? That is, why is D&P’s result not obvious?
Hashing-based commitment schemes do two operations, both of which can be a bottleneck in practice: FFTs (or more generally, the encoding operation of an error-correcting code) and hashing.
Even if all the values in the vector being FFT-ed are small (say a few bits), the twiddle factors in the transformation will still be larger, so one still pays for arithmetic over larger values. Moreover, the vector resulting from the FFT will have “extra” field elements relative to the input to the FFT, and these extra field elements won’t be as small (even worse, if one uses a non-systematic code, as is the case for D&P, then the entire encoded vector won’t be small anymore, not just the “extra” elements). It is the *result* of the FFT that must be hashed when computing a commitment. So not only does the cost of field arithmetic in the FFT not fully benefit from small values being committed, neither does the cost of hashing.
The sensible way to ensure that small values translate to a faster FFT and faster Merkle hashing is to pack many of these small values into a single field element, so that one transforms a shorter vector of bigger values rather than a longer vector of smaller values. Tower fields of characteristic two are designed so that this packing operation literally corresponds to simply concatenating the binary representations of the small values together, to get the binary representation of the (concatenated) field element. This is exactly what D&P do.
However, things are not as simple as the above paragraph might suggest. Most of the complications arise not due to the commitment process itself (which just involves Merkle-hashing of one or more FFT-transformed vectors), but rather due to what happens during the polynomial evaluation proof. At that point in the polynomial commitment scheme, one has multiple (sub)fields floating around, such as the alphabet for Reed-Solomon encoding, the extension field from which random values are drawn during the evaluation procedure, etc. It is the interplay of these different fields that is complicated to define and analyze.
***
Read the original blog post.
***
Justin Thaler is Research Partner at a16z and an Associate Professor in the Department of Computer Science at Georgetown University. His research interests include verifiable computing, complexity theory, and algorithms for massive data sets.
***
The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.
This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investment-list/.
Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures/ for additional important information.