New paper alert: Atomic and fair data exchange via blockchain

Ertem Nusret TasIstván András SeresYinuo ZhangMárk MelczerMahimna KelkarJoseph BonneauValeria Nikolaenko

In a new paper, we outline a new protocol for building fair data-markets – including for expired blobs under EIP-4844, also known as “protodanksharding.”

Our Fair Data Exchange protocol enables a storage server to transfer a data file to a client atomically: The client receives the file if and only if the server receives an agreed-upon payment. In short, this new approach allows you to buy a potentially large blob of data (that has the properties you think it has) with minimal trust assumptions in a decentralized and open market. To date, it’s been hard to sell data without revealing it publicly, while also guaranteeing that the seller will receive payment for data once it’s been sent. This new scheme insures both sides of the transaction. It also introduces a new cryptographic primitive, Verifiable Encryption under Committed Key (VECK), which generalizes verifiable encryption to symmetric key primitives. 

Background

Fair and secure protocols to purchase access to data are essential to unlock the massive potential of global data markets. But most real-world approaches for accessing data today are either subscription-based, where the client pays the server in advance and must trust the reputation of the server to deliver the data; or altruistic, where either the server provides the data free of charge or the data is exchanged between the users themselves. The former approach does not safeguard the client from a malicious server that receives payment without fulfilling the data request, while the latter lacks incentives for clients to offer data for download, often leading to free-riding and limited capacity. Most of these systems do not provide data integrity guarantees to users.

Blockchains – originally conceived of as payment systems – can also be used for data storage, guaranteeing that every transaction in a block is accessible to every participant in the network (e.g., OP_RETURN in Bitcoin or calldata in Ethereum). As append-only, immutable, and distributed ledgers, blockchains can provide data storage that is robust against faults and abuses of power. But by themselves (without any form of sharding or roll-ups), blockchains are highly limited in storage capacity, since all validators replicate all data, making on-chain storage expensive. (As of 2024, March 15th, storing 1MB on the Ethereum blockchain in calldata would cost roughly 2,500 USD.) 

While many blockchain projects are working to improve capacity and reduce costs, these systems will likely always be limited as increased capacity is at odds with maintaining security and decentralization (the “blockchain trilemma”).

Limited on-chain capacity has led to the development of layer-2 systems, which perform computation off-chain. The main blockchain (now called a layer-1) verifies off-chain computation using non-interactive validity proofs (zk-rollups) or interactive fraud proofs (optimistic rollups). Data is stored on-chain in traditional rollups, but the cost of on-chain storage has led to designs that store data off-chain as well (often called validiums). In this approach, the layer-1 blockchain stores only a succinct commitment to the data. This commitment can be used to verify proofs-of-storage or proofs-of-replication, attesting to the persistence of the data off-chain.

While these protocols provide a natural mechanism to pay to store data, they don’t provide a means to pay for serving the data when clients request it. Today’s systems only incentivize storage, and assume servers will provide download access “for free.” This is problematic for two reasons: First, transferring the data comes with its own costs, for which servers should be compensated. Second, without any incentives, malicious servers might store data (and receive payment for doing so) but never respond to legitimate download requests. Storage is useless unless the data is made available for access.

Fair data exchange 

We’d like a way for clients to pay for data with guarantees that the server will provide it if payment is collected. This is an example of the more general fair exchange problem, which is known to require a trusted third party (e.g., a blockchain). Blockchains can also potentially be used to solve this in a straightforward way: a smart contract can keep a client’s payment in escrow, and release it to the server once a server posts data with some specified properties on-chain. But this is not efficient  if the data is large, because all of the data must be posted on-chain for the smart contract to verify.

Fair data exchange (FDE), the protocol we introduce in the paper, relies on a blockchain to enforce the atomicity and fairness of the exchange, and uses VECK (Verifiable Encryption under Committed Key), a new cryptographic primitive, to ensure that the client receives the correct data (matching an agreed-upon commitment) off-chain before releasing the payment for the decrypting key. The protocol is trust-minimized and requires only constant-sized on-chain communication: 3 signatures, 1 verification key, and 1 secret key, with most of the data stored and communicated off-chain. 

The protocol also supports exchanging only a subset of the data. Further, it can amortize the server’s work across multiple clients, and offers a general framework to design alternative FDE protocols using different commitment and encryption schemes. 

Here’s how it works, in three essential steps:

  1. The Client requests the data from the Server, specifying a KZG commitment of the expected data and a public encryption key.
  2. The Server encrypts the data using the Client’s specified key and sends the encrypted data to the Client off-chain, along with a zero-knowledge proof (VECK proof) that the encrypted data matches the specified commitment.
  3. The server sells the decryption key on-chain, using a simple atomic transfer.

The bulk of the communication takes place off-chain, and the server’s work is reusable across different clients requesting the same data. The key technical challenge is in step 2: efficiently proving that encrypted data matches a specified commitment. This is where our new VECK primitive comes in (which may be of independent interest for other protocols).

Applications

FDE can be used to pay for access to data stored using EIP-4844 (protodanksharding) and full danksharding data availability schemes for Ethereum. These protocols allow data to expire, meaning that after a predetermined period of time (4,096 epochs; ≈18 days), validators are no longer obliged to store it. Our protocol may potentially encourage nodes to continue storing the data, because they can profit from selling it later via FDE, thus potentially alleviating some concerns around the loss of expired data. Both protocols also commit to data via KZG polynomial commitments, exactly what our FDE construction is optimized to support. 

Our first protocol uses the exponential ElGamal encryption scheme, while the second protocol applies the Paillier encryption scheme to encrypt the exchanged data. We also provide an open-source implementation (see the repo here) for our protocol with both instantiations for VECK, demonstrating our protocol’s efficiency and practicality on Ethereum as well as on Bitcoin via adaptor signatures. Concretely, the on-chain cost of a single FDE run on Ethereum is ~200,000 gas for the server (under 10 USD) and ~30,000 gas for the client (under 1 USD). 

Beyond these, FDE may be used for any scenario in which a client wants to buy very large amounts of data (with some expected properties) in a decentralized and open market with minimal trust assumptions. Our protocols also support purchasing data privately.

Future work

Directions for work in the future could include:

  • Exploring other combinations for commitment and encryption schemes,
  • Optimized implementations (see the paper for open issues) to make the protocol faster both for the client and for the server,
  • Generalizing the scheme to allow verifiable pre-processing of the data, including the case when the data is distributed, and
  • Designing pricing mechanisms for a decentralized data marketplace.

Finally, if clients and servers are rational, nobody would be grieved in our protocol. But irrational or malicious clients may cause honest servers to keep sending encrypted data without ever requesting the decryption key. Although this problem was out of scope of the paper, it will have to be mitigated in practice.

***

The paper emerged from a16z crypto’s summer internship program, and is a joint work with four summer interns. For details, proofs, and citations, read the paper.

***

Ertem Nusret is a PhD student in electrical engineering at Stanford University working with Prof. David Tse on the analysis of blockchain.  

István András Seres is a PhD Student in computer science at Eötvös Loránd University.  

Yinuo Zhang is a PhD student in computer science at the University of California, Berkeley. 

Márk Melczer is a PhD student in computer science at Eötvös Loránd University. 

Mahimna Kelkar is a PhD student in computer science at Cornell University. 

Joseph Bonneau is a Research Partner at a16z crypto. His research focuses on applied cryptography and blockchain security. He has taught cryptocurrency courses at the University of Melbourne, NYU, Stanford, and Princeton, and received a PhD in computer science from the University of Cambridge and BS/MS degrees from Stanford.

Valeria Nikolaenko is a Research Partner at a16z crypto. Her research focuses on cryptography and blockchain security. She has also worked on topics such as long-range attacks in PoS consensus protocols, signature schemes, post-quantum security, and multi-party computation. She holds a PhD in Cryptography from Stanford University under advisorship of Professor Dan Boneh, and worked on the Diem blockchain as part of the core research team.

*** 

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

 This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investment-list/.

Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures/ for additional important information.