Table of contents
- General resources
- Defining consensus
- Synchronous protocols
- Partially synchronous protocols
- Asynchronous protocols
- DAG protocols
- Permissionless protocols
Editor’s note: a16z crypto has had a long series of the “canons” — from our our original Crypto Canon to our DAO Canon and NFT Canon to, more recently, our Zero Knowledge Canon (also with an annotated reading list).
We’ve now culled and curated a set of resources for anyone seeking to understand, go deeper, and/or build with consensus — the systems of agreement that enable cryptocurrencies to work, determining the validity of transactions and governance of the blockchain.
Consensus protocols are a central piece of everything that is going on in the world of blockchain. But, the literature can sometimes be hard to get a handle on, so here we share a list of links that should help get you up to date on the foundations as well as the latest. We’ll categorize the links below depending on the type of protocol discussed. First, though, a list of some general resources, which give a great overview of existing research.
Decentralized Thoughts. This blog is run by Ittai Abraham and Kartik Nayak but also has many contributions from other leading researchers. It starts right from the basics, but you can also find simple explanations of recent papers.
Consensus in 50 pages. Notes by Andrew Lewis-Pye covering the key results from the classical consensus literature. The version at this link is under construction and is frequently updated. See also the a16z crypto seminars based on these notes (Part I, Part II).
Foundations of Distributed Consensus and Blockchains. A preliminary draft of the textbook by Elaine Shi.
Foundations of Blockchains. A lecture series on YouTube by Tim Roughgarden.
Blockchain Foundations. Lecture notes focused on proof-of-work and proof-of-stake protocols by Dionysis Zindros.
The three consensus problems studied most are Byzantine Broadcast, Byzantine Agreement, and State Machine Replication (the problem that blockchain protocols solve). For an explanation of the relationship among these problems, see either Consensus in 50 Pages (listed above), or these blogs at Decentralized Thoughts: “What Is Consensus?” and “Consensus for State Machine Replication.”
The Byzantine Generals Problem (1982) by Leslie Lamport, Robert Shostak, and Marshall Pease.
This paper introduces the well-known “Byzantine Generals Problem.” It’s still worth a read, but better versions of some of the proofs can be found elsewhere. For the proof that one can solve the problem for any number of faulty processors given a public-key infrastructure (PKI), a simpler and more efficient version can be found in the paper by Dolev and Strong (see below in the section on “synchronous protocols”). For the famous impossibility result that, in the absence of a PKI, the problem is unsolvable unless less than a third of processors display Byzantine faults, a more understandable proof can be found in the paper by Fischer, Lynch, and Merritt (also below).
Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial (1990) by Fred Schneider.
You should also take a look at this older paper, which treats the problem of State-Machine-Replication (SMR) – the problem solved by blockchain protocols.
The following links are categorized according to the sort of protocol considered, starting with permissioned protocols (as considered in most of the classical literature). Permissioned protocols are those in which all participants are known from the start of the protocol execution. In the links below, permissioned protocols are further classified according to the model of message reliability: either synchronous, partially synchronous, or asynchronous.
For an explanation of these terms, see: “Synchrony, Asynchrony and Partial Synchrony” at Decentralized Thoughts. For a summary of results obtained in the different models, see the Decentralized Thoughts Cheat Sheet. |
We are in the “synchronous’” setting when message delivery is reliable, that is, messages are always delivered and there exists some finite known bound on the maximum time for message delivery. For a formal definition, see the links given above.
Authenticated Algorithms for Byzantine Agreement (1983) by Danny Dolev and H. Raymond Strong.
There are two significant proofs here. There is a proof that one can solve Byzantine Broadcast for any number of faulty processors given a public-key infrastructure (PKI). For another exposition of this, see “Dolev-Strong Authenticated Broadcast” at Decentralized Thoughts. There is also a proof that f+1 rounds are necessary to solve Byzantine Broadcast if up to f processors may be faulty. For a simpler proof see A Simple Bivalency Proof that t-Resilient Consensus Requires t+1 Rounds by Marcos Aguilera and Sam Toueg.
Easy impossibility Proofs for Distributed Consensus Problems (1986) by Michael Fischer, Nancy Lynch, and Michael Merritt.
See also recent talks that cover this, by Andrew Lewis-Pye and Tim Roughgarden.
Bounds on Information Exchange for Byzantine Agreement (1985) by Danny Dolev and Rüdiger Reischuk.
There aren’t that many forms of impossibility proof in the consensus literature. This is an important one that shows how to put a lower bound on the number of messages that need to be sent to solve consensus problems.
“The Phase King Protocol,” from the paper Bit Optimal Distributed Consensus (1992) by Piotr Berman, Juan Garay, and Kenneth Perry.
If you want to see a protocol solving Byzantine Agreement in the synchronous setting without PKI, this is probably the most informative. For a recent blogpost that explains this clearly, see “Phase-King through the lens of Gradecast: A simple unauthenticated synchronous Byzantine Agreement” at Decentralized Thoughts.
Roughly, we are in the “partially synchronous” setting when message delivery is sometimes reliable and sometimes not. Protocols are required to ensure “safety” at all times but need only be “live” during intervals when message delivery is reliable. The standard way to model this is to assume the existence of an unknown “Global Stabilization Time” (GST) after which messages will always be delivered within a known time bound. For a formal definition, see the links in the box above.
Consensus in the Presence of Partial Synchrony (1988) by Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer.
This is the classic paper that introduces the partially synchronous setting and proves many of the key results.
The Latest Gossip on BFT consensus (2018) by Ethan Buchman, Jae Kwon, and Zarko Milosevic.
Given the right presentation, the Tendermint protocol (described in this paper) is sufficiently simple that it is a good way to learn State-Machine-Replication in the partially synchronous setting. A very simple presentation can be found in Consensus in 50 pages (see above), and there are also clear presentations in talks by Andrew Lewis-Pye and Tim Roughgarden.
Streamlet: Textbook Streamlined Blockchains (2020) by Benjamin Chan and Elaine Shi.
This paper describes a blockchain protocol that is specifically designed to be easy to teach. You can find a lecture by Elaine Shi on it here.
Casper the Friendly Finality Gadget (2017) by Vitalik Buterin and Virgil Griffith.
This is the protocol that forms the backbone of Ethereum’s present approach to proof-of-stake. It is essentially a “chained” version of Tendermint. For an explanation of “chaining” see the Hotstuff paper listed below.
HotStuff: BFT Consensus in the Lens of Blockchain (2018) by Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, and Ittai Abraham.
This was essentially the protocol that Facebook’s Libra project (renamed Diem) originally intended to implement. The advantage over Tendermint is that the protocol is optimistically responsive, which means that confirmed blocks can be produced at “network speed” when leaders are honest, that is, there is no requirement to spend a predefined minimum time producing each confirmed block. You can also watch a talk by Ittai Abraham on this here.
Expected Linear Round Synchronization: The Missing Link for Linear Byzantine SMR (2020) by Oded Naor and Idit Keidar.
This paper addresses the issue with Hotstuff that it doesn’t establish any efficient mechanism for “view synchronization.” This blog by Dahlia Malkhi and Oded Naor gives an overview of work on the view synchronization problem. See also this further optimisation by Andrew Lewis-Pye and Ittai Abraham.
Paxos Made Simple (2001) by Leslie Lamport.
If you don’t want to jump straight in with recent blockchain protocols such as Tendermint, an alternative is to start with Paxos (which doesn’t deal with Byzantine failures) and then move on to PBFT, which is the next link on our list (and which does).
Practical Byzantine Fault Tolerance (1999) by Miguel Castro and Barbara Liskov.
This is the classic PBFT protocol. A great talk on the protocol by Barbara Liskov can be found here.
In the “asynchronous” setting, messages are guaranteed to arrive but could take any finite amount of time. For a formal definition, see the links in the box above.
Impossibility of Distributed Consensus with One Faulty Process (1985) by Michael Fischer, Nancy Lynch, and Michael Paterson.
The FLP Theorem (named after the authors) is probably the most famous impossibility result in the literature on consensus protocols: No deterministic protocol solves Byzantine Agreement (or SMR) in the asynchronous setting when even a single unknown processor may be faulty. You can find a nice presentation in a lecture by Tim Roughgarden here.
“Bracha’s Broadcast,” first appeared in the paper Asynchronous Byzantine Agreement Protocols (1987) by Gabriel Bracha.
One way of getting around the FLP impossibility theorem is to weaken the termination requirement. Bracha’s Broadcast is a deterministic protocol that functions in the asynchronous setting by solving a weaker form of Byzantine Broadcast that doesn’t require termination in the case the broadcaster is faulty. While Bracha’s Broadcast first appears in the paper above, the paper also shows how to use the broadcast protocol to solve Byzantine Agreement with the help of randomness. If you just want to learn Bracha’s Broadcast, then a clear presentation can be found here.
FastPay: High-Performance Byzantine Fault Tolerant Settlement (2020) by Mathieu Baudet, George Danezis, and Alberto Sonnino.
This paper describes how to implement a payment system in the asynchronous setting using reliable broadcast (and without the need to establish a total ordering).
If you really need to solve Byzantine Agreement or SMR in the asynchronous setting, then the FLP result means you will have to use some form of randomness. As well as Bracha’s paper (listed above), the following two links are classics from the literature that describe how to solve Byzantine Agreement using randomness:
Validated Asynchronous Byzantine Agreement with Optimal Resilience and Asymptotically Optimal Time and Word Communication (2018) by Ittai Abraham, Dahlia Malkhi, and Alexander Spiegelman.
An alternative route to understanding how to solve SMR (and Byzantine Agreement) in the asynchronous setting is to jump in with the paper above, which modifies Hotstuff. If you already understand Hotstuff, then the modification is quite simple. One can’t run standard Hotstuff in the asynchronous setting because, after a leader is selected, the adversary can just withhold messages from that leader. Since honest parties don’t know whether the leader is dishonest and isn’t sending messages, or whether the leader is honest and their messages are being delayed, eventually they are forced to try and make progress another way. To solve the issue, we simply have all parties act as leader simultaneously. Once a super-majority of parties successfully completes a standard “view” of the Hotstuff protocol, we retrospectively select a leader at random. If they have produced a confirmed block, then we use that one, discarding the rest.
Dumbo-MVBA: Optimal Multi-valued Validated Asynchronous Byzantine Agreement, Revisited (2020) by Yuan Lu, Zhenliang Lu, Qiang Tang, and Guiling Wang.
This paper optimizes the previous one by Abraham, Malkhi, and Spiegelman, reducing the expected communication complexity.
The Honey Badger of BFT Protocols (2016) by Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song.
In Search for an Optimal Authenticated ByzantineAgreement (2020) by Alexander Spiegelman.
The advantage of asynchronous protocols is that they are able to make progress even when message delivery is not reliable. A disadvantage is that communication costs are not optimal (in various ways) when network conditions are good. The paper above addresses the question “to what extent can we get the best of both worlds.”
There is a flurry of recent work on permissioned DAG-based protocols. These are protocols in which the set of confirmed blocks forms a directed acyclic graph, rather than being linearly ordered. Generally, these operate in either the asynchronous or partially synchronous settings.
In this a16z crypto seminar, Andrew Lewis-Pye gives an overview of DAG-based consensus.
The following four papers describe DAG protocols that achieve an efficient total ordering on transactions. DAG-Rider operates in the asynchronous setting and is similar to Cordial Miners but has higher latency and lower expected (amortized) communication complexity. Narwhal is a mempool protocol, and Tusk is an SMR protocol operating on top of Narwhal that improves the efficiency of DAG-Rider in certain respects. Bullshark is similar but optimized to take advantage of good network conditions when those occur in the partially synchronous setting.
All You Need is DAG (2021) by Idit Keidar, Lefteris Kokoris-Kogias, Oded Naor, and Alexander Spiegelman.
This is the paper that introduces the DAG-Rider protocol.
Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus (2022) by George Danezis, Lefteris Kokoris-Kogias, Alberto Sonnino, and Alexander Spiegelman.
Bullshark: DAG BFT Protocols Made Practical (2022) by Alexander Spiegelman, Neil Giridharan, Alberto Sonnino, and Lefteris Kokoris-Kogias.
Cordial Miners: Blocklace-Based Ordering Consensus Protocols for Every Eventuality (2022) by Idit Keidar, Oded Naor, and Ehud Shapiro.
It’s a fun fact that one doesn’t actually need a blockchain to implement a decentralized payments system — the latter is a strictly easier task (see this paper for a proof). Before analyzing how to establish a total ordering on transactions, the Cordial Miners paper above first describes a deterministic (and very elegant) DAG protocol that successfully implements payments in the asynchronous setting.
Permissionless protocols are those with permissionless entry: Anybody is free to join in the process of reaching consensus, and the set of participants may even be unknown at any point during the protocol execution.
Bitcoin: A Peer-to-Peer Electronic Cash System (2008) by Satoshi Nakamoto.
You’ve heard of this one. Here also is a blog post by Kartik Nayak that intuitively analyzes the need for different aspects of the protocol, such as proof-of-work, and how network synchrony plays a role in the protocol.
Bitcoin and Cryptocurrency Technologies (2016) by Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller, and Steven Goldfeder.
This textbook gives a nice introduction to Bitcoin for those new to the space. There is also an associated free Coursera course.
On a more technical level, the following three papers analyze security and liveness for Bitcoin, using slightly different modeling assumptions. The “Bitcoin Backbone” paper is the most famous. Heavy notation makes it hard to read, but the basic idea behind the proof is not as complicated as it initially seems. The proof by Dongning Guo and Ling Ren explains the basic ideas and is shorter and simpler.
Everything is a Race and Nakamoto Always Wins (2020) by Amir Dembo, Sreeram Kannan, Ertem Nusret Tas, David Tse, Pramod Viswanath, Xuechao Wang, and Ofer Zeitouni.
In this paper, the authors perform an elegant security analysis for Bitcoin that works by showing that the most obvious attack of racing to build a longer chain is the most effective. The analysis also extends to Ouroboros, SnowWhite, and Chia (all listed below).
Then the three following papers describe different forms of attack on Bitcoin and the old proof-of-work Ethereum.
Majority is not Enough: Bitcoin Mining is Vulnerable (2014) by Ittay Eyal and Emin Güun Sirer.
This is the well-known “selfish mining” paper.
Eclipse Attacks on Bitcoin’s Peer-to-Peer Network (2015) by Ethan Heilman, Alison Kendler, Aviv Zohar, and Sharon Goldberg.
Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network (2018) by Yuval Marcus, Ethan Heilman, and Sharon Goldberg.
FruitChains: A Fair Blockchain (2017) by Rafael Pass and Elaine Shi.
The paper above is a response to the issue of selfish mining. The authors describe a protocol such that the honest strategy for miners is a form of approximate equilibrium.
Prism: Deconstructing the Blockchain to Approach Physical Limits (2019) by Vivek Bagaria, Sreeram Kannan, David Tse, Giulia Fanti, and Pramod Viswanath.
In Bitcoin, blocks play multiple roles in the sense that they are used to list transactions but also in reaching consensus in block ordering. In the paper above, the authors deconstruct Nakamoto’s blockchain into its basic functionalities and show how to construct a proof-of-work protocol with high throughput and low latency.
The two following papers show how to implement longest chain proof-of-stake protocols with provable guarantees.
Algorand: Scaling Byzantine Agreements for Cryptocurrencies (2017) by Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich.
This paper shows how to implement a classical BFT-style protocol as a proof-of-stake protocol. Here is a talk on Algorand by Silvio Micali.
Combining GHOST and Casper (2020) by Vitalik Buterin, Diego Hernandez, Thor Kamphefner, Khiem Pham, Zhi Qiao, Danny Ryan, Juhyeok Sin, Ying Wang, and Yan X Zhang.
Three Attacks on Proof-of-Stake Ethereum (2022) by Caspar Schwarz-Schilling, Joachim Neu, Barnabé Monnot, Aditya Asgaonkar, Ertem Nusret Tas, and David Tse.
The present version of Ethereum needs more analysis. This paper describes some attacks.
The Chia Network Blockchain (2019) by Bram Cohen and Krzysztof Pietrzak.
This paper shows how to build a longest chain protocol using proof of space and time.
Byzantine Generals in the Permissionless Setting (2021) by Andrew Lewis-Pye and Tim Roughgarden.
In this paper, the authors develop a framework for the analysis of permissionless protocols that allows one to do things such as prove impossibility results for permissionless protocols, and to clearly delineate the general capabilities of proof-of-work and proof-of-stake protocols.
***
Andrew Lewis-Pye is a Professor at the London School of Economics. He has worked in various fields, including mathematical logic, network science, population genetics, and blockchain. For the last four years his research focus has been on blockchain, where his principal interests are in consensus protocols and tokenomics. You can find him on Twitter @AndrewLewisPye .
Acknowledgments: Many thanks to Ling Ren, Ittai Abraham, Kartik Nayak, Valeria Nikolaenko, Alexander Spiegelman, and Mathieu Baudet for useful suggestions. (And to Tim Roughgarden for suggesting the piece as the latest in a16z crypto’s Canons series!)
***
The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.
This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.
Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.