Building Helios: Fully trustless access to Ethereum

Noah Citron

One of the main reasons we use blockchains is trustlessness. This property promises to allow us self-sovereign access to our wealth and data. For the most part, blockchains like Ethereum have delivered on this promise — our assets are truly ours. 

However, there are concessions we’ve made for the sake of convenience. One such area is in our use of centralized RPC (remote procedure call) servers. Users typically access Ethereum through centralized providers like Alchemy. These companies run high-performance nodes on cloud servers so that others can easily access chain data. When a wallet queries its token balances or checks whether a pending transaction has been included in a block, it almost always does so through one of these centralized providers. 

The trouble with the existing system is that users need to trust the providers, and there is no way to verify the correctness of their queries.

Enter Helios, a Rust-based Ethereum light client we developed that provides fully trustless access to Ethereum. Helios — which uses Ethereum’s light client protocol, made possible by the recent switch to proof of stake — converts data from an untrusted centralized RPC provider into a verifiably safe, local RPC. Helios works together with centralized RPCs to make it possible to verify their authenticity without running a full node. 

The tradeoff between portability and decentralization is a common pain point, but our client – which we’ve made available to the public to build on and more – syncs in around two seconds, requires no storage, and allows users to access secure chain data from any device (including mobile phones and browser extensions). But what are the potential pitfalls of relying on centralized infrastructure? We cover how they might play out in this post, walk through our design decisions, and also outline a few ideas for others to contribute to the codebase.

The pitfalls of centralized infrastructure: theoretical creatures in Ethereum’s “dark forest”

A (theoretical) creature lurks in the dark forest. This one doesn’t hunt for its prey in the Ethereum mempool, but instead sets its traps by mimicking centralized infrastructure that we’ve come to rely on. The users who get caught in this trap don’t make any mistakes: They visit their favorite decentralized exchange, set a reasonable slippage tolerance, and buy and sell tokens as usual… They do everything right, but still fall victim to a new kind of sandwich attack, a trap meticulously set at the very entrance to Ethereum’s dark forest: RPC providers.

Before we elaborate, let’s look at how trades work on decentralized exchanges. When users send a swap transaction, they provide several parameters to the smart contract — which tokens to swap, the swap amount, and most importantly, the minimum number of tokens a user must receive for the transaction to go through. This last parameter specifies that the swap must satisfy a “minimum output,” or revert. This is often known as “slippage tolerance,” as it effectively sets the maximum price change that can occur between when the transaction is sent to the mempool and when it is included in a block. If this parameter is set too low, the user accepts the possibility of receiving fewer tokens. This situation can also lead to a sandwich attack, where an attacker effectively sandwiches the bid between two malicious swaps. The swaps drive up the spot price and forces the user’s trade to execute at a less favorable price. The attacker then sells immediately to collect a small profit.

So long as this minimum output parameter is set near the fair value, you are safe from sandwich attacks. But what if your RPC provider doesn’t provide an accurate price quote from the decentralized exchange smart contract? A user can then be tricked into signing a swap transaction with a lower minimum output parameter, and, to make matters worse, sends the transaction straight to the malicious RPC provider. Instead of broadcasting this transaction to the public mempool, where dozens of bots compete to perform the sandwich attack, the provider can withhold it and send the attack transaction bundle directly to Flashbots, securing the profits for themselves.

The root cause of this attack is trusting somebody else to fetch the state of the blockchain. Experienced users have traditionally solved this problem by running their own Ethereum nodes — a time and resource-intensive endeavor that, at the very least, requires a constantly-online machine, hundreds of gigabytes of storage, and around a day to sync from scratch. This process is certainly easier than it used to be; groups like Ethereum on ARM have worked tirelessly to make it possible to run nodes on low-cost hardware (such as a Raspberry Pi with an external hard drive strapped to it). But even with these relatively minimal requirements, running a node is still difficult for most users, particularly for those using mobile devices.

It’s important to note that centralized RPC provider attacks, although entirely plausible, are generally simple phishing attacks — and the one we describe has yet to happen. Even though the track records of larger providers like Alchemy give us little reason to doubt them, it’s worth doing some further research before adding unfamiliar RPC providers to your wallet.

Introducing Helios: fully trustless access to Ethereum

By introducing its light client protocol (made possible by the recent switch to Proof of Stake), Ethereum opened exciting new possibilities for quickly interacting with the blockchain and verifying RPC endpoints with minimal hardware requirements. In the month’s since The Merge, we’ve seen a new crop of light clients emerge independently of each other (Lodestar, Nimbus, and the JavaScript-based Kevlar) that have taken different approaches in service of the same goal: efficient and trustless access, without using a full node.

Our solution, Helios, is an Ethereum light client that syncs in around two seconds, requires no storage, and provides fully trustless access to Ethereum. Like all Ethereum clients, Helios consists of an execution layer and a consensus layer. Unlike most other clients, Helios tightly couples both layers so that users only need to install and run a single piece of software. (Erigon is moving in this direction as well, by adding a consensus layer light client built directly into their archive node). 

So how does it work? The Helios consensus layer uses a previously known beacon chain blockhash and a connection to an untrusted RPC to verifiably sync to the current block. The Helios execution layer uses these authenticated beacon chain blocks in tandem with an untrusted execution layer RPC to prove arbitrary information about the chain state such as account balances, contract storage, transaction receipts, and smart contract call results. These components work together to serve users a fully trustless RPC, without the need to run a full node.

…At the consensus layer

The consensus layer light client conforms to the beacon chain light client specification, and makes use of the beacon chain’s sync committees (introduced ahead of the Merge in the Altair hard fork). The sync committee is a randomly selected subset of 512 validators that serve for ~27-hour periods. 

When a validator is on a sync committee, they sign every beacon chain block header that they see. If more than two-thirds of the committee signs a given block header, it is highly likely that that block is in the canonical beacon chain. If Helios knows the makeup of the current sync committee, it can confidently track the head of the chain by asking an untrusted RPC for the most recent sync committee signature. 

Thanks to BLS signature aggregation, only a single check is required to validate the new header. If the signature is valid and has been signed by more than two-thirds of the committee, it’s safe to assume the block was included in the chain (of course it can be reorg’d out of the chain, but tracking block finality can provide stricter guarantees).

There’s an obvious missing piece in this strategy, though: how to find the current sync committee. This starts with acquiring a root of trust called the weak subjectivity checkpoint. Don’t let the name intimidate you — it just means an old blockhash that we can guarantee was included in the chain at some point in the past. There is some interesting math behind exactly how old the checkpoint can be; worst-case analysis suggests around two weeks, while more practical estimates suggest many months. 

If the checkpoint is too old, there are theoretical attacks that can trick nodes into following the wrong chain. Acquiring a weak subjectivity checkpoint is out of band for the protocol. Our approach with Helios provides an initial checkpoint hardcoded into the codebase (which can easily be overridden); it then saves the most recent finalized blockhash locally to use as the checkpoint in the future, whenever the node is synced. 

Conveniently, beacon chain blocks can be hashed to produce a unique beacon blockhash. This means it’s easy to ask a node for a full beacon block, and then prove the block contents are valid by hashing it and comparing to a known blockhash. Helios uses this property to fetch and prove certain fields inside the weak subjectivity checkpoint block, including two very important fields: the current sync committee, and the next sync committee. Critically, this mechanism allows light clients to fast forward through the blockchain’s history.

Now that we have a weak subjectivity checkpoint, we can fetch and verify the current and next sync committees. If the current chain head is within the same sync committee period as the checkpoint, then we immediately start verifying new blocks with the signed sync committee headers. If our checkpoint is several sync committees behind, we can:

  1. Use the next sync committee after our checkpoint to fetch and verify a block that originates one sync committee in the future.
  2. Use this new block to fetch the new next sync committee.
  3. If still behind, return to step 1.

Each iteration of this process allows us to fast forward through 27 hours of the chain’s history, start with any blockhash in the past, and sync to the current blockhash.

…At the execution layer

The goal of the execution layer light client is to take beacon block headers that are verified by the consensus layer, and use them along with an untrusted execution layer RPC to provide verified execution layer data. This data can then be accessed via an RPC server that is locally hosted by Helios.

Here’s a simple example of fetching an account’s balance, starting with a quick primer on how state is stored in Ethereum. Each account contains a couple fields, such as the contract code hash, nonce, storage hash, and balance. These accounts are stored in a large, modified Merkle-Patricia tree called the state tree. If we know the root of the state tree, we can validate merkle proofs to prove the existence (or exclusion) of any account within the tree. These proofs are effectively impossible to forge.

Helios has an authenticated state root from the consensus layer. Using this root and merkle proof requests to the untrusted execution layer RPC, Helios can locally verify all data stored on Ethereum.

We apply different techniques to verify all sorts of data used by the execution layer; used together, these allow us to authenticate all data retrieved from the untrusted RPC. While an untrusted RPC can deny access to data, it can no longer serve us incorrect results.

Using Helios in the wild

The tradeoff between portability and decentralization is a common pain point — but since Helios is so lightweight, users can access secure chain data from any device (including mobile phones and browser extensions). The ability to run Helios anywhere makes it possible for more people to access trustless Ethereum data, regardless of their hardware. This means users can use Helios as their RPC provider in MetaMask, and trustlessly access dapps without any other changes. 

Further, Rust’s support for WebAssembly makes it easily possible for app developers to embed Helios inside Javascript applications (like wallets and dapps). These integrations would make Ethereum safer, and reduce our need to trust centralized infrastructure.

We can’t wait to see what the community comes up with. But in the meantime, there are lots of ways to contribute to Helios — if you’re not interested in contributing to the codebase, you can also build software that integrates Helios to take advantage of its benefits. These are just a few of the ideas we’re excited about:

  • Support fetching light client data directly from the P2P network rather than via RPC
  • Implement some of the missing RPC methods
  • Build a version of Helios that compiles to WebAssembly
  • Integrate Helios directly into wallet software
  • Build a web dashboard to view your token balances that fetches data from Helios embedded into the website with WebAssembly
  • Implement the engine API so that Helios’ consensus layer can be connected to an existing execution layer full node

Check out the codebase to get started — we welcome your bug reports, feature requests, and code. And if you build something more, share it with us on Twitter, Telegram, or Farcaster @a16zcrypto.

 

***
The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information

podcast: web3 with a16z

a show about building the next internet, from a16z crypto

newsletter:
web3 weekly

a newsletter from a16z crypto that's your go-to guide on the next internet