Pro tip: Implementing stateful invariant testing with Halmos

Antonio Viggiano

Editor’s note: A frequently-asked question we get about Halmos, a16z crypto’s open source formal verification tool, is: “Can you use it for stateful invariant testing?” It’s not supported out of the box; but in this post, independent security researcher Antonio Viggiano shares a new approach to stateful invariant testing, discovered while working with our engineering team on Halmos’s performance. See his blog, Fuzzy, for more on fuzz testing, invariant testing, symbolic testing, and formal verification.

Formal verification and invariant testing are two different approaches to evaluating whether a program works as expected. Unlike formal verification, invariant testing evaluates the program’s behavior for some inputs, while formal verification can check it for all possible inputs. Formal verification may seem like the more efficient, more comprehensive solution. But in reality, these processes are complimentary: They have different strengths and provide the greatest assurance when used together. 

Halmos, an open source formal verification tool developed by a16z crypto, can help developers bridge the gap between invariant testing and formal verification by allowing them to reuse the same properties written for invariant testing for formal specifications through symbolic testing. Similarly, we can use Halmos for invariant testing to prove that protocol invariants hold up to some arbitrary number of transactions. Though invariant testing is not the intended use case, developers looking for creative ways to expand their quality assurance toolset can also use Halmos for stateless testing and, now, stateful testing with a few small adjustments.  

In this post, we dive into implementing stateful invariant testing using Halmos, a symbolic testing tool for EVM smart contracts. First, we present a brief overview of symbolic testing and stateful invariant testing; then discuss how Halmos compares to traditional fuzz testing tools. Finally, we look at a practical case study on a simplified version of the Multi-Collateral DAI system, and how Halmos can be used to enhance the reliability and security of smart contracts.

A brief overview of symbolic testing using Halmos

Symbolic testing involves executing tests with symbolic values, significantly reducing the need for extensive specifications. Unlike conventional testing, which confirms program functionality for specific inputs, symbolic testing evaluates a program against all conceivable inputs. This level of rigor and comprehensiveness is uniquely useful in a blockchain setting, where a program’s correctness is critically important. 

Halmos is a symbolic testing tool for EVM smart contracts, developed by a16z crypto. It simplifies the process by using existing tests as the basis for formal specifications, instead of requiring new specifications or languages.

Source: Symbolic testing with Halmos: Leveraging existing tests for formal verification 

The field of symbolic testing is closely related to fuzz testing with the difference that, instead of trying random inputs in order to break system properties or invariants, Halmos will check an invariant against the whole range of valid inputs. Because of this, Halmos can provide a higher assurance than a fuzzer. While Foundry, Echidna, or Medusa, can verify the presence of a bug relatively fast, they cannot, in general, assert their absence. 

When tests are run through Halmos, it either confirms the tests are valid for all inputs or provides examples where they fail. This approach makes use of existing unit tests or fuzz tests for formal verification, soo developers can focus on improving and extending the testing suite, rather than creating new specifications.

What about stateful invariant testing?

My previous research has shown that complex protocols may require many different function calls with input in a specific range before invariants are broken — sometimes as many as 10 different transactions. This shows how stateful invariant tests can play a crucial role in the quality assurance strategies that a project should adopt alongside stateless invariant tests.

Stateful testing consists of maintaining the system’s internal state across multiple test cases of function executions. This methodology can more closely resemble real-world scenarios, as it takes into account the impact of previous transactions on new ones and explores the dependencies of different paths to break system invariants. 

Most fuzzers support stateful fuzzing out of the box, which involves evaluating a collection of invariant expressions using random sequences of specified function calls from predetermined contracts. These invariants are checked after every function call execution and are an effective method for revealing logic errors in intricate protocol states and atypical scenarios.

Halmos natively supports stateless invariant testing, which operates by resetting the system state between each test case, but it does not support stateful testing by default. The tool was conceived to be plugged into existing Foundry property-based tests, which generally don’t retain state between multiple executions. 

While it’s not the intended use case, there is a way to make stateful testing work with Halmos.

Echidna-output-of-a-stateful-fuzzing-campaign-on-a-protocol-1.pngSource: Echidna output of a stateful fuzzing campaign on a protocol 

Implementing stateful invariant testing with Halmos

In order to circumvent the current limitations (or, rather, design decisions) of Halmos, we choose to use the strengths of symbolic testing and execute function calls symbolically

This means that, instead of implementing a target contract to fuzz against (which is usually what we would do with Echidna), we implement a target function that receives an array of function selectors that can execute any arbitrary function call. Then, we simply loop over this array, filter for the selectors we’re interested in, create symbolic inputs Halmos cheat codes, and bind them to valid ranges in order to optimize for the SMT-solving phase.

Source: aviggiano/property-based-testing-benchmark 

As a result of this setup, Halmos will now execute this test with a series of N symbolic function selectors and execute these functions from our target contract, with the size of the sequence equal to the number of loop unrolling limits: 

In order to evaluate symbolic execution tools for EVM bytecode, we implement a simplified version of the Multi-Collateral DAI system created by Palina Tolmach against the MiniVat smart contract. In this example, Halmos was able to find a counterexample with concrete values for the broken invariant in 12 seconds, although the timeout can grow rapidly for a higher number of loop unrolling limit. 

This experiment can be thought of as running Echidna with a sequence length of N, which yields a similar output:

Running Halmos with an array of function selectors can be useful in checking for invariants that depend on a sequence of multiple function calls. This approach also demonstrates the flexibility of Halmos as a symbolic execution tool.

By applying this strategy, we hope further research will help determine the applicability of Halmos in assessing protocol-level invariants of complex projects. A wider, more flexible range of quality assurance options, which includes invariant testing, fuzzing, and formal verification, can help ensure critical protocols behave as intended.