Symbolic testing with Halmos: Leveraging existing tests for formal verification

2.2.23

Formal verification vs testing
Specification overhead
Bridging the gap between testing and formal verification with symbolic testing and Halmos
Example: Testing the isPowerOfTwo() function
Limitation: Bounded symbolic execution
Demo: Formal verification of ERC721A with Halmos
Verification results
Experiment with injected bugs
Related work

While you’re reading hover Bold⁽¹⁾ text for more info.

Formal verification vs testing
Specification overhead
Bridging the gap between testing and formal verification with symbolic testing and Halmos
Example: Testing the isPowerOfTwo() function
Limitation: Bounded symbolic execution
Demo: Formal verification of ERC721A with Halmos
Verification results
Experiment with injected bugs
Related work

While you’re reading hover Bold⁽¹⁾ text for more info.

Formal verification — the process of using mathematical methods to “inspect” a program or smart contract across any number of inputs — is generally seen as the more concise, more comprehensive alternative to traditional testing for writing higher quality, more secure code. But in reality, formal verification is an open-ended and interactive process. Much like unit testing, developers must dynamically define and layer on formal specifications, iterating on their approach as their code and analyses evolve. Further, formal verification is only as effective as its specifications, which can be time consuming to write (and often come with a steep learning curve).

For many developers who find the process daunting, unit tests are often the more cost-effective and time-efficient route to sussing out a program’s correctness. In practice, formal verification is not a more comprehensive alternative to unit testing, but a complementary one. These processes have different strengths and limitations, providing even greater assurance when used together. Developers will always need to write unit tests — so what if we could use the same properties for formal verification?

Halmos, our open source formal verification tool, allows developers to reuse the same properties written for unit tests for formal specifications through symbolic testing. Rather than having to create a robust set of properties from the start, developers can avoid duplicative work and improve their tests a few specifications at a time without starting from scratch. We designed this tool for use, alongside others in the formal verification process, as an on-ramp to formal verification; developers can start with a few analyses with minimal effort, before adding more later in the process.

In this post, we cover the challenges of formal verification, and the potential to bridge the gap between unit testing and formal verification with symbolic testing. We then walk through a demo of Halmos using existing smart contract code, and take a quick look at other formal verification tools (many open source) available to developers.

Formal verification vs testing

Formal verification — generally favored by blockchain developers for its rigor and comprehensiveness — is the process of proving the correctness of a program by verifying that it satisfies certain correctness properties. The properties, which are specific to the program, are usually provided externally and expressed using a formal language or notation that is supported by the verification tool being used. Developers often perceive formal verification as a push-button solution for testing properties across all possible scenarios automatically, but in reality, formal verification can be a labor-intensive and highly interactive process.

Like formal verification, unit testing involves evaluating whether a program works as expected; testing, however, only checks the program’s behavior for some inputs, while formal verification can check it for all possible inputs. Both testing and formal verification require a description of the expected behavior of the program (with test cases used in testing and formal specifications used in formal verification). But, when used together, they can provide a more thorough examination of a program. Testing, for instance, is effective at finding simple bugs and mistakes, but is generally quicker and easier to perform. Formal verification, while more cumbersome to use, is powerful enough to prove the absence of errors or reveal subtle bugs that are easy to miss in testing or code reviews.

Specification overhead

One of the main challenges of formal verification is the overhead of writing and maintaining formal specifications. This process often involves the time-consuming task of manually writing the specifications using a specialized language (that many developers will need to learn first). The process is also incremental, typically starting with writing simple properties and verifying them first, then gradually adding more complex properties on top. Like testing, it is an open-ended process, with no clear stopping point, and one can only add as many properties as possible within the available time frame. Additionally, when developers change the code as it’s being verified, they must also update their existing specifications, leading to extra maintenance efforts. These factors can make formal verification a daunting task for some developers hesitant to commit to the extra overhead.

And although testing and formal verification can improve code quality when used together, both require (sometimes similar) descriptions of a program’s expected behavior in different languages and formats. Writing and maintaining these descriptions is labor-intensive, and maintaining two different forms of the same description can feel like duplicated effort. This raises the following question: Is it possible to describe the expected behavior in a way that developers can use for both testing and verification?

Bridging the gap between testing and formal verification with symbolic testing and Halmos

Symbolic testing, the practice of running tests with symbolic inputs, is an effective formal verification method that reduces specification overhead. This approach enables the use of the same test cases for both testing and formal verification. Unlike traditional testing, which verifies that a program works correctly for a limited set of inputs, symbolic testing checks the program for all possible inputs, hence a program that passes symbolic testing can be considered formally verified.

Halmos is a formal verification tool designed for symbolic testing. Instead of requiring separate specifications or learning a new language, Halmos uses existing tests as formal specifications. Running tests through Halmos will automatically verify they pass for all possible inputs, or provide counterexamples. This not only eliminates the need for additional specification writing, but also allows for the reuse of tests written for unit testing or fuzzing, for formal verification purposes.

Developers thus have greater flexibility to choose from a range of quality assurance options, including unit testing, fuzzing, and formal verification, depending on their current needs. For instance, tests may quickly identify simple mistakes, possibly with the help of a fuzzer that generates random inputs, and then Halmos can further increase confidence in the program’s correctness across all inputs.

Example: Testing the isPowerOfTwo() function

As an example, consider the following isPowerOfTwo() function, which determines whether a given number is a power of two. This function uses a bit manipulation algorithm for efficiency, but it can be challenging to prove its correctness, particularly in the case where the input is not a power of two.

Imagine the following test for the isPowerOfTwo() function: it compares the actual output of the function with the expected output for a given input. This is a parameterized test (also known as a property-based test), meaning that you can easily run it with different input values, possibly with fuzzing tools like Foundry.

You can use this test to examine the isPowerOfTwo() function through unit testing or fuzz testing, running it with a selection of inputs. Tests like these cannot formally prove the correctness of the function, as it is computationally infeasible to run the test for every possible input.

Halmos, however, allows developers to reuse these pre-existing tests for formal verification with only a little extra effort. The tool checks that tests pass for all possible inputs by performing symbolic execution of the test and then verifying that the assertion is never violated, (or, if the assertion is violated, by providing a counterexample). This means that if the test passes Halmos, the correctness of the function is formally verified, meaning the algorithm is correctly implemented and has been accurately translated by the compiler into bytecode.

Limitation: Bounded symbolic execution

It is generally not possible to perform fully automatic, complete symbolic testing, as this would require solving the halting problem, which is known to be undecidable. One reason for this is that it is often impossible to determine automatically how many times a loop should iterate symbolically. As a result, fully automatic formal verification is generally undecidable.

Given these fundamental limitations, Halmos prioritizes automation over completeness. To this end, Halmos is designed to perform bounded symbolic reasoning for unbounded loops (where the number of iterations depends on program inputs) or variable-length arrays (including strings). This sacrifices some completeness, but allows Halmos to avoid requiring the user to provide additional annotations such as loop invariants.

For example, consider the following iterative version of the isPowerOfTwo() function, which features an unbounded while loop, where the number of loop iterations is determined by the minimum number of bits required to represent the input number.

Halmos symbolically iterates this unbounded loop only up to a specified bound. For instance, if the bound is set to 3, Halmos will iterate the loop at most 3 times and will not consider input values that would cause the loop to iterate more than 3 times (i.e., any values greater than or equal to 2^3). In this particular case, setting the bound to 256 or higher would allow Halmos to be complete.

Demo: Formal verification of ERC721A with Halmos

To demonstrate the capabilities of Halmos, we used it to symbolically test and formally verify ERC721A, a highly gas-optimized implementation of the ERC721 standard that allows for batch minting at almost the same cost as single minting. ERC721A includes several innovative optimizations to achieve this efficiency; for example, gas can be saved by delaying updates to the token ownership data until the token is transferred, not at the time of minting. This requires the use of complex data structures and algorithms to efficiently retrieve ownership information from the lazy data structure. And although this optimization improves gas efficiency, it also increases code complexity and makes it challenging to prove the implementation’s correctness. This makes ERC721A a good candidate for formal verification, as it can increase confidence in the implementation and benefit potential users.n>

Symbolic testing properties

Since the existing tests for ERC721A were written in JavaScript with Hardhat (which is not currently supported by Halmos), we wrote new tests in Solidity for the main entry point functions: mint(), burn(), and transfer(). These tests checked that each function correctly updates token ownership and balance, and only affects the relevant users without altering the balance or ownership of other users. The latter is non-trivial to prove due to the use of the lazy data structure algorithm in ERC721A. For example, the following test checks that the transfer() function correctly updates the ownership of the specified token:

Another test checks that the transfer() function does not alter the balance for other addresses, which is challenging to prove as mentioned earlier:

Verification results

We conducted a verification experiment using Halmos on the ERC721A smart contract by writing a total of 19 tests. The tests were run through Halmos with a loop unrolling bound of 3, which took 16 minutes to complete. The breakdown of the verification time can be seen in the table below. The experiment was conducted on a MacBook Pro with an M1 Pro chip and 16 GB of memory.

Test	Time (s)
testBurnBalanceUpdate	6.67
testBurnNextTokenIdUnchanged	1.40
testBurnOtherBalancePreservation	5.69
testBurnOtherOwnershipPreservation	189.70
testBurnOwnershipUpdate	3.81
testBurnRequirements	71.95
testMintBalanceUpdate	0.20
testMintNextTokenIdUpdate	0.18
testMintOtherBalancePreservation	0.26
testMintOtherOwnershipPreservation	5.74
testMintOwnershipUpdate	1.38
testMintRequirements	0.09
testTransferBalanceUnchanged	9.03
testTransferBalanceUpdate	53.53
testTransferNextTokenIdUnchanged	4.47
testTransferOtherBalancePreservation	19.57
testTransferOtherOwnershipPreservation	430.61
testTransferOwnershipUpdate	18.71
testTransferRequirements	149.18

While most of the tests were completed in a matter of seconds, a few of them took several minutes. These time-consuming tests were challenging to verify due to the complex nature of the cases to be considered, and were closely related to the correctness of the lazy data structure algorithm.

Overall, the results of this experiment indicate that Halmos is able to effectively verify the correctness of smart contract code. It provides increased confidence in the code’s integrity, despite the complexity and potential edge cases of the smart contract.

Experiment with injected bugs

To demonstrate the effectiveness of Halmos’s bounded reasoning, we used it to detect bugs in a previous version of the ERC721A contract. This version had an issue that mishandled arithmetic overflow and potentially allowed for batch-minting a large number of tokens, which could break the integrity of the lazy data structure and result in some users losing their token ownership (an issue resolved in the current version). We ran the same set of 19 tests for ERC721A on the buggy version, and Halmos was able to find a counterexample for the properties of the mint() function. Specifically, Halmos provided input values that led to the exploit scenario described above. The results of this experiment indicate that, despite its incompleteness, Halmos’s bounded reasoning can be an effective way of detecting and preventing exploitable bugs in smart contracts.

Related work

Formal verification tools for Ethereum smart contract bytecode have been developed by various teams. These tools, including Halmos, can be used to help ensure the security and correctness of smart contracts. Comparing and understanding the different features, capabilities, and limitations of these tools can help developers choose the most appropriate one for their unique needs.

While Halmos is a valuable addition to the toolset available for smart contract verification, it is meant to complement (not replace) existing tools. Developers can combine Halmos with other tools to achieve a higher level of assurance in their contracts. Below, we compare Halmos with a few selected formal tools that support EVM bytecode.

K is a powerful formal verification framework that enables deductive verification and interactive theorem proving. Its underlying theory and implementation provide a high level of expressiveness, making it suitable for verifying complex programs and algorithms. However, it should be noted that K is not designed with heavy emphasis on automated verification and lacks certain automation features which can require more manual effort during the verification process. To use the K framework, formal specifications are written in the K language, which must be learned prior to use. Its strength is particularly useful in verifying complex systems, which may be challenging to analyze using automated reasoning.
Certora is a formal verification tool for smart contracts that focuses on automated verification and supports bounded model checking, similar to Halmos. To use Certora, developers must learn their new language, CVL, in order to write specifications. This language allows for the concise description of many critical properties through contract invariants, a feature that Halmos currently does not support. Despite being a closed-source, proprietary tool, Certora provides robust formal verification tooling, with ongoing development and good user support.

Halmos, on the other hand, is an open-source tool that is smaller in scale and currently lacks some features provided by Certora, but it is meant to serve as a public good and is intended as a niche solution for quickly verifying smart contracts without the need for extensive setup and maintenance.
HEVM is another formal verification tool that is similar to Halmos. It was previously part of DappTools, which is a precursor of Foundry. Both HEVM and Halmos have the feature of not requiring a separate specification and can symbolically execute existing tests to identify assertion violations. This allows users to use both tools interchangeably or run them in parallel for the same tests, providing them with multiple options for symbolic testing.

It’s worth noting that, despite their similarities, HEVM and Halmos have been independently developed and differ in their implementation details; particularly in terms of optimizations and symbolic reasoning strategies. Additionally, HEVM is written in Haskell, while Halmos is written in Python, providing exposure to the rich Python ecosystem. Having the ability to use both tools gives users more flexibility and options to ensure the security and correctness of smart contracts.

Halmos is open source, and currently in its beta phase. We are actively working on new features and functionality, including Foundry cheat codes and several other key usability features. We would greatly appreciate your thoughts on which features are most important, and welcome any feedback, suggestions, and contributions to make Halmos a better tool for everyone.

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.