1. Introduction

1.1 Formal Verification of Microprocessors

Nowadays, microprocessors are in use in many safety-critical environments, such as cars or planes. We therefore consider the correctness of such components as a matter of vital importance.

Verifying the correctness of microprocessors used to be done by extensive tests. However, the state space of modern microprocessors is huge and tests never attain full coverage, especially for 64-bit designs. We therefore think formal verification is the sole way to obtain a guarantee.

This formal verification should be done such that any third party is able to verify the correctness with low effort, i.e., we aim to provide a proof of correctness that can be checked mechanically. In particular, we think that all critical designs should be delivered in form of a four-tuple: 1) the design itself, 2) a specification, 3) a human-readable proof, and 4) a machine-verified proof. Moreover, we think that there will be a considerable market for such four-tuples.

Let us motivate why we distinguish human-readable proofs and machine-readable proofs and why we demand for both. This is not a common demand. In industrial environments, low-effort but automatized verification is preferred.

However, proofs written for theorem proving systems tend to be hard to read. This becomes worse the higher the grade of automatization of the theorem proving system is. We think that this leads to two drawbacks: Without a human-readable proof, one completely depends on the soundness of the theorem proving system. This includes that one depends on the clarity and accuracy of the specification language of the theorem proving system.

The second drawback is that automatized design verification is of no aid in understanding the designs. In contrast to that, we have experienced that writing proofs, in particular the human-readable proofs, is producing generic theories and design approaches previously unknown. We therefore claim that providing human-readable proofs will aid automatizing the actual design process, since generic theories allow for the development of non-specialized tools with diversified use.

In this thesis, we present proofs of correctness for complex microprocessors. Designing microprocessors is considered an error-prone process. Due to the complexity of the designs, errors often remain undiscovered even in case extensive testing is done. A well known example for this is the Pentium FDIV bug [Coe95, Pra95].

1.2 Related Work

There are many publications on the formal verification of sequential machines, e.g., Cohn verified the VIPER processor [Coh87], Joyce verified the Tamarack [Joy88a, Joy88b], Hunt verified the FM8501 [Hun94], and Windley verified the AVM-1 [Win95].

In [HP96, PH94], Hennessy and Patterson describe a 32-bit RISC architecture, the DLX, which serves as basis for many microprocessor verification projects. In [MP95], Mueller and Paul describe sequential DLX designs at gate level, including a machine with precise interrupts.

The formal verification of a pipelined processor is reported in [BS89]: Bickford and Srivas verify a three stage DLX-like RISC processor. In [LO96], Levitt and Olukotun verify a five-stage DLX pipeline by transforming it back into a sequential machine by removing stalling and rollback logic.

In [Hos00], Hosabettu verifies both in-order and out-of-order DLX implementations that are not synthesizeable. The pipelined implementation has a trivial stalling logic. The verification is done using the completion Related Work function approach and PVS.

Further literature on the verification of pipelined machines is [LO96], which covers automatic verification of pipelined microprocessors, [BM96] provides a manual proof of a DLX pipeline, Burch, Dill [BD94] verify a very simple pipeline. Henzinger et.al. [HQR98] use refinement mappings in order to model-check a RISC pipeline.

Besides PVS, there are more theorem proving systems that are applied for hardware verification, such as HOL [CGM86] or ACL2 [KM96]. There has been much success in verifying complete, complex systems using theorem provers [BS89, HGS99, SH99]. However, theorem proving systems always involve much manual work.

Recent papers show the correctness of complex designs or schedulers in theorem proving systems such as PVS. Hosabettu et al. [HGS99] prove both safety and liveness of Tomasulo's algorithm using PVS. Swada and Hunt [SH99] provide an ACL2 proof of a complete design implementing a Tomasulo scheduler with reorder buffer.

Henzinger et al. [HQR98] verify a simple pipelined processor using a model checker. McMillan [McM98] partly automates the proof by refinement of Tomasulo's algorithm presented in [DP97] with the help of compositional model checking. This technique is improved in [McM99b] by theorem proving methods to support an arbitrary register size and number of function units.

In the literature cited above, the complex designs are verified at very high levels of abstraction. In particular, there is even not much literature on details of actually implementing complex microprocessors. Gate-level descriptions of microprocessors usually never go beyond simple machines, with the exception of [Lei99] and [MP00]: In [Lei99], Holger Leister presents out-of-order designs and evaluates the architectures regarding hardware cost and performance. The correctness is argued using paper-and-pencil proofs but not verified by means of machine.

In [MP00], Silvia M. Mueller and Wolfgang J. Paul present gate-level designs of pipelined DLX implementations including a machine with full IEEE floating point arithmetic and interrupts. The correctness of the machines is argued as follows: The authors build a sequential machine but with the structure of a pipelined machine. This machine is called prepared sequential machine. The authors transform this prepared sequential machine into a pipelined machine by adding interlock and forwarding hardware. This is supported by introducing the concept of a stall engine. The stall engine encapsulates the logic required for generating clock enable signals for the individual pipeline stages.

The correctness of the pipelined machine is argued as follows: given the correctness of the prepared sequential machine, the authors prove the pipeline to be correct by arguing that it simulates the prepared sequential machine. This is done using a scheduling function. This function maps a configuration of the physical machine to a configuration of the abstract reference machine.

1.3 Contribution

In this thesis, we provide a rigorously formal approach to hardware verification. The designs presented in this thesis include state of the art schedulers, such as the Tomasulo scheduler [Tom67] and speculation. In contrast to most of the literature, the designs we provide are very close to gate level. In particular, we are synthesizing some of the designs for the XILINX FPGA series.

These designs are of high complexity, and so are the proofs. In contrast to [MP95, Lei99, MP00], the proofs are machine verified using the theorem proving system PVS [CRSS94]. However, we never present the original PVS proof in this thesis. We aim to provide proofs that come close to comprehensible paper-and-pencil proofs in the tradition of [KP95, MP95, MP00]. We aim to maintain the full formal reasoning of the PVS proofs, to the extent that the proofs are reviewable on a line-per-line basis. This resulted in several PVS proofs to be re-written due to better readability of the paper version of the proof.

In order to verify sequential machines, we extend the data consistency invariant given in [MP00] by defining a "correct value" of an implementation register such as IR.2. Given the correctness of functional components such as the ALU, this allows for an almost fully automated proof of the data consistency of the prepared sequential machine using PVS. We argue that the correct functional components provide correct results if given correct inputs.

We extend the stall engine concept presented in [MP00] by providing a fully generic stall engine design. In contrast to [MP00], our stall engine design supports an arbitrary number of stages and allows for stalling (and therefore clocking) all stages independently. We formally verify data Organization consistency and liveness properties for this stall engine.

Using this extended stall engine, we can significantly improve the process of transforming the prepared sequential machine into the pipelined machine by providing a tool that does this transformation automatically. This includes the generation for forwarding and interlock hardware. In particular, the transformation of the PC environment of the DLX with Delayed PC, i.e., removing the DP C register, turns out to be a special case of adding forwarding.

We then prove the data consistency of the pipelined machine. We do so by showing that the inputs of the pipeline stages are correct. Using this fact, we argue the correctness of the output values as we do for the sequential prepared machine, since the functional components of the machines are identical.

We present a generic approach to speculative execution and propose a data consistency criterion for such a machine. We then apply this method in order to implement and prove DLX pipelines with branch prediction and precise interrupts. It is a well-known fact that both techniques are implemented using speculation [SP88]. However, to the best of our knowledge, implementing both techniques as an instance of a generic speculation mechanism is done for the first time.

Besides the in-order pipelines, we verify the correctness of the Tomasulo scheduling algorithm with reorder buffer as described in [KMP99]. The reorder buffer realizes in-order termination which allows implementing precise interrupts. The proof of correctness covers the arguments neccessary to show the uniqueness of the tags.

Furthermore, we rigorously prove the liveness of all machines we design, i.e., we prove that any given instruction sequence is executed within a finite amount of time. Although critical, liveness issues are often not covered in the open literature.

1.4 Organization

Chapter 2 describes basic concepts. We introduce the mathematical hardware model, and describe the implementation and verification of basic circuits, such as adders. We use these basic circuits in order to implement and verify an ALU. We then provide a formal specification of a DLX RISC microprocessor without interrupts and floating point instructions.

In chapter 3, we describe how we model the hardware of a microprocessor. We describe the extended stall engine for the prepared sequential machine. We introduce the functions used in order to model the registers, the circuits between the registers and the forwarding logic. We use this formalism in order to implement and verify a prepared sequential DLX. We also show the liveness of the prepared sequential machine.

In chapter 4, we describe how the stall engine is modified in order to get a pipelined machine. We describe how to add the forwarding and interlock hardware and prove the correctness of the pipelined machine. This comprises of both data consistency and liveness.

In chapter 5, we describe a generic approach to speculative execution. We prove its data consistency and liveness. We implement two machines as examples: the first machine guesses whether branches are taken or not. The second machine guesses whether we have an interrupt or not. We prove that this realizes precise interrupts according to the specification given in [MP00].

In chapter 6, we describe the results of verifying an out-of-order DLX with Tomasulo scheduler as presented in [Krö99].