Contents Next

Chapter 1   Introduction









The performance of today's microprocessors is astonishing. Beneath the progress in wafer technology, a big contribution to the improvements achieved in the past years was made by developing sophisticated scheduling algorithms. One of the major scheduling algorithms used in recent CPUs was specified long ago in 1967 by Robert M. Tomasulo [Tom67]. However, up to now concrete data on the impact of the algorithm on hardware cost and cycle time has been missing.

Thus, this thesis gives a detailed implementation of the Tomasulo scheduling algorithm for the DLX RISC architecture [HP96]. The design is based on a machine presented in [Lei98] and realizes full support for precise interrupts with a reorder buffer [SP88]. Cost and cycle time are calculated and evaluated with a formal model presented in [MP95]. The results are compared to other DLX implementations.

1.1   Results

The Tomasulo scheduling algorithm is one of the most competitive scheduling algorithms. It provides low CPI rates down to 1.1 which is shown by simulations on common benchmarks in [Ger98, Del98]. This thesis shows that adding a Tomasulo scheduler does not have any impact on the cycle time of the CPU design.

The Tomasulo scheduling algorithm with precise interrupts is known to be expensive regarding hardware cost. A complete CPU core design counts about 236,000 gate equivalents, which is about two times as much as is needed by a pipelined design with equal function units. Compared to the total costs of a CPU design (including the first level cache), this is just an increase of 26 percent at a 44 percent higher performance.

1.2   Outline

Chapter 2 describes some basic concepts, like the hardware model and gives a rough overview of the design. It also includes a terse introduction to the Tomasulo scheduling algorithm itself. Chapter 3 presents all the implementation details on gate level other than the memory system, which is presented separately in chapter 4. In chapter 5, the analysis of the cost and cycle time of the design is carried out. The results are compared to other DLX implementations in order to evaluate the overall design quality impact of different scheduling algorithms. Chapter 6 contains the correctness proof for the hardware.
Contents Next