Chapter 1 Introduction
The performance of today's microprocessors is astonishing. Beneath the
progress in wafer technology, a big contribution to the improvements
achieved in the past years was made by developing sophisticated scheduling
algorithms. One of the major scheduling algorithms used in recent CPUs was
specified long ago in 1967 by Robert M. Tomasulo [Tom67].
However, up to now concrete data on the impact of the algorithm on
hardware cost and cycle time has been missing.
Thus, this thesis gives a detailed implementation of the Tomasulo
scheduling algorithm for the DLX RISC architecture [HP96]. The design
is based on a machine presented in [Lei98] and realizes full support for
precise interrupts with a reorder buffer [SP88]. Cost and
cycle time are calculated and evaluated with a formal model presented in
[MP95]. The results are compared to other DLX implementations.
1.1 Results
The Tomasulo scheduling algorithm is one of the most competitive scheduling
algorithms. It provides low CPI rates down to 1.1 which is shown by
simulations on common benchmarks in [Ger98, Del98]. This thesis shows that
adding a Tomasulo scheduler does not have any impact on the cycle time of
the CPU design.
The Tomasulo scheduling algorithm with precise interrupts is known to be
expensive regarding hardware cost. A complete CPU core design counts about
236,000 gate equivalents, which is about two times as much as is needed by a
pipelined design with equal function units. Compared to the total costs of a
CPU design (including the first level cache), this is just an increase of 26
percent at a 44 percent higher performance.
1.2 Outline
Chapter 2 describes some basic concepts, like the hardware model and gives a
rough overview of the design. It also includes a terse introduction to the
Tomasulo scheduling algorithm itself. Chapter 3 presents all the
implementation details on gate level other than the memory system, which is
presented separately in chapter 4. In chapter 5, the analysis of the cost
and cycle time of the design is carried out. The results are compared to
other DLX implementations in order to evaluate the overall design quality
impact of different scheduling algorithms. Chapter 6 contains the
correctness proof for the hardware.