Chapter 7 Perspective
Several aspects of the Tomasulo scheduler have not been examined in this
thesis. They are left for further research.
-
Modern CPUs with Tomasulo scheduler issue up to eight instructions
in one cycle. Extending the hardware to handle multiple instruction
issue is subject of a thesis by Mark A. Hillebrand [Hil99].
- The design presented in this thesis performs a stall on each
conditional branch until the operand is available. In order to make
effective use of a scheduling algorithm, branch prediction is required
to eliminate these stalls. This is also part of the thesis of Mark A.
Hillebrand.
- The Tomasulo scheduling algorithm has built-in support for function
units with variable latency. This allows for floating point units sharing
expensive components such as the rounder. This results in a slightly lower
IPC but great cost savings.
Further cost savings are possible by removing the forwarding of the rounding
mode by encoding it in the instruction word. Another cost-saving opportunity
is dropping the forwarding of the interrupt mask. The floating point rounder
itself could access the SR (status register) special purpose register
directly, performing a stall if not available.
- The present design already supports 64 bit wide floating point
operands. The extension to 64 bit wide integer operands is therefore
available at very low extra cost. Simulations should show how much
improvement in IPC is possible by wider integer operands.
- The hardware model used in this thesis does not take wiring in
account. An improved model presented in [PS98] includes the
significant impact of wiring on cost and delay.