However, in-order execution does not fully utilize all functional parts of a CPU. The rule of in-order execution prohibits that subsequent instructions overtake previous instructions. In figure 2.1, instruction I2 blocks the execute stage for four cycles, since the division function unit has a long latency. Instruction I3 has to be stalled upon the begin of its execution, since the execution stage is blocked by I2 and since it requires the result of I2 (data dependence).
Figure 2.2: Out-of-order execution example. Instructions I2 and I4 cause a WAW hazard, instructions I2 and I3 cause a RAW hazard, and instructions I3 and I4 cause a WAR hazard.
Figure 2.3 gives an overview of the basic data paths of an in-order design and of the same design after adding Tomasulo scheduling with reorder buffer. The Tomasulo scheduling algorithm requires the following data structures and transfer paths:
Basic structure of an in-order design After adding Tomasulo scheduling
The reservation stations provide the operands for the function units. They are basically a queue for the issued instructions. Each reservation station RSi holds exactly one instruction and its operands and has the following components (figure 2.4):
When an instruction completes, both the result and the exception flags are written into the reorder buffer entry pointed to by this reorder buffer tag. In each cycle, the entry at the head of the reorder buffer is tested. If it is valid (i.e., the instruction has completed), a check for exceptions is performed and the data is written into the register file. Depending on the type of the interrupt (abort/repeat/continue), the result of Ii is written into the register file before executing the interrupt service routine.
Name Width Purpose valid 1 valid =1 Û data field contains a valid value data 64 result data dest 4 address of the destination register
ALU reservation station for I2 register file t global operand 1 operand 2 R1 R2 R3 op tag full tag valid data tag valid data tag valid data tag valid data tag valid data 0 - - 0 - - - - - - - 1 0 - 1 9 ROB-0 0 - 1 + ROB-1 1 - 1 9 ROB-0 0 - ROB-1 0 - - 1 9 ROB-0 0 - 2 + ROB-1 1 - 1 9 ROB-0 0 - ROB-1 0 - - 1 9 ROB-0 0 - 3 + ROB-1 1 - 1 9 ROB-0 1 11 ROB-1 0 - - 1 9 ROB-0 0 - 4 - - 0 - - - - - - ROB-1 0 - - 1 9 ROB-0 0 - 5 - - 0 - - - - - - ROB-1 0 - - 1 9 - 1 11 6 - - 0 - - - - - - - 1 20 - 1 9 - 1 11
reorder buffer common t global entry 0 entry 1 data bus ROB.head ROB.tail valid data dest valid data dest req ack tag valid data 0 ROB-0 ROB-1 0 - gpr R3 - - - - - - - 0 - 1 ROB-0 ROB-2 0 - gpr R3 0 - gpr R1 - - - 0 - 2 ROB-0 ROB-2 0 - gpr R3 0 - gpr R1 Mem - - 0 - 3 ROB-0 ROB-2 0 - gpr R3 0 - gpr R1 ALU Mem ROB-0 1 11 4 ROB-0 ROB-2 1 11 gpr R3 0 - gpr R1 - ALU ROB-1 1 20 5 ROB-1 ROB-2 - - - - 1 20 gpr R1 - - - 0 - 6 ROB-2 ROB-2 - - - - - - - - - - - 0 -