Previous Contents Next

Chapter 6   Correctness

The correctness proof of the Tomasulo scheduling algorithm with reorder buffer itself is already in [Ger98]. It is based on a proof for the Tomasulo scheduler without reorder buffer in [Mül97a]. The proof presented here uses a slightly different notation. The correctness proof has two parts: The first part shows that data consistency is maintained. The second part proves that the algorithm terminates, i.e., the hardware is deadlock-free.

6.1   Data Consistency

6.1.1   Definitions

Let I1 to In be a sequence of n instructions. Data consistency intuitively means that the results generated by any given instruction Ii in the program conform to the semantics of an abstract DLX machine. These semantics are defined as transition rules on an abstract configuration set C(i) of the machine. This configuration includes all DLX registers, which are R0(i) to R31(i), SPR, FPR, and PC(i), and the rest of the program to be executed. A shorthand for all register files is RF[](i). The start configuration C(0) is the configuration after the reset.

This abstract DLX machine processes one instruction with each transition. Let the source registers of instruction Ii be S1(i) to Ss(i)(i) with values s1(i) to ss(i)(i) and the destination registers D1(i) to Dd(i)(i) with values d1(i) to dd(i)(i). We use s(i) source and d(i) destination registers to handle double precision floating point instructions without disturbing distinction of cases. For example, the instruction

FPR0:=FPR2+FPR4

has six source operands (FGR2 to FGR5 and RM and MASK) and three destination operands (FGR0, FGR1, and IEEEf). Let opi be the operation of instruction Ii, this is a double precision floating point addition in the example.

Now, Ii can be specified as

Ii: (D1(i),..,Dd(i)(i)) = opi(S1(i),..,Ss(i)(i))

with values

Ii: (d1(i),..,dd(i)(i)) = opi(s1(i),..,ss(i)(i))

As mentioned above, the abstract DLX machine processes instruction Ii in transition i, which results in configuration C(i). Registers not written by instruction Ii are passed unmodified from configuration C(i-1). Let last(r,i) be index of the last instruction prior Ii which modified register r:

last(r,i) = max{j<i | Ij has destination register r }

This allows an easy criterion for data consistency for any given (non-abstract) machine.

Criterion 1 For any given source register, the value of this register must be the result of the last instruction writing it:

sx(i) = dy(last(Sx(i),i))   " x Î {1,..,s(i)}, Dy(i)=Sx(i)

There are some exceptions from this rule, e.g., the register R0, which are left out for sake of simplicity. Furthermore, if there is no such instruction, this value is undefined, therefore, it is assumed that all registers, which are used as source register, are initialized with proper values by the program.

The rest of the section will prove that this condition holds for the machine presented in chapter 3. The denominators of the abstract DLX machine for the registers and the values are now used for the Tomasulo DLX configuration. This is done in analogy to the proof in [Mül97a] with the help of four invariants.

Invariant 1 During the issue of instruction Ii, source operand Sx(i) is in the register file, as indicated by RF[r].valid=1. In this case, the data item of the register holds the desired value, i.e., it holds the result of the last instruction writing RF[r].

Invariant 2 During the issue of instruction Ii, source operand r:=Sx(i) is, as indicated by RF[r].valid=0, the result of an previous instruction Ij, which has not yet retired. In this case, the tag item of the register holds Ij.tag, i.e., the tag of the instruction Ij. This instruction is the last instruction writing RF[r].

Invariant 3 Let instruction Ii be an instruction in reservation station RSl. If there is an operand x in this reservation station which is not yet valid (RSl.opx.valid=0), there must be a previous instruction Ij, which produces the value for the operand and has not yet completed. The tag of this instruction is the tag in RSl.opx.tag.

Invariant 4 Let invariant 2 or 3 apply for Ii and source operand r:=Sx(i) with tag Sx(i).tag. If the CDB is valid and if the CDB tag is equal to the tag of the operand, the result of the last instruction writing the operand is on the CDB. If the ROB entry pointed to by the tag of the operand is valid, the result of the last instruction writing the operand is in this ROB entry.

With the help of these invariants, the proof of the data consistency is done by induction over n. For n=0, the claim is obvious. The induction for n>0 requires a distinction of two cases. Let instruction Ii read source operand Sx(i). This is possible in two different phases.

6.2   Termination

6.2.1   Definitions

The termination proof will show that an instruction sequence I1 to In is processed in a finite amount of clock cycles. The following shorthands are used: fetch(i), issue(i), disp(i), compl(i) and term(i) denote the cycles in with instruction Ii is fetched, issued, dispatched, completed and terminated, respectively.

6.2.2   Termination

Lemma 1 All instructions are issued in program order and terminate in program order, i.e., issue(i)<issue(j) and term(i)<term(j) for any i<j.

In-order termination is a conclusion of the fact that an instruction terminates when leaving the ROB. The ROB is a fifo queue and is filled in program order. Consequently, to prove termination, it is sufficient to prove that a finite a exists with term(i)+a ³ term(j) for any i<j, i.e., to find an upper bound for the number of cycles of an instruction. A weak upper bound for a can be determined by summing up the maximum time which an instruction spends in each stage after all previous instructions terminated:

a £ a0 + ... + a4

This bound is proved by induction over n. The induction hypothesis is that instruction Ii obeys to this bound.

Fetch After all previous instructions terminated, instruction Ii can be fetched if there is no stall from the instruction memory. If lmem<¥ is the maximum latency of the memory, a0 = lmem.

Issue Instruction Ii can be issued iff there is no issue stall (the conditions for an issue stall are in section 3.5.6, page ??). This is obviously true one cycle after all previous instructions terminated and if there is no stall from the instruction memory. Since lmem is the maximum latency of the memory, a1 = lmem+1.

Dispatch Instruction Ii can be dispatched iff all its operands are valid and if the function unit is able to accept data. Obviously, the operands are valid after all previous instructions terminated. However, the function unit still might be blocked by instructions after Ii, i.e, instructions Ij with j>i. Let imax be the maximum number of instructions the function unit can hold (including the producer pipeline stage), and l the maximum latency of the function unit. For iterative function units, imax · l is a weak bound for the number of cycles until the function unit is empty. This bound is valid for pipelined function units, too. However, before an instruction can leave the function unit, it is necessary to wait for the CDB. Since the CDB is allocated round robin, this takes n cycles at most, with n being the number of function units. Thus, it takes a2=imax · l · n cycles at most until the function unit is empty. As Ii is the oldest valid instruction in the reservation stations (all previous already terminated), Ii is dispatched afterwards.

Completion As shown above, an instruction Ii in a function unit leaves it (completes) after at most a3=imax · l · n cycles.

Termination A valid (completed) instruction in the ROB terminates one cycle after all previous instructions terminated. Thus, a4 is one cycle.


Previous Contents Next