Of data hazards, bypasses and stalls

Remember that a data hazard occurs when an instruction waits for the result of a previous instruction.

In the following example:

        add a0, t0, t1         ; a0  ←  t0 + t1
        sub a1, a0, t2         ; a1  ←  a0 + t2

the add instruction stores the result into register a0 when it reaches the Writeback stage at clock cycle 5,
but the sub instruction wants to read a0 when it reaches the stage Decode in clock cycle 3.
Of course, at cycle 3, the result of the first instruction has not been written into the register file yet.

Instead of waiting two cycle for the value to be available, we could instead create a shortcut, a way for the second instruction to get the value of a0 directly out of a subsequent stage of the pipeline. In other words, a bypass.

In practice, it's not just one bypass that we need but several of them!

The forwarding unit

The Execute stage needs to know the correct value for both rs1 and rs2 and each of them can come from three different sources:

  • The register file. That's the canonical source for the value, no bypass is used.
  • The output of the previous instruction. This means the value is present in the EX/MEM pipeline register, which is equivalent to say that it comes from the Memory stage.
  • The output of the instruction before that. In this case, the value is in the MEM/WB pipeline register, or in the Writeback stage.

Any instruction before that will have had the time to go through the Writeback stage.

Here is how we determine the correct source for the value of rs1:
rs1 EX forwarding

The forwarding unit is always active, it does not care whether the register is actually needed by the ALU operation or not.

The logic is not very complex. First, a comparison is made between the the source register number and the destination register number of the previous two instructions (the one in MEM and the one in WB). Then it is also checked that these previous two instructions do actually want to write a value back. A store instruction, for instance, does not write anything back in the register file, so cannot be the source of a bypass.

After these two tests, we know if rs1 is present in MEM or in WB (or both), and it is only a question of priority to know where we want to take it from. Highest priority is MEM, then if not present there, try to take it from WB, and otherwise default back to what was fetched from the register file.

The logic is exactly the same for rs2:
rs2 EX forwarding

There is another place where a bypass is required, it's in the Memory stage. When a load into a register is immediately followed by a store of the same register, this value must be forwarded as well.

rs2 MEM forwarding

The bypasses

All right, we have all those signals coming out of the forwarding unit, but what do we do with them?

We wire the bypasses, of course!

The first bypasses are the ones in the EX stage, and they look like this:
EX bypasses

The selection signals will enable only one buffer for each of rs1 and rs2, so that the correct value can be sent to the ALU or further down the pipeline.

In the Mem stage, it's the same principle, only simpler because there is only one register and two sources:
MEM bypasses

The stall logic unit

There's one situation where we cannot rely just on a bypass, though, it's the use of a register by the ALU in an instruction directly following a load. The second instruction needs to wait for the value to be loaded from memory, and for that it needs to be stalled.

Load stall

We stall if the ALU wants, as one of its input, the content of a register that would come from the previous instruction, and that previous instruction is a load. All the other cases are covered by the bypasses we just implemented above!

The stall mechanism

Let's keep things simple for now and consider that a memory access can always be done in one cycle. As a consequence, the instruction present in the EX stage cannot advance and must stay in there, while the load instruction in MEM stage moves to the WB stage. That leaves a bubble between them.

If the instruction in EX cannot advance, neither can thoses in the previous stages Decode and Fetch.

On all the pipeline registers between EX and Mem, a stall signal will prevent a new value to be clocked in by disabling the Write Enable input:
EX stall

The same signal is applied to the other pipeline registers in ID/EX and IF/ID, as well as to the fetch unit so that the PC does not get incremented.

There is just one pipeline register that we need to treat differently: the register that conveys the actual instruction's actions from EX to MEM, that one must be cleared to become a nop, our bubble.

EX bubble

This concludes the part on how the data hazards will be handled in Astorisc. Stay tuned for the next part where we'll tackle the control hazards induced by the jumps and branch operations.