Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Given the datapath and delays in the following figure, calculate the time it takes to execute the following
instructions. You may assume that the delay of anything not mentioned in the table (including wires and
ALU control) is zero.
(a) lw $t0, 8($t1)
(b) bne $t0, $t1, 0x100
Assume the following instruction mix for a 5-stage MIPS pipeline:
• The processor’s base CPI = 1.
• We use “always branch not taken” scheme for branch prediction.
• 55% of the branches are not taken.
• 35% of the load instructions are immediately followed by an instruction that uses the loaded value.
• There are no other stalls in the pipeline.
• Branch misprediction has a 4 cycle penalty.
• Stalling due to a load instruction has a 2 cycle overhead.
Calculate the CPI of this pipeline.
Consider the MIPS assembly code given below.
1 xor r0, r0, r0
2 addiu r1, r0, 10
3 j L1
4 loop: lw r3, 0(r2)
5 mul r4, r3, r3
6 mul r3, r3, r1
7 addiu r0, r0, 1
8 div r3, r4, r3
9 sw r3, 0(r2)
10 addiu r2, r2, 4
11 L1: bne r0, r1, loop
We want to run this code on a 5-stage pipelined processor, with some modifications. The processor is a
typical 5-stage pipeline (F-D-X-M-W), with the following exceptions:
• The multiplier block used to execute the mul instruction is pipelined into four stages:
This means that a multiply instruction runs through the pipeline as follows: F-D-X0-X1-X2-X3-M-W and
up to four multiply instructions maybe in-flight at a time. All other instruction types are blocked from the
execute stage while any of the multiply stages are being used.
• The divider block used to execute the div instruction is iterative and takes four cycles:
This means that a divide instruction runs through the pipeline as follows: F-D-X0-X0-X0-X0-M-W. All other
instructions are blocked from the execute stage while a division is being done.
(a) Stalling for Structural Hazards
Draw a pipeline diagram (table) showing the execution of the MIPS code through the first iteration
of the loop, without bypassing. Assume data hazards and structural hazards are resolved using only
stalling. Assume branches are not taken, until they are resolved in the execute stage. What is the CPI
of the entire program?
Hint: Fill in this pipeline diagram
(b) Bypassing for Data Hazards
Draw a pipeline diagram similar to Part A, but now assume the processor has data bypassing. What is
the CPI of the entire program?