372 Computer Architecture and Organization
assumed that instruction fetching, its decoding, necessary operand fetch, execution of the instruction
and nally, result-storage, all are completed by consuming identical time-slices. What would happen if
these time-slices are uneven? Well, in that case all related operations would remain idle till the execu-
tion of the longest step. Remember that we have investigated the case of one car per two days when
all other operations except the body fabrication needed one day but due to the requirement of the body
fabrication stage (it needed two days) the whole process was delayed. Therefore, the pipeline would
only perform in an ef cient manner if all related operations consume more or less the same amount of
time. Let us now investigate the real-life situation, related to this assumption.
In general, the operands of related instructions are available within the internal registers of the pro-
cessor. Similarly, the results of different instructions are also stored within the internal registers of the
processor. Therefore, we see that the last four operations namely decode, operand fetch, execute and
result storage may be taken as internal operations of the processor and consume more or less same time.
However, the very rst part, the instruction fetch must be carried out from external memory source.
From our discussions of Chapter 7, we know that this is the slowest process and may consume ten times
more time than any internal operation of the processor.
In such a situation, our cache memory comes into rescue. In case of L1 cache (the cache that is
located within the same wafer of the processor), its access time is more or less same as that of the access
time for processor registers or processor operations (add two numbers for example). Therefore, most of
the modern processors, which implement the pipeline strategy, are equipped with on-chip L1 cache to
speed up the instruction fetch cycle.
The reader may point out that what would be the situation during a cache miss? Well, that can always
happen but it would not be a frequent one. This and some other issues are labelled as hazards of pipeline ,
which we shall discuss now.
12.3 PIPELINE PERFORMANCE
In Sections 12.2.1 and 12.2.2, we have considered an ideal environment for explaining the basic prin-
ciples of pipeline strategy. However, considering the real-life situations, we nd that there are quite a
few circumstances, where many of our simpli ed assumptions would not hold good. In following sec-
tions, we shall try to identify those situations and describe the related problems. Simultaneously, we
shall discuss some of the methods to overcome these problems.
12.3.1 Stalling
Stalling is a generalized term for pipeline hazard. Whenever the processor is not able to complete
its assigned cycle within the prede ned time-slice, it demands extra time, which forces other related
cycles to wait idle. This is a general condition designated as stalling. For example, let us reconsider the
example illustrated in Figure 12.4 , with a minor variation.
Let us assume that during its execution, whatsoever be the reason, execution of instruction 3 was
not completed within its assigned time-slice t6 . It takes an extra time-slice t7 , and then it is com-
plete, as illustrated in Figure 12.5 . The reason may be an I/O wait for some operand or the instruc-
tion itself might be such (like multiply or divide) that it was not possible to complete the execution
within t6 . Therefore, all other cycles related to other following instructions must remain idle (inac-
tive) during t7 for completion of execution of instruction 3. This may be referred as the processor
is stalling during t7 .
M12_GHOS1557_01_SE_C12.indd 372M12_GHOS1557_01_SE_C12.indd 372 4/29/11 5:24 PM4/29/11 5:24 PM
373
Figure 12.5 The processor stalls
Processor is stalling here
M12_GHOS1557_01_SE_C12.indd 373M12_GHOS1557_01_SE_C12.indd 373 4/29/11 5:24 PM4/29/11 5:24 PM
374 Computer Architecture and Organization
The reader may ask a question, here, that why other instructions have to wait because of instruction 3.
What is wrong if instruction 6 is not decoded after its fetching during t7 ? After all, instruction decoding
does not need the help of ALU, which might be engaged by instruction 3 and might be unavailable for the
next executable instruction, i.e., instruction 4. The question is very much justi ed. To answer this ques-
tion, we have to visualize the data ow and data storage provisions adopted in the pipeline architecture, as
shown in Figure 12.6 . Note that in this diagram the operations are indicated in boxes with rounded corners,
and buffers are shown by square boxes with sharp corners.
We know that after completion of the fetch cycle, processor stores the fetched instruction in the
instruction register (IR). From our discussions in previous chapters, we also know that this IR cannot
accommodate more than one instruction at a time and it must be present in the IR till the completion of
execution of that instruction, otherwise proper control signals might not be generated by the control unit
(discussions in Chapter 11). However, in case of micro-programmed control, adopted by most of the
modern processors, after generation of the start address, the duty of data in IR is over. We assume that
this start address is stored within the decode buffer, as shown in Figure 12.6 .
The next step is any eventual operand fetch (not necessary for all but for some instructions) and,
in general, these operands are stored within the ALU registers, which must remain unchanged till the
completion of the ALU operation and transfer of its result. Finally, after the completion of the ALU
operation, its result register is copied to its destination as indicated by the instruction itself and the result
buffer becomes free to accommodate another result.
Therefore, referring Figure 12.5 , we can now visualize that the extension of execution of instruction
3 till the end of t7 stalls (or halts) the progress of execution of instruction 4 (ALU not available), oper-
and fetch of instruction 5 (ALU registers not available) and decoding of instruction 6 (no space to store
the decoded information in the opcode of instruction 6). This is a generalized example of stalling of
pipeline and, now, we can investigate different types of hazards encountered in the pipeline architecture.
12.3.2 Types of Hazards
In pipeline architecture, we expect that at every clock cycle, one instruction would be complete, although
physically it may consume multiple clock cycles. Figure 12.3 (b) or Figure 12.4 illustrates this expecta-
tion, considering ideal conditions of processing. However, as mentioned before, in real-life situations,
At this stage, the reader may note that the duration of execution time of different instructions
of any processor are not uniform. In the case of some processors adopting pipeline architec-
ture, there is a tendency to introduce some amount of uniformity within these durations by
incorporating specific number of wait-states in the shorter instructions. This is specially true
in case of RISC architecture.
F
O
O
D
F
O
R
T
H
O
U
G
H
T
Figure 12.6 Data flow in a pipeline architecture
M12_GHOS1557_01_SE_C12.indd 374M12_GHOS1557_01_SE_C12.indd 374 4/29/11 5:24 PM4/29/11 5:24 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.216.75