Pipelining 381
main memory then a con ict takes place as to which cycle would be allowed rst to access it. This condi-
tion is explained through Figure 12.10 . As a matter of fact, this diagram is identical with Figure 12.4 . The
only difference is the shaded column representing t5 , which would be the target of our present discussions.
As we can observe, during t5 , fetch cycle of instruction 5, operand fetch cycle of instruction 3 and
result storing cycle of instruction 1 are performed concurrently. If we assume that all these three opera-
tions are related to the external memory, then there would be a structural hazard arising out of the con-
ict of accessing the external memory.
In modern processors while implementing pipeline architecture, this con ict is solved by providing sepa-
rate caches for instruction and data. Moreover, the data paths are so designed that it does not allow the con-
ict of result storage and operand fetch. Details related to these matters are discussed in the following section.
12.7 CONTROLS AND DATA PATHS
Through the discussions so far on pipeline architecture the reader must have realized that for proper
implementation of it, the following processor or control unit operations must be performed indepen-
dently without affecting any other operation(s) mentioned below going on concurrently.
R Read an instruction from the instruction cache and store within IR.
R Decode an instruction loaded within IR.
R Perform a given ALU operation.
R Increment program counter by 1.
R Reading from two different registers of the register set simultaneously.
R Writing in any register of the register set.
R Read or write data using data cache.
R Load input of ALU by its result (operand forwarding).
To implement all these features for a successful pipeline operation, some modi cations in the data
path and controls are necessary. We have already discussed about the operand forwarding in details
(Figure 12.7 ). This and other necessary details for a smooth pipeline operation allowing all above indi-
cated criteria are illustrated in Figure 12.11 . Note that in this illustration, other details related with the
processor are not shown.
To highlight some of the features shown in Figure 12.11 , we start from its right side where the instruc-
tion queue and instruction decoder along with the control signal buffers are shown. Address for the next
instruction is generated by the program counter (PC), as usual, and released to the instruction cache (at the
In Chapters 9 and 10, we have already discussed about data path design and micro-operations
using different control signals. The reader is reminded here that generally a processor is
designed around one, two or three buses. Although there is no doubt that multiple data paths
help in rapid and efficient data communication, however, the complexity behind implementation
of multiple data paths cannot be completely ruled out.
F
O
O
D
F
O
R
T
H
O
U
G
H
T
M12_GHOS1557_01_SE_C12.indd 381M12_GHOS1557_01_SE_C12.indd 381 4/29/11 5:24 PM4/29/11 5:24 PM
382 Computer Architecture and Organization
left side of ALU in Figure 12.11 ). Note that according to the demand of the pipeline architecture, we are
providing instruction cache and data cache separately, preferably located within the processor (L1 cache).
The PC incrementing module may work independently and the address code released through memory
addressing register for instruction [MAR(I)] does not affect the functioning of any other part of the process-
ing modules. The multiple buffers within the control signals’ buffers module stores the traces of the relevant
control signals for the decoded opcode placed within instruction decoder another independent operation.
For transacting data or operands, two paths are provided. One is for the data cache and the other one
is for external data memory located within main memory. To write within the data cache, data memory
(Write) register is used, while for data reading from data cache, data memory register (Read) has to be
used. For external data access, MAR(D) register is used to generate the necessary address codes.
The purpose of A-bus, B-bus and C-bus have already been discussed, which ensures reading simulta-
neously from two general purpose registers within the register array and at the same time writing within
any one of the general purpose register within the same location. All these features allow independent
data movements from the register array to other places as and when necessary.
In Figure 12.10, we have highlighted the operations during the time-slice t5 , indicating a probable struc-
tural hazard. Let us now investigate the effectiveness of our data path design related with that problem.
To start with, time-slice t5 in Figure 12.10 performs following functions:
R Fetch Instruction 5
R Decode Instruction 4
R Get operand for Instruction 3
R Execute Instruction 2
R Store result of Instruction 1.
Fetching of Instruction 5 needs PC and MAR(I), the output from which goes directly to the instruction
cache. The PC is then incremented by one, which is also performed independently. Decoding of Instruction 4
...
MAR(D)
Figure 12.11 Data path suitable for pipeline operation
M12_GHOS1557_01_SE_C12.indd 382M12_GHOS1557_01_SE_C12.indd 382 4/29/11 5:24 PM4/29/11 5:24 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.157.197