12.3 PIPELINE PERFORMANCE

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

372 Computer Architecture and Organization

assumed that instruction fetching, its decoding, necessary operand fetch, execution of the instruction

and nally, result-storage, all are completed by consuming identical time-slices. What would happen if

these time-slices are uneven? Well, in that case all related operations would remain idle till the execu-

tion of the longest step. Remember that we have investigated the case of one car per two days when

all other operations except the body fabrication needed one day but due to the requirement of the body

fabrication stage (it needed two days) the whole process was delayed. Therefore, the pipeline would

only perform in an ef cient manner if all related operations consume more or less the same amount of

time. Let us now investigate the real-life situation, related to this assumption.

In general, the operands of related instructions are available within the internal registers of the pro-

cessor. Similarly, the results of different instructions are also stored within the internal registers of the

processor. Therefore, we see that the last four operations namely decode, operand fetch, execute and

result storage may be taken as internal operations of the processor and consume more or less same time.

However, the very rst part, the instruction fetch must be carried out from external memory source.

From our discussions of Chapter 7, we know that this is the slowest process and may consume ten times

more time than any internal operation of the processor.

In such a situation, our cache memory comes into rescue. In case of L1 cache (the cache that is

located within the same wafer of the processor), its access time is more or less same as that of the access

time for processor registers or processor operations (add two numbers for example). Therefore, most of

the modern processors, which implement the pipeline strategy, are equipped with on-chip L1 cache to

speed up the instruction fetch cycle.

The reader may point out that what would be the situation during a cache miss? Well, that can always

happen but it would not be a frequent one. This and some other issues are labelled as hazards of pipeline ,

which we shall discuss now.

12.3 PIPELINE PERFORMANCE

In Sections 12.2.1 and 12.2.2, we have considered an ideal environment for explaining the basic prin-

ciples of pipeline strategy. However, considering the real-life situations, we nd that there are quite a

few circumstances, where many of our simpli ed assumptions would not hold good. In following sec-

tions, we shall try to identify those situations and describe the related problems. Simultaneously, we

shall discuss some of the methods to overcome these problems.

12.3.1 Stalling

Stalling is a generalized term for pipeline hazard. Whenever the processor is not able to complete

its assigned cycle within the prede ned time-slice, it demands extra time, which forces other related

cycles to wait idle. This is a general condition designated as stalling. For example, let us reconsider the

example illustrated in Figure 12.4 , with a minor variation.

Let us assume that during its execution, whatsoever be the reason, execution of instruction 3 was

not completed within its assigned time-slice t6 . It takes an extra time-slice t7 , and then it is com-

plete, as illustrated in Figure 12.5 . The reason may be an I/O wait for some operand or the instruc-

tion itself might be such (like multiply or divide) that it was not possible to complete the execution

within t6 . Therefore, all other cycles related to other following instructions must remain idle (inac-

tive) during t7 for completion of execution of instruction 3. This may be referred as the processor

is stalling during t7 .

M12_GHOS1557_01_SE_C12.indd 372M12_GHOS1557_01_SE_C12.indd 372 4/29/11 5:24 PM4/29/11 5:24 PM

373

Figure 12.5 The processor stalls

Processor is stalling here

M12_GHOS1557_01_SE_C12.indd 373M12_GHOS1557_01_SE_C12.indd 373 4/29/11 5:24 PM4/29/11 5:24 PM

374 Computer Architecture and Organization

The reader may ask a question, here, that why other instructions have to wait because of instruction 3.

What is wrong if instruction 6 is not decoded after its fetching during t7 ? After all, instruction decoding

does not need the help of ALU, which might be engaged by instruction 3 and might be unavailable for the

next executable instruction, i.e., instruction 4. The question is very much justi ed. To answer this ques-

tion, we have to visualize the data ow and data storage provisions adopted in the pipeline architecture, as

shown in Figure 12.6 . Note that in this diagram the operations are indicated in boxes with rounded corners,

and buffers are shown by square boxes with sharp corners.

We know that after completion of the fetch cycle, processor stores the fetched instruction in the

instruction register (IR). From our discussions in previous chapters, we also know that this IR cannot

accommodate more than one instruction at a time and it must be present in the IR till the completion of

execution of that instruction, otherwise proper control signals might not be generated by the control unit

(discussions in Chapter 11). However, in case of micro-programmed control, adopted by most of the

modern processors, after generation of the start address, the duty of data in IR is over. We assume that

this start address is stored within the decode buffer, as shown in Figure 12.6 .

The next step is any eventual operand fetch (not necessary for all but for some instructions) and,

in general, these operands are stored within the ALU registers, which must remain unchanged till the

completion of the ALU operation and transfer of its result. Finally, after the completion of the ALU

operation, its result register is copied to its destination as indicated by the instruction itself and the result

buffer becomes free to accommodate another result.

Therefore, referring Figure 12.5 , we can now visualize that the extension of execution of instruction

3 till the end of t7 stalls (or halts) the progress of execution of instruction 4 (ALU not available), oper-

and fetch of instruction 5 (ALU registers not available) and decoding of instruction 6 (no space to store

the decoded information in the opcode of instruction 6). This is a generalized example of stalling of

pipeline and, now, we can investigate different types of hazards encountered in the pipeline architecture.

12.3.2 Types of Hazards

In pipeline architecture, we expect that at every clock cycle, one instruction would be complete, although

physically it may consume multiple clock cycles. Figure 12.3 (b) or Figure 12.4 illustrates this expecta-

tion, considering ideal conditions of processing. However, as mentioned before, in real-life situations,

At this stage, the reader may note that the duration of execution time of different instructions

of any processor are not uniform. In the case of some processors adopting pipeline architec-

ture, there is a tendency to introduce some amount of uniformity within these durations by

incorporating specific number of wait-states in the shorter instructions. This is specially true

in case of RISC architecture.

Figure 12.6 Data flow in a pipeline architecture

M12_GHOS1557_01_SE_C12.indd 374M12_GHOS1557_01_SE_C12.indd 374 4/29/11 5:24 PM4/29/11 5:24 PM

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12.3 PIPELINE PERFORMANCE

Create new playlist

Sign In

Sign Up

Table of Contents for
12.3 PIPELINE PERFORMANCE