CPU pipeline

Instruction sets get more and more complex, and processors have faster and faster clock speeds, which sometimes makes most CPU instructions require more than one clock tick to execute. This is usually because the CPU needs to first understand what instruction is being executed, understand its operands, produce the meaningful signals to get those operands, perform the operations, and then save those operations. And no more than one step can be done per clock tick.

This is usually solved in processors by creating a CPU pipeline. This means that when a new instruction comes in, while that instruction gets analyzed and executed, the next instruction comes to the CPU to get analyzed. This has some complications, as you might imagine.

First, if an instruction requires the output of a previous instruction, it might need to sometimes wait for the result of another instruction. It might also sometimes happen that the instruction being executed is a jump to another place in the memory so that new instructions need to be fetched from the RAM and the pipeline needs to be removed.

Overall, what this technique achieves is to be able to execute one instruction per clock cycle in ideal conditions (once the pipeline is full). Most new processors do this, since it enables much faster execution without requiring a clock speed improvement, as we can see in this diagram:

Another extra benefit from this approach is that dividing the instruction processing makes each step easier to implement, and not only that. Since each section will be physically smaller, electrons at light speed will be able to synchronize the whole step circuit in less time, making it possible for the clock to run faster.

In any case, though, dividing each instruction execution into more steps increases the complexity of the CPU wiring, since it has to fix potential concurrency issues. In the event that an instruction requires the output of the previous one to work, four different things can happen. As a first, and bad, option, it could happen that the behavior gets undefined. This is not what we want, and fixing this complicates the wiring of the processor.

The most important wiring piece to fix this is to first detect it. This on its own will make the wiring more complex. Once the CPU can detect the potential issue, the easiest fix is to simply wait for the output without advancing the pipeline. This is called stalling, and will hurt the performance of the CPU, but it will work properly.

Some processors will handle this by adding some extra input paths that will contain previous results, in case they need to be used, but this will greatly increase the complexity of the pipeline. Another option would be to detect some safety instructions and make them run before the instruction that requires the output of the previous one. This last option is called out of order execution and will also increase the complexity of the CPU.

So, in conclusion, to improve the speed of a CPU, apart from making its clock run faster, we have the option to create a pipeline of instructions. This will make it possible to run one instruction per clock tick (ideally) and sometimes even increase the clock speed. It will increase the complexity of the CPU, though, making it much more expensive.

And what are the pipelines of current processors like, you might ask? Well, they come in different lengths and behaviors, but in the case of some high-end Intel chips, pipelines can be larger than 30 steps. This will make them run really fast, but greatly increase their complexity and price.

When you develop applications, a way to avoid slowing down the pipeline will be to try to perform operations that do not require previous results first, and then use the generated results, even though this, in practice, is very difficult to do, and some compilers will actually do it for you.

Table of Contents for CPU pipeline

Create new playlist

Sign In

Sign Up

Table of Contents for
CPU pipeline