Chapter 8

Simple Embedded Processors

Abstract

This application example chapter concentrates on the key topic of integrating processors onto FPGA designs. This ranges from simple 8- bit microprocessors up to large IP processor cores that require an element of hardware-software co-design involved. This chapter will take the reader through the basics of implementing a behavioral based microprocessor for evaluation of algorithms, through to the practicalities of structurally correct models that can be synthesized and implemented on an FPGA.

Keywords

Simple processor

Microprocessors

8.1 Introduction

This application example chapter concentrates on the key topic of integrating processors onto FPGA designs. This ranges from simple 8-bit microprocessors up to large IP processor cores that require an element of hardware-software co-design involved. This chapter will take the reader through the basics of implementing a behavioral based microprocessor for evaluation of algorithms, through to the practicalities of structurally correct models that can be synthesized and implemented on an FPGA.

One of the major challenges facing hardware designers in the 21st century is the problem of hardware-software co-design. This has moved on from a basic partitioning mechanism based on standard hardware architectures to the current situation where the algorithm itself can be optimized at a compilation level for performance or power by implementing appropriately at different levels with hardware or software as required. This aspect suits FPGAs perfectly, as they can handle fixed hardware architecture that runs software compiled onto memory, they can implement optimal hardware running at much faster rates than a software equivalent could, and there is now the option of configurable hardware that can adapt to the changing requirements of a modified environment.

8.2 A Simple Embedded Processor

8.2.1 Embedded Processor Architecture

A useful example of an embedded processor is to consider a generic microcontroller in the context of an FPGA platform. Take a simple example of a generic 8-bit microcontroller as shown in Figure 8.1.

f08-01-9780080971292
Figure 8.1 Simple microcontroller.

As can be seen from Figure 8.1, the microcontroller is a general-purpose microprocessor with a simple clock (clk) and reset (clr), and three 8-bit ports (A, B, and C). Within the microcontroller itself, there needs to be the following basic elements:

1. A control unit: this is required to manage the clock and reset of the processor, manage the data flow and instruction set flow, and control the port interfaces. There will also need to be a program counter (PC).

2. An ALU: a microcontroller will need to be able to carry out at least some rudimentary processing which is carried out in the ALU (Arithmetic Logic Unit).

3. An Address Bus.

4. A Data Bus.

5. Internal Registers.

6. An instruction decoder.

7. A ROM to hold the program.

While each of these individual elements (1-6) can be implemented simply enough using a standard FPGA, the ROM presents a specific difficulty. If we implement a ROM as a set of registers, then obviously this will be hugely inefficient in an FPGA architecture. However, in most modern FPGA platforms, there are blocks of RAM on the FPGA that can be accessed and it makes a lot of sense to design a RAM block for use as a ROM by initializing it with the ROM values on reset and then using that to run the program.

This aspect of the embedded core raises an important issue, which is the reduction in efficiency of using embedded rather than dedicated cores. There is usually a compromise involved and in this case it is that the ROM needs to be implemented in a different manner, in this case with a hardware penalty. The second issue is what type of memory core to use.

In an FPGA RAM, the memory can usually be organized in a variety of configurations to vary the depth (number of memory addresses required) and the width (width of the data bus). For example, a 512 address RAM block, with an 8-bit address width would be equivalent to a 256 address RAM block with a 16-bit address width.

If the equivalent microcontroller ROM is, say, 12 bits wide and 256, then we can use a 256 × 16 RAM block and ignore the top 4 bits. The resulting embedded microcontroller core architecture could be of the form shown in Figure 8.2.

f08-02-9780080971292
Figure 8.2 Embedded microcontroller architecture.

8.2.2 Basic Instructions

When we program a microprocessor of any type, there are three different ways of representing the code that will run on the processor. These are machine code (1s and 0s), assembler (low level instructions such as LOAD, STORE), and high level code (such as C, Fortran, or Pascal). Regardless of the language used, the code will always be compiled or assembled into machine code at the lowest level for programming into memory. High level code (e.g., C) is compiled and assembler code is assembled (as the name suggests) into machine code for the specific platform.

Clearly a detailed explanation of a compiler is beyond the scope of this book, but the same basic process can be seen in an assembler and this is useful to discuss in this context. Every processor has a basic Instruction Set which is simply the list of functions that can be run in a program on the processor. Take the simple example of the following pseudocode expression:

1 b = a + 2;

In this example, we are taking the variable a and adding the integer value 2 to it, and then storing the result in the variable b. In a processor, the use of a variable is simply a memory location that stores the value, and so to load a variable we use an assembler command as follows:

1 LOAD a

What is actually going on here? Whenever we retrieve a variable value from memory, the implication is that we are going to put the value of the variable in the register called the accumulator (ACC). The command “LOAD a” could be expressed in natural language as “LOAD the value of the memory location denoted by a into the accumulator register ACC.”

The next stage of the process is to add the integer value 2 to the accumulator. This is a simple matter, as instead of an address, the value is simply added to the current value stored in the accumulator. The assembly language command would be something like:

1 ADD # x02

Notice that we have used the x to denote a hexadecimal number. If we wished to add a variable, say called c, then the command would be the same, except that it would use the address c instead of the absolute number. The command would therefore be:

1 ADD c

Now we have the value of a+2 stored in the accumulator register (ACC). This could be stored in a memory location, or put onto a port (e.g., PORT A). It is useful to notice that for a number we use the key character # to indicate that we are adding the value and not using the argument as the address. In the pseudocode example, we are storing the result of the addition in the variable called b, so the command would be something like this:

1 STORE b

While this is superficially a complete definition of the instruction set requirements, there is one specific design detail that has to be decided on for any processor. This is the number of instructions and the data bus size. If we have a set of instructions with the number of instructions denoted by N, then the number of bits in the opcode (n) must conform to the following rule:

N>=2n

si1_e  (8.1)

In other words, the number of bits provides the number of unique different codes that can be defined, and this defines the size of the instruction set possible. For example, if n = 3, then with 3 bits there are 8 possible unique opcodes, and so the maximum size of the instruction set is 8.

8.2.3 Fetch Execute Cycle

The standard method of executing a program in a processor is to store the program in memory and then follow a strict sequence of events to carry out the instructions. The first stage is to use the program counter to increment the program line; this then calls up the next command from memory in the correct order, and then the instruction can be loaded into the appropriate register for execution. This is called the fetch execute cycle.

What is happening at this point? First the contents of the program counter (PC) are loaded into the memory address register (MAR). The data in the memory location are then retrieved and loaded into the memory data register (MDR). The contents of the MDR can then be transferred into the instruction register (IR). In a basic processor, the PC can then be incremented by one (or in fact this could take place immediately after the PC has been loaded into the MDR). Once the opcode (and arguments if appropriate) are loaded, then the instruction can be executed. Essentially, each instruction has its own state machine and control path, which is linked to the instruction register (IR) and a sequencer that defines all the control signals required to move the data correctly around the memory and registers for that instruction. We will discuss registers in the next section, but in addition to the program counter (PC), instruction register (IR) and accumulator (ACC) mentioned already, we require two memory registers at a minimum, the Memory Data Register (MDR) and Memory Address Register (MAR).

For example, consider the simple command LOAD a, from the previous example. What is required to actually execute this instruction? First, the opcode is decoded and this defines that the command is a LOAD command. The next stage is to identify the address. As the command has not used the # symbol to denote an absolute address, this is stored in the variable a. The next stage, therefore, is to load the value in location a into the MDR, by setting MAR = a and then retrieving the value of a from the RAM. This value is then transferred to the accumulator (ACC).

8.2.4 Embedded Processor Register Allocation

The design of the registers partly depends on whether we wish to clone a “real” device or create a modified version that has more custom behavior. In either case there are some mandatory registers that must be defined as part of the design. We can assume that we need an accumulator (ACC), a program counter (PC), and the three input/output ports (PORTA, PORTB, PORTC). Also, we can define the instruction register (IR), Memory Address Register (MAR), Memory Data Register (MDR).

In addition to the data for the ports, we need to have a definition of the port direction and this requires three more registers for managing the tristate buffers into the data bus to and from the ports (DIRA, DIRB, DIRC). In addition to this, we can define a number (essentially arbitrary) of registers for general purpose usage. In the general case the naming, order, and numbering of registers does not matter; however, if we intend to use a specific device as a template, and perhaps use the same bit code, then it is vital that the registers are configured in exactly the same way as the original device and in the same order.

In this example, we do not have a base device to worry about, and so we can define the general purpose registers (24 in all) with the names REG0 to REG23. In conjunction with the general purpose registers, we need to have a small decoder to select the correct register and put the contents onto the data bus (F).

8.2.5 A Basic Instruction Set

In order for the device to operate as a processor, we must define some basic instructions in the form of an instruction set. For this simple example we can define some very basic instructions that will carry out basic program elements, ALU functions, memory functions. These are summarized in the following list of instructions:

 LOAD arg This command loads an argument into the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
LOAD #01
LOAD abc

 STORE arg This command stores an argument from the accumulator into memory. If the argument has the prefix # then it is the absolute address, otherwise it is the address and this is taken from the relevant memory address.
Examples:
STORE #01
STORE abc

 ADD arg This command adds an argument to the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
ADD #01
ADD abc

 NOT This command carries out the NOT function on the accumulator.

 AND arg This command ands an argument with the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
AND #01
AND abc

 OR arg This command ors an argument with the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
OR #01
OR abc

 XOR arg This command xors an argument with the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
XOR #01
XOR abc

 INC This command carries out an increment by one on the accumulator.

 SUB arg This command subtracts an argument from the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
SUB #01
SUB abc

 BRANCH arg This command allows the program to branch to a specific point in the program. This may be very useful for looping and program flow. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
BRANCH #01
BRANCH abc

In this simple instruction set, there are 10 separate instructions. This implies, from the rule given in equation (8.1) previously in this chapter, that we need at least 4 bits to describe each of the instructions given in the table above. Given that we wish to have 8 bits for each data word, we need to have the ability to store the program memory in a ROM that has words of at least 12 bits wide. In order to cater for a greater number of instructions, and also to handle the situation for specification of different addressing modes (such as the difference between absolute numbers and variables), we can therefore suggest a 16-bit system for the program memory.

Notice that at this stage there are no definitions for port interfaces or registers. We can extend the model to handle this behavior later.

8.2.6 Structural or Behavioral?

So far in the design of this simple microprocessor, we have not specified details beyond a fairly abstract structural description of the processor in terms of registers and busses. At this stage we have a decision about the implementation of the design with regard to the program and architecture.

One option is to take a program (written in assembly language) and simply convert this into a state machine that can easily be implemented in a VHDL model for testing out the algorithm. Using this approach, the program can be very simply modified and recompiled based on simple rules that restrict the code to the use of registers and techniques applicable to the processor in question. This can be useful for investigating and developing algorithms, but is more ideal than the final implementation as there will be control signals and delays due to memory access in a processor plus memory configuration, that will be better in a dedicated hardware design.

Another option is to develop a simple model of the processor that does have some of the features of the final implementation of the processor, but still uses an assembly language description of the model to test. This has advantages in that no compilation to machine code is required, but there are still not the detailed hardware characteristics of the final processor architecture that may cause practical issues on final implementation.

The third option is to develop the model of the processor structurally and then the machine code can be read in directly from the ROM. This is an excellent approach that is very useful for checking both the program and the possible quirks of the hardware/software combination, as the architecture of the model reflects directly the structure of the model to be implemented on the FPGA.

8.2.7 Machine Code Instruction Set

In order to create a suitable instruction set for decoding instructions for our processor, the assembly language instruction set needs to have an equivalent machine code instruction set that can be decoded by the sequencer in the processor. The resulting opcode/instruction table is given here:

CommandOpcode (Binary)
LOAD arg0000
STORE arg0001
ADD arg0010
NOT0011
AND arg0100
OR arg0101
XOR arg0110
INC0111
SUB arg1000
BRANCH arg1001

8.2.8 Structural Elements of the Microprocessor

Taking the abstract design of the microprocessor given in Figure 8.2 we can redraw with the exact registers and bus configuration as shown in the structural diagram in Figure 8.3. Using this model we can create separate VHDL models for each of the blocks that are connected to the internal bus and then design the control block to handle all the relevant sequencing and control flags to each of the blocks in turn. Before this can be started, however, it makes sense to define the basic criteria of the models and the first is to define the basic type. In any digital model (as we have seen elsewhere in this book) it is sensible to ensure that data can be passed between standard models and so in this case we shall use the std_logic_1164 library that is the standard for digital models.

f08-03-9780080971292
Figure 8.3 Structural model of the microprocessor busses and major blocks.

In order to use this library, each signal shall be defined in VHDL of the basic type std_logic and also the library ieee.std_logic_1164.all shall be declared in the header of each of the models in the processor.

Finally, each block in the processor shall be defined as a separate block for implementation in VHDL or Verilog.

8.3 A Simple Embedded Processor Implemented in VHDL

8.3.1 Processor Functions Package

In order to simplify the VHDL for each of the individual blocks, a set of standard functions have been defined in a package called processor_functions. This is used to defined useful types and functions for this set of models. The VHDL for the package is given below:

1 library ieee;

2 use ieee . std_logic_1164 . all;

4 package processor_functions is

5 type opcode is (load, store, add, not, and, or, xor, inc, sub, branch);

6 function decode (word : std_logic_vector) return opcode;

7 constant n : integer := 16;

8 constant oplen : integer := 4;

9 type memory_array is array (0 to 2**(n − oplen −1) of

10  std_logic_vector (n −1 downto 0);

11 constant reg_zero : unsigned (n −1 downto 0) :=

12  (others => 0);

13 end package processor_functions;

14 

15 package body processor_functions is

16 function decode (word : std_logic_vector) return opcode is

17  variable opcode_out : opcode;

18 begin

19  case word (n −1 downto n − oplen −1) is

20  when 0000 => opcode_out := load;

21  when 0001 => opcode_out := store;

22  when 0010 => opcode_out := add;

23  when 0011 => opcode_out := not;

24  when 0100 => opcode_out := and;

25  when 0101 => opcode_out := or;

26  when 0110 => opcode_out := xor;

27  when 0111 => opcode_out := inc;

28  when 1000 => opcode_out := sub;

29  when 1001 => opcode_out := branch;

30  when others => null;

31  end case;

32  return opcode_out;

33 end function decode;

34 end package body processor_functions;

8.3.2 The Program Counter

The program counter (PC) needs to have the system clock and reset connections, and the system bus (defined as inout so as to be readable and writable by the PC register block). In addition, there are several control signals required for correct operation. The first is the signal to increment the PC (PC_inc), the second is the control signal to load the PC with a specified value (PC_load) and the final is the signal to make the register contents visible on the internal bus (PC_valid). This signal ensures that the value of the PC register will appear to be high impedance (Z) when the register is not required on the processor bus. The system bus (PC_bus) is defined as a std_logic_vector, with direction inout to ensure the ability to read and write. The resulting VHDL entity is given here:

1 library ieee;

2 use ieee . std_logic_1164 . all;

3 entity pc is

4 port (

5  clk : in std_logic;

6  nrst : in std_logic;

7  pc_inc : in std_logic;

8  pc_load : in std_logic;

9  pc_valid : in std_logic;

10  pc_bus : inout std_logic_vector (n −1 downto 0)

11  );

12 end entity pc;

The architecture for the program counter must handle all of the various configurations of the program counter control signals and also the communication of the data into and from the internal bus correctly. The PC model has an asynchronous part and a synchronous section. If the PC_valid goes low at any time, the value of the PC_bus signal should be set to Z across all of its bits. Also, if the reset signal goes low, then the PC should reset to zero.

The synchronous part of the model is the increment and load functionality. When the clk rising edge occurs, then the two signals PC_load and PC_inc are used to define the function of the counter. The precedence is that if the increment function is high, then regardless of the load function, the counter will increment. If the increment function (PC_inc) is low, then the PC will load the current value on the bus, if and only if the PC_load signal is also high. The resulting VHDL is given as:

1 architecture rtl of pc is

2 signal counter : unsigned (n −1 downto 0);

3 begin

4 pc_bus <= std_logic_vector (counter)

5 when pc_valid = 1 else (others => z);

6 process (clk, nrst) is

7 begin

8  if nrst = 0 then

9  count <= 0;

10  elsif rising_edge (clk) then

11  if pc_inc = 1 then

12   count <= count + 1;

13  else

14   if pc_load = 1 then

15   count <= unsigned (pc_bus);

16   end if;

17  end if;

18  end if;

19 end process;

20 end architecture rtl;

8.3.3 The Instruction Register

The instruction register (IR) has the same clock and reset signals as the PC, and also the same interface to the bus (IR_bus) defined as a std_logic_vector of type INOUT. The IR also has two further control signals, the first being the command to load the instruction register (IR_load), and the second being to load the required address onto the system bus (IR_address). The final connection is the decoded opcode that is to be sent to the system controller. This is defined as a simple unsigned integer value with the same size as the basic system bus. The basic VHDL for the entity of the IR is given as follows:

1 library ieee;

2 use ieee . std_logic_1164 . all;

3 use work . processor_functions . all;

4 entity ir is

5 port (

6  clk : in std_logic;

7  nrst : in std_logic;

8  ir_load : in std_logic;

9  ir_valid : in std_logic;

10  ir_address : in std_logic;

11  ir_opcode : out opcode;

12  ir_bus : inout std_logic_vector (n −1 downto 0)

13  );

14 end entity ir;

The function of the IR is to decode the opcode in binary form and then pass to the control block. If the IR_valid is low, the the bus value should be set to Z for all bits. If the reset signal (nsrt) is low, then the register value internally should be set to all 0s.

On the rising edge of the clock, the value on the bus shall be sent to the internal register and the output opcode shall be decoded asynchronously when the value in the IR changes. The resulting VHDL architecture is given here:

1 architecture rtl of ir is

3 signal ir_internal : std_logic_vector (n −1 downto 0);

4 begin

5 ir_bus <= ir_internal

6 when ir_valid = 1 else (others => z);

7 ir_opcode <= decode (ir_internal);

8 process (clk, nrst) is

9 begin

10  if nrst = 0 then

11  ir_internal <= (others => 0);

12  elsif rising_edge (clk) then

13  if ir_load = 1 then

14   ir_internal <= ir_bus;

15  end if;

16  end if;

17 end process;

18 end architecture rtl;

In this VHDL, notice that we have used the predefined function Decode from the processor_functions package previously defined. This will look at the top 4 bits of the address given to the IR and decode the relevant opcode for passing to the controller.

8.3.4 The Arithmetic and Logic Unit

The Arithmetic and Logic Unit (ALU) has the same clock and reset signals as the PC, and also the same interface to the bus (ALU_bus) defined as a std_logic_vector of type INOUT. The ALU also has three further control signals, which can be decoded to map to the eight individual functions required of the ALU. The ALU also contains the Accumulator (ACC) which is a std_logic_vector of the size defined for the system bus width. There is also a single bit output ALU_zero which goes high when all the bits in the accumulator are zero. The basic VHDL for the entity of the ALU is given as follows:

1 library ieee;

2 use ieee . std_logic_1164 . all;

3 use work . processor_functions . all;

4 entity alu is

5 port (

6  clk : in std_logic;

7  nrst : in std_logic;

8  alu_cmd : in std_logic_vector (2 downto 0);

9  alu_zero : out std_logic;

10  alu_valid : in std_logic;

11  alu_bus : inout std_logic_vector (n −1 downto 0)

12  );

13 end entity alu;

The function of the ALU is to decode the ALU_cmd in binary form and then carry out the relevant function on the data on the bus, and the current data in the accumulator. If the ALU_valid is low, then the bus value should be set to Z for all bits. If the reset signal (nsrt) is low, then the register value internally should be set to all 0s. On the rising edge of the clock, the value on the bus shall be sent to the internal register and the command shall be decoded. The resulting VHDL architecture is given here:

1 architecture rtl of alu is

2 signal acc : std_logic_vector (n −1 downto 0);

3 begin

4 alu_bus <= acc

5 when acc_valid = 1 else (others => z);

6 alu_zero <= 1 when acc = reg_zero else 0;

7 process (clk, nrst) is

8 begin

9  if nrst = 0 then

10  acc <= (others => 0);

11  elsif rising_edge (clk) then

12  case acc_cmd is

13  −− load the bus value into the accumulator

14  when 000 => acc <= alu_bus;

15  −− add the acc to the bus value

16  when 001 => acc <= add (acc, alu_bus);

17  −− not the bus value

18  when 010 => acc <= not alu_bus;

19  −− or the acc to the bus value

20  when 011 => acc <= acc or alu_bus;

21  −− and the acc to the bus value

22  when 100 => acc <= acc and alu_bus;

23  −− xor the acc to the bus value

24  when 101 => acc <= acc xor alu_bus;

25  −− increment acc

26  when 110 => acc <= acc + 1;

27  −− store the acc value

28  when 111 => alu_bus <= acc;

29  end if;

30 end process;

31 end architecture rtl;

8.3.5 The Memory

The processor requires a RAM memory, with an address register (MAR) and a data register (MDR). There therefore needs to be a load signal for each of these registers: MDR_load and MAR_load. As it is a memory, there also needs to be an enable signal (M_en), and also a signal to denote Read or Write modes (M_rw). Finally, the connection to the system bus is a standard inout vector as has been defined for the other registers in the microprocessor.

The basic VHDL for the entity of the memory block is given here:

1 library ieee;

2 use ieee . std_logic_1164 . all;

3 use work . processor_functions . all;

4 entity memory is

5 port (

6  clk : in std_logic;

7  nrst : in std_logic;

8  mdr_load : in std_logic;

9  mar_load : in std_logic;

10  mar_valid : in std_logic;

11  m_en : in std_logic;

12  m_rw : in std_logic;

13  mem_bus : inout std_logic_vector (n −1 downto 0)

14 );

15 end entity memory;

The memory block has three aspects. The first is the function in which the memory address is loaded into the memory address register (MAR). The second function is either reading from or writing to the memory using the memory data register (MDR). The final function, or aspect, of the memory is to store the actual program that the processor will run. In the VHDL model, we will achieve this by using a constant array to store the program values.

The resulting basic VHDL architecture is given as follows:

2 architecture rtl of memory is

3 signal mdr : std_logic_vector (wordlen −1 downto 0);

4 signal mar : unsigned (wordlen − oplen −1 downto 0);

5 begin

6 mem_bus <= mdr

7 when mem_valid = 1 else (others => z);

8 process (clk, nrst) is

9  variable contents : memory_array;

10  constant program : contents :=

11  (

12 0 => 0000000000000011,

13 1 => 0010000000000100,

14 2 => 0001000000000101,

15 3 => 0000000000001100,

16 4 => 0000000000000011,

17 5 => 0000000000000000,

18 others => (others => 0)

19 );

20 begin

21  if nrst = 0 then

22  mdr <= (others => 0);

23  mdr <= (others => 0);

24  contents := program;

25  elsif rising_edge (clk) then

26  if mar_load = 1 then

27   mar <= unsigned (mem_bus (n − oplen −1 downto 0));

28  elsif mdr_load = 1 then

29   mdr <= mem_bus;

30  elsif mem_en = 1 then

31   if mem_rw = 0 then

32   mdr <= contents (to_integer (mar));

33   else

34   mem (to_integer (mar)) := mdr;

35   end if;

36  end if;

37  end if;

38 end process;

39 end architecture rtl;

We can look at some of the VHDL in a bit more detail and explain what is going on at this stage. There are two internal signals to the block, mdr and mar (the data and address, respectively). The first aspect to notice is that we have defined the MAR as an unsigned rather than as a std_logic_vector. We have done this to make indexing direct. The MDR remains as a std_logic_vector. We can use an integer directly, but an unsigned translates easily into a std_logic_vector.

1 signal mdr : std_logic_vector (wordlen −1 downto 0);

2 signal mar : unsigned (wordlen − oplen −1 downto 0);

The second aspect is to look at the actual program itself. We clearly have the possibility of a large array of addresses, but in this case we are defining a simple three line program:

1 c = a + b

The binary code is shown below:

1 0 => 0000000000000011

2 1 => 0010000000000100

3 2 => 0001000000000101

4 3 => 0000000000001100

5 4 => 0000000000000011

6 5 => 0000000000000000

7 Others => (others => 0)

For example, consider the line of the declared value for address 0. The 16 bits are defined as 0000000000000011. If we split this into the opcode and data parts we get the following:

1 Opcode 0000

2 Data 3000000000011

In other words, this means LOAD the variable from address 3. Similarly, the second line is ADD from 4, finally the third command is STORE in 5. In addresses 3, 4, and 5, the three data variables are stored.

8.3.6 Microcontroller Controller

The operation of the processor is controlled in detail by the sequencer, or controller block. The function of this part of the processor is to take the current program counter address, look up the relevant instruction from memory, move the data around as required, setting up all the relevant control signals at the right time, with the right values. As a result, the controller must have the clock and reset signals (as for the other blocks in the design), a connection to the global bus, and finally all the relevant control signals must be output. An example entity of a controller is given here:

1 library ieee;

2 use ieee . std_logic_1164 . all;

3 use work . processor_functions . all;

4 entity controller is

5 generic (

6  n : integer := 16

7  );

8 port (

9  clk : in std_logic;

10  nrst : in std_logic;

11  ir_load : out std_logic;

12  ir_valid : out std_logic;

13  ir_address : out std_logic;

14  pc_inc : out std_logic;

15  pc_load : out std_logic;

16  pc_valid : out std_logic;

17  mdr_load : out std_logic;

18  mar_load : out std_logic;

19  mar_valid : out std_logic;

20  m_en : out std_logic;

21  m_rw : out std_logic;

22  alu_cmd : out std_logic_vector (2 downto 0);

23  control_bus : inout std_logic_vector (n −1 downto 0)

24  );

25 end entity controller;

Using this entity, the control signals for each separate block are then defined, and these can be used to carry out the functionality requested by the program. The architecture for the controller is then defined as a basic state machine to drive the correct signals. The basic state machine for the processor is defined in Figure 8.4.

f08-04-9780080971292
Figure 8.4 Basic processor controller state machine.

We can implement this using a basic VHDL architecture that implements each state using a new state type and a case statement to manage the flow of the state machine. The basic VHDL architecture follows and it includes the basic synchronous machine control section (reset and clock) and the management of the next stage logic.

1 architecture rtl of controller is

2 type states is (s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10);

3 signal current_state, next_state : states;

4 begin

5  state_sequence : process (clk, nrst) is

6  if nrst = 0 then

7   current_state <= s0;

8  else

9   if rising_edge (clk) then

10   current_state <= next_state;

11   end if;

12  end if;

13  end process state_sequence;

14 

15  state_machine : process (present_state, opcode) is

16 −− state machine goes here

17 end process state_machine;

18 end architecture;

You can see from this VHDL that the first process (state_sequence) manages the transition of the current_state to the next_state and also the reset condition. Notice that this is a synchronous machine and as such waits for the rising_edge of the clock, and that the reset is asynchronous. The second process (state_machine) waits for a change in the state or the opcode and this is used to manage the transition to the next state, although the actual transition itself is managed by the state_sequence process. This process is given in the VHDL here:

1  state_machine : process (present_state, opcode) is

2  begin

3  −− reset all the control signals

4  ir_load <= 0;

5  ir_valid <= 0;

6  ir_address <= 0;

7  pc_inc <= 0;

8  pc_load <= 0;

9  pc_valid <= 0;

10  mdr_load <= 0;

11  mar_load <= 0;

12  mar_valid <= 0;

13  m_en <= 0;

14  m_rw <= 0;

15  case current_state is

16   when s0 =>

17   pc_valid <= 1;

18   mar_load <= 1;

19   pc_inc <= 1;

20   pc_load <= 1;

21   next_state <= s1;

22   when s1 =>

23   m_en <= 1;

24   m_rw <= 1;

25   next_state <= s2;

26   when s2 =>

27   mdr_valid <= 1;

28   ir_load <= 1;

29   next_state <= s3;

30   when s3 =>

31   mar_load <= 1;

32   ir_address <= 1;

33   if opcode = store then

34   next_state <= s4;

35   else

36   next_state <= s6;

37   end if;

38   when s4 =>

39   mdr_load <= 1;

40   acc_valid <= 1;

41   next_state <= s5;

42   when s5 =>

43   m_en <= 1;

44   next_state <= s0;

45   when s6 =>

46   m_en <= 1; m_rw <= 1;

47   if opcode = load then

48   next_state <= s7;

49   else

50   next_state <= s8;

51   end if;

52   when s7 =>

53   mdr_valid <= 1;

54   acc_load <= 1;

55   next_state <= s0;

56   when s8 =>

57   m_en <= 1;

58   m_rw <= 1;

59   if opcode = add then

60   next_state <= s9;

61   else

62   next_state <= s10;

63   end if;

64   when s9 =>

65   alu_add <= 1;

66   next_state <= s0;

67   when s10 =>

68   alu_sub <= 1;

69   next_state <= s0;

70  end case;

71 end process state_machine;

8.3.7 Summary of a Simple Microprocessor Implemented in VHDL

Now that the important elements of the processor have been defined, it is a simple matter to instantiate them in a basic VHDL netlist and create a microprocessor using these building blocks. It is also a simple matter to modify the functionality of the processor by changing the address/data bus widths or extend the instruction set.

8.4 A Simple Embedded Processor Implemented in Verilog

As in the case of the VHDL model we can implement common functions in a series of Verilog files for use in the key blocks of the processor. The architecture has been implemented in a slightly different manner, to illustrate a different approach. In both cases an internal bus has been used, which is analogous to the approach taken in early processors; however, a more direct approach can also be taken where the internal registers are accessed directly.

8.4.1 The Program Counter

The program counter (PC) needs to have the system clock and reset connections, and the system bus (defined as inout so as to be readable and writable by the PC register block). In addition, there are several control signals required for correct operation. The first is the signal to increment the PC (pc_inc), the second is the control signal to load the PC with a specified value (pc_load) and the final is the signal to make the register contents visible on the internal bus (pc_valid). This signal ensures that the value of the PC register will appear to be high impedance (Z) when the register is not required on the processor bus. The pc value output (pc_bus) is defined as a standard logic type, with direction inout to ensure the ability to read from and write to the bus.

The architecture for the program counter must handle all of the various configurations of the program counter control signals and also the communication of the data into and from the internal bus correctly. The PC model has an asynchronous part and a synchronous section. If the pc_valid goes low at any time, the value of the pc_bus signal should be set to Z across all of its bits. Also, if the reset signal goes low, then the PC should reset to zero.

The synchronous part of the model is the increment and load functionality. When the clk rising edge occurs, then the two signals pc_load and pc_inc are used to define the function of the counter. The precedence is that if the increment function is high, then regardless of the load function, the counter will increment. If the increment function (pc_inc) is low, then the PC will load the current value on the bus, if and only if the pc_load signal is also high.

The resulting Verilog code is given below:

1 ‘define N 8

3 module pc (clk, nrst, pc_inc, pc_valid, pc_load, data);

5 input clk;

6 input nrst;

7 input pc_inc;

8 input pc_valid;

9 input pc_load;

10 

11 inout [ ‘N −1:0] data;

12 

13 wire [ ‘N −1:0] data;

14 

15 reg [ ‘N −1:0] counter;

16 

17 assign data = pc_valid ? counter : ‘N ’ bz;

18 

19 always @ (posedge clk) begin

20 if (nrst ==0) begin

21  counter <= 0;

22 end

23 else begin

24  if (pc_inc ==1) begin

25  counter <= counter + 1;

26  end

27  else begin

28  if (pc_load ==1) begin

29   counter <= data;

30  end

31  else begin

32   counter <= 0;

33  end

34  end

35 end

36 end

37 

38 endmodule

We can test this using a test bench that first resets the program counter (PC) to initialize it, increments, then loads in a set value (in this case 4), resets the counter and finally sets the valid signal to low so as to disable the output. This is shown in the following test bench Verilog:

1 ‘define N 8

4 module pc_tb ();

5 // declare the counter signals

6 reg clk;

7 reg nrst;

8 reg pc_valid;

9 reg pc_load;

10 reg pc_inc;

11 wire [ ‘N −1:0] data;

12 reg [ ‘N −1:0] datareg;

13 

14 // Set up the initial variables and reset

15 initial begin

16 $display (” time t clk reset inc load valid data ”);

17 $monitor (” % g t % b %b %b %b %b %b ”,

18 $ time, clk, nrst, pc_inc, pc_load, pc_valid, data);

19 clk = 1;  // initialize the clock to 1

20 nrst = 1;  // set the reset to 1 (not reset)

21 pc_valid = 0;

22 pc_inc =0;

23 pc_load = 0;

24 datareg = 4;

25 #5 nrst = 0;  // reset = 0 : resets the counter

26 #10 nrst = 1; // reset back to 1 : counter can start

27 #10 pc_inc = 1;

28 #10 pc_inc = 0;

29 #10 pc_load = 1;

30 #10 datareg = 8’bzzzzzzzz;

31 #10 pc_load = 0;

32 #10 pc_inc = 1;

33 pc_valid = 1;

34 #50 pc_valid = 0;  // reset back to 1 : counter can start

35 #200 $finish; // Finish the simulation

36 end

37 

38 // Clock generator

39 always begin

40 #5 clk = ˜clk; // Clock every 5 time slots

41 end

42 

43 assign data = datareg;

44 

45 // Connect DUT to test bench

46 pc DUT (clk,nrst,pc_inc,pc_valid,pc_load,data);

47 

48 endmodule

The resulting waveform shows the behavior as predicted (Figure 8.5):

f08-05-9780080971292
Figure 8.5 Basic processor PC simulation.

8.4.2 The Instruction Register

The Instruction Register (IR) in a simple microprocessor is a simple register with enough bits for the address and opcode combined. For example, if the address requires 8 bits, and the opcode also requires 8 bits, then the Instruction Register needs to be 16 bits wide (8 + 8). If the output from the Memory Data Register goes onto the main bus, then this can be read into the instruction register, which is 16 bits wide in this case.

The current value of the instruction register also can be read by the Memory Address Register (MAR) or the Program Counter (PC) and so the stored value needs to be of type inout, so that it can be made valid onto the internal system bus.

The instruction register (IR) therefore has clock and reset signals, and also the same interface to the internal processor bus (ir_bus) defined as a standard logic of direction inout. The IR also has two further control signals, the first being the command to load the instruction register (ir_load), and the second being to make the required address available on the system bus (ir_valid). This consists of the opcode and address, which can be used by the controller or Program Counter.

The code for the Instruction Register is therefore given in the following listing:

1 ‘define OP 8

2 ‘define ADDR 8

4 module ir (clk, nrst, ir_valid, ir_load, ir_bus);

6 input clk;

7 input nrst;

8 input ir_valid;

9 input ir_load;

10 

11 inout [ ‘OP + ‘ADDR −1:0] ir_bus;

12 

13 wire [ ‘OP + ‘ADDR −1:0] ir_bus;

14 

15 reg [ ‘OP + ‘ADDR −1:0] ir_reg;

16 

17 assign ir_bus = ir_valid ? ir_reg : 16’ bz;

18 

19 always @ (posedge clk) begin

20 if (nrst ==0) begin

21  ir_reg <= 0;

22 end

23 else begin

24  if (ir_load ==1) begin

25  ir_reg <= ir_bus;

26  end

27 end

28 end

29 

30 endmodule

We can test this by loading in a sample instruction, and then setting it valid so that it is then seen on the bus. The test bench to achieve this is shown here:

1 ‘define OP 8

2 ‘define ADDR 8

5 module ir_tb ();

6 // declare the counter signals

7 reg clk;

8 reg nrst;

9 reg ir_valid;

10 reg ir_load;

11 

12 wire [ ‘OP + ‘ADDR −1:0] data;

13 reg [ ‘OP + ‘ADDR −1:0] datareg;

14 

15 // Set up the initial variables and reset

16 initial begin

17 $display (” time t clk reset inc load valid data ”);

18 $monitor (” % g t % b % b % b % b % b ”,

19 $ time, clk, nrst, ir_load, ir_valid, data);

20 clk = 1; // initialize the clock to 1

21 nrst = 1;  // set the reset to 1 (not reset)

22 ir_valid = 0;

23 ir_load = 0;

24 datareg = 16’b0000000000001111;

25 #5 nrst = 0; // reset = 0 : resets the counter

26 #10 nrst = 1; // reset back to 1 : counter can start

27 #10 ir_load = 1;

28 #10 ir_load = 0;

29 #10 datareg = 16’bzzzzzzzzzzzzzzzz;

30 ir_valid = 1;

31 #50 ir_valid = 0; // reset back to 1 : counter can start

32 #200 $finish; // Finish the simulation

33 end

34 

35 // Clock generator

36 always begin

37 #5 clk = ˜clk; // Clock every 5 time slots

38 end

39 

40 assign data = datareg;

41 

42 // Connect DUT to test bench

43 ir DUT (clk,nrst,ir_valid,ir_load,data);

44 

45 endmodule

The resulting waveform shows the behavior as predicted (Figure 8.6):

f08-06-9780080971292
Figure 8.6 Basic processor instruction register simulation.

8.4.3 Memory Data Register

The memory data register is used to handle the data transferred to and from the memory unit, and this can be handled either using a bus approach (which we have used in this architecture) or separate data input and output declaration for the memory. In this case we will use a separate input and output setting for the memory; therefore, the MDR becomes a simple register which sets its output to the value of the memory output when its control signal mdr_load is high. The Memory Data Register (MDR) in a simple microprocessor needs enough bits for the address and opcode combined. For example, if the address requires 8 bits, and the opcode also requires 8 bits, then the size of the register needs to be 16 bits wide (8 + 8). If the output from the Memory Data Register goes onto the main bus, then this can be read into the instruction register, which is also 16 bits wide in this case.

The Memory Data Register (MDR) therefore has clock and reset signals, and also the same interface to the internal processor bus (mdr_bus) defined as a standard logic of direction inout. The MDR also has a further control signal, to make the required data available on the system bus (mdr_valid). This consists of the opcode and address, which can be used by the controller, Accumulator or Instruction register.

The code for the Memory Data Register (MDR) is therefore given in the following listing:

1 ‘define OP 8

2 ‘define ADDR 8

4 module mdr (clk, nrst, mdr_load, mdr_valid, mem_bus, mdr_bus);

6 input clk;

7 input nrst;

8 input mdr_valid;

9 input mdr_load;

10 input mem_bus;

11 

12 inout [ ‘OP + ‘ADDR −1:0] mdr_bus;

13 input [ ‘OP + ‘ADDR −1:0] mem_bus;

14 

15 wire [ ‘OP + ‘ADDR −1:0] mdr_bus;

16 wire [ ‘OP + ‘ADDR −1:0] mem_bus;

17 

18 reg [ ‘OP + ‘ADDR −1:0] mdr_reg;

19 

20 assign mdr_bus = mdr_valid ? mdr_reg : 16’ bz;

21 

22 always @ (posedge clk) begin

23 if (nrst ==0) begin

24  mdr_reg <= 0;

25 end

26 else begin

27  if (mdr_load ==1) begin

28  mdr_reg <= mem_bus;

29  end

30 end

31 end

32 

33 endmodule

We can test this by loading in a sample instruction, and then setting it valid so that it is seen on the bus. The test bench to achieve this is shown here:

1 ‘define OP 8

2 ‘define ADDR 8

5 module mdr_tb ();

6 // declare the counter signals

7 reg clk;

8 reg nrst;

9 reg mdr_valid;

10 reg mdr_load;

11 

12 wire [ ‘OP + ‘ADDR −1:0] data;

13 reg [ ‘OP + ‘ADDR −1:0] memory;

14 

15 // Set up the initial variables and reset

16 initial begin

17 $display (” time t clk reset inc load valid data ”);

18 $monitor (” % g t % b % b % b % b % b ”,

19 $ time, clk, nrst, mdr_load, mdr_valid, data);

20 clk = 1;  // initialize the clock to 1

21 nrst = 1;  // set the reset to 1 (not reset)

22 mdr_valid = 0;

23 mdr_load = 0;

24 memory = 16’b0000000000001111;

25 #5 nrst = 0; // reset = 0 : resets the counter

26 #10 nrst = 1; // reset back to 1 : counter can start

27 #10 mdr_load = 1;

28 #10 mdr_load = 0;

29 #10 memory = 16’bzzzzzzzzzzzzzzzz;

30 mdr_valid = 1;

31 #50 mdr_valid = 0; // reset back to 1 : counter can start

32 #200 $finish; // Finish the simulation

33 end

34 

35 // Clock generator

36 always begin

37 #5 clk = ˜clk; // Clock every 5 time slots

38 end

39 

40 //assign data = datareg;

41 

42 // Connect DUT to test bench

43 mdr DUT (clk,nrst,mdr_load,mdr_valid,memory,data);

44 

45 endmodule

The resulting waveform shows the behavior as predicted (Figure 8.7):

f08-07-9780080971292
Figure 8.7 Basic processor memory data register (MDR) simulation.

8.4.4 Memory Address Register

The memory address register is used to handle the address transferred to the memory unit, and this can be handled either using a bus approach (which we have used in this architecture) or direct input declaration for the memory. In this case we will use a bus setting for the memory, therefore the MAR becomes a simple register which sets its output to the value of the required address from the IR or PC when its control signal mar_load is high. The Memory Address Register (MAR) in a simple microprocessor needs enough bits for the address. For example, if the address requires 8 bits then the The size of the register needs to be 8 bits wide.

The Memory Address Register (MAR) therefore has clock and reset signals, and also the same interface to the internal processor bus (mar_bus) defined as a standard logic of direction inout, however only the first 8 bits are used.

The code for the Memory Address Register (MAR) is therefore given in the listing below

1 ‘define ADDR 8

2 ‘define OP 8

4 module mar (clk, nrst, mar_load, mar_bus, address);

6 input clk;

7 input nrst;

8 input mar_load;

10 input [ ‘OP + ‘ADDR −1:0] mar_bus;

11 output [ ‘ADDR −1:0] address;

12 

13 wire [ ‘OP + ‘ADDR −1:0] mar_bus;

14 reg [ ‘ADDR −1:0] address;

15 

16 always @ (posedge clk) begin

17 if (nrst ==0) begin

18  address <= 0;

19 end

20 else begin

21  if (mar_load ==1) begin

22  address <= mar_bus [ ‘ADDR −1:0];

23  end

24 end

25 end

26 

27 endmodule

We can test this by loading in a sample instruction, and then setting it valid so that it is seen on the bus. The test bench to achieve this is shown below:

1 ‘define OP 8

2 ‘define ADDR 8

5 module mar_tb ();

6 // declare the counter signals

7 reg clk;

8 reg nrst;

9 reg mar_load;

10 

11 reg [ ‘OP + ‘ADDR −1:0] data;

12 wire [ ‘ADDR −1:0] address;

13 

14 // Set up the initial variables and reset

15 initial begin

16 $display (” time t clk reset inc load valid data ”);

17 $monitor (” % g t % b % b % b % b % b ”,

18 $ time, clk, nrst, mar_load, data, address);

19 clk = 1;  // initialize the clock to 1

20 nrst = 1;  // set the reset to 1 (not reset)

21 mar_load = 0;

22 data = 16’b0000000000001111;

23 #5 nrst = 0; // reset = 0 : resets the counter

24 #10 nrst = 1; // reset back to 1 : counter can start

25 #10 mar_load = 1;

26 #10 mar_load = 0;

27 #10 data = 16’bzzzzzzzzzzzzzzzz;

28 #200 $finish; // Finish the simulation

29 end

30 

31 // Clock generator

32 always begin

33 #5 clk = ˜clk; // Clock every 5 time slots

34 end

35 

36 //assign data = datareg;

37 

38 // Connect DUT to test bench

39 mar DUT (clk,nrst,mar_load,data,address);

40 

41 endmodule

The resulting waveform shows the behavior as predicted (Figure 8.8):

f08-08-9780080971292
Figure 8.8 Basic processor memory address register (MAR) simulation.

8.4.5 The Arithmetic and Logic Unit

The Arithmetic and Logic Unit (ALU) has the same clock and reset signals as the PC, and also the same interface to the bus (alu_bus) defined as a type inout. The ALU also has three further control signals, which can be decoded to map to the 8 individual functions required of the ALU. The ALU also contains the Accumulator (ACC) which is an input of the size defined for the system bus width. There is also a single bit output alu_zero which goes high when all the bits in the accumulator are zero.

The function of the ALU is to decode the alu_op in binary form and then carry out the relevant function on the data on the bus, and the current data in the accumulator. If the alu_valid is low, the the bus value should be set to Z for all bits. If the reset signal (nest) is low, then the register value internally should be set to all 0. On the rising edge of the clock, the value on the bus shall be sent to the internal register and the command shall be decoded. The resulting Verilog model is given as follows:

1 ‘define OP 8

2 ‘define ADDR 8

4 module alu (clk, nrst, alu_op, alu_zero, alu_valid, alu_bus);

6 // Interface Definitions

7 input clk; // Clock Input

8 input nrst; // reset (active Low)

9 output alu_zero; // ALU is zero

10 input alu_valid; // ALU output is valid

11 

12 inout [ ‘OP + ‘ADDR −1:0] alu_bus; // ALU bus

13 input [ ‘OP −1:0] alu_op; // ALU OP code

14 

15 // Register Definitions

16 reg alu_zero; // ALU is zero

17 

18 reg [ ‘OP + ‘ADDR −1:0] acc; // Accumulator

19 reg [ ‘OP + ‘ADDR −1:0] alu_reg; // Accumulator Reg

20 

21 assign alu_bus = alu_valid ? alu_reg : ’ bz;

22 

23 always @ (posedge clk) begin

24 

25 if (nrst ==0) begin

26  acc <= 0;

27 end

28 else begin

29  case (alu_op)

30  8’ h00 : acc <= alu_bus;

31  8’ h01 : acc <= acc + alu_bus;

32  8’ h02 : acc <= ˜ alu_bus;

33  8’ h03 : acc <= acc | alu_bus;

34  8’ h04 : acc <= acc & alu_bus;

35  8’ h05 : acc <= acc ˆ alu_bus;

36  8’ h06 : acc <= acc + 1;

37  8’ h07 : alu_reg <= acc;

38  default : acc = 0;

39  endcase

40  if (acc ==0) begin

41  alu_zero <=1;

42  end

43  else begin

44  alu_zero <= 0;

45  end

46 end

47 end

48 

49 endmodule

We can test this by loading in a sample instruction, and then setting it valid so that it is then seen on the bus. The test bench to achieve this is shown here, and in this case after initializing the accumulator to all zeros, the hex value 0012 is loaded, with the binary equivalent 0000 0000 0001 0010, which is seen in Figure 8.9.

f08-09-9780080971292
Figure 8.9 Basic processor ALU simulation.

1 ‘define OP 8

2 ‘define ADDR 8

5 module alu_tb ();

6 // declare the counter signals

7 reg clk;

8 reg nrst;

9 reg alu_valid;

10 wire alu_zero;

11 

12 wire [ ‘OP + ‘ADDR −1:0] alu_bus;

13 reg [ ‘OP −1:0] opcode;

14 reg [ ‘OP + ‘ADDR −1:0] alu_reg;

15 

16 

17 assign alu_bus = alu_reg;

18 

19 // Set up the initial variables and reset

20 initial begin

21 $display (” time t clk reset inc load valid data ”);

22 $monitor (” % g t % b % b % b % b % b ”,

23 $ time, clk, nrst, alu_zero, alu_valid, opcode, alu_bus);

24 clk = 1;  // initialize the clock to 1

25 nrst = 1;  // set the reset to 1 (not reset)

26 alu_valid = 0;

27 opcode = 8’h00;

28 alu_reg = ’bz;

29 #5 nrst = 0;  // reset = 0 : resets the counter

30 #10 nrst = 1;  // reset back to 1 : counter can start

31 alu_valid = 1;

32 #10 alu_valid = 0;  // reset back to 1 : counter can start

33 #10 alu_reg = 16’h0012;

34 #200 $finish; // Finish the simulation

35 end

36 

37 // Clock generator

38 always begin

39 #5 clk = ˜clk;  // Clock every 5 time slots

40 end

41 

42 //assign data = datareg;

43 

44 // Connect DUT to test bench

45 alu DUT (clk,nrst,opcode,alu_zero,alu_valid,alu_bus);

46 

47 endmodule

The resulting waveform shows the behavior as predicted:

8.4.6 The Memory

The processor requires a RAM memory, with an address register (MAR) and a data register (MDR). There therefore needs to be a load signal for each of these registers: mdr_load and mar_load. As it is a memory, there also needs to be an enable signal (m_en), and also a signal to denote Read or Write modes (m_rw). Finally, the connection to the system bus is a standard inout vector as has been defined for the other registers in the microprocessor.

As there is a full description of a sample memory in Chapter 11, Memory, the code is not repeated at this point.

8.4.7 Microcontroller Controller

The operation of the processor is controlled in detail by the sequencer, or controller block. The function of this part of the processor is to take the current program counter address, look up the relevant instruction from memory, move the data around as required, setting up all the relevant control signals at the right time, with the right values. As a result, the controller must have the clock and reset signals (as for the other blocks in the design), a connection to the global bus, and finally all the relevant control signals must be output.

Using this entity, the control signals for each separate block are then defined, and these can be used to carry out the functionality requested by the program. The architecture for the controller is then defined as a basic state machine to drive the correct signals. The basic state machine for the processor is defined in Figure 8.4 shown previously in this chapter.

The outline Verilog Controller is shown below; however, there are so many states it has been cut down to illustrate the architecture of the model.

1 ‘ define OP 8

2 ‘ define ADDR 8

module controller (

5 clk, nrst,

6 ir_load, ir_valid, ir_address,

7 pc_inc, pc_load, pc_valid,

8 mdr_load,

9 mar_load, mar_valid,

10 m_en, m_rw,

11 alu_op,

12 alu_valid

13  );

14 

15 // Interface Definitions

16 input clk; // Clock Input

17 input nrst; // reset (active Low)

18 

19 output ir_load;

20 output ir_valid;

21 input [ ‘ADDR −1:0] ir_address;

22 

23 output pc_inc;

24 output pc_load;

25 output pc_valid

26 

27 output alu_valid; // ALU output is valid

28 

29 output mdr_load;

30 output mar_load;

31 output mar_valid;

32 output m_en;

33 output m_rw;

34 

35 output [ ‘OP −1:0] alu_op; // ALU OP code

36 

37 // Register Definitions

38 

39 reg ir_load;

40 reg ir_valid;

41 

42 reg pc_inc;

43 reg pc_load;

44 reg pc_valid

45 

46 reg alu_valid; // ALU output is valid

47 

48 reg mdr_load;

49 reg mar_load;

50 reg mar_valid;

51 reg m_en;

52 reg m_rw;

53 

54 reg [ ‘OP −1:0] alu_op; // ALU OP code

55 

56 reg [3:0] state; // state variable

57 

58 always @(posedge clk) begin

59 

60 if(nrst==0) begin

61  acc <= 0;

62 end

63 else begin

64  case (state)

65  8’h00: begin

66  mar_load <= 1;

67  pc_load <= 1;

68  pc_inc <= 1;

69  end

70 

71  // Complete State Definitions Here

72 

73 

74  // Catch All state to avoid unknown conditions

75  default: state <= 0;

76  endcase

77  if (acc==0) begin

78  alu_zero <=1;

79  end

80  else begin

81  alu_zero <= 0;

82  end

83 end

84 end

85 

86 always @(posedge clk or posedge rst)

87 begin

88 if (rst == 0)

89  state = s0;

90 else

91  case (state)

92  s0:

93  state = s1;

94  s1:

95  if (choice)

96  state = s3;

97  else

98  state = s2;

99  s2:

100  state = s0;

101  s3:

102  state = s0;

103  endcase

104  end

105 

106  endmodule

8.4.8 Summary of a Simple Verilog Microprocessor

Now that the important elements of the processor have been defined, it is a simple matter to instantiate them in a complete Verilog model and create a microprocessor using these building blocks. It is also a simple matter to modify the functionality of the processor by changing the address/data bus widths or extend the instruction set.

8.5 Soft Core Processors on an FPGA

While the previous example of a simple microprocessor is useful as a design exercise and helpful to gain understanding about how microprocessors operate, in practice most FPGA vendors provide standard processor cores as part of an embedded development kit that includes compilers and other libraries. For example, this could be the MicroBlazeTM core from Xilinx or the NiosTM core supplied by Altera. In all these cases the basic idea is the same: that a standard configurable core can be instantiated in the design and code compiled using a standard compiler and downloaded to the processor core in question.

Each soft core is different and rather than describe the details of a particular case, in this section the general principles will be covered and the reader is encouraged to experiment with the offerings from the FPGA vendors to see which suits their application the best.

In any soft core development system there are several key functions that are required to make the process easy to implement. The first is the system building function. This enables a core to be designed into a hardware system that includes memory modules, control functions, DMA functions, data interfaces, and interrupts. The second is the choice of processor types to implement. A basic Nios II or similar embedded core will typically have a performance in the region of 100-200MIPS, and the processor design tools will allow the size of the core to be traded off with the hardware resources available and the performance required.

8.6 Summary

The topic of embedded processors on FPGAs would be suitable for a complete book in itself. In this chapter the basic techniques have been described for implementing a simple processor directly on the FPGA and the approach for implementing soft cores on FPGAs have been introduced.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.171.20