Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7

Memory System

Abstract

This chapter introduces the memory system in typical microcontrollers, including some special features such as boot loader and memory remapping support, and the memory map defined by the ARMv6-M architecture. The memory system architecture topics include endian support, memory attributes, and access permissions. It also covers information of how program codes relate to the memory system characteristics such as data types and data alignment.

Keywords

Boot loader in microcontrollers; Data alignment; Little and big endian; Memory access permissions; Memory attributes; Memory bus; Memory map; Memory remapping

7.1. Memory Systems in Microcontrollers

All processor systems need memories. In typical microcontrollers we need Non-Volatile Memory (NVM) for program storage, such as flash memories or mask ROM, as well as memory space such as SRAM (Static Random Access Memory) in which we can easily write and read back. SRAM is typically used for data variables, stack memory, as well as heap memory for dynamic memory allocation (e.g., when using alloc() function in C language).

In most microcontrollers, you can find these memories integrated in the microcontroller chip. This makes these microcontrollers much easier to use (requires fewer external connections and reduces costs for the final embedded products). However, the on-chip flash and SRAM memory sizes are limited. Many low cost microcontrollers have around 128 KB (or less) of flash memory and around 32 KB (or less) of SRAM size.

A number of microcontrollers also have a boot loader ROM which enables the microcontroller to execute a small program provided by the Micro-Controller Unit (MCU) vendor before starting the user applications stored in the flash memory. The boot loader ROM might provide various boot options and flash programming utility, as well as setting up factory calibration data for internal clock source calibration, or calibration data for internal voltage references. Some of the microcontroller designs do not allow the boot loader to be modified or erased by the software developers.

If the project requires more memories in the system, the system designer would need to select a microcontroller product which supports external memory interface. Please note that many microcontroller products are not designed to support off-chip memory systems. Even with microcontrollers that support external memories, each memory access to the off-chip memory system can take multiple clock cycles, and therefore the system performance is likely to be lower than placing all data in the on-chip memory systems.

Traditional microcontrollers require separate NVM and SRAM because NVM-like flash memories require a complex programming sequence to update, therefore is not suitable for data storage (e.g., data variables, stack, which need to be updated very frequently).

Recently, some microcontroller products start to use FRAM (Ferroelectric RAM) or MRAM (Magnetoresistive RAM). These technologies enable a single memory block to be used for both program code and data storage, and have the advantage that the memory system can be powered down completely and then resume operations without losing the data in the RAM (traditional approach requires the SRAM to be put into a state retention mode which still incurs leakage current). While existing Cortex^®-M processor-based microcontroller products do not use these memory technologies, this can be done (and have been demonstrated experimentally¹) as the Cortex-M processors do not restrict the types of memory technologies used for the implementation.

One important aspect of NVM memories in microcontrollers is that many NVM technologies are relatively slow compared to SRAM access speed. As a result, the bus interface for the flash memories or FRAM memories needs to insert wait states to the bus system when the processor bus is running faster than the maximum access speed of the memory. For example, typically on-chip flash memory has access speed ranged from 25 to 50 MHz (some high-speed flash memory technologies can run at over 100 MHz, but they are rarely used for ultra low power microcontroller devices as their power consumption is relatively high).

7.2. Bus Systems in the Cortex^®-M0 and Cortex-M0+ Processors

The Cortex-M0 and Cortex-M0+ processors have a 32-bit system bus interface with 32-bit address lines (4 GB address space). The system bus is based on a bus protocol called AHB-Lite (Advanced High-performance Bus), which is a protocol defined in the AMBA^® (Advanced Microcontroller Bus Architecture) standard. The AMBA standard is developed by ARM^® and is widely used in semiconductor industry.

The system bus interface is a generic design that can be connected to different types of memories with suitable memory interface logic. The bus interface can support read/write transfers with 32-, 16-, and 8-bit data, and support wait states and slave responses (can be OKAY or ERROR). Technically the memory devices connected to the processor can be any size and can be different width. For example, the memory devices can be 8-bit, 16-bit, or 64-bit memory, but would require additional hardware to bridge between different bus sizes. Typically 32-bit on-chip memories are used to keep the design’s complexity at minimum.

While the AHB-Lite protocol provides high-performance accesses to the memory system, very often a secondary bus segment can also be found for slower devices including peripherals, as shown in Figure 7.1. In ARM microcontrollers, the peripheral bus system is normally based on the APB (Advanced Peripheral Bus) protocol. The APB is connected to the AHB-Lite via a bus bridge and may run at a different clock speed compared to the AHB system bus. The data path on the APB is also 32-bit, but the address lines are often less than 32-bit as the peripheral address space is relatively small.

Figure 7.1 Separation of system and peripheral bus in a simple 32-bit microcontroller.

Due to the separation of main system bus and peripheral bus, and in some cases with separated clock frequency controls, an application might need to initialize some clock control hardware in the microcontroller before accessing some of the peripherals. In some cases, there can also be multiple peripheral bus segments in a microcontroller running at different clock frequencies. Beside from allowing some part of the system running in a slower speed, the separation of bus segments also provide possibilities of power reduction by allowing clock signal to a peripheral system to be stopped completely.

Depending on the microcontroller designs, some high-speed peripherals might be connected to the AHB-Lite system bus instead of the APB. This is because the AHB-Lite protocol requires less number of clock cycles for each transfer when compared to the APB. The bus protocol behavior affects the system operation and programmer’s view on the memory system in a number of ways. This will be covered in Section 7.9.

7.3. Memory Map

7.3.1. Overview

The 4 GB memory space of the Cortex^®-M0 and Cortex-M0+ processors is architecturally divided into a number of regions (Figure 7.2). Each region has its recommended usage, and the memory access behavior could be dependent on which memory region you are accessing to. This memory region definition helps software porting between different ARM^® Cortex microcontrollers as they all have the similar arrangements.

Figure 7.2 Architecturally defined memory map of the Cortex^®-M0/M0+ processor.

Although having an architectural defined memory map, the actual usage of the memory map is very flexible. There are only a few limitations, for example: a few memory regions which are allocated for peripherals do not allow program code execution, and there are a number of internal components that have fixed memory addresses to ensure software portability.

Next we will have a look into the usage of each region.

7.3.2. Code Region (0x00000000–0x1FFFFFFF)

The size of the code region is 512 MB. It is primarily used to store program code, including the initial exception vector table at address 0x00000000 which is a part of the program image. This region can also be used for data memory (connection to RAM).

7.3.3. SRAM Region (0x20000000–0x3FFFFFFF)

The SRAM region is the located in the next 512 MB of the memory map. It is primarily used to store data, including stack. It can also be used to store program codes. For example, in some cases you might want to copy program codes from slow external memory to the SRAM and execute it from there. Despite the name given to this region is called “SRAM,” the actual memory devices being used could be SRAM, SDRAM or other types or read–write memory.

7.3.4. Peripheral Region (0x40000000–0x5FFFFFFF)

The Peripheral region also has the size of 512 MB. It is primarily used for peripherals, and can also be used for data storage. However, program execution is not allowed in the Peripheral region. The peripherals connected to this memory region can either be AHB-Lite peripheral or APB peripherals (via a bus bridge).

7.3.5. RAM Region (0x60000000–0x9FFFFFFF)

The RAM region consists of two 512 MB blocks, which results in total of 1 GB space. Both 512 MB memory blocks are primarily used to stored data, and in most cases the RAM region can be used as a 1 GB continuous memory space. The RAM region can also be used for program code execution. The only differences between the two halves of the RAM region are the memory attributes, which might cause differences in caching behavior if a system level cache (level-2 cache) is used. More about memory attributes will be covered in later part of this chapter.

7.3.6. Device Region (0xA0000000–0xDFFFFFFF)

The external device region consists of two 512 MB memory blocks, which results in a total of 1 GB space. Both 512 MB memory blocks are primarily used for peripherals and I/O usages. The device region does not allow program execution, but it can be used for general data storage. Similar to the RAM region, the two halves of the device region have different memory attributes.

7.3.7. Internal Private Peripheral Bus (0xE0000000–0xE00FFFFF)

The internal Private Peripheral Bus (PPB) memory space is allocated for peripherals inside the processor, such as the interrupt controller Vectored Interrupt Controller (NVIC), as well as the debug components. The internal PPB memory space is 1 MB in size, and program execution is not allowed in this memory range.

Within the PPB memory range, a special range of memory is defined as the System Control Space (SCS). The SCS address is from 0xE000E000 to 0xE000EFFF. It contains the interrupt control registers, system control registers, debug control registers, etc. The NVIC registers are part of the SCS memory space. The SCS also contains an optional timer called the SysTick. This will be covered in Chapter 10 (Section 10.3, The SysTick Timer).

7.3.8. Reserved Memory Space (0xE0100000–0xFFFFFFFF)

The last section of the memory map is a 511 MB reserved memory space. This may be used in some microcontrollers for microcontroller vendor specific usages.

7.3.9. System Level Design

Although all the Cortex-M Processors have this fixed memory map, the usage of the memory is very flexible. For example, it can have multiple SRAM memory blocks placed in SRAM region as well as other locations like the CODE region, and it can also execute program code from external memory components located in CODE/SRAM/RAM region. Microcontroller vendors can also add their own system level memory features like system level cache if needed.

So how does the memory map of a typical real system look like?

For a typical microcontroller developed with the Cortex-M0/M0+ processor, normally you can find:

• Flash memory (for program code)

• Internal SRAM (for data)

• Internal peripherals

• External memory interface (for external memories as well as external peripherals, optional)

• There could also be other external peripherals interface

After putting all these components together, an example microcontroller could be illustrated as in Figure 7.3, with the nonexecutable memory regions highlighted in yellow.

Figure 7.3 shows some of the possibilities of how memory regions can be used. However, in many low cost microcontrollers the system designs do not have any external memory interface or SD (Secure Digital) card interface. In these cases, some of the memory regions like the external RAM or the external device regions might be unused.

7.4. Program Memory, Boot Loader, and Memory Remapping

7.4.1. Program Memory and Boot Loader

In microcontroller products, usually the program memory of the Cortex^®-M0 or Cortex-M0+ processor is implemented with on-chip flash memory. However, it is also possible that the program is stored externally or using other types of memory devices (e.g., external Quad SPI flash, EEPROM).

When the Cortex-M processor comes out of reset, it accesses the vector table in address zero for initial Main Stack Pointer value and reset vector value, and then starts the program execution from the reset vector. In order to ensure the system start up correctly, a valid vector table and a valid program memory must be available in the system to prevent the processor from executing rogue program code. In many designs the required vector table and boot code are provided by a flash memory starting from address zero. However, an off-the-shelf microcontroller product might not have any program in the flash memory before it is programmed. In order to allow the processor start up correctly, some Cortex-M microcontrollers come with a boot loader, a small program located on the microcontroller chip that executes after power-up and branch to the user’s application in the flash memory only if the flash is programmed.

Figure 7.3 Example usage of various memory regions in a microcontroller design.

The boot loader is preprogrammed by the chip manufacturer. Sometimes it is stored on the on-chip flash memory with a memory section separated from user applications (to allow update of user program without affecting the boot loader), or stored on an NVM separated from the user programmable flash memory. The boot loader feature is not always needed, even if the microcontroller does not boot up correctly due to the lacking of a valid program image in the flash memory, a debugger can still be able to connect to the processor via a debug connection and reprogram the flash memory.

7.4.2. Memory Remap

When a boot loader is present, it is possible that the microcontroller vendor would implement a memory map switching feature called “remap” on the system bus. The switching of the memory map is controlled by a hardware register, which is programmed when the boot loader is executed. There are various types of remap arrangements. One common remap arrangement is to allow the boot loader to be mapped to the start of the memory during power-up using address alias, as shown in Figure 7.4.

Figure 7.4 An example of memory-remap implementation with boot loader.

The boot loader might also support additional features like hardware initialization (clock and PLL setup), supporting of multiple boot configurations, firmware protection or even flash erase utilities. The memory remap feature is implemented on the system bus and is not a part of the Cortex-M0/M0+ processor, therefore different microcontrollers from different vendors have different implementations.

Another common type of remap features implemented on some ARM microcontrollers allows an SRAM block to be remapped to address 0x0 (Figure 7.5). Normally NVM used on microcontrollers like flash memory is slower than SRAM. When the microcontroller is running at high clock rate, wait states would be required if the program is executed from the flash memory. By allowing an SRAM memory block to be remapped to address 0x0, then the program can be copied to SRAM and execute at maximum speed. This also avoids wait states in vector table fetch which affects interrupt latency.

Figure 7.5 A different example of memory-remap implementation—SRAM for fast program accesses.

In some other cases, the memory remapping technique is being used in Cortex-M0 microcontrollers to allow the vector table (see Section 8.5 in Chapter 8) to be modified dynamically during runtime. For this usage, a small part of the SRAM can be mapped into address 0x0 as an address alias and used for storing vector table entries. Since the Cortex-M0+ processor has the vector table relocation feature (see Section 9.2.4 Vector Table Offset Register in Chapter 9), the system level memory remap is not essential because the users can define part of the on-chip SRAM or user flash memory as vector table.

7.5. Data Memory

The data memory in Cortex^®-M processors is used for software variables, stack memory, and in some cases, heap memory. Sometimes local variables in C functions could be stored onto the stack memory. The heap memory is needed when the applications use C functions that require dynamically allocated memory space (e.g., alloc(), malloc() functions). Other data variables like global variables and static variables are normally statically allocated in the beginning of the RAM space.

In most embedded applications without Operating Systems (OS), only one stack is used (only the Main Stack Pointer is required). In this case the data memory can be arranged as shown in Figure 7.6.

Since the stack operation is based on full descending stack arrangement, and heap memory allocation is ascending, it is common to put the stack at the end of the memory block and heap memory just after normal memory to get the most flexible arrangement.

For embedded applications with embedded OS, each task might have their own stack memory range (see Figure 3.9 in Chapter 3). It is also possible that each task has its own allocated memory space, with each memory space containing a memory layout which consists of stack, heap, and data.

Figure 7.6 An example of common SRAM usage.

7.6. Little Endian and Big Endian Support

The Cortex^®-M0 and Cortex-M0+ processors support either little endian or big endian memory format. The choice is made by the microcontroller vendor when the chip is designed, and cannot be changed by embedded programmers. Software developers must configure their development tools project options to match the endianness of the targeted microcontroller.

The big endian mode supported on the Cortex-M0/M0+ processor is called Byte-Invariant big endian mode, or “BE8” big endian mode. It is one of the big endian modes in ARM architectures. Traditional ARM processors like ARM7TDMI™ use a different big endian mode called Word-Invariant big endian mode, or “BE32.” The difference between the two is on the hardware interface level and does not affect programmer’s view.

Most of the Cortex-M Processor-based microcontrollers are using little endian configuration. With little endian arrangement, the lowest byte of a word-size data is stored in bit 0 to bit 7 (Figure 7.7).

While in big endian configuration, the lowest byte of a word-size data is stored in bit 24 to bit 31 (Figure 7.8).

Both memory configurations support data handling of different sizes. The Cortex-M processors can generate byte, half-word, and word transfers. When the memory is accessed, the memory interface selects the data lanes based on the transfer size and the lowest 2 bits of the address. For little endian systems, the data access can be illustrated by the following diagram (Figure 7.9).

Similarly, a big endian system support data access of different size (Figure 7.10).

Note that there are two exceptions in big endian configurations:

1. the instruction fetch is always in little endian, and

2. the accesses to PPB address space are always in little endian.

7.7. Data Type

The Cortex^®-M processors support different data types by providing various memory access instructions for different transfer sizes, and by providing a 32-bit AHB-Lite interface which supports 32-bit, 16-bit, and 8-bit transfers. For example, in C language development, the following data types are commonly used (Table 7.1).

Figure 7.9 Data access in little endian system.

Figure 7.10 Data access in big endian system.

Table 7.1

Commonly used data types in C language development

Type	Number of bits in ARM^®	Instructions
“char”, “unsigned char”	8	LDRB, LDRSB, STRB
“enum”	8/16/32 (smallest is chosen)	LDRB, LDRH, LDR, STRB, STRH, STR
“short”, “unsigned short”	16	LDRH, LDRSH, STRH
“int”, “unsigned int”	32	LDR, STR
“long”, “unsigned long”	32	LDR, STR

If “stdint.h” in C99 is used, the following commonly used data types are available (Table 7.2).

Table 7.2

Commonly used data types provided in “stdint.h” in C99

Type	Number of bits in ARM^®	Instructions
“int8_t”, “uint8_t”	8	LDRB, LDRSB, STRB
“int16_t”, “uint16_t”	16	LDRH, LDRSH, STRH
“int32_t”, “uint32_t”	32	LDR, STR

For other data type that requires larger size (e.g., int64_t, uint64_t), the C compilers automatically convert the data transfer into multiple memory access instructions.

Note that for peripheral register accesses, the data types being used should match the hardware register sizes. Otherwise the peripheral might ignore the transfer or not functioning as expected. In most cases, peripherals connected to the peripheral bus (APB) should be accessed using word-size transfers. This is due to the fact that APB protocol does not have transfer size signals, hence all the transfers are assumed to be word size. Therefore peripheral registers accessed via the APB are normally declared as “volatile unsigned integer” or “volatile uint32_t” if “stdint.h” is used.

7.8. Memory Attributes and Memory Access Permission

The Cortex^®-M Processors can be used with a wide range of memory systems and devices. In order to make porting of software between different devices easier, a number of memory attribute settings are available for each regions in the memory map. Memory attributes are characteristics of the memory accesses; they can affect data and instruction accesses to memory as well as accesses to peripherals.

In the ARMv6-M architecture, which is used by the Cortex-M0 and Cortex-M0+ processors, a number of memory access attributes are defined for different memory regions (these attributes are also available on ARMv7-M architecture):

Executable—The executable attribute defines whether program execution is allowed in that memory region. If a memory region is defined as nonexecutable, in ARM documentation it is marked as eXecute Never (XN).

Bufferable—When a data write is carried out to a bufferable memory region, the write transfer can be buffered, which means the processor can continue to execute next instruction without waiting for the current write transfer to complete.

Cacheable—If a cache device is present on the system, it can keep a local copy of the data during a data transfer, and reuse it next time the same memory location is accessed to speed up the system. The cache device can be a cache memory unit, or could be a small buffer in a memory controller.

Shareable—The shareable attribute defines whether a memory region can be accessed by more than one processor. If a memory region is shareable, the memory system needs to ensure coherency between memory accesses by multiple processors in this region.

For most users of the Cortex-M0 and Cortex-M0+ processor-based products, only the XN attribute is relevant as it defines which regions can be used for program execution. The other attributes are used only if cache unit or multiple processors are used. Since the Cortex-M0 and Cortex-M0+ processors do not have an internal cache unit, in most cases these memory attributes are not used. If a system level cache is used, or when the memory controller has a build-in cache, then these memory attributes signals exported by the processor via the AHB interface could be used.

Base on the memory attributes, various memory types are architecturally defined, and is used to define what type of devices could be used in each memory region:

Normal memory—Normal memories can be shareable or nonshareable, and can be either cacheable or noncacheable. For memories with cacheable, the caching behavior can be further divided into Write Through (WT) or Write Back Write Allocate (WBWA).

Device memory—Device memories are noncacheable. They can be shareable or nonshareable.

Strongly Ordered (SO) memory—A memory region that is nonbufferable, noncacheable and transfer to/from SO region takes effect immediately. Also, the orders of SO transfers on the memory interface must be identical to the orders of the corresponding memory access instructions (i.e., no access reordering for speed optimization—please note that the Cortex-M0 and Cortex-M0+ processors do not have such access reordering feature anyway). SO memory regions are always shareable in terms of architectural definition.

The memory attribute and memory types for each memory region in the Cortex-M processors are defined in the architecture (Table 7.3), and the attribute for some of the regions can be overridden with configuration settings in the MPU (Memory Protection Unit) if available. During the memory accesses, the memory attributes are exported from the processor to the AHB system, which can be used by a system level cache controller (L2 cache) when applicable.

The PPB memory region is defined as SO. This means the memory region is nonbufferable and noncacheable. In the Cortex-M0 and Cortex-M0+ processors, operations following an access to SO region are not started until the access is completed. This behavior is important for changing registers in the SCS, where we often expected the operations of changing a control register should take place immediately before next instruction is executed. Please note that memory attributes and permissions for SCS cannot be changed by MPU.

Table 7.3

Default memory attribute map defined by the architecture

Address	Region	Memory type	Cache	XN	Shareable	Descriptions
0x00000000–0x1FFFFFFF	CODE	Normal	WT	–	–	Memory for program code including vector table
0x20000000–0x3FFFFFFF	SRAM	Normal	WBWA	–	–	SRAM, typically used for data and stack memory
0x40000000–0x5FFFFFFF	Peripheral	Device	–	XN	–	Typically used for on-chip devices
0x60000000–0x7FFFFFFF	RAM	Normal	WBWA	–	–	Normal memory with Write Back, Write Allocate cache attributes
0x80000000–0x9FFFFFFF	RAM	Normal	WT	–	–	Normal memory with Write Through cache attributes
0xA0000000–0xBFFFFFFF	Device	Device	–	XN	S	Shareable device memory
0xC0000000–0xDFFFFFFF	Device	Device	–	XN	–	Nonshareable device memory
0xE0000000–0xE00FFFFF	PPB	Strongly ordered	–	XN	S	Internal Private Peripheral Bus
0xE0100000–0xFFFFFFFF	Reserved	Reserved	–	–	–	Reserved (Vendor-specific usage)

In some other ARM processors like the Cortex-M3 processor, there can also be default memory access permission for each region. Since the Cortex-M0 processor does not have separated privileged and nonprivileged (user) access level, the processor is in privilege access level all the time and therefore does not have a memory map for default memory access permission. The Cortex-M0+ processor, however, has the optional unprivileged execution level and therefore has the default access permission as shown in Table 7.4.

In practice, most of the memory attributes and memory type definitions are unimportant (apart from the XN attribute and access permissions) to users of Cortex-M0 and Cortex-M0+ microcontrollers. However, if the software code has to be reused on high-end processors, especially on systems with multiple processors and cache memories, these details can be important.

Table 7.4

Memory access permission

Memory region	Default permission	Note
CODE, SRAM, Peripheral, RAM, Device	Accessible for both privileged and unprivileged code.	Access permission can be overridden by MPU configurations
System Control Space including NVIC, MPU, SysTick	Accessible for privileged code only. Attempts to access these registers from unprivileged code result in HardFault exception.	Cannot be overridden by MPU configurations

7.9. Effect of Hardware Behavior to Programming

The design of the processor hardware and the behavior of the bus protocol affect the software in a number of ways. In previous section we have already mentioned that peripherals connected to the APB are usually accessed using word-size transfers due to the nature of the APB protocol. In this section we will look into other aspects.

7.9.1. Data Alignment

The Thumb^® instruction set supported by the Cortex^®-M0 and Cortex-M0+ processors can only generate aligned transfers. It means that the transfer address must be a multiple of the transfer size. For example, a word-size (32-bit) transfer can only access addresses like 0x0, 0x4, 0x8, 0xC, etc. Similarly, a half-word transfer can only access addresses like 0x0, 0x2, 0x4, etc. All byte data accesses are aligned. Examples of aligned and unaligned data accesses are shown in Figure 7.11.

Figure 7.11 Examples of aligned and unaligned transfers (for little endian memory configuration).

If the program executed attempts to generate an unaligned transfer, this will result in a fault exception and cause the HardFault handler to be executed. In normal cases, C compilers do not generate any unaligned transfers, but an unaligned transfer can still be generated if a C program directly manipulated a pointer (example in Appendix G, Section G.6.3).

Unaligned transfers can also be generated accidentally when programming in assembly language, for example, when load/store instructions of wrong transfer size is used. In the case of a half-word data located in address 0x1002, which is an aligned data, it can be accessed using LDRH, LDRSH, or STRH instructions without problems. But if the program code used LDR or STR instruction to access this data, an unaligned access fault would be triggered.

7.9.2. Access to Invalid Addresses

Unlike most 8-bit or 16-bit processors, a memory access to an invalid memory address generates a fault exception on ARM^® Cortex-M-based microcontrollers. This provides better program error detection and allows software bugs to be detected earlier.

In an AHB system connected to a Cortex-M processor, the address decoding logic detects the address being accessed and the bus system response with an error signal if the access is going to an invalid location. The bus error can be caused by either data accesses or instruction fetches. When the processor detects the error response, it can trigger a HardFault exception to handle the error.

One exception to this behavior is the branch shadows for instruction fetch. Due to the pipeline nature of the Cortex-M processors, instructions are fetched in advance. Therefore if the program execution reaches the end of a valid memory region and a branch is executed, there might be chances that the addresses beyond the valid instruction memory region could have been fetched and result in a bus error response in the AHB system. However, in this case the bus fault would be ignored if the faulted instruction is not executed due to the branch.

7.9.3. Use of Multiple Load and Store Instructions

The multiple load and store instructions in the Cortex-M processor can greatly increase the system performance when used correctly. For example, it can be used to speed up data transfer processes or can be used as a way to adjust memory pointer automatically.

However, when handling peripheral accesses, typical use of LDM or STM instructions should be avoided. If the Cortex-M0 or Cortex-M0+ processor received an interrupt request during the execution of LDM or STM instruction, the LDM or STM instruction will be abandoned and the interrupt service will start. At the end of the interrupt service, the program execution will return to the interrupted LDM or STM instruction and restart again from the first transfer of the interrupted LDM or STM.

As a result of this restart behavior, some of the transfers in this interrupted LDM or STM instruction could be carried out twice. It is not a problem for normal memory devices. However, if the access is carried on a peripheral, then the repeating of the transfer could cause error. For example, if the LDM instruction is used for reading a data in a FIFO (First-In-First-Out) buffer, then some of the data in the FIFO could be lost as the read operation is repeated.

As a precaution, we should avoid the use of LDM or STM instruction on peripheral accesses unless we are sure that the restart behavior does not cause incorrect operation to the peripheral.

7.9.4. Wait States

Some of the memory accesses might take several clock cycles to complete. For example, the flash memory used in a low power microcontroller might have a maximum access speed of just around 20 MHz while the microcontroller can run at over 40 MHz. When this happens, the flash memory interface would need to insert wait states to the bus system so that the processor will wait for the transfer to complete.

The wait states can affect the systems in a number of ways:

• The performance of the system is reduced.

• The energy efficiency of the system can be reduced because the performance is reduced.

Figure 7.12 Performance of an example system based on the Cortex^®-M0 processor with various wait states for flash memory.

• The interrupt latency of the system increases.

• The system behavior is less deterministic in terms of program execution timing.

For example, assume the flash memory system of an MCU with Cortex-M0 processor has an access speed of 50 ns (20 MHz), the performance curve of the device could look like the one shown in Figure 7.12.

As you can see from Figure 7.12, the performance is not linear because the flash memory access speed could limit the maximum performance. In order to solve this problem, many microcontroller vendors introduce flash prefetch hardware in the design so that multiple words of instructions are fetched from the flash memory each time, and when the processor is still consuming the instructions in the buffer, the next set of instruction fetches can start. This technique reduces the performance drop when the frequency increases. For example, Figure 7.13 shows the improvement with a simple prefetch logic design.

Figure 7.13 Performance comparison for a simple MCU with flash prefetch logic and without prefetch logic.

Further performance improvement is possible with more complex designs or with a system level cache.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7. Memory System

Create new playlist

Sign In

Sign Up