CHAPTER 16 Debugging Components

Introduction

The Cortex-M3 processor comes with a number of debugging components used to provide debugging features such as breakpoint, watchpoint, Flash patch, and trace. If you are an application developer, there might be a chance that you’ll never need to know the details about these debugging components, because they are normally used only by debugger tools. This chapter will introduce you to the basics of each debug component. If you want to know details about things such as the actual programmer’s model, refer to the Cortex-M3 Technical Reference Manual (Ref 1).

All the debug trace components, as well as the FPB, can be programmed via the Cortex-M3 Private Peripheral Bus (PPB). In most cases, the components will only be programmed by the debugging host. It is not recommended for applications to try accessing the debug components (except stimulus port registers in the ITM), because this could interfere with the debugger’s operation.

The Trace System in the Cortex-M3

The Cortex-M3 trace system is based on the CoreSight architecture. Trace results are generated in the form of packets, which can be of various lengths (in terms of number of bytes). The trace components transfer the packets using Advanced Trace Bus (ATB) to the Trace Port Interface Unit (TPIU), which formats the packets into Trace Interface Protocol. The data are then captured by an external trace capture device such as a Trace Port Analyzer (TPA).

image

Figure 16.1 The Cortex-M3 Trace System

There are up to three trace sources in a standard Cortex-M3 processor: ETM, ITM, and DWT. Note that the ETM in the Cortex-M3 is optional, so some Cortex-M3 products do not have instruction trace capability. During operation, each trace source is assigned a 7-bit ID value (ATID), which is transferred along the trace packets during merging in the ATB so that the packets can be separated back into multiple trace streams when they reach the debug host.

Unlike many other standard CoreSight components, the debug components in the Cortex-M3 processor include the functionality of merging ATB streams, whereas in standard CoreSight systems, ATB packet merger, called ATB funnel, is a separate block.

Before using the trace system, the Trace Enable (TRCENA) bit in the Debug Exceptions and Monitor Control Register (DEMCR) must be set to 1 (see Table 15.2 or D.37). Otherwise the trace system will be disabled. In normal operations that do not require tracing, clearing the TRCENA bit can disable some of the trace logic and reduce power consumption.

Trace Components: Data Watchpoint and Trace

The DWT has a number of debugging functionalities:

  1. It has four comparators, each of which can be configured as follows:
    • Hardware watchpoint (generates a watchpoint event to processor to invoke debug modes such as halt or debug monitor)
    • ETM trigger (causes the ETM to emit a trigger packet in the instruction trace stream)
    • PC sampler event trigger
    • Data address sampler trigger
    • The first comparator can also be used to compare against the clock cycle counter (CYCCNT) instead of comparing to a data address
  2. Counters for counting the following:
    • Clock cycles (CYCCNT)
    • Folded instructions
    • Load Store Unit (LSU) operations
    • Sleep cycles
    • Cycles per instruction (CPI)
    • Interrupt overhead
  3. PC sampling at regular intervals
  4. Interrupt events trace

When used as a hardware watchpoint or ETM trigger, the comparator can be programmed to compare either data addresses or program counters. When programmed as other functions, it compares the data addresses.

Each of the comparators has three corresponding registers:

  • COMP (compare) register
  • MASK register
  • FUNCTION control register

The COMP register is a 32-bit register that the data address (or program counter value, or CYCCNT) compares to. The MASK register determines whether any bit in the data address will be ignored during the compare (see Table 16.1).

Table 16.1 Encoding of the DWT Mask Registers

MASKIgnore Bit
0All bits are compared
1Ignore bit [0]
2Ignore bit [1:0]
3Ignore bit [2:0]
15Ignore bit [14:0]

The comparator’s FUNCTION register determines its function. To avoid unexpected behavior, the MASK register and the COMP register should be programmed before this register is set. If the comparator’s function is to be changed, you should disable the comparator by setting FUNCTION to 0 (disable), then program the MASK and COMP registers, and then enable the FUNCTION register in the last step.

The rest of the DWT counters are typically used for profiling the application codes. They can be programmed to emit events (in the form of trace packets) when the counter overflows. One typical application is to use the CYCCNT register to count the number of clock cycles required for a specific task, for benchmarking purposes.

The TRCENA bit in the DEMCR must be set to 1 before the DWT is used. If the DWT is being used to generate a trace, the DWTEN bit in the ITM Control register should also be enabled.

Trace Components: Instrumentation Trace Macrocell

The ITM has the following functionalities:

  • Software can directly write console messages to ITM stimulus ports and output them as trace data.
  • The DWT can generate trace packets and output them via the ITM.
  • The ITM can generate timestamp packets that are inserted into a trace stream to help the debugger find out the timing of events.

Since the ITM uses a trace port to output data, if the microcontroller or SoC does not have TPIU support, the traced information cannot be output. Therefore, it is necessary to check whether the microcontroller or SoC has all the required features before you use the ITM. In the worst case, if these features are not available you can still use the NVIC debug register or a UART to output console messages.

To use the ITM, the TRCENA bit in the DEMCR must be set to 1. Otherwise the ITM will be disabled and ITM registers cannot be accessed.

In addition, there is also a lock register in the ITM. You need to write the access key 0xC5ACCE55 (CoreSight ACCESS) to this register before programming the ITM. Otherwise, all write operations to the ITM will be ignored.

Finally, the ITM itself is another Control register to control the enabling of individual features. The Control register also contains the ATID field, which is an ID value for the ITM in the ATB. This ID value must be unique from the IDs for other trace sources so that the debug host receiving the trace packet can separate the ITM’s trace packets from other trace packets.

Software Trace with the ITM

One of the main uses of the ITM is to support debug message output (such as printf). The ITM contains 32 stimulus ports, allowing different software processes to output to different ports, and the messages can be separated later at the debug host. Each port can be enabled or disabled by the Trace Enable register and can be programmed (in groups of eight ports) to allow or disallow user processes to write to it.

Unlike UART-based text output, using the ITM to output does not cause much delay for the application. A FIFO buffer is used inside the ITM, so writing output messages can be buffered. However, it is still necessary to check whether the FIFO is full before you write to it.

The output messages can be collected at the trace port interface or the Serial-Wire Interface (SWV) on the TPIU. There is no need to remove code that generates the debug messages from the final code, because if the TRCENA control bit is low, the ITM will be inactive and debug messages will not be output. You can also switch on the output message in a “live” system and use the Trace Enable register in the ITM to limit which ports are enabled so that only some of the messages can be output.

Hardware Trace with ITM and DWT

The ITM is used in output of hardware trace packets. The packets are generated from the DWT, and the ITM acts as a trace packet merging unit. To use DWT trace, you need to enable the DWTEN bit in the ITM Control register; the rest of the DWT trace settings still need to be programmed at the DWT.

image

Figure 16.2 Merging of Trace Packets on the ITM and TPIU

ITM Timestamp

ITM has a timestamp feature that allows trace capture tools to find out timing information by inserting delta timestamp packets into the traces when a new trace packet enters the FIFO inside the ITM. The timestamp packet is also generated when the timestamp counter overflows.

The timestamp packets provide the time difference (delta) with previous events. Using the delta timestamp packets, the trace capture tools can then establish the timing of when each packet is generated and hence reconstruct the timing of various debug events.

Trace Components: Embedded Trace Macrocell

The ETM block is used for providing instruction traces. It is optional and might not be available on some Cortex-M3 products. When it is enabled and when trace operation starts, it generates instruction trace packets. A FIFO buffer is provided in the ETM to allow enough time for the trace stream to be captured.

To reduce the amount of data generated by the ETM, it does not always output exactly what address the processor has reached/executed. It usually outputs information about program flow and outputs full addresses only if needed (e.g., if a branch has taken place). Since the debugging host should have a copy of the binary image, it can then reconstruct the instruction sequence the processor has carried out.

The ETM also interacts with other debugging components such as the DWT. The comparators in the DWT can be used to generate trigger events in the ETM or to control the trace start/stop.

Unlike the ETM in traditional ARM processors, the Cortex-M3 ETM does not have its own address comparators, because the DWT can carry out the comparison for ETM. Furthermore, since the data trace functionality is carried out by the DWT, the ETM design in the Cortex-M3 is quite different from traditional ETM for other ARM cores.

To use the ETM in the Cortex-M3, the following setup is required (handled by debug tools):

  1. The TRCENA bit in the Debug Exceptions and Monitor Control Register (DEMCR) must be set to 1 (see Table 15.2 or D.37).
  2. The ETM needs to be unlocked so that its control registers can be programmed. This can be done by writing the value 0xC5ACCE55 to the ETM LOCK_ACCESS register.
  3. The ATB ID register (ATID) should be programmed to a unique value so that the trace packet output via the TPIU can be separated from packets from other trace sources.
  4. The NIDEN input signal of the ETM must be set to high. The implementation of this signal is device specific. Refer to the datasheet from your chip’s manufacturer for details.
  5. Program the ETM control registers for trace generation.

Trace Components: Trace Port Interface Unit

The TPIU is used to output trace packets from the ITM, DWT, and ETM to the external capture device (for example, a trace port analyzer). The Cortex-M3 TPIU supports two output modes:

  • Clocked mode, using up to 4-bit parallel data output ports
  • Serial-Wire Viewer (SWV) mode, using single-bit SWV output1

In clocked mode, the actual number of bits being used on the data output port can be programmed to different sizes. This will depend on the chip package as well as the number of signal pins available for trace output in the application. The maximum trace port size supported by the chip can be determined from one of the registers in the TPIU. In addition, the speed of trace data output can also be programmed.

In SWV mode, the SWV protocol is used. This reduces the number of output signals, but the maximum bandwidth for trace output will also be reduced.

To use the TPIU, the TRCENA bit in the DEMCR must be set to 1, and the protocol (mode) selection register and trace port size control registers need to be programmed by the trace capture software.

The Flash Patch and Breakpoint Unit

The FPB has two functions:

  • Hardware breakpoint (generates a breakpoint event to the processor to invoke debug modes such as halt or debug monitor)
  • Patch instruction or literal data from Code memory space to SRAM

The FPB contains eight comparators:

  • Six instruction comparators
  • Two literal comparators

What Are Literal Loads?

When we program in assembler language, very often we need to set up immediate data values in a register. When the value of the immediate data is large, the operation cannot be fitted into one instruction space. For example:

image

Since no instruction has an immediate value space of 32, we need to put the immediate data in a different memory space, usually after the program code region, and then use a PC relative load instruction to read the immediate data into the register. So what we get in the compiled binary code will be something like this:

image

or with Thumb-2 instructions:

image

Since we are likely to use more than one literal value in our code, the assembler or compiler will usually generate a block of literal data, it is commonly called literal pool.

In Cortex-M3, the literal load are data read operation carried out on the data bus (D-CODE bus or System bus depending on memory location).

The FPB has a Flash Patch control register that contains an enable bit to enable the FPB. In addition, each comparator comes with a separate enable bit in its comparator control register. Both of the enable bits must be set to 1 for a comparator to operate.

The comparators can be programmed to remap addresses from Code space to the SRAM memory region. When this function is used, the REMAP register needs to be programmed to provide the base address of the remapped contents. The upper three bits of the REMAP register (bit[31:29]) is hardwired to 3’b001, which limited the remap base address location to be within 0x20000000 to 0x3FFFFF80, which is always within the SRAM memory region.

When the instruction address or the literal address hits the address defined by the comparator, the read access is remapped to the table pointed to by the REMAP register.

Using the remap function, it is possible to create some “what if” test cases in which the original instruction or a literal value is replaced by a different one; even the program code is in ROM or Flash memory. An example use is to allow execution of a program or subroutine in the SRAM region by patching program ROM in the Code region so that a branch to the test program or subroutine can take place. This makes it possible to debug a ROM-based device.

image

Figure 16.3 Flash Patch: Remap of Instructions and Literal Read

Alternatively, the six instruction address comparators can be used to generate breakpoints as well as to invoke halt mode debug or debug monitor exceptions.

The AHB Access Port

The AHB-AP is a bridge between the debug interface module (SWJ-DP or SW-DP) and the Cortex-M3 memory system. For the most basic data transfers between the debug host and the Cortex-M3 system, three registers in the AHB-AP are used:

  • Control and Status Word (CSW)
  • Transfer Address Register (TAR)
  • Data Read/Write (DRW)
image

Figure 16.4 Connection of the AHB-AP in the Cortex-M3

The CSW register can control the transfer direction (read/write), transfer size, transfer types, and so on. The TAR register is used to specify the transfer address, and the DRW register is used to carry out the data transfer operation (transfer starts when this register is accessed).

The data register DRW represents exactly what is shown on the bus. For half word and byte transfers, the required data will have to be manually shifted to the correct byte lane by debugger software. For example, if you want to carry out a data transfer of half word size to address 0x1002, you need to have the data on bit [31:16] of the DRW register. The AHB-AP can generate unaligned transfers, but it does not rotate the result data based on address offset. So the debugger software will have to either rotate the data manually or split an unaligned data access into several accesses if needed.

Other registers in the AHB-AP provide additional features. For example, the AHB-AP provides four banked registers and an automatic address increment function so that access to memory within close range or sequential transfers can be speeded up.

In the CSW register, there is one bit called MasterType. This is normally set to 1 so that hardware receiving the transfer from AHB-AP knows that it is from the debugger. However, the debugger can pretend to be the core by clearing this bit. In this case the transfer received by the device attached to the AHB system should behave as though it is accessed by the processor. This is useful for testing peripherals with FIFO that can behave differently when accessed by the debugger.

ROM Table

The ROM table is used to allow auto detection of debug components inside a Cortex-M3 chip. The Cortex-M3 processor is the first product based on ARM v7-M architecture. It has a defined memory map and includes a number of debug components. However, in newer Cortex-M devices or if the chip designers modified the default debug components, the memory map for the debug devices could be different. To allow debug tools to detect the components in the debug system, a ROM table is included; it provides information on the NVIC and debug block addresses.

The ROM table is located in address 0xE00FF000. Using contents in the ROM table, the memory locations of system and debug components can be calculated. The debug tool can then check the ID registers of the discovered components and determine what is available on the system.

For the Cortex-M3, the first entry in the ROM table (0xE00FF000) should contain the offset to the NVIC memory location. (The default value in the ROM table’s first entry is 0xFFF0F003; bit[1:0] means that the device exists and there is another entry in the ROM table following. The NVIC offset can be calculated as 0xE00FF000 + 0xFFF0F000 = 0xE000E000.)

The default ROM table for the Cortex-M3 is shown in Table 16.2. However, since chip manufacturers can add, remove, or replace some of the optional debug components with other CoreSight debug components, the value you find on your Cortex-M3 device could be different.

Table 16.2 Cortex-M3 Default RAM Table Values

image image

The lowest two bits (LSB) of the value indicate whether the device exists. In normal cases, the NVIC, DWT, and FPB should always be there, so the last two bits are always 1. However, the TPIU and the ETM could be taken out by the chip manufacturer and might be replaced with other debugging components from the CoreSight product family.

The upper part of the value indicates the address offset from the ROM table base address. For example:

image

For debug tool development, it is necessary to determine the address of debug components from the ROM table. Some Cortex-M3 devices might have a different setup of the debug component connection that can result in different base addresses. By calculating the correct device address from this ROM table, the debugger can determine the base address of the provided debug component, and then from the component ID of those components the debugger can determine the type of debug components that are available.

1Not available on early versions of Cortex-M3 products that are based on Cortex-M3 revision 0.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.227.194