Digging deeper – SystemView data breakpoints

So far, we've determined where our processor is stuck, but we haven't uncovered anything to help us determine what needs to be changed to get the system operational again. Here are the steps we need to take to uncover the root cause of the issue:

  1. Let's take a look at the assertion again. Here, our goal is to troubleshoot exactly why it is failing. Run the following command:
configASSERT( ( portAIRCR_REG & portPRIORITY_GROUP_MASK ) <= ulMaxPRIGROUPValue );
  1. Using SystemView's memory viewer, analyze the value of portAIRCR_REG in port.c:

  1. Since this is a hardcoded memory location, we can Set Data Breakpoint, which will pause execution each time the memory location is written. This can be a quick way to track down all of the ways a variable is accessed, without attempting to search through the code:

  1. Upon restarting the MCU, the write breakpoint is immediately hit. Although the program counter is pointing to HAL_InitTick, the actual data write to the 0xE000ED0C address was done in the previous function, that is, HAL_NVIC_SetPriorityGrouping.  This is exactly what we expect since the assert is related to interrupt priority groups:

  1. Some quick searching through the code for NVIC_PRIORITYGROUP_4 reveals the following comment in stm32f7xx_hal_cortex.c:
* @arg NVIC_PRIORITYGROUP_4: 4 bits for preemption priority
* 0 bits for subpriority
Priority grouping: The interrupt controller (NVIC) allows the bits that define each interrupt's priority to be split between bits that define the interrupt's preemption priority bits, as well as the bits that define the interrupt's sub-priority. For simplicity, all bits must be defined to be preemption priority bits. The following assertion will fail if this is not the case (if some bits represent a sub-priority).

Based on this information, there should be 0 bits for the subpriority. So, why was the value of the priority bits in portAIRCR_REG non-zero? 

From the ARM® Cortex® -M7 Devices Generic User Guide, we can see that to achieve 0 bits of subpriority, the value of the AIRCR register masked with 0x00000700 must read as 0 (it had a value of 3 when we looked at the value in memory):


Here is the explanation for PRIGROUP in the same manual. Notice that PRIGROUP must be set to 0b000 for 0 subpriority bits:

This certainly warrants further investigation... why was the value of PRIOGROUP 3 instead of 0? Let's take another look at that configAssert() line:

configASSERT( ( portAIRCR_REG & portPRIORITY_GROUP_MASK ) <= ulMaxPRIGROUPValue );

Note the following definition of ulMaxPRIOGROUPValue in port.c. It is defined as static, which means it has a permanent home in memory:

#if( configASSERT_DEFINED == 1 )
static uint8_t ucMaxSysCallPriority = 0;
static uint32_t ulMaxPRIGROUPValue = 0;

Let's set up another data breakpoint for ulMaxPRIGROUPValue and restart the MCU again, but this time, we'll watch each time it is accessed:

  • As expected, something was accessed by the BaseType_t xPortStartScheduler( void ) function in port.c
  • The curious part about the data access breakpoint is that it is hit when the program counter is inside SEGGER_RTT.c, which doesn't look right since ulMaxPRIGROUPValue is privately scoped to xPortStartScheduler in port.c
  • Looking at the debugger the problem is staring right at us:
    • The ulMaxPRIGROUPValue static variable is being stored in 0x2000 0750.
    • The data write breakpoint was hit with the stack pointer at 0x200 0740.
    • The stack has been overrun:

We've just uncovered a stack overflowIt manifested itself as a write into a static variable (which happened to trigger a configAssert in an unrelated part of the system). This type of wildly unexpected behavior is a common side effect of stack overflows.

Currently, the minimum values of each stack in main.c has been set to 128 words (1 word = 4 bytes), so increasing this to 256 words (1 KB) gives us plenty of headroom.

This example is fairly representative of what happens when functionality is added to a preexisting task that was working properly previously. If the new functionality requires more functions to be called (with each having local variables), those variables will consume stack space. In this example, this problem only showed up after adding the SEGGER print functionality to an existing task. Because there wasn't additional stack space available, the task overflowed its stack and corrupted the memory that was being used by another task.

The problem in this example would have likely been caught if we had the stack overflow hooks set up it would have certainly been caught if the MPU port was being used.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.141.6