Chapter 6

Taking a Look Under the Hood

Publisher Summary

The chapter delivers the information that in order to obtain a driver’s license or a pilot’s license, a hood or the cowling for pilots is required. It is not important to understand neither how each part of the engine works nor how to fix it. Generally, mechanics feel happy to take care of it. But a basic understanding of what is going on would help to be a better driver or pilot. If one understands the machine, one can control it better. If required, one can diagnose little problems and can do a little maintenance. Working with a compiler is not that dissimilar. Sometime ahead, one has to start looking under the hood to get the best performance out of it. The basics of string declaration to introduce the memory allocation techniques used by the MPLAB® C compiler for the PIC24 are also outlined. The chapter concludes delving with details of the engine compartment.

Whether you are trying to get a driver’s license or a pilot’s license, sooner or later you have to start looking under the hood, or the cowling for pilots. You don’t have to understand how each part of the engine works nor how to fix it – mechanics will be happy to do that for you. But a basic understanding of what is going on will help you be a better driver/pilot. If you understand the machine, you can control it better – it’s that simple. You can diagnose little problems, and you can do a little maintenance.

Working with a compiler is not that dissimilar – sooner or later you have to start looking under the hood if you want to get the best performance out of it. Since the very first lesson we have been peeking inside the engine compartment. This time we will delve into a little bit more detail.

Flight Plan

In this lesson we will review the basics of string declaration as an excuse to introduce the memory allocation techniques used by the MPLAB® C compiler for the PIC24. The Harvard architecture of the PIC24 poses some interesting challenges that require innovative solutions. We will use several tools, including the Embedded Memory window, the Watches window and the Map file, to investigate how the MPLAB C compiler and linker operate in combination to generate the most compact and efficient code.

Preflight Checklist

Similarly to previous lessons we will make use of the MPLAB X IDE, the MPLAB C Compiler for the PIC24, a programmer/debugger of your choice and pretty much any demo board that can host a PIC24FJ128GA010 microcontroller.

Use the New Project Setup checklist to create a new project called 6-Strings and a new source file similarly called strings.c.

The Flight

Strings are treated in the C language as simple ASCII character arrays. Every string composed of characters is assumed to be stored sequentially in memory in consecutive 8-bit elements of the array. After the last character of the string an additional byte containing a value of zero (represented in a character notation with ‘’) is added as a termination flag.

Notice, however, that this is just a convention that applies to the standard C string manipulation library string.h. It would be entirely possible, for example, to define a new library and store strings in arrays where the first element is used to record the length of the string; in fact, Pascal programmers would be very familiar with this method. Additionally, if you are developing international applications, i.e. applications that communicate using languages that require large character sets (like Chinese, Japanese, Korean…), you might want to consider using Unicode, a technology that allocates multiple bytes per character, instead of plain ASCII. The MPLAB C library stdlib.h provides basic support for the translation from/to multi-byte strings defined by the ANSI90 standard.

Let’s get started by reviewing the declaration of a variable containing a single character:

char c;

As we have seen from the previous lessons, this is how we declare an 8-bit integer (character), which is treated as a signed value (−128.. +127) by default.

We can declare and initialize it with a numerical value:

char c = 0x41;

Or, we can declare and initialize it with an ASCII value:

char c =a;

Note the use of the single quotes for ASCII character constants. The result is the same, and for the C compiler there is absolutely no distinction between the two declarations – characters are numbers.

We can now declare and initialize a string as an array of 8-bit integers (characters):

char s[5] = {H’, ’E’, ’L’, ’L’, ’O};

In this example, we initialized the array using the standard notation for numerical arrays. However, we could have also used a far more convenient notation (a shortcut) specifically created for string initializations:

char s[5] = "HELLO";

To further simplify things, and save you from having to count the number of characters composing the string (thus preventing human errors), you can use the following notation:

char s[] = "HELLO";

The MPLAB C compiler will automatically determine the number of characters required to store the string, whilst automatically adding a termination character (zero) that will be useful to the string manipulation routines later to correctly identify the string length. So, the example above is, in truth, equivalent to the following declaration:

char s[6] = {H’, ’E’, ’L’, ’L’, ’O’, ’};

Assigning a value to a char (8-bit integer) variable and performing arithmetic upon it is no different to performing the same operation on any integer type:

char c;  // declare c as an 8-bit signed integer

c =a;  // assign valueafrom the ASCII table

c++;  // increment, it will be changed into ab

The same operations can be performed upon any element within an array of characters (string). However, there is no simple shortcut, similar to the one used above for the initialization, that can assign a new value to an entire string:

char s[15];  // declare s as a string of 15 characters

s = "Hello!"; // Error! This does not work!

By including the string.h file at the top of your source file, you will gain access to numerous useful functions that will allow you to:

• copy the content of a string into another…
strcpy(s, "HELLO");  // s : "HELLO"

• append (or concatenate) two strings…
strcat(s, "WORLD");  // s : "HELLO WORLD"

• determine the length of a string…
i = strlen(s);  // i : 11

• and many more.

Memory Space Allocation

Just like with numerical initializations, every time a string variable is declared and initialized, as in:

char s[] = "Flying with the PIC24";

three things happen:

1. The MPLAB C linker reserves a contiguous set of memory locations in RAM (data space) to contain the variable: 22 bytes in the example above. This space is part of the ndata (near) data section.

2. The MPLAB C linker stores the initialization value in a 22-byte-long table (in program memory). This space is part of the init code section.

3. The MPLAB C compiler creates a small routine that will be called before the main program (part of the crt0 code we mentioned in previous chapters) to copy the values, thereby initializing the variable.

In other words, the string "Flying with the PIC24" ends up using twice the space you would expect, as one copy of it is stored in Flash program memory and space is reserved for it in RAM memory, too. Additionally, you must consider the initialization code and the time spent in the actual copying process. If the string is not supposed to be manipulated during the program, but is only used “as is”, transmitted out over a serial port or sent to a display, then there is no need to waste precious resources. Declaring the string as a constant will save RAM space and initialization code/time:

const char s[] = "Flying with the PIC24";

Now, the MPLAB C linker will only allocate space in program memory, in the const code section, where the string will be accessible via the Program Space Visibility window – an advanced feature of the PIC24 architecture that we will review shortly.

The string will be treated by the compiler as a direct pointer into program memory and, as a consequence, there will be no need to waste RAM space.

In the previous examples of this lesson, we saw other strings implicitly defined as constants:

strcpy(s, "HELLO");

The string "HELLO" was implicitly defined as being of const char type, and was similarly assigned to the const section in program memory to be accessible via the Program Space Visibility window.

Note that if the same constant string is used multiple times throughout the program, the MPLAB C compiler will automatically store only one copy in the const section to optimize memory use, even if all optimization features of the compiler have been turned off.

Program Space Visibility

The PIC24 architecture is somewhat different from most other 16-bit microcontroller architectures you might be familiar with. It was designed for maximum efficiency according to the Harvard model, as opposed to the more common Von Neumann model. The big difference between the two is that in the Harvard model there are two completely separate and independent buses available: one for access to the program memory (Flash) and one for access to the data memory (RAM). The net result is a doubling of bandwidth, since when the data bus is in use during the execution of one instruction, the program memory bus is available to fetch the next instruction code and initiate its decoding. In traditional Von Neumann architectures, the two activities must instead be interleaved, with a consequent penalty in performance. The drawback of this architectural choice is that access to constants and data stored in program memory requires special considerations.

The PIC24 architecture offers two methods to read data from program memory: using special table access instructions (tblrd), and through a second mechanism called Program Space Visibility or PSV (Figure 6.1). This is a window of 32 Kbytes of program memory accessible via the data memory bus. In other words, the PSV is a bridge between the program memory bus and the data memory bus.

image
Figure 6.1 PIC24F Program Space Visibility (PSV) window

Notice that although the PIC24 uses a 24-bit-wide program memory bus, it operates only on a 16-bit-wide data bus. The mismatch between the two buses makes the PSV “bridge” a little more interesting. In practice, the PSV connects only the lower 16 bits of the program memory bus to the data memory bus. The upper portion (8 bits) of each program memory word is not accessible using the PSV window. However, when using the table access instructions, all parts of the program memory word become accessible, but at the cost of having to differentiate the manipulation of data in RAM (using direct addressing) from the manipulation of data in program memory (using the special table access instructions).

The PIC24 programmer can therefore choose between the more convenient, but relatively memory-inefficient, method for transferring data between the two buses of the PSV, or the more memory-efficient, but less transparent, solution offered by the table access instructions.

The designers of the MPLAB C compiler considered the trade-offs and chose to use both mechanisms albeit using them to solve different problems at different times:

• The PSV is used to manage constant arrays (numeric and strings) so that a single type of pointer (to the data memory bus) can be used uniformly for constants and variables.

• The table access mechanism is used to perform the variable initializations (limited to the crt0 segment) for maximum compactness and efficiency.

Investigating Memory Allocation

We will start investigating these issues with the following short snippet of code:

/*
** Strings
*/
#include <config.h>
#include <string.h>
// 1. variable declarations
const char a[] = "Learn to fly with the PIC24";

char b[100] = "";

// 2. main program
main()
{
  strcpy(b, "MPLAB C compiler"); // assign new content to b
} //main

Build the project for debugging using the Debug>Project Debug command.

Then open the Watches window and add to it the two strings a and b (Figure 6.2):

• You can select Debug>New Watch … from the main menu.

• You can press the CTRL+SHIFT+F7 keyboard shortcut.

• You can put the cursor in the editor window on each variable and right click with your mouse and select New Watch … from the context menu.

image
Figure 6.2 The Watches window showing the strings a and b

By default, MPLAB X will show each element of the array as a hexadecimal value but you can customize the view by right clicking on the string name (before expanding it) and selecting Display Value as: and picking the Character option as I do in Figure 6.3.

image
Figure 6.3 The Watches windows context menu

You will be presented with the Watches window context menu.

Now by clicking on the little + (enclosed in a box) icon you will be able to expand the view to show each individual element (Figure 6.4).

image
Figure 6.4 The string b expanded to show the array contents

Back in Figure 6.2, you might have noticed that the address of the string a (0x892B) appears to be a much higher value than the address of the string b (0x850). This reflects the fact that the constant string a is using only the minimum amount of space required in the Flash program memory of the PIC24 and will be accessed through the PSV space (starting at address 0x8000) and no RAM has been assigned to it. In contrast, the address of string b is clearly within the address space of the PIC24 RAM memory.

Notice also how the string b appears to be already initialized when we begin our debugging session (or after each reset). In reality, MPLAB X is allowing the crt0 code to execute before starting our debugging session, so we don’t have a chance to observe how the array is empty at the very beginning and how its initial value is copied from a location in Flash memory just before the call to the function main().

Note

Only the most curious and patient readers will be able to see how the initialization of the string b is performed using the Table Read (tblrd) assembly instruction to extract the data from the program memory (Flash) and to store the values in the allocated space in data memory (RAM).

While single stepping (select Debug>StepOver from the main menu) through the short main(), notice how the contents of the string b get overwritten as the strcpy() function is called (Figure 6.5).

Note

Although the string.h library contains dozens of functions, you will be pleased to know that the MPLAB C linker is wisely appending to our executable code only the functions that are actually being used.

image
Figure 6.5 The string b contents updated after the call to strcpy()

Looking at the Map

Another tool we have at our disposal to help us understand how strings (and in general any array variable) are initialized and allocated in memory is the map file. This text file, produced by the MPLAB C linker, can be easily inspected with the MPLAB X editor and is designed specifically to help you understand and resolve memory allocation issues.

To find this file, look for it in the main project directory where all the project source files are. Select File>Open File and then browse until you reach the project directory. There you will find a file called .map (no name, just the .map extension) (Figure 6.6). Make sure the Open dialog box is currently filtering for All Files in the Files of type field or you won’t be able to see it.

image
Figure 6.6 The Open dialog box

Map files tend to be pretty long and verbose but, by learning to inspect only a few critical sections, you will be able to find a lot of useful data. The Program Memory Usage summary, for example, is found among the very first few lines.

section  address  length (PC units)  length (bytes)  (dec)
-------  -------  -----------------  --------------  -----
.text  0x200  0x90  0xd8  (216)
.const  0x290  0x38  0x54  (84)
.dinit  0x2c8  0x4c  0x72  (114)
.text  0x314  0x16  0x21  (33)
.isr  0x32a  0x2  0x3  (3)

  Total program memory used (bytes):  0x1c2  (450) <1%

This is a list of small sections of code assembled by the MPLAB C linker in a specific order and position (dictated by a linker script file).

Most section names are pretty intuitive, other are … historical:

• .text section: where all the code generated from your source files by the MPLAB C compiler will be placed (the name of this section has been used since the original implementation of the very first C compiler).

• .const section: where the constants (integers and strings) will be placed for access via the PSV.

• .dinit section: where RAM variable’s initialization data (used by the crt0 code) will be found.

• .isr: where the Interrupt Service Routine (in this case a default one) will be found.

It’s in the .const section that the a constant string as well as the “MPLAB C” (implicit) constant string are stored for access via the PSV window.

You can confirm this by inspecting the Embedded Memory window at the address 0x8290, remembering to select the Memory mode to Program and the Format option to PSV Data.

Observe the “groups of two” character grouping of the string “MPLAB C Compiler” in Figure 6.7. Remember how the PSV allows us to use only 16 bits of each 24-bit program memory word.

image
Figure 6.7 Inspecting the .const and .dinit memory sections

In .dinit is where the b variable’s initialization string is to be found. This section follows immediately after the .const section, so you will be able to see it in the same Figure 6.7.

Observe how this string is prepared for access via the table instructions, therefore it uses each and every one of the 24 bits available in each program memory word. Note the character grouping of “groups of three” of the string “Initialized”.

The next part of the map file we might want to inspect is the Data Memory Usage (RAM) summary.

section  address  alignment gaps  total length  (dec)
-------  -------  --------------  -------------  ----
.icd  0x800  0x50  0x50  (80)
.ndata  0x850  0    0x64  (100)

  Total data memory used (bytes):  0xb4  (180) 2%

In our simple example, it contains only two sections:

1. .icd, a small area of 80 bytes reserved for the in circuit debugger use starting at address 0x800, the first location available in the PIC24 RAM.

2. .ndata, containing only one variable: b, for which 100 bytes are reserved immediately following .icd.

Pointers

Pointers are variables used to refer indirectly (i.e. point) to other variables or part of their content. Pointers and strings go hand in hand in C programming, as they are a powerful mechanism for working on any array data type. So powerful in fact, that they are also one of the most dangerous tools in the programmers hands and a source of a disproportionately large share of programming bugs. Some programming languages, such as Java, have gone to the extreme of completely banning the use of pointers in an effort to make the language more robust and verifiable.

The MPLAB C compiler takes advantage of the PIC24 16-bit architecture to manage with ease large amounts of data memory (up to 32 Kbytes of RAM in GA and GB models). In particular, thanks to the PSV window, the MPLAB C compiler doesn’t make any distinction between pointers to data memory objects and const objects allocated in program memory space. This allows a single set of standard functions to manipulate variables and/or generic memory blocks as needed from both spaces.

Note

This will come as a big relief to those of you who have previously attempted to program 8-bit PIC® microcontrollers in C.

The following classic program example will compare the use of pointers versus indexing to perform sequential access to an array of integers:

int *pi;  // define a pointer to an integer
int i;  // index/counter
int a[10];  // an array of integers
// 1. sequential access using array indexing
for(i=0; i<10; i++)

 a[i] = i;

// 2. sequential access using a pointer
pi = a;
for(i=0; i<10; i++)
{

 *pi = i;

 pi++;

}

In 1. we performed a simple for loop and each time round the loop we used i as an index into the array. To perform the assignment the compiler will have to take the value of i, multiply it by the size of the array element in bytes (2) and add the resulting offset to the initial address of the array a.

In 2. we initialized a pointer to point to the initial address of the array a. At each time round the loop we used the pointer (*) to perform the assignment, then we simply incremented the pointer.

Comparing the two cases, we see how, by using the pointer, we can save at least one multiplication step for each time round the loop. If the array element is used more times inside the loop, the performance improvement is going to be proportionally greater.

Pointer syntax can become very “concise” in C, allowing for some pretty effective code to be written, but also opening the door to more bugs.

As a minimum, you should become familiar with the most common contractions. The previous snippet of code is more often reduced to the following:

// 2. sequential access to array using pointers
for(i=0, pi=a; i<10; i++)

 *pi++ = i;

Also note that an empty pointer, that is, a pointer without a target, is assigned a special value NULL, which is implementation-specific and defined in stddef.h.

The Heap

One of the advantages offered by the use of pointers is the ability to manipulate objects that are defined dynamically (at run time) in memory. The heap is the area of data memory reserved for such use, and a set of functions, part of the standard C library stdlib.h, provide the tools to allocate and free the memory blocks. They include as a minimum the two fundamental functions:

void *malloc(size_t size);

which takes a block of memory of requested size from the heap and returns a pointer to it; and

void free(void *ptr);

which returns the block of memory pointed to by ptr to the heap.

The MPLAB C linker places the heap in the RAM memory space left unused above all project global variables and the reserved stack space. Although the amount of memory left unused is known to the linker and listed in the map file of each project, you will have to explicitly instruct the linker to reserve an exact amount for use by the heap.

Use the File>Project Properties menu command to open the Project Properties dialog box, select the pic30-ld (MPLAB C Linker) tab, and then define the heap size in bytes.

As a general rule, allocate the largest amount of memory possible as this will allow the malloc() function to make the most efficient use of the memory available. After all, if it is not assigned to the heap it will remain unused.

MPLAB C Memory Models

The PIC24 architecture allows for a very efficient (compact) instruction encoding for all operations performed on data memory within the first 8 Kbytes of addressing space. This is referred to as the near memory area and in the case of the PIC24FJ128GA010 it corresponds to the group of SFRs (within the first 2 Kbytes) and the following 6 Kbytes of general purpose RAM. Only the top 2 Kbytes of RAM are actually outside the near space.

Access to memory beyond the 8-Kbyte limit requires the use of indirect addressing methods (pointers) and could be less efficient if not properly planned for. The stack (and with it all the local variables used by C functions) and the heap (used for dynamic memory allocation) are naturally accessed via pointers and are correspondingly ideal candidates to be placed in the upper RAM space. This is exactly what the linker will attempt to do by default. It will also try to place all the global variables defined in a project in the near memory space for maximum efficiency. If a variable cannot be placed within the near memory space it has to be “manually” declared with a far attribute, so that the compiler will generate the appropriate access code. This behavior is referred to as the Small Data Memory model. This is the alternative to the Large Data Memory model, where each variable is assumed to be far unless the near attribute is specified.

In practice, while using the PIC24FJ128GA010, you will use almost uniquely the default small memory model and on rare occasions you will find it necessary to identify a variable with the far attribute. We will observe one such case in lesson number 12, where a very large array that would otherwise not fit in the near memory space will have to be declared as far. As a consequence, not only will the compiler generate the correct addressing instructions, but the linker will also push it to an upper area of RAM, giving priority to the other global variables and allowing them to be accessed in the near space.

Since access to elements of an array (explicitly via pointers or by indexing) is performed via indirect addressing anyway, there will be no performance or code size penalty.

A similar concept applies to the program memory space. In fact, within each compiled module, functions are called by making use of a more compact addressing scheme that relies on a maximum range of +/−32 Kbytes. Program memory models (small and large) define the default behavior of the compiler/linker with regards to the addressing of functions within or outside this 32-Kbyte range.

Post-Flight Briefing

In C language, strings are defined as simple arrays of characters, but the C language standard had no concept of different memory regions (RAM vs Flash) nor of the particular mechanisms required to cross the bridge between different buses in a Harvard architecture. The programmer using the MPLAB C compiler needs a basic understanding of the trade-offs of the various mechanisms available and the allocation strategies adopted to make the most out of the precious resources (RAM especially) available to the embedded control applications.

Notes for the C Experts

The const attribute is normally used in C language together with most other variable types only to assist the compiler in catching common parameters usage errors. When a parameter is passed to a function as a const, or a variable is declared as a const, the compiler can in fact help flag any following attempt to modify it. The MPLAB C use of the PSV only extends this semantic in a very natural way, allowing for a more efficient implementation as we have seen.

Notes for the Assembly Experts

• The string.h library contains many useful block manipulation functions that can be useful, via the use of pointers, to perform operations on any type of data array, not just strings, such as memcpy(), memcmp(), memset() and memmove().

• The ctype.h library, on the other hand, contains functions that help discriminate individual characters according to their position in the ASCII table, to discriminate lower case from upper case, and/or convert between the two cases.

Notes for the PIC Microcontroller Experts

Since the PIC24 program memory is implemented using Flash technology, programmable with a single supply voltage even at run time and during code execution, it is possible to design boot-loaders, which are applications that automatically update part or all of their own code. It is also possible to utilize sections of the Flash program memory as a non-volatile memory storage area, as long as you stay within some pretty basic limitations. To write to the Flash program memory you will need to utilize the table access methods and exercise extreme caution. The PSV window is a read-only mechanism and, as we have seen before, it gives access only to 16 of the 24 bits of each program memory location.

Also pay notice to the fact that the memory can only be written in complete rows of 64 words each and must be first erased in blocks of eight rows (512 words) each. This can make frequent updates impractical if single words or, as is more usual, small data structures are being managed.

Tips & Tricks

String manipulation can be fun in C once you realize how to make the zero termination character work for you efficiently. Take for example, the mycpy() function below:

void mycpy(char *dest, char *src)
{

 while(*dest++ = *src++);

}

This is quite a dangerous piece of code, as there is no limit to how many characters could be copied. Additionally, there is no check as to whether the dest pointer is pointing to a buffer that is large enough and you can imagine what would happen should the src string be missing the termination character. It would be very easy for this code to continue beyond the allocated variable spaces and to corrupt the entire contents of the data RAM, including the all precious SFRs.

As a minimum, you should try to at least verify that pointers passed to your functions have been initialized before use. Compare them with the NULL value (declared in stdlib.h and/or stddef.h) to catch the error.

Add a limit to the number of bytes to be copied. It is reasonable to assume that you will know the size of the strings/arrays used by your program and if you don’t, use the sizeof() operator. A better implementation of mycpy() would be the following:

void mycpy(char *dest, char *src, int max)
{

  if ((dest != NULL) && (src != NULL))

   while ((max−− > 0) && (*src))

    *dest++ = *src++;

}

Exercises

Why not try developing new string manipulation functions to perform the following operations:

• Search for a string in an array of strings, sequentially.

• Implement a binary search.

• Develop a simple hash table management library.

Books

• Wirth, N., 1976. Algorithms + Data Structures = Programs, Prentice-Hall, Englewood Cliffs, NJ.
With un-paralleled simplicity, Wirth (the father of the Pascal programming language) takes you from the basics of programming all the way up to writing your own compiler.

Links

• http://en.wikipedia.org/wiki/Pointers#Support_in_various_programming_languages Learn more about pointers and see how they are managed in various programming languages.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.224.197