C H A P T E R  5

image

Variables: Identifiers, Basic Scalar Data Types, and Literals

Classic Data Types

C programmers will find that packetC provides a variety of familiar data types that follow C99 conventions. The use of identifiers, literals, and basic scalar data types is almost identical to what you would expect from C. In a few instances, such as enumeration types and structure bit-fields, there are subtle differences introduced to clean up ambiguities in C99 or to address some specific issues related to network structures required by packetC. Those differences don't appear visually or grammatically as much as they do in the implementation expectations and meaning. Where deviations occur, they will be highlighted to help the C programmer note the important changes.

This chapter provides a quick reference for identifiers, literals, and basic data types in packetC, highlighting syntax as well as any important points specific to packetC. In addition, a small section presents an overview on the impact of network byte order endianness on all data types in packetC. Programmers porting from one platform to another often grapple with big-endian versus little-endian systems and in particular fight with inconsistencies when creating network structures which are big-endian in Internet Protocol networks. In packetC, given its focus on networks, the language will always be big-endian no matter the underlying platform the application is working on. Big-endian will apply to packetC process through to transmission on the network. This awareness is critical as one moves from simple data types to complex ones or even just in specifying the value of a literal.

Identifiers and a Few Fundamentals

Identifiers in packetC are similar to C99 identifiers but there are some differences. In particular, there are some constraints on packetC identifiers that can introduce some issues in code brought directly from C environments; however, one will quickly find that packetC has simple rules for identifiers.

Given the following definitions:

  • letter = ‘a’..‘z’ or ‘A’..‘Z’
  • underscore = ‘_’
  • dollar sign = ‘$’
  • digit = ‘0’..‘9’

Identifiers need to follow the following rules:

  • The initial character of an identifier must be a letter.
  • Any subsequent characters can be letters, digits, or underscores.
  • packetC identifiers are case sensitive.
  • Dollar signs may appear in system declared identifiers referenced. Not available to user defined data types.

The following represent all possible values present in a packetC identifier:

  • Digit = ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’
  • Letter = ‘A’|‘B’|‘C’|‘D’|‘E’|‘F’|‘G’|‘H’|‘I’|‘J’|‘K’|‘L’|‘M’|‘N’|‘O’|‘P’|‘Q’|‘R’|‘S’|‘T’|‘U’|‘V’|‘W’|‘X’|‘Y’|‘Z’
    |‘a’|‘b’|‘c’|‘d’|‘e’|‘f’|‘g’|‘h’|‘i’|‘j’|‘k’|‘l’|‘m’|‘n’|‘o’|‘p’|‘q’|‘r’|‘s’|‘t’|‘u’|‘v’|‘w’|‘x’|‘y’|‘z’
  • OtherChar = ‘_’|‘$’

Throughout this chapter, examples of literals being constructed in a number of forms are shown. The grammar production for those literals is shown below as well as a list of the possible operators within packet:

  • Decimal = Digit [ Decimal ]
  • HexDigit = Digit |‘A’|‘B’|‘C’|‘D’|‘E’|‘F'|‘a’|‘b’|‘c’|‘d’|‘e’|‘f’
  • Hexdecimal = ( HexDigit | Digit ) [ Hexadecimal ]
  • HexEntry = ‘0x’ Hexadecimal
  • Binary = ( 0 | 1 ) [ Binary ]
  • BinaryEntry = ‘0b’ Binary
  • String = ( Letter |  ‘x’ ( HexDigit | Digit ) ( HexDigit | Digit ) |‘’ Digit Digit Digit ) [ String ]
  • Operator = ‘<’|‘<=’|‘>’|‘>=’|‘=’|‘!=’|‘+’|‘-’|‘*’|‘/’|‘%’|‘==’|‘ (’|‘)’|‘ [’|‘]’|‘.’|
    ‘,’|‘++’|‘-’|‘~’|‘<<’|‘>>’|‘&’|‘^’|‘|’|‘!’|‘&&’|‘||’|‘+=’|‘-=’|‘*=’|‘/=’|‘%=’|‘%=’|
    ‘<<=’|‘>>=’|‘&=’|‘^=’|‘|=’| sizeof | lock | unlock | offset | packet_offset | ref | deref
  • Hex Example:             0×00010203
                                         0×04050607
                                         0×08090a0b
  • Decimal Example:          12345
                                             213
                                             12333
                                             5
  • Binary Example:             0b0101001010101010101010101
                                             0b0101001010101111101010101
                                             0b0101111110101010101010101

Basic Scalar Types

packetC supports four basic types. All four types are unsigned only and do not have signed variants within packetC. Without signed values, the lowest possible number is 0 and that 1 more than the maximum is also 0 with no carry flags present in the language. Other than that, packetC uses straightforward simple C scalar types.

The unsigned integer types are: byte, short, int and long.

images

While packetC supports an array of bit-level operations, bit is not a scalar type. Instead, bits are within bit-fields that comprise a fully padded basic scalar type as discussed in Chapter 8 and shown in Chapter 19 descriptor data type examples.

Examples of variable declarations are shown below:

int   myIpAddr;
byte  count;
short srcPort;
long  bigCounter;

Literals

Four categories of literals are present in packetC, namely integral, network, string, and character literals. Each of these plays an important role in packetC and they have some interesting variations from their representations in C that should be studied. Integral type literals do not differ from C99, but rather clarify their representation in packetC to ensure clear understanding for C programmers where integral type literal syntax varies on some platforms. In the case of network literals, these are new to packetC, yet very familiar to those working in networking. String and character literals follow C99, however, these involve escape sequences that can differ from one platform to another. Furthermore, as will be discussed later, a string literal can be leveraged by the compiler to provide the size of a byte array at compile time.

Integral value literals can be expressed in binary, hexadecimal and decimal radices.

byte b = 0b00000110;    // binary radix
byte h = 0xa5;          // hexadecimal radix
byte d = 255;           // decimal radix

Network literals, new with packetC, allow the user to specify an integer as a dotted address (or dotted quad address) to indicate an IPv4 32-bit address or network mask.

int  myInt = 162.150.1.1;
int  myMask = 255.255.255.0;

String literals are defined within double quotes. A string can be assigned to an array of byte, which must have appropriate space to receive the accompanying sequence of characters.

byte myStr[8] = "a string";         // Not null terminated
byte ntStr[9] = "a stringx00";     // null-terminated

Standard escape sequences are also supported within string literals and character literals.

               New line
              Carriage return
    ddd    ASCII character in decimal (ddd range 0 to 255)
    xhh    ASCII character in hexadecimal (hh range 0x00 to 0xFF)

Character literals are surrounded by a single quote assigned to a byte variable. This may include printable ASCII values enclosed in the quote or unprintable values introduced using one of the variations on escape sequences.

byte myChar = 'a';               // assign lowercase letter A
byte singleQuoteChar = ''';     // assign quote character
byte alert = 'a';               // assign alert character
byte nullTerm = 'x00';          // assign to hex 0x00
byte terminator = '00';        // assign to decimal 0

Integral Type Literals

In packetC, integral type literals can have values expressed in binary, hexadecimal, and decimal radices. The default radix is decimal. Radices are indicated for binary and hexadecimal literals by leading characters ‘0b' and ‘0x' respectively.

byte b = 0b00000110;       // binary radix
byte h = 0xa5;             // hexadecimal radix
byte d = 255;              // decimal radix

In addition to the prefix notation identified above, for byte, short, and long a decimal literal can use a postfix notation modifier. By default, a literal will be of type int; however, with the ‘b’, ‘s’, or ‘l’ following, a literal may be used to indicate byte, short, or long, respectively. Uppercase ‘B’, ‘S’, and ‘L’ may be used as well.

byte b = ~0b;                // byte literal using negation
byte b = ~(byte)0;           // byte literal as above using casting
short s = 12s;               // short literal
long l = 255l;               // long literal
long l1 = (1L << 33);        // ensure all bits preserved
const long l2 = (7L << 7);   // 64-bit constant with lower 3 bits set, 7 bits shifted

Integral type literals will appear in a variety of forms given the numerous representations based upon leading prefixes as well as postfix magnitude indicators interleaved with casting and other packetC operators. The key to packetC is no ambiguity in the meaning of any represented code for predictability and simplified audit to ensure secure code. Wherever possible, it is suggested that any literals leverage postfix notation when their value is possibly unclear. One aspect important to remember is packetC will always pad literals as they are cast to larger data types with zeros in the most significant bits, just as you would expect. Furthermore, packetC requires strict typing ensuring that compilers will highlight type violations.

short s = 12b;        // error due to strict typing as type is short and literal is byte
short s = (short) 12b // byte properly cast back to short literal, not recommended
long l = 4096;        // ambiguous declaration as 4096 presumed int, legal but not recommended
byte b = 1050;        // error - literal value greater than container
int i = 4096 + (int) 17s; // legal - integral value literals presumed int, short literal cast.

The most important advice with packetC integral type literals is to think about future readers of code with an eye for removing any ambiguity in your meaning about the value a literal represents.

Network Literals

Network literals, new with packetC, allow the user to specify an integer as a dotted address (or dotted quad address) to indicate an IPv4 32-bit address or network mask. Such a literal consists of 4 individual integer values (ranging from 0 to 255), separated by dots (‘.’). packetC associates such a value with the 4-byte int data type, associating the literal's leftmost individual value with the most-significant integer byte, the rightmost individual literal value with an integer's least significant byte, and so forth.

int  myInt  =  162.150.1.1;    // 162 is stored in myInt's most significant byte
int  myInt  =  192.168.0.1;    // 192 is stored in myInt's most significant byte

// Error
int addr  =  256.150.5.1;      // ERROR: Value 256 too large for byte (255 max)
short  s1  =  163.98.1.1;      // ERROR: dotted quad form is only allowed for int

In networking, the use of IP addresses is at the heart of managing almost any device or programming networking applications. As use migrates from beyond a single device into a network, the notion of ranges of addresses becomes critical. IP network protocol routing manages groups of addresses that share a common high-order portion of the address as subnets to simplify routing tables. The migration to networking routing decisions based upon portions of the address was introduced with Classless Inter-Domain Routing (CIDR) blocks leading to what many see as the requirement of entering an IP address and network mask into systems. As packetC applications often include routing features, the ease of interacting with CIDR blocks and masks becomes critical to simplifying the developer's application.

When performing simple network masks, such as those related to class A, class B, and class C (e.g., /8, /16, /24) networks, the role of network literals isn't as pronounced.

int  myCIDR  =  255.0.0.0;       // A class A /8 Network Mask Using packetC Network Literal
int  myCIDR  =  0xff000000;      // A class A /8 Network Mask Using Integral Value Literal

When performing complex network masks, the benefit of packetC network literals becomes clear.

int  myCIDR  =  255.255.224.0;  // A /19 Network Mask Using packetC Network Literal
int  myCIDR  =  0xffffe000;     // A /19 Network Mask Using Integral Value Literal
int  myCIDR  =  0b11111111111111111110000000000000;
int  myHost  =  0.0.31.255;     // The Host Within A /19 Network Mask Using
                                // packetC Network Literal
int  myHost  =  0x00001fff;     // A /19 Network Mask Using Integral Value Literal
int  myHost  =  0b00000000000000000001111111111111;
int  myHost  =  0.0.0.255;      // Simple /8 Host Mask

As network headers are often being constructed or manipulated in packetC, the goal of network literals is to provide as many tools as possible to the developer, not only to simplify construction but also to provide representations that aid in removing ambiguity from the meaning of an assignment. For example, if you read code, assigning an integer a value of 0×c0a80001 doesn't jump out at you as clearly that this is hard-coding an upstream routing IP address of 192.168.0.1 into the application.

String Literals

In packetC the notion of a character and a string do not have their own data types. They do, however, play a central role to the language and string literals are key to data initialization, assignment, and evaluation. Further reading will show that single characters declare one byte whereas strings are arrays of bytes.

String literals can be defined within double quotes and can be assigned to an array of byte, which must have appropriate space to receive the accompanying sequence of characters. It is a fatal error to attempt to assign a string to a byte array that has insufficient space to accommodate the string. A byte array without a dimension specified will be assigned the magnitude of the string literal at compile time.

byte myStr[8]  = "a string";            // a simple string literal
byte myStr[]   = "a string";            // string literal specifies magnitude of 8 for myStr

// Error
byte badStr[7] = "a string";           // Insufficient space for string

Unlike C, packetC never automatically inserts a null terminator (byte containing the decimal value zero) at the end of a string nor does it expect that the size of the variable must have an extra byte present for a null. Null values in network data are perfectly valid. Evaluations don't necessarily stop when evaluating strings when a null is found in either data element. Later chapters will address packetC searchset data types and their interaction null termination. Furthermore, two byte arrays, strings, can be directly evaluated without the use of a function call using standard C equality operators, whether null values are present or not. Users can embed null terminators inside strings by using escaped sequences. In packetC, a null value may be inserted anywhere within the literal and there may be more than one null value.

byte myStr[8] = "a string";             // not a null-terminated string
byte ntStr[9] = "a stringx00";         // null-terminated string
byte enStr[10] = "embedx00null";      // embedded null terminator followed by word null

In addition to the representations for null shown above, two pre-defined string literals related to null are defined in packetC and are found in cloudshield.ph.

const byte NULL_STOP[1]  = "x00";
const byte NULL_REGEX[4] = ".*?x00";

Byte arrays with a single dimension or multiple dimensions can accept string literal assignments. A packetC compiler may impose an implementation-defined maximum length for string literals.

// 2D array of string literals
byte       twoD[2][3]      = {"owl", "cat"};

// 2D array of string literals imported from file
const byte twoD[50][80]    = {
#include "words.px"
};

In the character literals section, a full set of packetC escape characters are shown which can also appear within string literals by using the backslash character as an escape character. For example, a double quote character in a string is represented by the two-character escape sequence, “. Similarly, the x before two digits in the 0-9A-F range is an escape character representation for a hexadecimal value not to be confused with the integral value literal representation of 0x00 form. String literals can include any combination of both predefined escape sequences, such as f, and numeric escape sequences, such as ×07, in a single string literal.

byte myStr[8]   = "a string";           // not a null-terminated string
byte ntStr[9]   = "a stringx00";       // null-terminated string
byte dQuote[16] = "a double quote:"";  // contains escaped double quote
byte aStr[12]   = "alertainside";      // predefined escape 'a'
byte enStr[10]  = "embedx00null";     // embedded null terminator
byte twoD[2][3] = {"owl", "cat"};       // 2D array of string literals
byte third[9]   = "my string";          // a simple string literal assignment
// collection of strings with mix of literals and variables
byte many[4][9] = { "just some", " of ", third, " values." };

A string literal cannot cross multiple text lines unless a continuation character ('') appears as the final character on a line to be continued. When string literals are continued in this fashion, neither the continuation character nor the newline character following it is present in the string.

byte goodContinueStr[64] = "abc
def";                                // string is stored s "abcdef"

// Error
byte badContinueStr[64] = "abc
def";;                                // no closing quote or continue character

When a single-dimension byte array is initialized by a string literal with an unsized dimension, it receives the size of the literal string (without any automatic null termination). An array with multiple unsized dimensions can have its sizes defined by a compound initialization clause that includes string assignments but the strings must be the same size if the most rapidly varying dimension is defined in this way.

byte unsized[] = "str determines size";  // array of 19 bytes
byte good[][] = {"owl", "cat"};          // array is [2][3]

// Error
byte bad[][] = {"bird", "owl"};          // array[2][?] inconsistent sizes

Character Literals

Many aspects of character literals have been previewed in the string literals shown. However, a deeper dive into characters is revealing. As stated earlier, packetC does not have a character data type per se, but rather utilizes the basic scalar data type byte for storage of characters. As all values are unsigned, this bears no real impact in packetC as the equivalent of typdef byte char; would introduce char as a data type for a developer, although not recommended for purposes of clarity in code as packets are always collections of bytes, even if they store characters or strings.

A solitary ASCII character, surrounded by single quotes can be assigned to a byte variable and can be used anywhere that a byte numeric value could be. The character values specified by the single quote form and the string values specified by the double quote form can both use the two-character and four-character escape sequences described below. All the escape sequences, including those for non-printing characters and those that use hexadecimal or decimal values, begin with the backslash character.

  • a:      Alert
  • :      Backspace
  • f:      Form feed
  • :      New line
  • :      Carriage return
  • :      Horizontal tab
  • v:      Vertical tab
  • ':      Single quotation mark
  • ":      Double quotation mark
  • \:      Backslash
  • ddd: ASCII character in decimal (ddd range 0 to 255)
  • xhh: ASCII character in hexadecimal (hh range 0×00 to 0×FF)
// Example
byte myChar  = 'a';
byte singleQuoteChar = ''';           // assign quote character
byte alert = 'a';                     // assign alert character
// Hex and decimal numeric values are legal
byte nullTerm    = 'x00';                // assign to hex 0x00
byte terminator = '00';              // assign to decimal 0

For a complete listing of ASCII values and the associated escape sequences and hexadecimal or decimal values that will be assigned to a character based upon using a particular escape sequence, refer to the ASCII reference chart found in the references chapter.

Because a character literal is interchangeable with a numeric literal that would fit into a byte variable, it can be used with the same operators that are legal for an integer literal.

byte myChar = ' ' + 1;  // valid, not clear that a TAB (0x09) becomes a NEW LINE (0x0A)
myChar += 'A';         // useful if it's clear the previous value in character offset from 'A'
if( ' ' >= 9 ) {…}    // fairly confusing meaning
if( (myChar >= 'A') && (myChar <= 'Z') ) {…}    // makes sense as it defines
                                                // a range of ASCII chars

While this can lead to some interesting examples, it should be used only where it is removing ambiguity from what is being declared.

Network Byte Order

All values in packetC are stored in network byte order, which follows a big endian byte-allocation. Additionally, packetC bits are stored in a little-endian bit-allocation order. This matches the ordering of bytes and bits with how packets are represented in IP and Ethernet networks. The packetC compiler and/or execution environment will accommodate adjustments to ensure this for a packetC programmer even if the underlying system does not natively support network byte order computation or packet delivery.

Unlike C, in packetC the exact number of bytes allocated and the binary bit representation of an integer is consistent across all target platforms. This ensures that basic scalar types are consistent across all platforms. This also means that functionality related to adapting to platform representations is unnecessary in packetC. The compiler understands the address and alignment when performing type casting based upon strict rules. Byte level unions and bit level access all follow network byte ordering as shown below in Figure 5-1 for the integer maxRateLimit.

images

Figure 5-1. Byte and bit alignment of variable maxRateLimit

While the C programmers rarely are concerned about the exact binary representation of data, in networking and packetC this is an item of tremendous interest since it affects everything in the data plane. This applies both at the bit and byte level as well as the overall packing and allocation of structures and higher order data sets in packetC. Without this foundation, much of what packetC addresses may be misunderstood by traditional C programmers not comfortable with embedded systems.

Throughout packetC, a lot of focus is applied to bit and byte level representation as well as working with this through bitfields. No matter which platform you are on, the conversion between character or string literals and integral value and network literals will always be consistent. Furthermore, bit-level masking and access when moving literals into bit-fields found within structures will always map to the big-endian representation shown above. This is very useful when performing simple forms of conversion such as that for upper- and lower-case ASCII characters which can be swapped from ‘A’ to ‘a’ by the manipulation of a simple bit with no concern for the bit level representation in packetC since it will always be consistent.

byte toUpperCase (byte inChar)
{
   // Value 0xdf is 0b11011111 where zero's position is being toggled.
   if( (inChar >= 'a') && (inChar <= 'z') )
     return (inChar & 0xdf);
   else
     return inChar;
};

byte toLowerCase (byte inChar)
{
   // Value 0x20 is 0b00100000 where one's position is being toggled.
   if( (inChar >= 'A') && (inChar <= 'Z') )
      return (inChar | 0x20);
   else
      return inChar;
};

Unsupported Types

packetC does not support the following data types:

  • Boolean
  • Floating Point
  • Pointer
  • Register

While Boolean is not a native type within packetC, a typedef for Boolean is provided, typdef int bool, providing similar functionality. In addition, declarations for true and false are also provided in cloudshield.ph.

While floating point data may need to be processed, floating point is not supported in packetC. Libraries can be implemented to add relevant functionality for the limited floating-point processing required in networking.

There is no equivalent to a pointer in packetC as a data type. Developers looking for similar functionality for building dynamic linkages should refer to sections on references in packetC for more information on those forms of handles used with data sets.

The register data type is redundant in packetC as local variables within packet scope are expected to implement the underlying execution platform as closely as possible to a register due to the implied data plane performance expectations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.137.58