Chapter 2: Introduction to the packetC Language

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

C H A P T E R 2

Introduction to the packetC Language

packetC Language Design Considerations

The primary objective in the packetC design is to define a language that will allow software developers to use familiar, high-level language constructs to express coding solutions for packet processing applications for general purpose, and for CloudShield-enabled platforms in particular.

While C provided widespread familiarity of syntax, the underlying emphases of C and packetC as programming languages are different. The following differences weighed heavily upon the design considerations of packetC:

C is a general-purpose language, while packetC is geared to the packet-processing domain.
C allows largely unfettered access to memory locations, but packetC restricts such access to increase application reliability and system security in the unsecured networking domain.
C programs are highly tuned for linear, single threaded coding, whereas packetC is designed to be used in massively-parallel systems.
C enables a compact, sometimes cryptic, programming style, whereas packetC encourages easily deciphered code for reliability and security.

Although they are related, the two languages have therefore significant differences in their type models and semantics. Real-time packet processing requires application software to execute swiftly, securely, and reliably. Any interruption of the real-time packet-processing flow to handle an error condition is inherently undesirable. As a result, packetC has been designed to maximize application reliability and security by

Simplifying and constraining the type declaration system to prevent unforeseen type conflicts
Avoiding type coercions or promotions to prevent unexpected data truncations or expansions
Supporting a strong typing model with restrictive type casting to prevent unexpected side effects
Connecting declaration source code location to declaration scope in a clear, intuitive way
Requiring switch statements to exhibit clear control flow
Enforcing a try-catch-throw model of exception handling that addresses all thrown exceptions

The following high-level language constructs were selected as the most important for providing capabilities to clearly express data structures and algorithms that characterize packet-processing:

User-defined types that aggregate data (specificially, structures and unions)
High-level constructs for expressing conditional algorithm control flow (e.g., if, while and switch statements)
An intuitive way to express arbitrarily complex arithmetic expressions in symbolic fashion
A means for decomposing complex programs into smaller, cohesive functions

packetC Language Similarities

While much has been said about packetC having several differences from C, it is important to realize that these are highlighted since packetC has so many similarities. Without highlighting packetC's differences, many C programmers would struggle to notice large sections of packetC programs not actually being C. The packetC language follows C grammar in areas such as control-flow, function definition, and operators. Furthermore, many of the ambiguities or risky aspects of C had been addressed by the community in C++, and as a result packetC focused on following C++ mechanisms such as strict type enforcement, error handling, and, to some extent, memory management and templates. Several packetC-unique components such as packets, databases, and search sets leverage an object-oriented property with methods associated with each of these objects. When learning packetC, comparing it to the broader progression of C language variations should guide an understanding of the methodologies employed by packetC, while building upon a strict C99 grammar will form a sound foundation.

Key similarities to consider when learning packetC are as follows:

packetC is a case-sensitive language, e.g., “IPVersion4” and “IpVersion4” are not the same.
A semicolon “;” is used to delineate the end of statements in packetC.
Strong typing follows C++ behavior at compile time.
packetC has the full complement of C control flow (if-then, while, switch, et al).
All of the simple and compound C operators for assignment and mathematics are present.
Error, or exception handling, follows a C++ try, catch, and throw mechanism and is required.
Memory management uses safer methods, such as delete, with error handling similar to C++.
A C pre-processor enables familiar features such as #define, #ifdef and #include.
Both C and C++ comments are supported. // Ignore Rest of Line /* Already Gone! */

Despite such similarities, crucial differences distinguish packetC, which simplify the development of network applications; boost driving performance through parallel processing and processing of packets in a logical network form; improve security; reduce errors; and assure accuracy. In each area of deviation, packetC's design addresses issues that affect either the complexity of development, security (which often drove complexity of debugging and auditing), or complexity in problem representation. The result is a language that simplifies the development cycle through its changes, yet maintains and builds on the developer community's familiarity with C and its variations.

Virtual Machine—packetC Behavior

The packetC language is designed to be compiled into optimized bytecodes that are executed by a packetC native processor or by an appropriate Virtual Machine (VM). Bytecode output for a packetC virtual machine allows for disparate hardware platforms to execute in a predictable manner. The underlying approach employed by packetC systems follows an approach familiar to Java programmers with its use of p-code. Given that packetC expects a processor with networking, parallel processing, and security feature sets contained in the underlying processor, a bytecode representation can employ the specialized instructions required and leave the implementation to a packetC native processor or a virtual machine providing an equivalent implementation.

In this form, the virtual machine is less like a virtual machine representing an entire PC found in computing virtualization, and more like a lightweight bytecode virtualization layer found in emulating embedded systems or Java programs. This underlying representation is in contrast to C, where the underlying platform often bleeds through to the application to resolve conflicts such as with big- or little-endian machines or operating system behavior such as sockets versus streams. A packetC developer benefits greatly from this deviation from C. For example, the virtual machine bytecode approach ensures the consistency of programming network protocols in a network byte and bit order representation within packetC across all platforms. Furthermore, packet receipt and transmit are handled and buffered regardless of the design or variety of hardware and operating system software implementing the interfaces.

Thus, packetC code is assumed to be executed in a runtime environment that either provides or emulates:

Arithmetic and logical operations for unsigned integer operands with sizes of 8, 16, 32, and 64 bits.
Structures in which the fields that are declared first are stored at lower addresses.
Multiple-byte integers stored in big-endian order (network order) with the most significant byte stored in the lowest numbered address.
Little-endian bit fields with bytes stored in big-endian order.
Management of packet receipt, buffering, queuing and transmission.
Basic packet structure interpretation and underlying functions for IP packet cleanup.
Fundamental primitives for structured and unstructured content analysis to support database and search set expectations.

These elements may be provided by a hardware platform, operating system, packetC virtual machine environment, or the compiler itself. The packetC developer does not need to address these areas as any packetC system must provide these capabilities such that code does not change from one platform to another.

Digging a Little Deeper into packetC vs. C

The preceding sections highlight the key areas to focus on when learning packetC. The C language was decades old when packetC was designed and what is interesting is how many variations of C really exist. Not including what are considered different languages, such as C++ and C#, the standards bodies redefined many variations of the language, and just about every compiler implementation introduced its own deviations to C. As a result, the C language is not a monolithic entity. It is instructive to compare and contrast the C antecedents of packetC grammar.

The packetC language is C-like in the sense that it uses C-language symbols for arithmetic and logical operators, uses the C operator precedence hierarchy and uses familiar C keywords for conditional constructs, such as while, for, and if-then-else. When a specification describes packetC as following the practice of “C,” it means that our practice follows the C practice specified by the C99 variant of the language, as defined by the specification, ISO/IEC 9899:1999, authored by JTC (Joint Technical Committee) 1/SC 22/WG 14. The specification's authors used the Committee draft of May 6, 2005, as its reference. In a few instances, the specification states that packetC follows “Standard C,” to indicate that packetC follows the older language as defined in ISO/IEC 9899:1990. “Standard C” in this definition dates back to what many programmers think of as The C Programming Language, a little white book by Brian Kernighan and Dennis Ritchie.

For those developers who are unfamiliar with many of the premises of how C works or who believe packetC introduces severe execution and coding implementations should make sure to measure these deviations against C99. Through the decades of C programming and numerous compiler implementations, dozens of variations came into use. Unfortunately, this has led to programs not working on two systems the same way and to chronic problems in code reliability, security, and support. In packetC, the C99 language standard was chosen as the basis for all parts that are based on C since it was determined by the team to be the best modern representation of the language with the clearest documentation on historical C issues that were addressed. In many ways a C programmer wanting to learn packetC should not only use packetC language documentation, but also leverage C99 resources for learning particular coding practices and presumption of implementation details. This is not to undermine the benefit of the massive amount of open source information on C, but to serve to highlight the critical nature of a strictly defined C grammar with minimal ambiguity, as were the prime development criteria for both C99 and packetC.

Case Sensitivity and Identifiers

The packetC language is case sensitive, just like C. For C programmers, this might not seem like a big deal, or a topic requiring much focus at the start of a book. However, case sensitivity is an important discussion for packetC as it highlights a struggle that the packetC language designers faced, namely, security. Having two identifiers with the same name, but not the same case, such as myPacketData and mypacketdata, runs the risk of not promoting good secure application development practices. A hallmark of packetC is security and C developers will learn to program differently when it comes to restrictions such as no pointers in packetC. On the notion of case sensitivity, packetC designers chose to follow C.

In packetC, identifiers such as keywords, functions, and variables with differing cases resolve to different objects. This is familiar to C programmers, yet can lead to some security concerns with packetC because of ambiguity, as mentioned above. Since code will often be ported or brought from other systems into packetC, portability and consistency with C were prioritized over the possible security implications. While this can lead to possible conflicts from mistakes through case-insensitivity, this wasn't seen as much different from the uncontrollable case where a single character is changed between two similar variables such as myPacketData and myPackatDate. This led to a requirement placed upon the compilers to be responsible for introducing warnings where these potentially problematic gray areas of secure code occurred.

Object Orientation and Control Flow

One of the hardest challenges in developing a new language by targeting a set of requirements not previously combined into a single high-level language onto a familiar grammar was selecting which language base to start with. While a language in the C family is the natural starting point, where should the basis begin? Based on the syntax and desired operators, C (and as described above C99) became the clear choice. What is not obvious until one really digs into studying input and output along with the flow of an application was how object-oriented packet processing really is. Even more so, in parallel processing systems many copies of a similar object, such as a packet, might be being processed by the greater application at any given time. The notion of objects, contexts, and scope became key to successful representation of the processing paradigm in packetC. These concepts, however, start to migrate away from C99 rapidly and more into the realm of C++. As introductions to elements of packetC that are not describing basic statements including operators and variables, one should be able to discern the object and method representations brought forth from C++.

In packetC, the packet is the single most central piece of data being evaluated and processed by an application. The packet is both one of the simplest data types, an array of bytes, as well as one of the most complex objects in the language. Methods are available to operate on the packet object to allow operations such as the insert or delete of bytes within the packet. Packet descriptors provide structural representation of headers within packets that are aliased accesses to the packet object. Furthermore, a packet may be copied and placed into a queue for introducing another context to process the replicated packet. The notion, such as pkt.replicate;, and the result follow a very different representation from standard C, including the C++ error handling associated with the failure of a method like replicate. While packetC does not allow for inheritance and polymorphism and allows only limited cases of encapsulation, a firm understanding of object-oriented principles from C++ will greatly help the packetC developer.

When control flow is discussed, it is often thought that the discussion is going down the path of described if statements. This is not the case. In packetC, the control flow discussion at the macro level is really about the application as a whole and ties into the Virtual Machine and systems expectations section covered earlier. In C, programs are often referred to as control-oriented in that a program starts and is then in control of either being a single line of evaluation or introducing threads and other mechanisms to handle aspects that may be parallelized. If a system is going to have more than one code base running at a time, even if these are copies of the same application, these are generally different programs that have chosen a shared means of communicating. In C++, language extensions have been introduced for managing advanced control flow, including concurrency to manage shared memory and multi-processor systems. In packetC, the language has a concurrent control flow where each packet executes its own copy of the application, namely function main(), from the start. A packetC program is developed as a module which includes the definition of the concurrent code as well as shared memory, defined as variables outside of main() in global address space. Expectations are upon the underlying system, and not packetC, to handle most of the concurrent aspects of processing.

The control flow of the packetC application differs from C in that it follows much more of an embedded interrupt service routine code control flow. Much as a device driver for a keyboard only does work when a key is pressed on a personal computer, packetC control flow only does work when a packet arrives and all work is solely focused on that packet. If no packets arrive on a system, no execution of main() will occur. If multiple packets arrive, multiple concurrent copies of main() will start executing. Although this seems quite complex, the changes packetC introduces for scoping and complex global objects such as databases and search sets work to simplify this. When comparing packetC to languages introducing concurrent control systems within the language such as C++, an already complex system becomes almost impossible to code or debug.

In data plane programming, the content of the packet often dictates the control flow through the application. Even the simplest router implementation will differ in its processing of a packet for those that are addressed to the router itself indicating that a table update has been sent or a ping packet has arrived to check health. The notion that the exact same flow through the packet would handle each restricts a system. As such, packetC drives toward being a data-driven language, where packets are the key component of data, as opposed to code-dictating actions, for the simple reason that a program operating on in-network gear cannot dictate when traffic will arrive.

While the simple interrupt service routine example helps to articulate the point, further evaluation of packetC shows that it migrates closer to modules with larger multi-tasking systems. Programming data plane applications are more complex than just awaiting a packet, although many applications may do just that. Some packetC applications need to do other activities, such as background tasks. In packetC, the notion of a packet initiating processing is a bit of a misnomer in complex applications, since packets may be created by applications and the resultant control flow based upon a given packet is data-driven, resulting in many packets becoming simply messages for contexts to perform concurrent processing of tasks that are not specific to a packet. When considering that a network device must be able to be greater than a responder to input, and rather advance to an autonomous system that is able make decisions based upon factors such as time and historical information, the ability to create events through messages to itself that spawn processing is critical. With packetC, the notion of a secure, autonomous agent in the network is fundamental to the processing paradigm.

Memory Layout

Much of the detailed discussion in this book focuses on memory layout. From the introduction of bit fields in packetC with precise expectations on treatments, to descriptors representing complex stacks of headers that arrived with network byte order, to the simpler discussion on endianness, packetC and memory layout are intertwined. Taking the time to consider how applications develop expectations for the construction of a data element as simple as an int is critical in network programming.

Memory used for variables can be thought of as a contiguous sequence of bits, each of which is capable of storing a single binary digit (0 or 1). In packetC, groups of 8 bits (bytes) are stored adjacent to one another in network byte order. Therefore each byte can be assumed to always be aligned in multi-byte variables in the form depicted in Figure 2-1.

Figure 2-1. Multi-byte alignment of data

The packetC compiler generates executable code which maps data entities to memory locations. For example:

int maxRateLimit = 65000;

causes the compiler to allocate a few bytes to represent maxRateLimit.

Unlike C, the exact number of bytes allocated and the binary bit representation of an integer is consistent across all target platforms in packetC. The compiler uses the address of the first byte at which maxRateLimit is allocated to refer to it. The above assignment causes the value 65000 to be stored at address 32 as an integer in the four bytes allocated as in Figure 2-2.

Figure 2-2. Consistent bit and byte ordering for packetC variables

While C programmers are rarely concerned about the exact binary representation of data, in networking and packetC this is an item of extreme interest since it affects everything in the data plane. This applies both at the bit and byte level to the overall packing and allocation of structures and higher order data sets in packetC. Without this foundation, much of what packetC is addressing may often be misunderstood by traditional C programmers not comfortable with embedded systems.

Summary

This chapter touches on some areas to keep in mind as you dig into learning packetC. The environmental aspects surrounding the language are as important as the grammar itself. Parallel processing, memory layout, and running in a virtual machine are just a few concepts that affect packetC as they change the execution environment around the application. Throughout this book, the packetC grammar will be introduced so that these rules can be understood, and any deviations from C or C++ for common grammar will be highlighted. Learning the differences packetC introduces is critical to developing high-performance, safe code for packet processing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Introduction to the packetC Language

Create new playlist

Sign In

Sign Up