R E P R I N T  7

image

Portable Bit Fields in packetC
by Ralph Duncan, Peder Jungck and Dwight Mulcahy

Abstract

Network packets place some protocol data in bit fields that are smaller than typical processor operand sizes. C language structures can represent such protocols but the uncertain layout and endian-specific nature of C's bit fields cause problems. Research has ranged from alternative bit field constructs, through specialized bit registers to using analytic techniques to identify programs' implicit subword usage. This paper describes the packetC language's two-fold approach to handling protocols in a portable way. The language addresses bit field layout and operation uncertainties as language design matters that can be overcome with a container-oriented approach and unambiguous layout rules. It tackles the problems of endianness and of packet bit field processing by two means. On the language design level packetC imposes big endian byte allocation order for structure and packet array storage. Second, the language is built around a packet processing model that involves triggering a parallel copy of a program after the host system assembles the entire packet in a byte array, locates standard protocols within that packet and saves protocol location information. By providing both portable protocol representation and protocol layer offset calculation, packetC frees engineering resources to pursue other packet processing tasks.

Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features – data types and structures

General Terms Algorithms, Languages.

Keywords bit fields; byte allocation order; network processing; programming languages; structures

1. Introduction

Structure bit fields, such as those found in the C language. [1], are useful for many system programming tasks. They allow the programmer to manipulate individual bits or bit collections smaller than integer operands, facilitating high-level language control of entities like system status words or special-purpose registers.

Research supported by US Air Force contract FA 7037-04-C-0011.

This is relevant to computer network applications that process packets, because packets contain standard protocols, such as TCP/IP or IPv4, that provide routing, service and standards data. Such protocols and their component fields are naturally mapped to C-like structures. To minimize communications overhead, protocols represent information in as few bits as feasible. Thus, they often contain fields with fewer than 8 bits, as a fragment of the IPv4 (Internet Protocol version 4) header in Figure 1. shows.

However, there are significant difficulties in using C-style bit fields for network packet processing. (Throughout the paper we use C99, as opposed to standard C [2] or other variants, as synonymous with the C language).

Images C rules that define bit fields allow compiler implementers a great deal of leeway with container size, padding and boundaries: a C structure that matches a packet protocol with one C compiler may fail to match it with another.

Images C bit field structure declarations match the byte-allocation order of the target processor. Thus, an application written to run on a big-endian processor must be recoded to run on a little-endian machine [3].

images

Figure 1. IPv4 Protocol (first 90 bits).

packetC [4] is a C-like language developed by CloudShield Technologies for reliable, embedded network programming, particularly for deep packet inspection. Because the language is both designed for packet processing and intended to be platform-independent, providing portable bit field programming is important for the language. packetC's approach to these problems can be summarized as:

Images Center all bit field organization on containers (unsigned integers) with explicit sizes and require implementations to adhere to unambiguous rules for padding, straddling and boundaries.

Images Require users to define structures in big-endian byte allocation order (including those with bit fields) and require compliant compilers to make an algorithmic correction to access bit fields for programs running on little-endian processors.

Images Require that the host system pre-scan the packet, locating standard protocols and storing their packet offsets in a special, user-accessible data structure. Thus, responsibility for recognizing protocol bit fields shifts from the user to the runtime support system.

The following sections discuss these areas, first presenting C99 practice, then contrasting it with the packetC approach.

2. C's Bit Field Layout Uncertainties

The bit field implementation freedom afforded by C leads to a variety of uncertainties, as illustrated by the example below (all quoted remarks below refer to section 6.7.2.1, clause 10 of the C99 Specification):

    struct structTag {
           unsigned int nonbitfield;
           unsigned char first:  4;
           unsigned int second:  2;
           unsigned int third:   4;
    } myStruct;

Images Handling ‘Straddles’ – The entire field named third cannot fit in a byte allocated for fields, first and second. Does the compiler let it ‘straddle’ bytes, with 2 bits in the byte allocated for the first two fields and the remaining bits in a trailing, contiguous byte? Possibly, but perhaps not, since the C99 Specification declares that straddle behavior is implementation-defined. “If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.”

Images Container size – If packing and straddling is not an issue, does the compiler reliably place a bit field within a container of the user-specified size? We cannot be sure, since the Spec. says an implementation can use “any addressable storage unit large enough” to accommodate the bit field.

Images Bit field layout – Does the compiler allocate the topmost fields in the declaration to the least significant bytes of the corresponding portion of the structure? How is the containing unit aligned? These matters, too, are implementation-defined: “the order of allocation of bits-fields within a unit is implementation-defined. The alignment of the addressable storage unit is undefined.”

These matters are critical for packet applications, because they process protocols. For example, if an application copies the contiguous bytes of a packet that comprise a protocol into a structure, it is desirable to know that the compiler will organize the structure in a predictable way that matches the protocol layout. Moreover, it is desirable to be able to move the application to new target processors or compilers without recoding to reflect new bit field implementation peculiarities. Providing predictable and intuitive layouts is the key to the packetC approach described in the next section.

3. packetC's Container-based Bit Field Layout

packetC design goals for providing bit fields emphasized providing language users with a clear, unambiguous way to specify:

Images The size of the storage unit that holds related bits.

Images Bit field padding (esp. for network protocol-specific padding, as opposed to padding needed to align structure fields for memory references).

Images Packing behavior without uncertainties about straddling multiple storage units.

The resulting packetC rules produce the following syntax, using the structure from the previous example (alphabetic superscripts map to the bulleted points that follow):

    struct structTag {
         int nonbitfield;
         bits short {a
                first:  4;b
                second: 2;
                third:  4;
                pad:    6;c
         } optionalContainerName;d
     } myStruct;

a) Bit fields are explicitly grouped inside containers, which have one of packetC's 4 unsized integer types: byte, short, int and long.

b) Since a bit field is always part of a container with a type, individual bit fields declare only their name and size but not a type.

c) Pad fields are always declared explicitly and, given packetC's emphasis on embedded system runtime reliability, a group of bit fields, including pad fields, must always sum to the size of their container. Pad fields cannot be accessed for test or set operations.

d) The optional container name can be used to access and manipulate the bit field collection as a whole.

This approach removes layout size, straddling and boundary uncertainties by guaranteeing that the storage unit size is the one specified by the user, by precluding straddling and by ensuring that every bit in the container has been explicitly defined by the user.

Other packetC language rules for bit fields follow:

Images A bit field width expression must be > zero and <= maximum number of bits in the associated integer storage type.

Images PacketC bit fields are unsigned.

Images If all of the bit-fields in a collection cannot be packed into the specified type of container, it is a fatal error.

Images The same rules govern the alignment of an integer structure field, whether it is being used to store a bit-field collection or not.

Images Unnamed bit fields are not allowed.

packetC rules also clarify how bit fields act in operations, as the next section describes.

4. Bit Field Operation Semantics in packetC

In order to have portable bit fields, users must have predictable bit field behavior in operations, as well as predictable data layouts. The packetC user can test and set bit fields, using operators for assignment, equality, inequality and the relational operators. As it was with data layout, the bit field container is the locus of operations. Our design goals were to produce the same logical results as C but to be more explicit about the mechanics through which those results are reached.

When an n-bit packetC bit field acts as an operand it behaves as if it occupied the least significant n bits of an integer the size of its container, with any other bits set to zero. Thus, it acts as the sole bit field in a temporary container. packetC differs significantly from C in that there are no type promotions and no implicit type conversions (other than ascribing types to literals). The consequences of these combined rules follow.

Images A bit field used as an operand takes the type of its container.

Images The type of a bit field (container) used in a binary operation must match the type of the other operand,

Images A type cast on a bit field changes the size of its container, not the width of the bit field: it is still an n-bit field but occupies the least significant bits of a differently sized container.

Images When a bit field is used in binary operations all the bits in its temporary container are used (which causes results to match those of C).

Defining bit field semantics in this way makes some otherwise obscure mechanics clear. For example, consider when bit fields of different sizes are compared below.

    struct s1 {
           bits short {
               a04:4;
               a12:12;
           } con1;
    } sa;
    sa.con1.a12  =  0xabc;
    sa.con1.a04  =  0xd;
    if ( sa.con1.a04  >  sa.con1.a12 ) {…}
    // expression above evaluates to 'false'

If only the 4 bits of bit field a04 were used, the conditional expression would be true, since 0xd > 0xc. However, the comparison effectively takes place in a 16-bit container, so bit field a12's high bits are also used.

packetC explicitly states how bit field assignments are made:

Images An assignment expression result has the type of the LHS container, even if the LHS bitfield cannot store all of that result.

Images When bit fields appear on both sides of an assignment operator, given a LHS bitfield, lbf with length L1 and a right-hand-side bit field, rbf with length L2:

Images If ( L1 <= L2 ) lbf gets the least significant L1 bits of rbf.

Images If ( L1 > L2 ) lbf bits 0:L2-1 = rbf and lbf bits L2:L1-1 = 0.

These rules make operations on bit field clearer than they are in C, at the likely cost of explicit type-casts to get operands to binary operations in containers of the same size. Thus, packet C language rules prevent some of the layout uncertainties that bedevil C implementations and they say more precisely how bit field contents are to be compared and assigned. However, these rules cannot prevent similar problems with endian-specific byte allocation order, which is discussed next.

5. Byte Allocation Order in C

Recall that packet protocol information arrives in byte-by-byte fashion in big-endian order. Thus, a structure holding protocol information will present the data in an intuitive way if its field organization mirrors the order in which the information arrived. Recalling the IPv4 protocol fragment shown in the introduction; an application expects to encounter a byte with the version and header length information before the two bytes with type of service data. Two programming language matters are especially relevant for this: field allocation and byte allocation order.

Field allocation order is the order in which a structure's declared fields are mapped to consecutive memory addresses. Both C and packetC map the first structure fields declared to the lowest byte address and the last declared to the highest byte address. This matches the expected order of network protocol contents. Thus, field allocation order is not a problem but byte allocation order is.

C structures do not exhibit the same byte allocation order when the same code is compiled and run on big-endian and little-endian processors. C user operations on structure fields that correspond to whole integer values, like int or short, do not show effects due to host processor endianness. However, operations on values, like bit fields, that can be sub-elements of integer storage units or can straddle such units do exhibit endian-specific characteristics.

To see why this is so, consider the following simplified case. Suppose we try to construct a 4 byte packet sequence by setting a 32-bit integer value, as shown below.

    int bytes4 = 0xabcdef12, *p = &bytes4;

C compilers for both big and little endian processors treat the leftmost portions of the literal as the most significant bit values and place them in memory accordingly. Observe what happens when this value is mapped to a C structure with the following bit fields:

    typedef struct sTag {
           unsigned int first : 8;
           unsigned int second: 24;
    } sType;
    sType myStruct, *pStruct = *((sType*)p);

gcc compilers on big and little-endian processors running linux (v 3.3.3 on a Sparc64 and v. 3.4.5 on an i686 platform respectively) both pack the two bit fields into a single 32-bit int. However, the programs on the two processors output different values for them. (The C program output below shows the most significant half-byte value to the left of the least significant half):

    Big Endian           Little Endian
    first = 0xab;        first  = 0x12;
    second = 0xcdef12    second = 0xabcdef

The two processors store the byte sequences as shown below, where the lowest numbered byte addresses appear before higher numbered ones when read from left to right. The big-endian list is shown with big-endian bit allocation order (the most significant half of a byte appears to the left of the least significant one), while the little-endian list shows the least significant byte to the left.

// Big Endian:     a b | c d | e f | 1 2

// Little Endian:  2 1 | f e | d c | b a

For this reason, when C network applications use bit fields, they often employ ifdef constructs to define both big-endian and little-endian structure forms of a protocol header. Alternatively, some developers use macros to deliver big and little-endian results, although this this solution can be unwieldy, as shown by [5].

Clearly, it is preferable to code one version of an application, rather than two, so packetC chooses one endianness that best matches its overall processing approach, then uses relatively minor compiler adjustments to compensate.

6. packetC Processing Model and Byte Allocation Order

Several distinctive aspects of CloudShield Technologies' model of packet processing shaped the packetC approach to byte allocation order and bit field access. In this model parallelism is at the packet level: multiple copies of a program run in parallel asynchronously. Each program thread or context is triggered when the underlying system has prepared a packet in the form of an unsigned byte array, has located any standard protocols inside the packet and has prepared a Packet Information Block (PIB). The PIB contains detailed information about the presence of various layer protocols, where they are located in the packet, and what their contents are (Figure 2).

On CloudShield products [11] these functions are performed by dedicated hardware and firmware components, though packetC can be implemented on any system, including an ordinary desktop computer, that performs those functions. This approach affects packetC byte allocation design in two ways.

First, since the packet contents appear in network order, pack-etC structures and unions are required to be in packed, big-endian order to match the packet's organization and facilitate rapidly reading or writing protocol information between the packet array and user structures.

For example, packetC includes a distinctive form of structure, termed a descriptor that can be dynamically overlain on the packet array in a way that aligns it with a protocol present in that particular packet. As shown below, the descriptor is a structure with an additional location clause that specifies where it begins within the packet array.

  descriptor ipv4Descr {
       bits byte {
           version : 4;
           headerLength: 4;
       }
       short typeOfService;
       …
  } ipv4Header at pib.L4_offset;

Thus, the packet array, descriptors and structures share the same big-endian organization.

images

Figure 2. Packet information produced for packetC processing model.

In addition, since the protocols have already been located within the packet, the primary requirement for arbitrary bit-length information is for bit fields in standard protocols to be as predictably organized and located in structures and unions as they are in the packet.

7. Managing packetC Byte Order Mechanics

If the target processor uses little-endian organization, a packetC compiler is obligated to byte swap fields that are entire integers and to adjust code for testing or setting a bit field accordingly. This frees the user from needing to code separate big and little-endian solutions. In practice, the differences in code sequences that a packetC compiler needs to produce for big and little-endian situations are minor for bit field access.

For example, to produce the value of any given packetC bit field, let the total width in bits of its predecessor fields within its container be fore and the total width of subsequent fields be aft. Both values are known at compile-time; one, neither or both can be zero. One approach is to produce the desired value by a pair of shifts on a copy of the container value, using slightly different computations for big and little-endian scenarios.

  // big endian
  value = (container << fore) >> (aft + fore)
  // little endian
  value = (container << aft) >>  (aft + fore)

Consider this packetC structure with a 2-byte container:

    struct   sType {
            bits  short {
                first: 3;
                second:8;
                third: 5;
            } aCon;
    } aStruct;

Figure 3 shows how bit field values are easily isolated on big and little-endian machines using a slightly different pair of shifts. The figure depicts the three fields' bits using boldface, italics or underlining, respectively as in the assignments below:

    aStruct.first  = 3;
    aStruct.second = 0xf6;
    aStruct.third  = 0x19;
images

Figure 3. Shift code to isolate bit fields on big and little-endian platforms.

The example shows that the algorithmic adjustment needed to access big-endian bit fields on a little-endian machine is trivial. In practice the need to pipe scalars through byte swapping routines, such as htonl, is a more likely source of compiler complication for targeting a packetC compiler to a little-endian platform.

In sum, the packetC approach to bit fields is to define an unambiguous approach to container layouts, to impose a big-endian byte allocation order that matches a ‘packet-centric’ orientation and to require packetC compilers to manage differences on behalf of little endian processors. The next section reviews other approaches to managing bit fields for packet processing.

8. Related Research

Recent research includes efforts to explicitly define bit fields in various C dialects or in new languages, as well as research into recognizing implicit bit field use.

J. Wagner and R. Leupers described an enhanced C dialect and compiler for the Infineon NP [6]. This scheme maps C arrays with the register qualifier to a special register file, that supports variable bit-width operands and alignments. Through intrinsics (or compiler-known functions) and bit pointers, the user triggers special instructions on these registers, specifying bit offsets and widths. This replaces the usual C bit field scheme with an equivalent system for operating on the contiguous bit subsets of integer operands.

L. George and M. Blume discuss the NOVA language for the IXP network processor in [7]. The NOVA layout construct describes a given bit field in two forms: packed and unpacked. The packed form,, despite some syntactic differences, approximates a C bit field description, although an overlay construct provides additional functionality to define alternative organizations for a given bit range within a layout.. The unpacked form accords a word of storage or a nested unpacked form to each field. NOVA provides pack and unpack operations to mange the two forms. Two related layout examples from the paper are shown below

  layout ipv6_address =
       { a1:32, a2:32, a3:32, a4:32 };

  layout ipv6_header =
       version:              4,
       priority:             4,
       flow_label:          24,
       payload_length:      16,
       next_header:          8,
       hop_limit:            8,
       src_address:   ipv6_address,
       dst_address:   ipv6_address
  };

A. Inoue, et al. define Valen-C [8], a language intended to support hardware design. Valen-C requires specifying the bit width of operands (and the mantissa and exponent sizes of floating point numbers). The Valen-C compiler uses a machine description to ascertain data path width and machine characteristics. The compiler guarantees that n-bit operands will be implemented with adequate precision, by mapping the operands to sufficiently large memory units or by utilizing multiple instructions to handle subsets of the operands.

R. Gupta, E. Mehofer and Y. Zhao detail an analytic approach to identifying bit sections (subwords) used as operands in programs [9]. First, local analysis identifies the implicit bit field assignments effected by C bit operations (e.g., bitwise AND, OR, NOT). Their graph representation replaces these operations with the equivalent explicit assignments to bit fields. Then, more global analysis of control flow interjects split nodes at the latest point that individual bit sections are needed and combine nodes at the earliest point that individual bit sections can be collapsed into a single operand. The resulting representation can be used for optimized code generation with appropriate instruction sets.

The PL8 language and its antecedent, pl.8, exhibit a concern with unambiguous bit field layouts, as W. Gellerich, et al, de-monstate in [10]. As shown in the code sample below, prefix numbers indicate the nesting level of fields, a BIT type is available, and parenthesized values indicate a field's length.

  DCL 1 ExampleRecord
         2  LongWord BIT (64)
        .2  BitLayout
            3 Flag1   BIT (1)
            3 Flag2   BIT (1)
            3 *       BIT (30)
            3 RegWord BIT (32);

The dot syntax prefix indicates that the associated field (and its constituents at a deeper nesting level) redefines the preceding field. Thus, the redefined field, LongWord, serves to unambiguously specify collected field lengths, somewhat like a packetC container. PL8 is intended as a language for developing firmware for IBM RISC architectures, particularly IO management and error recovery. Thus, precise bit field layout is important to the language developers because they must match OS structures, such as IO status descriptors.. It is not clear how PL8 handles byte allocation order.

The packetC language fits into this spectrum of research in at least two ways: by exploring data structures that match network processing needs and by making high-level language constructs exploit specialized network processing hardware. packetC's concern for fine-tuning bit fields as a language construct seems most closely shared by the PL8 and NOVA approaches. Our scheme also reflects hardware support for the application domain, as do the Infineon NP and IXP NP/NOVA efforts described above. As the conclusions below suggest, the packetC language has its own distinctive combination of goals and tentative solutions.

9. Conclusions and Future Research

Three aspects of C-style bit fields and structures were primary concerns for packetC's designers:

Images Precise bit field layouts are needed to predictably match packet protocol header layouts,

Images It is desirable to avoid having to recode structures when moving packetC source code from a big-endian host to a little-endian one or vice versa.

Images Design choices regarding bit fields and structures should fit into a coherent scheme for packet processing .

packetC achieves bit field layout precision by combining an intuitive container concept with a set of unambiguous rules. Advantages of this approach include closeness to existing C syntax and a clear picture of how bit field operations and assignments work.

Basic choices for managing byte allocation order include imposing big or little-endian order, requiring the user to recode when moving from one type of host to the other or hiding the differences behind an interface. Choosing big-endian order as a standard is a natural path for packetC, because a packet arrives and fills up a buffer in big-endian order. If packed structures are mapped to the buffer, then they are in big-endian order. As the packet goes, so goes everything else.

The most distinctive aspect of packetC bit fields is how they fit into an overa ll scheme for packet processing. packetC is designed to support a distinctive model for packet processing in general and deep packet inspection in particular. Although it should be possible for packetC compilers to target a variety of platforms, the initial target is CloudShield products that are designed to support this processing model [11].

First, as sketched earlier, the model presumes that, before an instance of the user program is executed, the system has already examined th e packet, located the protocols and stored their locations for the user. Thus, the packetC emphasis on bit fields is not primarily to help users find protocols: that has already been done by specialized hardware and firmware. Having bit field layouts that match protocol layout in the stored packet array is mainly to facilitate rap id protocol reading, writing and modification. In such a scheme finding protocols is more the preamble than the main text.

Second, the model assumes that deep packet inspection will involve applying a host of specialized functions to the packet payload, as well as to the protocols – searching for patterns, comparing chara cter sequences, storing and searching portions of packets. For these functions it is desirable to have absolute control on the layout of a variety of data structures, including C-style structures with bit fields.

We have implemented a first prototype packetC compiler targeted to our current multiprocessor product. Our likely research directions in the near future include improving the quality of our emitted code (especially for unusual language constructs), fine-tuning new language features and developing compilers for other targets.

Acknowledgments

Peder Jungck's vision of packet processing drove both Cloud-Shield Technologies' product development and the existence of a packetC language. He, Dwight Mulcahy and Ralph Duncan are the co-authors of the packetC language. Professors Rajiv Gupta and Rainer Leupers provided helpful citations and papers. The errors in the paper are solely the author's.

References

[1] ISO/EC 9899:1999. Standard for the C programming language. May 2005 version. (‘C99’).

[2] ANSI X3.159-1989. Programming language C. (‘Std. C’).

[3] D. Cohen, On holy wars and a plea for peace. USC/ISI EN 137. April 1, 1980. Available at http://www.ietf.org/rfc/ien/ien137.txt. retrieved November 5, 2008.

[4] CloudShield Technologies. “packetC Programming Language Specification. Rev. 1.128, October 10, 2008.

[5] B. Gerst, Cleanup bitfield endianess mess. November 3, 2002 posting in Linux-kernel archives, retrieved November 6, 2008 as http://www.ussg.iu.edu/hypermail/linux/kernel/0211.0/0927.html,.

[6] J. Wagner and R. Leupers: C compiler design for a network processor. IEEE Trans. On CAD, 20(11): 1-7, 2001.

[7] L. George and M. Blume. Taming the IXP network processor. In Proceedings of the ACM SIGPLAN '03 Conference on Programming Language Design and Implementation, San Diego, California, USA, ACM, pp. 26-37, June 2003.

[8] A. Inoue, H. Tomiyama, E.F. Nurprasetyo and H. Yasuura. A programming language for processor based embedded systems. In Proc. 5th Asia Pacific Conference on Hardware Description Languages, Seoul, Korea, pp. 89-94, July 8-10, 1998.

[9] R. Gupta,, E. Mehofer and Y. Zhang. A Representation for bit section based analysis and optimization. In Proceedings of the International Conference on Compiler Construction, LNCS 2304, Grenoble, France, Springer Verlog, pp. 62-77, April 2002.

[10] W. Gellerich, T. Hendel, R. Land, H. Lehmann, M. Mueller, P. H. Oden and H. Penner: The GNU 64-bit PL8 Compiler: toward an open standard environment for firmware development, IBM Journal of Research and Development, 48(3/4), 2004.

[11] CloudShield Technologies. CS-2000 Technical Specifications. Product datasheet available from CloudShield Technologies, 212 Gibraltar Dr., Sunnyvale, CA, USA 94089, 2006.

R. Duncan, P. Jungck, and D. Mulcahy. “Portable Bit fields in packetC.” 2009. Printed by permission.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.2.157