C H A P T E R  23

image

Developing Large Applications in packetC

Planning for Large Projects in packetC

Developing large applications in packetC is quite similar to that of developing large applications in languages such as C++ from code organization and development team collaboration points of view. There are several grammar changes that impact standards that one would follow as well as new areas of concern regarding performance, security, and networking aspects not often present in other application domains. What becomes the single most important planning aspect of developing large applications in packetC is the planning itself.

In packetC, the notion of a packet module containing a function main with static linking of library modules and dynamic linking of shared library modules follows similar patterns you are familiar with in Linux and Windows application development. Organizing functionality into libraries and designing appropriate APIs for libraries should follow similar best practices team and programming approaches. In addition, however, naming conventions conforming to your adopted style guide should be followed including the file names and directories as well as functions. Within packetC, additional areas of documentation and communication need to be addressed when using and designing libraries, namely regarding inlined function passing and impacts on the packet. Within packetC performance is a critical design factor and the use of functions with inlined parameters and code is extremely beneficial for performance, however, it can be problematic if libraries are not designed expecting this usage or API documents do not properly describe their impact on the parameters or the system. With regard to the system, library modules may have access to the packet (pkt), the packet information block (pib), or system information (sys) that can affect the outcome of subsequent processing or even the network results upon completion of the packetC application for the current packet. As such, having clear coordination on these impacts is critical. As one would coordinate access to global memory regions that were allocated and shared in C or C++, protecting the integrity of global resources like pkt, pib, and sys as well as ensuring clear coordination on issues affecting performance, security, or the networking aspects of the processing are key to packetC development team success.

Furthermore, as applications grow to the size where they result in multiple applications or have code that remains resident while portions are reprovisioned through the use of shared libraries, it becomes even more important to design for the issues that occur within a network environment that may not occur elsewhere. For example, with shared libraries and reprovisioned applications that dynamically link to them, the notion of data initialization is key. For instance, a library maintaining a list of active flows may already exist and be supporting more than the current application at the startup of the newly-provisioned application. Shared libraries must consider the requirement for functions that return a set of status indicators such that a newly-provisioned application can ascertain the state of the shared library based upon the arrival of prior packets. A program that expects that all data is reinitialized when it starts will run into problems when leveraging shared libraries. Libraries may change over time and packetC developers should take into consideration versioning of libraries just as in Linux with shared objects or Windows with DLLs. The notion of versioning of libraries is critical as underlying implementation may change the results expected by the calling application. Version numbering should be in the name of libraries as well as within the source code by using pre-processor controls around prototype files to ensure run-time and linker controls pick up the change and can warn developers of possible issues. The worst thing that can happen is to have a shared library updated and nothing has changed to make incompatibilities visible to loaders or the leveraging packetC application’s logic.

Last, when developing large programs, the notion of resources will become a factor. Resources in the simplest form can be referred to as available memory and processor budget, although it does become more complex over time when the concept of threads and number of applications becomes involved. Resource management is important in two dimensions, first within your application and then bounding your application to live with others. Living within the resources provided on your target platform for the problem set shared amongst the components of your application can itself be a tough chore. As most packetC code drives usage of processor budget based upon the traffic mix being seen (e.g., network signaling and login packets often need more processing than media or data packets), knowing the deployment environments is often as critical as knowing what amount of processing is consumed by the application for a given packet. Additionally, different target systems will vary in performance and resources available as well as whether you are leveraging multiple processors or blades to execute your application. When designing for a performance target, knowing the amount of processing resources available may be significant as it may involve designing functionality to synchronize state among blades. On the other end of the spectrum, requirements of an application may include capping resource usage such that the application can be loaded alongside other applications in a single processor while maintaining certain performance metrics. In this case, modeling of your application is important not only among your team, but for administrators and users of your finished application to assure it meets expectations in a shared environment. Imagine ensuring response time and processor utilization goals of a complex video processing application in Windows if you don’t have control over what else is running; the same scenario can occur with packetC deployment environments.

The concepts suggested above are not foreign to traditional application development teams, although they do take on several new nuances within a real-time packet processing environment when dynamic change is constantly in play. When moving on from building small applications performing a functional role in the network to engaging in a large application, either alone or in a team, consider stepping up into a new of way of designing the application from the outset.

Things to Consider in Large Application Development

The previous section discussed some of the issues of large application environment when designing and developing your application. This section is intended to provide some tips and guidelines to consider when you are building your application that are beneficial to even the smallest application.

Follow a Common Style

Developers often believe that they will be the only ones reading their code and that they will always remember why they wrote it and what they did. With the Internet, we see code snippets appearing everywhere from a comment on a blog to posting an application as a new open source project. As time goes by, the reasons why a function was written or what the genesis was of an application often becomes muddled to even the original developer. As small outcroppings of functions turn into library modules, these functions find a new life of their own being electronically read for inclusion into programs far from the review of developers. As such, the notion of following a consistent style in developing code becomes important from day one. Consistent use of naming conventions for variables helps discern their data type and the naming of functions in a library to start with a common name such that multiple libraries don’t collide are small examples of important style attributes in a large program. Chapter 3 in this book provides a suggested style guide for packetC programs.

Plan Out Modularity in Your Programs

Whether you choose to develop static or shared libraries, or simply segregate related functionality into include files, developing an architecture for the organization of code beyond a single huge application source file is critical. Study what modules are available from the community, what is provided with software development kits, and what components of your application might be re-usable in other applications you build or that others would need, and segregate them early in the project. Often, early construction of a series of libraries within an application is avoided because it is determined that the solution won’t grow large enough to require them. All too often, details appear during development and functionality grows, introducing a new element of work, namely carving up code and renaming functions as they are removed from a packet module and placed into other libraries. Plan early for allowing functionality to be spread across libraries and it will save the re-work as complexity grows and ensure that functionality is positioned for reuse in the future.

Set Up the Production Environment Early

All too often, a project seems to be a one-time development project and the team that starts it expects to be the one that finishes it. Due to numerous circumstances good and bad, this often doesn’t define the life of most large applications. Furthermore, we fail to allow for the unexpected, such as an operating system crash on your development system, so we don’t plan for how much time it will take to develop an application. Getting into the habit of leveraging version control systems on a machine separate from those where code is being developed ensures that not only are backups performed regularly, but that the project is trouble-free for development by more than one individual and easily re-created on a new system, just as is true of any large programming project using any programming language.

Three key aspects come into play when developing in a team and building an environment for a large project. These are centralized team code, shared design and build information, and code version compatibility.

The first step is to ensure that there is a stable repository where code is checked in and a build environment that can be re-created separate from the client systems where you are developing your portion of the code. This helps ensure that code is written that can be rebuilt, which may be as simple as ensuring that all the files are really on the network and not hidden in an include file in a directory that was not checked in. Moreover, addressing build-production releases without version control and snapshots built early in a project, can become cumbersome and time-consuming at the point in a project where time is most precious.

Once you have code in a common area, it is important to share any build-related files such that parameter-setting in a graphical development environment or project and “make files” are documented and placed with the source code such that all team members are building the project in a consistent fashion. Additionally, design documentation should also be shared, version-controlled, and updated in the common repositories.

Last, functionality will change in a large application and APIs often drift from their original design specifications no matter how diligent team members try to keep design docs in sync. As such, engaging in a method of ensuring compatibility of libraries and functions is critical. Often this can be accomplished through the use of pre-processor directives to identify the version of functions and the expectations of a function call. Much like pre-processor directives are used to guard against include files being included more than once, pre-processor directives can be used to detect incompatibilities at compile time. For example, if the main application includes libPacketFlow and the API was reviewed at version 2.2 by the including application developer, simple protections like a #define libPacketFlow_VER_2_2 in the including application and an #ifdef libPacketFlow_VER_2_2 in the provided include files can protect compatibility should the developer of libPacketFlow simply increment the #ifdef every time they changed the designed functional roles or function prototypes. The key notion is to build in electronic mechanisms of ensuring coordination over casual tracking of emails and verbal comments when designs change.

Leverage Include Files Well

If it isn’t obvious yet, designing large applications without breaking them into smaller chunks that can be easily managed is a disaster waiting to happen. Using include files and breaking up functionality is key to success, but can also be the best way to make code impossible to understand. Include files are great because a single line in the packet module can drive the inclusion of a large mass of functionality while making the application’s logic understandable. At the same time, without proper naming conventions or by burying layers of includes within included files will make tracking down bugs, finding the source of a broken function, or even tracing the flow of an application impossible. Include file naming needs to be representative of what is included and the use of nested includes should be avoided whenever possible. If a developer performing a code audit, or even yourself a year from now, cannot understand where to go to look for a function called in main by simply looking at the set of #includes at the top of the packet module, something is wrong. Furthermore, if it is not visible in that include file or doesn’t have specific information such that a search of the include file would find the reference, tracking it down in the future will be next to impossible. Plan the use and avoid the inappropriate use of include files well.

Be Careful, Be Clear, and Be Code

In packetC, performance is a critical aspect of any program. As such, there is a great tendency to avoid any run-time performance impacts for things that could be done ahead of time. Simple examples include variable initialization where const is pre-pended to a variable declaration to ensure it is initialized at compile and load time as opposed to run time, such as the entrance to a function. In addition, the inline operator is applied to parameters in a function call such that a function can be in-lined rather than a call stack be generated. In other areas, global variables are used to track data sets and set flags for the communication with a separate control-plane processor provided non-real-time assistance in the concurrent and post-processing of data collected by the system. When considering whether to compute a large algorithm or just pass the parameters so that the algorithm’s result is displayed on an operator’s console, passing off processing may be a good idea. When this starts to get into the choices of crafting large sets of code substitution by cleverly written pre-processor macros and a litany of #defines that the best of coders would never understand on a good day, something has gone haywire. As a developer of an application, it is critical that code be carefully conceived, clear on what is being done or not done within the real-time packetC code, and most of all truly be code. The magical benefits of the pre-processor can also be the bane of your existence when trying to repair code in the future or debug complex interactions that mysteriously change from one compiled release to the next. Fear the macro, trust in code, and when looking for optimizations, think of what you can avoid doing in real time and not how to be clever and outsmart the system—or you will in the end cost the team precious time.

It’s All About Data-Driven Code—Follow the Flow

In traditional programming environments, the application is generally in charge of its world. The application starts up, initializes all data, opens user or electronic interfaces to begin accepting work, and controls how it gets it done. Generally a request is analyzed in detail to learn what is asked and then processing starts on the best way to accomplish the goal. Unless the applications you have written in traditional computing environments have been interrupt service routines (ISR), then the world of packetC may seem bit different. Processor budgets in real-time systems don’t allow one to learn all there is to know about a packet upon arrival before beginning work and, furthermore, a packet is rarely a transaction or the entire request being awaited. In addition, a network like the Internet is a fluid beast where processing never stops to wait for your application to be inserted into the network before requests start over and begin to flow. As such, the state machine that is the logic of your application and the conditions placed upon it as a program begins must not only start beneficial processing but also get a sense of the current state of the network and is key to the success of the application. If a packetC application, especially a large one, isn’t modeled effectively from a flow-oriented point of view with a review of the state transitions not only within the code but the out-of-order presentation of data coming to the application via packets, corner cases will abound. Debugging the last defect will be rough. It is important in this environment to model the flow of an application and track the key use cases and test cases for how they would flow and affect the creation and state of metadata that is used to transform the knowledge from packets into flows, transactions, and eventually an application scenario of interest. This may feel like CS100 class concepts that have been long abandoned for other methods of object modeling in applications. However, in the world of flow-oriented code, where data drives the results and code simply tries to glean from the data what to do, the flow is all you have. Organize it into a common theme you can analyze as a team.

Programs Large and Small—Plan Appropriately

In traditional CPU-based programming, programs will often become large and complex. As all of the capabilities for writing large programs exist in packetC, the tendency can be to presume that packetC applications will grow to be equally large. Throughout this book, performance is often discussed and the separation of control plane and data plane and their processing roles is presented. At all times, it is important to remember that real-time data plane processing should be focused on only processing what is really required in the data plane. If you can calculate information up front, do so. If you can grab some data that is passed along to a control-plane processor, let it perform parsing and massaging of the data. If at all possible, keep applications small and to the point. The following is a simple yet effective application that maintains a view into the total number of active flows being processed by the system. The counters that are collected are exported to the control plane using pragmas and tables are simple and succinct and only track the relevant information. Consider this example as a view into a real-world program that can rapidly be extended to more complexity, yet at the same time represents something simple and elegant comprising a complete application.

What becomes important and is often overlooked until too late in the project is the dynamic role of control-plane and management applications and their near real-time involvement with the packetC application. While simple tasks like crafting a data file for an access control list from a graphical user interface or a simple graphing and reporting of events and activity may have already been conceptualized from reading, interactive solutions can have great benefit. Consider a user-based billing and control system that is focused on providing customers with traffic management features based on time, such as priority of gaming traffic in the evenings or when business traffic volume is low. Instead of having the packetC application evaluate the business traffic volume and apply per-user performance boosts, why not use packetC to simply provide traffic metadata to a near real-time system that can make its analysis and then provide dynamic updates to packetC database tables. In this manner, the packetC application can focus on real-time tasks such as collecting metadata or providing additive services on a per-session basis, while the adjacent management system provides continuous trend analysis and updates, potentially even in 15 second intervals or less. This simple concept of balancing the throughput-oriented requirements and computational requirements between respective packetC and more traditional computing environments often has two major benefits. The first is dramatically simpler packetC code yielding easier debugging and performance benefits, and the second is often a much more linearly scalable modular architecture.

///////////////////////////////////////////////////////////////////////////////////////////
//
//    Program: TCP Flow Tracking Example
//
//    Revision: 1.0 - January 20, 2009
//
//     Author: Tim King
//
//     Description: This program maintains counters on the number of
//                  Active & Total number of TCP Flows.
//
//                          Duplicate SYN's are recorded but not mitigated against.
//
///////////////////////////////////////////////////////////////////////////////////////////

packet  module  tcpFlows;

#include <cloudshield.ph>
#include “protocols.ph”

//
// Constants
//
const byte TCP_PROTOCOL    = 6;
const int  FLOW_TABLE_SIZE = 100;

//
// Global variables
//
int totalPacketCount_ = 0;
% pragma control totalPacketCount_ (export);

int totalFlowCount_ = 0;
% pragma control totalFlowCount_ (export);

int activeFlowCount_ = 0;
% pragma control activeFlowCount_ (export);

int duplicateSYNCount_ = 0;
% pragma control duplicateSYNCount_ (export);

int insertDBFull_;
% pragma control insertDBFull_ (export);

// declare record structure
struct FlowStruct
   {
       int    srcAddr;
       int    dstAddr;
       short  srcPort;
       short  dstPort;
       byte   protocol;
   };

// declare Flow database
database FlowStruct flowTable[FLOW_TABLE_SIZE];

// *********************************************
//                         MAIN
// *********************************************
void main($PACKET pkt, $PIB pib, $SYS sys)
{
   //local variables
   record FlowStruct insertRecord =
       {
          {0.0.0.0, 0.0.0.0, 0, 0, 0},
          {255.255.255.255, 255.255.255.255, 0xFFFF, 0xFFFF, 0x0}
       };

   FlowStruct pktData;
  
   int flowTableRow;

   // Increment packet counter
   ++totalPacketCount_;

   // Set the default action to be forward the packet
   pib.action = FORWARD_PACKET;

   if ( (pib.l3Type == 1) &&
        pib.flags.l3CheckSumValid &&
        pib.flags.l4CheckSumValid &&
        (ipv4.protocol == TCP_PROTOCOL)){

       pktData.srcAddr  = ipv4.sourceAddress;
       pktData.dstAddr  = ipv4.destinationAddress;
       pktData.srcPort  = tcp.sourcePort;
       pktData.dstPort  = tcp.destinationPort;
       pktData.protocol = ipv4.protocol;

       try {
          flowTableRow = flowTable.match( pktData );
          if ( tcp.flags & 0x01 ){
              flowTable[flowTableRow].delete();
              activeFlowCount_--;                       // Decrement Active Flow Counter.
          }
          else if ( tcp.flags & 0x02 ){
                     duplicateSYNCount_++;              // Duplicate SYN observed.
          };
       }
       catch (ERR_DB_NOMATCH){
          if ( tcp.flags & 0x02 ){                      // Check to see if this is a SYN
              try {
                 insertRecord.data = pktData;           // Insert record into database.
                 flowTable.insert( insertRecord );
                 activeFlowCount_++;                    // Increment Active Flow Counter.
                 totalFlowCount_++;                     // Increment the Total Flow Counter.
              }
              catch( ERR_DB_FULL){
                 insertDBFull_++;
              };
          };    // end if
       };    // end catch

   }    // end if
}    // end main

Following on the discussion of utilizing control-plane resources, consider the application shown above as a simple example of watching traffic flows to provide a view into communications. The metadata stored within the database tables can be queried and viewed by a control-plane system along with the global variables providing a semblance of load level.

From a developer’s point of view, the application above appears overly simple, yet from the suggestion prior to the example of breaking the workload, this logic may be all that is required as dynamic APIs can provide access for a control plane to do longer term operations. An example of this is flow time-out where flows left in a database table may sit stale for minutes before they are able to be considered for removal due to lack of further communications. While the packetC application can perform this operation, so too can the control plane and the notion of whether a flow is cleansed at minute 3 or 15 seconds later will generally have little impact on resources, yet the removal of garbage collection from the data plane can be significant.

Additionally, as an architect, consider the times when it is most appropriate to break up large applications into multiple smaller applications. For example, the tracking of metadata can be deployed on a processor that is operating passively in the network such that traffic volumes exceeding the rate of the processor may fail to be processed but will not negatively degrade network traffic performance. If the solution requires both metadata production and active inline controls, such as the example of providing acceleration services for gaming, two small applications may work better as a design. In this broader example, the application above can passively produce metadata for an out-of-band system that may be watching multiple systems across the network. As time dictates providing of the improved network service, another simple packetC application may be deployed actively in-line with all network traffic redirecting the gaming traffic onto an MPLS circuit that provides better peering and performance for the gaming provider. Both packetC applications become small and simple yet a broad set of computational requirements can be actively playing a role in dynamically changing the operation of the real-time system. The key point here is to consider the out-of-band components being designed equally with and during the development of the data-plane architecture so as to do that which isn’t truly real-time outside of the data plane whenever possible and achieve maximum performance. Don’t fall into the large application lure of packetC because of its familiarity and ability to incorporate vast C libraries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.198.59