“It’s the economy, stupid!” was a phrase that drove design choices in President Clinton’s 1992 campaign. “It’s about the business!” is the phrase that determines design choices for event-processing applications. Most of the benefits in implementing event-processing applications come from transforming, or significantly improving, the business.
In this chapter we apply the concepts and ideas of the previous chapters to propose a collection of best practices in event processing. We also draw upon our experiences working with different companies in different domains. The chapter is organized in the following main parts:
Starting out —Starting out with developing an event-processing application for the business: points to keep in mind.
Pilot project —How to select a pilot project for an event-processing application.
Best practices —Best practices to set up in your business for designing and implementing event-processing applications.
The following points are self-evident but should be kept in mind when building applications for an enterprise that has not, as yet, acquired software tools that are explicitly labeled “event processing,” “EDA,” or “CEP”:
The enterprise is already event-driven—it has event-driven business processes even if IT applications explicitly called “event processing” don’t exist in the enterprise. The organization may not, however, have systematically evaluated the types of valuable event-driven business processes that it can implement today or that it will be able to implement in the near future.
Most IT applications in the enterprise are, and will remain, hybrid systems with time-, request-, and event-driven components. Event-processing applications are built using mature IT technologies that already exist in the enterprise and (possibly) new components. Event processing adds value to earlier investments in IT technologies.
Some enterprises have developed event-processing components, including CEP components, for their businesses. For example, defense departments and space organizations have immense expertise in event processing in their domains. Existing components and expertise within the enterprise should be used in developing new applications.
The benefits and costs of event-processing technologies are directly related to the business. Justifications of conventional IT projects don’t emphasize benefits such as situation awareness, event detection, and accurate response, which are central to event processing (see Chapter 4).
Next, we discuss the following steps in developing event-processing applications:
1. Identify applications suitable for event processing.
2. Identify user communities.
3. Identify scenarios and responses.
4. Identify data sources.
5. Identify events and data transformations.
6. Estimate costs and benefits and plan for the future.
Two early steps that help introduce applications that are explicitly labeled CEP, EDA, or event processing are to identify the applications in your enterprise that are most likely to benefit from the addition of event-driven components, and identify the different communities of users in your enterprise who will benefit from the addition of event-driven functionality.
A first step is to determine whether a new business application or functionality can benefit from event-processing technologies (see Chapter 5). Here we summarize the main points. As you saw in Chapter 1, the system drivers for event-processing applications are timeliness, agility, and information availability. The technology-push and consumer-pull drivers for event-driven interactions are summarized in Chapter 2 by the PC-cubed (price, pervasiveness, performance, connectedness, celerity, and complexity) trends; these trends influence the role that EDA will play in the future. The informal expectations that people have about event-driven interactions are different from those for time- and request-driven interactions. The expectations that businesspeople have about components of a proposed application indicate whether time-, request-, or event-driven processing is appropriate for those components. The presence of several A-E-I-O-U features, described in Chapter 5, in a business domain indicate that EDA is suitable for that domain. Chapter 5 also shows how the framework is applied to evaluate the suitability of EDA for a variety of business domains. The effort that goes into this initial screening of a business problem will yield returns in later steps.
The value proposition of EDA is a business value proposition. EDA applications have a visible impact on people in lines of business. Even applications in which the entire process, from data acquisition to response, is automatic have a tangible impact on the business. A place to start identifying business domains that benefit from event processing is with the businesspeople who will be most affected.
People in different roles are affected by the addition of event-driven functionality. Some roles benefit in the short term and others over the longer term. Consider, for example, the installation of smart electric meters as the electric grid is upgraded to the smart grid. Smart meters provide an immediate, visible benefit to the metering and billing part of the business because smart meters communicate to computers in the utility, obviating the need for meter-readers to travel to customer sites. Smart meters also enable new applications that provide greater benefits over the longer term. Twoway communication between meters and the utility enables the utility to control appliances in the home, and this allows customers and the utility to jointly reduce demand when the system is about to get overloaded. Responsive demand allows the utility to build fewer power-generation plants and transmission lines and reduces carbon emissions. These savings are much higher than the savings from fewer meter-readers; however, these savings accrue over the longer term. The grid is getting “smarter” in stages, with different groups of people benefiting at different points in its evolution. Similarly, EDA functionality installed at trading desks provides immediate benefits to traders, whereas functionality added later also benefits other groups of users such as corporate risk managers.
Note: Multiple groups of people will be affected by the addition of event-driven functionality. Some groups benefit in the short term while others benefit over the longer term.
EDA applications can be transformational and provide significant benefits to many different constituencies in the enterprise. When a business unit identifies the benefits of event processing, the benefits become apparent to related business units. For example, farmers put National Animal Identification System (NAIS) RFID tags on their livestock to help officials track infected animals; but farmers use the same tags to improve management of their animals. An application that uses accelerometers in buildings to identify areas damaged by earthquakes can also be used to study building dynamics and weaknesses in welds. Components such as sensors used in an application for one business unit can be used for a different kind of application for a different unit. Managing staged rollout of integrated enterprise-wide EDA applications that deliver tangible benefits to different groups at each stage is a business, technical, and project-management challenge.
Many event-driven business processes interact with people outside the business. (The “O” in the A-E-I-O-U list of features stands for “outside.”) The smart grid has different types of user groups, including rate payers, the metering and billing organization in the utility, the transmission and distribution organization, power-generation companies, and the Independent Systems Operator (ISO) that coordinates operations on the grid. Some user groups, such as the metering and transmission organizations, are within the utility, and other user groups, such as rate payers, the ISO, and power-generation organizations, are outside the utility. Early steps in developing an event-driven application are to identify the user communities within and outside the enterprise and then estimate the different measures of business benefits that each of these groups derives from the application.
The complexity and the uncertainty of design parameters and cost/benefit measures depend, in part, on the type of application. Likewise, the business case for an event-processing application depends on its characteristics. Next, let’s look at two characteristics of the applications and their impact on the business cases for the applications and cost/benefit measures:
Is the application a new (“greenfield”) application or an improvement to an existing application?
Does the application respond continuously to events or does it respond only to rare, but critical, events?
The strategy for developing event-processing functionality depends on whether it improves existing processes or enables entirely new processes. Technologies such as RFID (radio frequency identity) can transform business processes but can be understood, nevertheless, within the framework of more familiar devices such as barcode readers. The benefits and costs of applications that improve existing business processes are clearer than for applications that enable totally new processes. For example, the return on investment (ROI) from using RFID bands on patients in hospitals can be estimated with greater certainty than the ROI from EDA technologies to manage wind energy.
Many business applications, such as the smart grid, have both types of features: they improve existing processes and enable new solutions. The smart grid improves existing metering and billing processes by automatically transmitting meter readings to the utility. Some features of the smart grid are totally new because technologies—such as distributed energy resources from wind—are being deployed on scales never seen before or because new technologies, such as phasor measurement instruments, are becoming available as commercial off-the-shelf (COTS) devices.
The grid that has operated with truly remarkable success for a century has to become “smarter” and more event-driven to deal with the issues of large amounts of transient “green” power, reduced standby capacity from fossil fuel generators, and increasing peak demand. As reported by Reuters (see Appendix A), system operators curtailed power to interruptible customers by over 1,000 megawatts within 10 minutes when wind dropped dramatically in Texas in February 2008; the operators sensed a change— reduction in wind power—and responded in an appropriate, timely fashion.
The ROI of smart meters, purely from a billing perspective, can be estimated from records of costs before and after traditional meters are replaced by smart meters in target areas. Predicting ROI from making the grid smarter to deal with distributed resources such as wind and solar energy is more difficult; however, these resources cannot be exploited without adding “smartness” and more event-processing capability to the grid.
An advantage of using event processing to improve existing functionality, as opposed to developing totally new business processes, is that changes in business processes engendered by incremental additions of functionality are less radical than those engendered by completely new functionality. This allows application developers to expend more effort on technology and less on the difficult problem of managing change to the business. On the other hand, an advantage of using event processing to create totally new business applications is that doing so demonstrates the transformative power of event processing, and the benefits are huge. A good practice is to demonstrate ROI from event-processing technologies by first taking less-risky, moretechnology-oriented steps to improve existing functionality, and then taking moredifficult, business-oriented steps to develop new transformative processes.
Note: You can demonstrate ROI from event-processing technologies in stages by first taking technology-focused steps to improve existing functionality and then developing transformative event-driven business processes.
RFID applications—whether for tracking patients in hospitals, palettes in warehouses, packages in transshipment points, or other objects in different tracking applications— respond to events continuously. By contrast, applications that warn about shaking during earthquakes respond rarely. Some applications combine routine responses to frequent, customary events with critically important responses to rare, unusual situations. For instance, smart electric grids upload energy consumption recorded in smart meters on a routine basis and also help manage rare brownout situations.
Applications that provide the greatest benefit by responding to critical rare events can be designed to also provide value by detecting, recording, and exploiting information in frequent, routine events. Routine events are used to develop models of the enterprise and its environment. All of us observe and record (subconsciously) events in dealing with our families and colleagues, and we use the recorded events to build informal models of them; then we use these models to predict (informally) how they will behave in different situations. Enterprises use business intelligence to build models from logs of recorded events. Scientific instruments that provide immense value when they detect critical rare events also provide value from continuous measurement. The Large Hadron Collider in Switzerland will provide colossal value when (and if) it is used to detect the Higgs boson particle; however, the instrument also provides great value from its continuous ongoing measurements. Seismological networks provide the most value in identifying regions of the greatest shaking after rare, severe earthquakes; however, commonplace events, continuously recorded by the networks, are the raw materials from which seismological models are built. Airplane cockpits display continuous measurements and also issue alarms. Business activity monitoring (BAM) dashboards provide value when they display key performance indicators (PKIs) that indicate situations that require response; however, dashboards also deliver benefits on a continuing basis because they provide evidence that nothing that requires immediate action has occurred.
Applications that are required to respond to routine events can be extended, often with little additional cost, to also respond to rare but important events. The initial specification of the application may have restricted attention to routine operations; however, a little creative analysis may show how the same application can be used to respond to exceptional threats and opportunities. A baggage-handling application that responds to the routine events of baggage movement can be extended to intercept packages containing contraband or dangerous material in seconds. Advanced metering infrastructures for electricity and water that routinely detect and record the resources consumed can be used to detect unusual situations. Demonstrating ROI for an application that responds both to routine and rare events is easier than demonstrating ROI for an application that responds only to rare events; likewise, the ROI for an application that responds to routine events can be increased by extending it to deal with massive, but rare, threats and opportunities.
Tip: Extend applications required to respond to rare, critical events to also respond to frequent, routine events. Likewise, extend applications required to respond to routine events to detect and respond to rare events that may occur.
After identifying the application domain, continue the emphasis on users by identifying the scenarios and responses that are most valuable to them. The practice of identifying use cases and scenarios is especially important when developing event-processing applications because many scenarios are driven by natural agents or adversaries over which the enterprise has no control. Scenarios for intrusion detection describe different types of intruders and their strategies; scenarios for smart homes describe the many situations in which older people may need help; and scenarios for the smart grid specify a variety of natural and manmade conditions. The scenarios describe what nature may do (windstorms, earthquakes), what hostile people may do (attempt to manipulate your customers’ bank accounts and appliances), and what systems may do inadvertently.
In some cases clients are unwilling to describe scenarios in any detail. Traders in energy, stock, or foreign exchange may refuse to describe the scenarios they care about, the responses they wish to take, the events they want to identify, and even the data sources they want to monitor. In these cases, designers work with generic or “sanitized” scenarios, and this is often difficult especially when debugging applications that cannot be revealed entirely.
Responses are also events, and responses are intertwined with other steps in event-processing applications. A system that secures an area from intruders must initiate a response when a possible intruder is detected; however, the application’s function doesn’t end with the initiation of the response—the application continues to respond to events such as movements of the intruder, movements of security personnel, and alarms being turned on. Applications that alert traders about opportunities also respond to actions that the trader takes in response to the alert. Since the event-processing applications are so tightly interwoven with business problems (detecting intruders and exploiting trading opportunities), businesspeople have to put in a lot of effort to identify and describe scenarios.
Note: The effort that goes into specifying scenarios is primarily for describing the business aspects of the scenarios and only secondarily for describing the IT aspects.
A common response of EDA applications is to inform people about events by updating BAM dashboards; sending alerts by e-mail, instant messaging, phone calls, or audible alarms; and providing tools, such as maps, for dealing with the event. For example, an accounting application responds to deviations of actual and planned expenditures by sending alert messages that contain locations in data cubes that help resolve the problems. When radioactive material is moved from a safe to an unsafe location in a hospital, an application responds by generating audible alarms and sending alert messages containing information about the situation. Spend time to identify the scenarios that describe how people want to be alerted, the devices that they want used, the situations under which they want to be alerted, and the times at which they want to be interrupted.
Many people use event-driven processing to monitor their social networks. Twitter is an example of a “social activity monitoring” application that displays information about people that a user is interested in (or to use the vernacular, the people that the user is “following”). The number of participants in social networks has increased dramatically in the last 5 years, and the number of social activity monitoring dashboards now exceeds the number of BAM dashboards. A generation has grown up with interactive games that process thousands of events per second. People use event processing routinely in nonwork settings such as news alerts. When you identify scenarios and responses, you should also look at the ways in which customers of your proposed application use event processing in social settings.
Some EDA applications respond without human intervention. Cross-trading applications, described in Chapter 2, match buy and sell orders in milliseconds without human interaction. Search engines automatically select advertisements to be displayed on the web pages in a second. Some software enterprise applications have interfaces that automatically trigger workflows when specified events occur, and responses for most applications include automatic updates to information repositories. Even for applications that respond entirely automatically, the scenarios must identify the business benefits of proposed applications, such as the expected revenue from a different advertisement-selection algorithm.
Best practices in developing event-processing applications are the same as those for any other application except that event-processing applications emphasize different benefits and costs than those emphasized in conventional IT applications. Responses in event-processing applications usually result in action—not merely in the transfer of information—and this action often involves interactions with the world outside the enterprise. Thus, an event-processing application may change the ways in which the enterprise interacts with the outside world. A goal of this step is to understand how the dynamics of interactions between the enterprise and its environment will change as a consequence of the event-processing application.
The first three steps in developing event-processing applications, covered in the previous sections, identify what the needs are: the business applications for which eventprocessing components are appropriate, the different user communities, and the responses the users need. The next steps identify how those needs can be met. Of course, application development doesn’t (and shouldn’t) progress in a strict waterfall from one stage to the next.
The variety, number, and cost-effectiveness of data sources are increasing by the day. Sensors are becoming more sensitive and accurate even though they consume less power and have smaller form factors than ever before. The costs of sensors such as temperature, pressure, and strain gauges and accelerometers have dropped substantially compared to the costs of other goods and services. Increasing numbers of products have event emitters built in at the factory—building event emitters into a product, as it is being manufactured, is cheaper than inserting emitters into finished products. Vendors are implementing more software products, such as business processes and databases, with event emitters. Some enterprise software applications have publishand-subscribe interfaces that allow other components to subscribe for events generated by the applications.
Many organizations offer web services or other application programming interfaces (APIs) for accessing valuable information, including news, blogs, business data, prices, trends, weather forecasts, journal papers, abstracts, and patents. Websites can be “screen scraped” even if they do not provide APIs for acquiring information. (Screenscraping is best avoided, because it is unreliable; if screen-scraping is necessary, the results should be checked frequently in case the schemas at the website change.) Enterprises have access to a great deal of valuable information on the Internet, and the quantity continues to grow dramatically. Event-processing applications that could not have been built just a few years ago because data sources didn’t exist have become viable today. The PC-cubed trends tell us that event-processing applications will become even more cost-effective in the future.
Some applications delegate the acquisition of information to the public; this delegation is sometimes called “crowd-sourcing.” The idea is similar to the “wisdom of crowds”—use the collective intelligence of a lot of individuals. The idea in crowdsourcing information acquisition is to use the “senses of crowds.” The U.S. Geological Survey’s website includes a “Did You Feel It?” page (http://earthquake.usgs.gov/eqcenter/dyfi) that gets data from thousands of people to estimate where shaking occurred after an earthquake. Some applications detect traffic congestion by fusing data from hundreds of drivers. Crowd-sourcing is invaluable in many applications.
The application must have enough data sources to ensure high accuracy and completeness. The application can improve accuracy by verifying information from one data source with information from another. “Sanity checks” of data acquired from external sources also help weed out errors. The application must also have enough data to detect significant events. It is helpful to list the possible sources that an enterprise could use for a given application and then cull the list later.
Though the numbers and varieties of data sources continue to grow at explosive rates, an enterprise may not have creatively and systematically attempted to identify the data sources that can help it. There are beneficial data sources within and outside the enterprise, in hardware and in software, with and without APIs, generated by devices and generated by people.
Tip: Identify invaluable, unexploited data sources that are available today and that will become available in the near future by encouraging business and IT people to brainstorm together.
Now that we have identified the input (data sources) and output (desired responses), we next design the steps that process the input to produce the output. Initially, we merely sketch the steps in the computation to determine whether the desired output can be computed from the given input in the specified time, or whether we need more or different data sources, or whether we need to modify responses.
Many of the steps that transform input to output may be implementable using components already in your enterprise’s software stack. An inventory of the enterprise’s IT components—such as BAM portals, business intelligence (BI) tools, rule engines, databases, and enterprise service buses (see Chapters 9 and 10)—helps in determining the additional components, if any, that are required. Your enterprise may have high-performance versions of these components (such as real-time BI, lowlatency rule engines, and in-memory databases) and may have in-house expertise in these technologies. Moreover, your enterprise may have developed highly tuned eventprocessing applications in its core competence. A component that builds shake maps (that identify the degree of shaking at different points due to earthquakes) is highly specialized; it is unlikely that there is any benefit in reimplementing the component by tailoring other COTS components. You can leverage specialized components by implementing the application using architectures that allow you to snap in the different types of components required by your application (see Chapter 8).
A preliminary mapping of computational steps to resources in the enterprise, and components that the enterprise can acquire, helps in estimating the feasibility and cost of the proposed application.
Tip: Make an inventory of the enterprise’s existing event-processing applications, the IT technologies (including high-performance versions) that it already has, and its expertise in technologies related to event processing.
Many commercial and open source components for event processing have been developed recently. For example, alert engines are components that gather news (including events about crises such as hurricanes), organize and display information on mobile phones or desktops, and send alerts to devices based on the user’s preferences (such as, mobile phone during the day, e-mail messages at night). Communication-enabled business process (CEBP) systems identify the best group of people to deal with an event based on which people are available at the current time, their locations, and their skill sets. CEBP also sets up collaboration tools such as teleconferences, calendars, e-mail, and wikis for the taskforce created to deal with an event. You can acquire software tools or use publicly available services for a wide variety of technologies, including natural-language processing, image and video analysis, geospatial analysis and display, time-series analysis, and signal processing. The variety and power of specialized components for event processing continues to increase, driven partially by growth in consumer applications. The design challenge is to select a proprietary or open source service or component from a vendor for each component in the design and then to integrate these new components with components in the enterprise’s portfolio.
The events generated by an application may be valuable to other applications, possibly in other divisions of the enterprise. The event objects that describe these events may prove to be invaluable in the future; applications that are not even on the drawing board could find these event objects to be useful. Furthermore, event objects stored in repositories could be used later by BI, statistics, and model-building tools. You saw in the example of the smart grid that enterprise-wide event-processing applications can be developed in stages over decades, and certain types of events generated by an application for smart metering today may be used by very different applications in 5 or 10 years.
Tip: Assess the value of monitoring events in a business application so that the information can be used by applications in other divisions in the enterprise, by BI, and by future business functions.
The benefits of EDA applications—such as event detection, faster response, accurate responses, and situation awareness—are not emphasized in project justifications for conventional IT applications. The business case for an event-processing project is similar to that for other IT projects but the relative emphases on different benefits are important.
The PC-cubed trends—technology push for lower price, greater performance, and increasing pervasiveness of EDA technologies coupled with consumer pull for technologies that deal with greater complexity, connectedness, and celerity—tell us to expect substantial changes during the lifetime of an EDA application. The lifetime of a smart electric meter is at least a decade; in that time, home energy-management systems, electrical appliances, and communication technologies connecting homes to utilities will change. Utilities cannot postpone installing smart meters. Overdesigning today’s meter to deal with possible new requirements in the next decade is expensive. On the other hand, replacing millions of meters in a utility’s service area, when the meters no longer meet requirements, is expensive, too.
Estimating the benefits and costs of any IT system over its lifetime is a challenge. This estimation is, however, particularly difficult for event-processing applications because IT components are often intertwined with long-lasting infrastructures and business processes. Bridges last for centuries, and smart bridges, by virtue of their smartness, will last even longer. Computers, however, become obsolete within 5 years.
A design question that impacts the long-term benefits of an event-processing application is: How general-purpose should the application be? A house remains much the same over the decades of its life, but a smart house is a programmable house: it can be changed by reprogramming, possibly by a programmer at the opposite side of the globe. A door is a door, but a “smart” door is configurable and reconfigurable. Should you design a smart door for today’s requirements, or should it have generalpurpose sense-and-response capability to satisfy requirements over the door’s lifetime, or should it be a plug-and-play door so that sensors, responders, and processing units can be plugged in and popped out of the door? How much work will a homeowner or “smart” handyman or handywoman have to do to reprogram a smart door? The different measures of benefits and costs, described in Chapter 4, can be estimated only approximately over the long lifetimes of these systems; nevertheless, the estimates help in scoping out the proposed application.
Note: Estimating the benefits and costs of event-processing applications over their lifetimes is difficult because the applications are often tightly intertwined with long-lasting infrastructures and business processes.
An event-processing project in one division of an enterprise serves as a catalyst for other divisions to rethink the structures of their business processes. Making business event objects in one division visible to other divisions leads to integration of eventdriven business processes across multiple divisions; this, in turn, makes the entire enterprise more responsive and agile. The potentially transformative power of event processing can result in viral dissemination of the technology. The potential benefit of event objects generated in one business unit for units across the enterprise makes design and estimation of benefits difficult. It also makes managing mission-creep important since multiple business units may want to add functionality to initial designs. Issues about uncertain requirements during the lifetime of the application (think smart grid) are bound to be raised. So, complete the implementation of eventprocessing applications in stages, and demonstrate the tangible business benefits of each stage before going on to the next.
An incremental, layered approach to implementing event-processing functionality reduces risk. A pilot project or proof of concept (PoC) at each stage helps to demonstrate that the functionality for that stage can be implemented and integrated into the business. Evaluations of costs and benefits after a pilot project is completed help determine whether initial cost/benefit estimates were reasonable.
Less time is required for a pilot project for a well-defined, “shrink-wrapped,” mature application in a vertical business domain than for a new application implemented by integrating application-independent components. For example, today electric utilities put a great deal of thought and effort into designing test systems for evaluating advanced metering infrastructures (AMIs) that collect and analyze data from smart electric meters. That’s because the utilities are integrating components such as smart meters, communications networks, and billing systems to form advanced-metering applications. They are pioneers. In a decade, professional services companies, products companies, and electric utilities will have developed expertise in implementing AMI systems. At that point, smart metering will have become a well-defined business application with generally accepted design principles and standards; so, much less effort will be required to implement proofs of concept. Many utilities, however, do not have the luxury of waiting for a decade.
A pilot project must be a sufficiently complete representation of the actual task so that extrapolations from the pilot project to the actual task are credible. On the other hand, the pilot project should be small enough that it can be finished quickly. This tension is common to all system development; however, there are some issues that are accentuated when developing event-processing applications. One of these issues is the effort required from the businesspeople in specifying and evaluating the pilot project. Since the benefits of EDA applications are directly visible to people in lines of business, the evaluation of a pilot project must be carried out with the aid of businesspeople; however, they often have more-pressing responsibilities.
The authors have participated in, or observed, designs of pilot projects in many business domains, including electricity power trading and commodity trading. In some cases, the decision to evaluate event-processing software for trading was taken by management—not by the traders themselves; however, the value of the application is that it helps traders. An event-processing vendor insisted on commitments of time from a few representative traders to help design and evaluate the PoC. They provided critical feedback about what they did and did not want in the application. The commitment from the traders proved absolutely necessary for a useful PoC.
Getting commitment from management for evaluating a trading PoC is sometimes easier than getting the same commitment from the traders themselves. There are many reasons for this, including the manner in which managers and traders are compensated for their work. Traders are under pressure to go back to their trading and get immediate results rather than spend time on a PoC for an application which may provide value months later. A best practice is to get commitments of time from end users as well as management to specify and evaluate pilot projects.
Note: Getting commitment from management for time from end-users to specify and evaluate a pilot project is sometimes easier than getting the same commitment from endusers themselves; a best practice is to get commitments from both users and management.
Another characteristic of a proof of concept for an event-processing application is that benefits and costs of the PoC must be evaluated along multiple dimensions such as better situation awareness and more rapid, accurate responses. Evaluating each of the REACTS (relevance, effort, accuracy, completeness, timeliness, and security) benefits provided by a PoC helps in estimating the ROI of the final application. For example, a PoC evaluation for a trading application should measure, or estimate, parameters such as: How much time does a trader need to tailor the application to satisfy that trader’s specific needs? What is the change in risk exposure due to traders using the application? What are the security weaknesses of the application exposed by the PoC? There will be cases where the PoC delivers inaccurate data, doesn’t detect events, and reacts late—the value of a PoC is in measuring how much better the proposed system is than current business practice. The REACTS metrics aren’t usually emphasized in justifications for conventional IT projects.
The size and duration of a pilot project for any application should be determined carefully; this is especially true for an event-processing application because it interacts with the environment outside the enterprise. A pilot project for an AMI for electricity must evaluate the ability of the system to get measurements from meters to the utility in a timely fashion under different environmental conditions such as thunderstorms and windstorms. These pilot projects span thousands of homes over many months to ensure that likely natural and manmade situations are experienced. Designs and redesigns of some applications never stop. For example, applications to detect movement of hazardous radiation material are improved continuously to deal with new types of threat scenarios.
This section looks at how material presented from previous chapters is used to design event-processing applications systematically. Best practices are derived from the apparently trite, but useful, dictum: “It’s all about the business!” There are many best practices in designing event-processing applications; we focus attention on the following:
Stepwise development of event-processing functionality —The activities of validating assumptions, testing implementations, and measuring benefits and costs are particularly important for event-processing applications because they are so intertwined with business activity. The assumptions that must be tested are not merely about the software but, more importantly, also about the business. Stepwise development validates assumptions about the business, tests implementations of changed business processes as well as software, and measures or estimates business benefits at each step.
Using models of the business and its environment —Models play a central role in event-processing applications. An application that triggers business processes to check on events that appear to signal fraud uses a model of what is, and is not, fraudulent. Alerts to traders about opportunities are based on models of the market. The development and use of models are critical parts of designing event-processing applications.
Designing for long-term business benefits —Business event processes are useful to the business for the long term. Event-processing applications in healthcare, pharmaceuticals, smart bridges, smart grids, trading, and national security will remain critical for decades; however, the requirements for these applications will change with changing business conditions. A good practice is to design event-processing applications for the long term. Business event processes must be designed so that they can be monitored, administered, configured, and reconfigured over a long term.
Stepwise development adds functionality incrementally and validates or modifies assumptions at each step before going on to the next. There are some situations for which stepwise development is inappropriate, and we discuss those situations later in this section. Where possible, a good practice is to follow the sequence of steps outlined next.
Business application monitoring (BAM) applications are good candidates for adding event-processing functionality in steps because they can be overlaid on top of existing applications. A BAM application acquires and processes raw events and displays KPIs. It provides visibility into a business process without controlling it. The response of the IT part of a BAM application is to display information—the response does not trigger workflow, invoke web services, or take other active steps. The active response of a BAM application, however, consists of the actions executed by the person who sees the KPI display.
The advantages of stepwise development of BAM applications are that each step has bounded scope; each step delivers tangible business value; and each step focuses on different concerns. We recommend that each step focus on the following concerns:
1. Data sources and display —The initial BAM application focuses on acquiring and displaying data. At this step developers identify data sources, build connectors to the data sources, and organize and display the data. Performance indicators at this step are simple aggregates of the acquired data. The value proposition of this initial application is better situation awareness and consequently more accurate, timely responses to situations that may arise. At this step, the person with the display acts not only as the component that detects events from patterns of data but also as the component that responds to the detected events.
2. BAM networks —The next steps deal with integrating multiple applications into BAM networks that mirror the organization of the enterprise. Systemwide situation awareness requires events from multiple applications to be fused together to provide a holistic picture. Information local to a specific business unit is shown on displays for people in that unit, whereas displays for executives who manage multiple units show aggregate information across all the units that they manage. The value proposition for this step of application development is that the organization, as a whole, has better situation awareness of itself and its environment.
3. Detecting events —In the first step (“Data sources and display”) and the second step (“BAM networks”) people are responsible for detecting patterns in the data that indicate significant events, and they are responsible for initiating responses to these events. The technology in the first two steps merely aggregates and displays data—it doesn’t detect patterns. In the next step, the application uses algorithms for detecting patterns in the data to identify significant events and then sends alerts about events to different people and different devices depending on the time of day, business roles of people, their skill sets, and the type of the alert. In this step, the application also proactively determines and displays the tools and data that a user will need to respond to an event. For example, an application may point to locations in repositories, such as data cubes, containing information helpful in responding to the event. At this step, responses to alerts are still carried out by people though detection of events is done with the aid of software.
4. Automatic responses —The final steps focus on triggering automatic responses to detected events. In this step, the application automatically invokes business processes, triggers workflows, and initiates other activities automatically, in addition to displaying data and sending alerts.
The approach given here is cautious and incremental: crawl before walking, walk before running, and evaluate return on investment at every step. There are, however, many problems for which the business case for a complete event-processing solution is so compelling, that directly implementing step 4—a full event-processing solution with automatic support for sensing, analyzing, and responding—is the right thing to do. Further, some applications cannot be implemented in a sequence of incremental steps. For example, a first incremental step of developing a BAM application is not helpful for businesses, such as algorithmic trading, that require responses in less than a second.
Event-processing applications rely on formal or informal models of the business and its environment. In Chapter 2 we discussed a model that a mother on a trip has about her family. The model allows her to conclude that no emergencies have happened at home if she hasn’t heard from her family. Your colleagues, friends, and doctors have informal models of you, and these models play key roles in their interactions with you. Routine medical checkups provide data that doctors use to build baseline models of your health. Your doctor’s actions are based on this model: for example, your doctor may recommend, based on the model, that you do not run a marathon.
An event-processing application that supports trading is based on a model of markets and trading. When a trader specifies that a certain pattern of stock prices signals a buy opportunity, the trader is using a model to predict the probable direction of stock prices. Applications that warn about impending hurricanes are based on weather models. When the Food and Drug Administration decides that spinach from certain regions of the country needs to be destroyed because of salmonella, it makes the decision based on a model of food distribution and disease.
You have expectations of an airline’s behavior when you register for alerts at the airline’s website—for example, you expect to be alerted within minutes if your flight is canceled. You estimate the current state of your flight and take actions based on your expectations. In the absence of an alert, you drive to the airport, expecting that the flight is operational. If you find when you get there that your flight had been canceled several hours ago, you conclude that your model of the airline is inaccurate, and you update it.
A key aspect of designing an event-processing application is identifying the implicit or explicit models that users of the application employ. The application uses the models to estimate the current situation and predict the future. The model, as a central construct, helps guide systematic design even when users don’t couch their requirements in terms of models. The construct helps designers determine whether data sources are adequate and whether expectations of system, environment and user behavior are accurate enough to execute responses effectively.
Models of hurricanes, earthquakes, electricity distribution, air traffic, and many other systems are complex. An event-processing application may require powerful computers to execute models and compare what-if scenarios so that the system can respond to events with celerity.
Note: The model, as a central construct, helps guide systematic design even when users don’t couch their requirements in terms of models.
Event-processing applications are used for many years. They are built so that they can be administered and reconfigured while they are in operation. You should keep the different business benefit measures in mind when you design a system. We now focus attention on a few issues that we haven’t covered in detail: designing for performance, tailoring the application to suit the user, systems administration including security issues, and build versus buy tradeoffs.
Most event-processing applications can be executed on parallel computers. An eventprocessing network (EPN) can be mapped naturally onto a network of computers, with each node of the computer network being responsible for executing a phase of the EPN. Event objects are processed in a series of steps, and while one step of an event object is being executed, another step of a different event object can be executed concurrently in a different computational thread or process. Clusters of computers and multiprocessors are becoming commonplace, powerful, and inexpensive, and they are suitable platforms for event processing.
Many EDA applications are distributed systems, because sensors are located where data is generated and responders are located where responses are executed. An important aspect of design is defining the distributed structure—determining what information should be sent where, and what computations should be done at each location. There is a conceptual simplicity to designs in which all components, apart from sensors and responders, are located at a single site.
Storing all data at a central site simplifies replay, because computations can be replayed by data from central event logs. Central events logs also simplify debugging, forensics, and what-if analyses. On the other hand, sending all data to a central site may require excessive communication bandwidth and consequent expense. In many applications, satisfactory accuracy is obtained by sending summary data from the periphery of the network to internal nodes. For example, seismological applications can have thousands of sensors deployed over a wide region, with each sensor being capable of sending measurements several times per second. The system can function effectively with each sensor sending infrequent short messages containing summary data except when a sensor detects an unusual pattern. Since unusual patterns are infrequent, the system needs low average bandwidth. Similarly, there are applications in telecommunications, homeland security, and defense that function effectively with only fractions of sensor data sent to processing nodes.
One approach for dealing with the tradeoff between easy replay and forensics on the one hand and costs of communication on the other is as follows. Each site sends only summary information to central sites but stores information about all events that occurred at that site during a time window. The sites ship data logs to central sites when bandwidth is available. Simulations, what-if analyses, and forensics studies are executed using data stored at central sites except in those relatively rare cases where that data is insufficient.
Business costs and benefits are also used to determine what information to save and what to discard. Cameras connected to traffic signals, at airports, at border crossings, and aboard unmanned aerial vehicles record information at enormous data rates, millisecond by millisecond, and year by year. Science experiments, such as the Large Hadron Collider and telescope networks, also capture information at high bit rates for years and decades. Even though costs of storage continue to decrease, storing all information in perpetuity is not always cost-effective. Here, too, a best practice is to estimate the probable business benefits from alternatives such as storing all the raw information, storing only summary information, or storing detailed recent information and coarse-grained summaries of old information.
The performance of event-driven systems benefits from technologies such as inmemory databases and streaming databases. In-memory databases can operate an order of magnitude faster than conventional databases, and streaming databases offer high-performance operations on streams of events. The design decision of whether or not to use these technologies depends on the business benefits of timeliness (see, for example, the value-time functions in Figure 4-2). For some applications, the ratedetermining component may inherently require seconds to complete; so, reducing the time for a database operation from a second to a millisecond may only reduce the overall response time by a tiny fraction.
Note: A best practice is to sketch all the steps in an event-processing flow; estimate— however approximately—the times required for each step; estimate the value-time function that determines the value of a response as a function of the time to respond; determine the performance requirement for each step; and only then select the technologies appropriate for each step.
An important aspect of design is determining how to help end users tailor an application to meet their own individual needs. As we discussed in earlier chapters, if end users cannot tune applications to satisfy their changing requirements, then the applications may provide irrelevant, inaccurate data. A best practice is to work with business users to understand their willingness to configure and reconfigure an application. What are the user interfaces and programming notations with which different groups of end users are familiar? Are they power users of spreadsheets? Do they prefer SQL? Or do they not have the time or inclination to configure the application? Do users want to turn data sources on and off based on their levels of trust? Can IT staff or professional services develop business templates that users can fill in, or do users need more flexibility? These questions are important in designing any system, but they are particularly important in event-processing systems.
Note: A best practice is to design configurability into every component and to think, at every stage of design, about how business users will tailor the system to meet their needs.
Features for administering and managing an event-processing application are similar to those for any continuously running application. Long-running applications should have plug-and-play capability that allows components such as sensors to register with the application and then interact with it. For example, a seismic application that allows any accelerometer to send signals to the application must have mechanisms for registering new sensors, where registration steps include giving each distinct sensor a unique ID and recording sensor parameters such as the type of sensor, the owner, and location.
A key aspect of plug-and-play is security; for example, an application must identify and discard information from clearly faulty sensors, and it must ensure that a single device cannot act as though it is thousands of different devices. Many event-processing applications are mission critical. The consequences of successful attacks by hackers or inadvertent failures are enormous in defense, national intelligence, financial trading, and smart systems such as smart grids, smart homes, and smart roads. Fraudulent activities that manipulate event-processing systems have been carried out by insiders familiar with system operation.
Note: You have to pay attention to security at every step of the design.
“It’s about the business!” is the phrase that determines design choices, including the build-versus-buy choice, for event-processing applications. The effort, time, and costs of understanding and transforming the business far exceed the costs of buying or building software components.
Much of the effort in implementing event-processing functionality is expended in integration—integrating new event-processing components with other components in your enterprise’s software portfolio, and integrating new event-processing functionality with the rest of the business. The buy-versus-build decision depends in part on the ease with which the components you buy or build can be integrated with the enterprise’s existing IT infrastructure. (Of course, the buy-versus-build tradeoff for a shrink-wrapped application that contains event-processing components is the same as for any other shrink-wrapped application.)
There are several reasons to buy event-processing software and hardware components, including the following:
A vendor may have specialized expertise in a business domain, such as financial trading, and demonstrated understanding of the many important business issues in event-processing applications in that domain.
The application may require very rapid response times and the ability to serve high data rates. Some vendors have software tools that have been honed, over years of experience, to deliver extremely high performance. Developing equivalent tools in-house will take time.
An event-processing component may be used by many different types of users who need to tailor the component for their specific roles. Developing flexible interfaces for business users to configure components takes time. Buying a component with a very flexible configuration mechanism is likely to save money in the long run.
The central point of this chapter is to focus on benefits and costs to the business at every stage of implementing an event-processing application. There are many different axes along which costs and benefits of EDA applications are measured, and you should consider all of these axes.
The success of an event-driven application depends on how well the application is integrated with the business. Effective application integration into the business takes time and effort from different groups of people, including IT staff and people in lines of business. Event-driven applications transform business processes—they don’t merely improve current practices. As a consequence, the development of event-driven applications must be careful and systematic.
The chapter suggested a sequence of steps to take in implementing an event-driven application. Developing applications in a sequence of steps reduces risk and helps ensure that the final applications are efficient. There are several reasons for considering acquisition of specialized event-driven software components, and some of the key reasons were highlighted.
3.22.217.45