Requirements and challenges
This chapter explores the challenges that are associated with the application's requirements. The following user-centric statement summarizes the high-level business requirement:
"As a Walmart business unit associate, I need the ability to request information about granular activities or events that are related to the operations of my business unit, and to receive that information within 2 seconds of my request."
The following sections review the technical details of reaching this objective. Factors include data volume, structure, storage type, and input/output (I/O) rates.
3.1 Volume
As described in 2.3.2, “Walmart Event Processing System (EPS)” on page 10, many business areas can subscribe as publishers to the event processing framework. The overall quantity of events can be large, so the volume of data becomes a consideration. The projections for event publishing were 5 million events per day upon initial application deployment and rapidly increasing to 500 million events per day, as more publishers come online. Additionally, the application required retention periods of 7 - 30 days for the events, further increasing the overall volume.
3.2 Searching
In a situation like this, capturing and storing high volumes of data is typically not too concerning. However, the ability to search through large amounts of these records quickly can be challenging. The application services large volumes of events that include dynamic indexable attributes that are associated with business data and metadata. The searches against the data are also very dynamic, with criteria for event types, attributes, and search range all being highly variable. Additionally, any search results with multiple entries must be returned in chronological order. So, the search capability was immediately identified as the primary problem to be addressed.
3.3 Service Level Agreement
The term "quickly" is obviously a relative statement that depends on many factors. Most importantly for this application, the searches are primarily driven by human interaction with the actual user who expects near-real-time results in a client web application. So, the Service Level Agreement (SLA) for the responses is set to a maximum of 2 seconds to maintain a quality user experience. Within the context of this publication, the term "quickly" relates to this metric of less than or equal to 2 seconds.
3.4 Data Repository
The application team initially implemented the data repository component of this application on a distributed database product. However, they encountered many problems with stability and an inability to achieve the response-time goals. Other distributed database products were considered. But the application team instead approached the Walmart z/OS Services Engineering team (zServices team) with the challenge. That team previously demonstrated success with other zServices products and how those products operate at scale.
All zServices products that store data use centralized storage that is shared among numerous systems. The file storage that is used is Virtual Storage Access Method with Record Level Sharing (VSAM/RLS), which allows concurrent read and update accessibility from numerous processes. This unique characteristic of the z/OS platform allows broad access to data. At the same time, it avoids many of the complexities and challenges with managing distributed data. For more information about Walmart's use of VSAM/RLS, see How Walmart Became a Cloud Services Provider with IBM CICS, http://www.redbooks.ibm.com/abstracts/sg248347.html?Open.
3.5 Data Structure
This section examines the event object and what it represents. Then it reviews how the format of the event can be adjusted to accommodate processing expectations.
3.5.1 Event object
The events are simply collections of information that is presented in a canonical format. At present, either JavaScript Object Notation (JSON) or Extensible Markup Language (XML) formats are used. The following simplified example of an event is represented in JSON, because that format is most commonly used:
{
"eventType": "US|000001721|0001|2019-02-11T08:00:35.002Z",
"eventID": "c3BlZWRiYWxs==",
"producerId": "2e7d0f3f00d5acdc497b358779fa1b6f",
"statID": "0000000042"
"locID": "13355"
}
This sample event includes these details:
eventType - includes operating region, business unit, subunit, date/timestamp
eventID - identifies specific event
producerID - entity that generated the event
statID - value that represents some status
locID - identifies a specific geographic location
In computer systems, the values that are being processed are commonly cryptic and carry little meaning to a human observer. To better understand what this event might actually represent, some assumptions can be made for what each of the values in this event signifies (Table 3-1):
Table 3-1 Event object
 
Value label
Value
Description
eventType
US|000001721|0001|2019-02-11T08:00:35.002Z
Identifies this event as being generated by the Transportation division in the U.S. at 08:00:35 on February 11, 2019.
eventID
c3BlZWRiYWxs==
A specialized value that provides uniqueness to the event entry.
producerID
2e7d0f3f00d5acdc497b358779fa1b6f
Unique string that identifies a particular vehicle in the fleet.
statID
0000000042
Code that represents a status of "ARRIVED".
locID
13355
Location number that equates to a Distribution Center in Derry, Maine.
With these value associations established, the preceding event can be reproduced as the following human-readable statement:
"The U.S. Transportation fleet vehicle number 2e7d0f3f00d5acdc497b358779fa1b6f arrived at the Derry, Maine Distribution Center at 08:00:35 on February 11, 2019."
3.5.2 Format conversion
The team decided at an early point to convert the canonical format of the events to a columnar structure in the data store. This change accommodates the search function as follows:
The columns (or positional fields) represent the indexable attributes of the events, potentially along with more fields to include other relevant characteristics.
Subgroups of these fields can then be defined as composite keys. This approach facilitates the use of initial search parameters for quick identification. Then, other attributes can be located within those entries to further filter results.
Of particular note in this scheme, each event is timestamped and all searches are based on time ranges. An "event ID" attribute is also included in the event to provide uniqueness. That way, time stamp (included in the eventType value) and event ID are included in a composite key. Other attributes can be included in the composite key based on the published events and search mechanisms that are provided, but those scenarios are not covered here.
The columnar structure and composite key are denoted in a definition list, which is used by the (extended pointer set) EPS service. In the following example, the columns or fields that are associated with each item in the JSON event are defined. Additionally, a top-level entry identifies the fields that are included in the composite key. (Notice the Len= value, which is the sum of the lengths of the eventType and eventID fields.)
ID=001,Col=0000001,Len=000056,Type=C,Name=eventKey
ID=001,Col=0000001,Len=000042,Type=X,Name=eventType
ID=001,Col=0000002,Len=000014,Type=C,Name=eventID
ID=001,Col=0000004,Len=000032,Type=C,Name=producerID
ID=001,Col=0000005,Len=000010,Type=C,Name=statID
ID=001,Col=0000006,Len=000020,Type=C,Name=locID
The conversion of the event object into this columnar format effectively "flattens" the information into an individual row of concatenated values. The primary identifying components are located at the beginning of the entry. Put another way, the event can now be viewed as a key and an associated payload, as represented in Figure 3-1.
Figure 3-1 Key and Payload
The new format now fits nicely (as a key and value) into the Key-Sequenced Data Store (KSDS) VSAM/RLS structure that can be used for searching.
This data structure was not an explicit functional requirement for the service. On the other hand, it was a necessary technical condition for achieving the non-functional requirements of the application. This structure proved to be highly relevant in subsequent testing and design decisions.
 
These event examples have been highly simplified for demonstration purposes and to maintain proprietary rights.
3.6 Fundamental I/O Requirements
Ultimately, the ability to satisfy the application's response-time goals depends on the ability of the EPS search service to process all relevant records within that time frame. The overall volume of data includes 500 million events per day for up to 30 days. Nonetheless, search criteria restrictions in the client interface effectively limit this processing to subsets of the data. Each search request is restricted to the following boundaries:
A single operational region
A single business unit
A 24-hour period
As reviewed in 3.5, “Data Structure” on page 14, each of the preceding values is included in the key for each event in a KSDS VSAM/RLS data store. As a result, direct access to the relevant divisions of the data is possible, and the search service I/O activity is also confined to the pertinent subset of events.
The application team projected the maximum number of events for a given operational region/business-unit combination to be 5 million per day. In reality, most of these groupings were projected to generate about 2 million events per day. Additionally, it was expected that even for the largest data sets, the typical search pattern would be limited to ranges that hover around 2 million events.
This information and the stated response time SLA of 2 seconds established in 3.3, “Service Level Agreement” on page 14 establish the processing requirements of the search service. Table 3-2 illustrates the calculations of the typical and atypical processing requirements.
Table 3-2 Processing requirements of the search service
 
Total Events to Search
Response Time SLA
Required Processing Rate
(Total Events/Response Time SLA)
2,000,000 (Typical)
2 seconds
1,000,000 per second
5,000,000 (Outlier)
2 seconds
2,500,000 per second
The "typical" scenario of 1 million events per second was deemed acceptable and became the primary requirement. The outlier scenario was reserved as a stretch goal. The following chapters examine these issues:
Initial approaches to this challenge
How eventually asynchronous processing with the IBM CICS asynchronous API was used to achieve these goals.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.136.142