Homegrown asynchronous solution
As described in Chapter 4, “Our initial sequential approach” on page 19, it quickly became apparent that serially searching the collection of events would not be sufficient. Our response time goals were still out of reach for our initial application. This chapter explores how we satisfied the SLA by using asynchronous methods to allow searches to run in parallel.
5.1 Parallel processing
The evolution of computer systems to multi-processor and multi-core systems led to the advent of parallelism in computing. Operations were split into smaller pieces that ran simultaneously on multiple processors or cores. This same concept can be applied to get around the constraints of sequential I/O rates in our scenario.
 
In Computer Science, Parallelism and Concurrency are similar concepts, but they are not identical. The general distinction is as follows:
Parallelism involves literal simultaneous processing (in other words, each process on an individual core).
Concurrency involves perceived simultaneous processing (in other words, time slicing of some of the processing).
In reality, both concepts can come into play for the solutions that are described in this document, depending on conditions. However, for the sake of simplicity, this document focuses on the Parallelism concept.
In Chapter 4, “Our initial sequential approach” on page 19, our preliminary tests established baseline rates for serial I/O in standard CICS APIs, then for native VSAM/RLS I/O operations.
We observed the rates that are listed in Table 5-1:
Table 5-1 Sequential I/O Rates
 
 
CICS APIs
Native VSAM/RLS
Serial I/O rate per second
5,000
60,000
With Parallelism, you can achieve higher I/O rates than is afforded with single serial processes, which permits you to meet the prescribed processing goals. When multiple read tasks run simultaneously, the effective read rate becomes,
(the base serial read rate) X (the number of tasks)
Conversely, the number of parallel tasks that are needed to accommodate a particular throughput value can be calculated as,
(the total number of events) ÷ (the base serial read rate)
For example, Table 5-2 represents the number of parallel tasks that would be required to process 1 million events per second:
Table 5-2 Required Parallel Tasks
 
I/O Type
Total events
Serial read rate (events per second)
Parallel tasks (Total events/Serial read rate)
CICS API
1,000,000
5,000
200
Native VSAM/RLS
1,000,000
60,000
17
Looking at this another way, the total number of events processed per second by using parallel processing can be projected for a particular number of tasks for each I/O type. In the example in Table 5-3, 200 parallel tasks are assumed:
Table 5-3 Max read rate at 200 tasks
 
I/O Type
Serial read rate (events per second)
Task quantity
Effective read rate (Serial read rate * Task quantity)
CICS API
5,000
200
1,000,000
Native VSAM/RLS
60,000
200
12,000,000
These calculations suggest that the base requirement of processing 1 million events per second can be satisfied, and also that much higher throughputs can be achieved. However, managing parallelism like this presents many challenges. The number of tasks, the amount of work each task should do, the collation of results, and numerous other details must be considered. The following sections describe the mechanisms that we used to address these challenges.
5.2 Asynchronous processing
Parallel and asynchronous processing are admittedly two distinct concepts. But they can be used together. Specifically for this use-case, asynchronous processing mechanisms implement the parallelized I/O tasks. These tasks must be managed by some authoritative (or parent) process, and this process must be free (non-blocked) to manage multiple tasks within a specific request.
Additionally, search results from individual I/O tasks can be returned to the client in the order that they are processed by the service (in proper collation order, of course). That way, we avoid additional blocking to the client.
As covered in Chapter 2, “Background” on page 5, many techniques can be used to accomplish asynchrony in z/OS and CICS applications. The following section describes the techniques that are used by Walmart.
5.3 Design
As described in Chapter 3, “Requirements and challenges” on page 13, the search service must locate relevant events in the data store. Then it returns those events to the client in chronological order. Chapter 4, “Our initial sequential approach” on page 19 gave a description of the fundamental function of the search service, which still applies in this chapter. But now, we need the ability to split up and parallelize the I/O to the data store, so that we can process high volumes of entries within the response-time SLA. The remainder of this chapter focuses on the mechanics that accomplish this.
The following high-level steps show the sequence of events for parallel processing each search request:
1. Receive request from client
2. Use search parameters to identify and start I/O tasks
3. Monitor I/O tasks for completion
4. Gather and return results to client in the proper order
To accomplish these steps, the search service includes this task hierarchy:
A parent task receives the client request and determines how many child tasks to start.
The child tasks scour portions of the data store for relevant records and return those records to the parent task.
The parent task then returns the results to the client. This general design is depicted in Figure 5-1.
Figure 5-1 High-Level Design
Other components are needed to coordinate the activities between the main sections of this design. The focus for this document is the asynchronous mechanisms. So, these components are now explored from the perspective of a standard asynchronous design pattern, which includes the following activities:
Prepare Data for Child
Initiate Child
Check for Completion
Retrieve Data from Child
Perform Housekeeping
5.3.1 Prepare Data for Child
An HTTP GET request from the client initiates the EPS search service. The CICS WEB RECEIVE and WEB EXTRACT commands are run to retrieve the search criteria. Then, control is passed to the parent program logic.
As discussed in Chapter 3, “Requirements and challenges” on page 13, each search request is predicated on a time range. Based on this time range, a proprietary algorithm determines the quantity and duration of intervals in the data to search. These intervals equate to the number of asynchronous child processes that the parent task initiates.
All events in the data store are time sequenced. Each child task must process a fraction of the overall time range of the search. As a result, each child task has values: a value for which a STARTBR command will be invoked, and a value for the final READNEXT value of its assigned range.
Figure 5-2 gives a simple view of the search request example from Chapter 4, “Our initial sequential approach” on page 19.
Figure 5-2 Search intervals
Because we want to retrieve the child tasks in chronological order, the parent task creates an array to store information about each child task. Various units of information are placed into this array and used by the parent task to manage the child tasks. Relevant components of the elements in this array include these items:
Array index value
Interval start and end times that are assigned to child
Status flag/value
Sample code that shows the creation of the array is shown in Example 5-1.
Example 5-1 Array in parent task
*********************************************************************** 00581362
* Start TSQ Table List. * 00581489
* Maximum 255 entries. * 00581489
*********************************************************************** 00581562
TT_DSECT DSECT
TT_START DS 0CL09
TT_S_HH DS CL02 TS Start HH
TT_S_MM DS CL02 TS Start MM
TT_S_SS DS CL02 TS Start SS
TT_S_MS DS CL03 TS Start MS (not part of TSQ name)
DS CL07 Align
*
TT_END DS 0CL09 TS End time 00476045
TT_E_HH DS CL02 TS End HH
TT_E_MM DS CL02 TS End MM
TT_E_SS DS CL02 TS End SS
TT_E_MS DS CL03 TS End MS
DS CL07 Align
*
TT_209 DS 0CL09 TS Resume time 00476045
TT_R_HH DS CL02 TS Resume HH
TT_R_MM DS CL02 TS Resume MM
TT_R_SS DS CL02 TS Resume SS
TT_R_MS DS CL03 TS Resume MS
DS CL03 Align
*
*
TT_STAT DS CL01 TS Table entry status
* A - active
* I - inactive
* R - resume (after 209)
* S - started
* C - completed
*
TT_IDX DS CL03 TS Index number
*
TT_E EQU *-TT_DSECT Entry length
The parent task then uses its EIBTRNID and EIBTASKN values, along with the array index value of the associated child task to task these actions (Example 5-2):
1. Create a unique channel name.
2. Store the relevant search request information for this child task through a PUT CONTAINER.
3. Repeat these steps for each child task that needs to be initiated.
Example 5-2 TS and CHANNEL name in parent task
*********************************************************************** 00790110
* TSQ name for Child response. The TSQ name will also serve as the * 00790219
* CHANNEL name. Child task uses the CHANNEL name as the TS response. * 00790219
*********************************************************************** 00791019
DS 0F
TS_TSQ DS 0CL16 TS Queue name 00476045
TS_TRAN DS CL04 TS Tran ID
TS_TASKN DS CL07 TS Task Number
TS_IDX DS CL03 TS Index number
TS_SP DS CL02 TS Spaces
DS 0F 00475945
The updated design diagram in Figure 5-3 shows the addition of these components.
Figure 5-3 Prepare Data for Child
5.3.2 Initiate Child
After creating the set of request containers, the parent program issues a START TRANSID for each child task (Example 5-3). The START TRANSID commands include the CHANNEL option, which indicates the constructed name that corresponds with the array index for each particular child task. After the START commands run, the parent task updates the Status Flag portion of the array entry for each child to indicate that it has been started.
Example 5-3 START TRANSID CHANNEL for parent task
*********************************************************************** 00790110
* Issue START for Child task providing the Channel name, which will * 00790219
* be used as the TSQ response queue name. * 00790219
*********************************************************************** 00791019
SY_0138 DS 0H 00791110
MVC TS_CHILD,EIBTRNID Move current TranID
MVC TS_CHILD+1,EC_CHILD Move child Identifier
*
MVI TT_STAT,C'S' Move 'started' indicator
MVC TS_IDX,TT_IDX Move index number
*
EXEC CICS START X
TRANSID (TS_CHILD) X
CHANNEL (TS_TSQ) X
NOHANDLE
* 00791223
When a child task starts, an ASSIGN CHANNEL command runs and gets the channel name that was created by the parent task. The child task then uses this name to create a Temporary Storage (TS) queue into which it posts response information (Example 5-4).
Example 5-4 ASSIGN CHANNEL in child task
*********************************************************************** 00790110
* Issue ASSIGN for CHANNEL name, which is used as the TS response. * 00790219
*********************************************************************** 00791019
SY_0000 DS 0H 00791110
EXEC CICS ASSIGN CHANNEL(CHANNEL) NOHANDLE
MVC TS_QNAME,CHANNEL Move CHANNEL name to TSQNAME
*
The child task issues the GET CONTAINER command to acquire the search request information that was passed from the parent task. Then the child task performs a GETMAIN SHARED operation to establish a location to store the result set from its search assignment (Example 5-5).
Example 5-5 Issue GETMAIN SHARED in child task
*********************************************************************** 01070292
* Issue GETMAIN for Result Set in SHARED storage * 01070392
*********************************************************************** 01070292
GM_0020 DS 0H 00973499
ST R14,GM_REG Save return register 01070893
* 00806542
EXEC CICS GETMAIN X00806642
SET(R1) X00806764
FLENGTH(G_LENGTH) X00806864
INITIMG(HEX_00) X00806986
SHARED X00806986
NOHANDLE 00806986
* 00806542
L R14,GM_REG Load return register 01070893
BCR B'1111',R14 Return to caller 01070893
* 00948799
Then, the child task issues the STARTBR (Example 5-6) and READNEXT (Example 5-7) commands until it has processed its entire assigned interval.
Example 5-6 STARTBR for parallel I/O child task
*********************************************************************** 00790110
* Issue STARTBR on Primary Column Index when this service is not * 00790219
* defined to the ECM zPARM as HP I/O ‘yes’. * 00790219
*********************************************************************** 00791019
SY_0085 DS 0H
*
EXEC CICS STARTBR X
FILE (WF_FCT) X
RIDFLD (DF_KEY) X
GTEQ X
NOHANDLE
*
CLC EIBRESP,=F'13' NOTFND condition?
BRC B'1000',ER_20401 ... yes, STATUS(204)
OC EIBRESP,EIBRESP Normal condition?
BC B'0111',ER_50701 ... no, File I/O error
*
*********************************************************************** 00790110
* GET for HP I/O or READNEXT for API method. * 00790219
*********************************************************************** 00791019
SY_0090 DS 0H
CLI WS_HPIO,C'Y' ECM HP I/O enabled?
BRC B'0111',SY_0093 ... no, use EIP services
*
CLI HP_STAT,C'Y' HP I/O active?
BRC B'0111',ER_50712 ... no, exit stage left.
Example 5-7 READNEXT for parallel I/O child task
*********************************************************************** 00790110
* Issue READNEXT until EOF or key range is exceeded. * 00790219
*********************************************************************** 00791019
SY_0093 DS 0H
MVC WF_LEN,=H'32700' Move record length
L R10,FF_ADDR Load record address
*
EXEC CICS READNEXT X
FILE (WF_FCT) X
RIDFLD(DF_KEY) X
INTO (DF_DATA) X
LENGTH(WF_LEN) X
NOHANDLE
*
CLC EIBRESP,=F'20' ENDFILE condition?
BRC B'1000',SY_0899 ... yes, set EOF
CLC EIBRESP,=F'13' NOTFND condition?
BRC B'1000',SY_0899 ... yes, set EOF
*
OC EIBRESP,EIBRESP Normal condition?
BRC B'0111',ER_50702 ... no, File I/O error
*
 
An additional container is used during this process and it has the other search criteria for fields in the payload section of the records. This container is not relevant to the asynchronous mechanisms that we are discussing. It is omitted from this design description to maintain simplicity and avoid confusion.
After the child task processes and stores all for its assigned interval, the task runs a WRITEQ TS operation that uses the name that was obtained from the earlier ASSIGN CHANNEL command (Example 5-8).
Example 5-8 Issue WRITEQ TS for response in child task
*********************************************************************** 01111599
* Put response information in TSQ for Parent task to process. * 01112099
*********************************************************************** 01120010
TS_0030 DS 0H 00791110
EXEC CICS WRITEQ TS X
QNAME (TS_QNAME) X
FROM (TS_REC) X
ITEM (TS_ITEM) X
LENGTH (TS_LEN) X
MAIN X
NOHANDLE
* 01077147
TS_0099 DS 0H 00791110
L R14,TS_REG Load return register 01070893
BCR B'1111',R14 Return to caller 01070893
* 01077147
The data that is written to the TS queue includes status information and the address of the result set that is in the GETMAIN SHARED area. The updated design diagram in Figure 5-4 shows the addition of these components.
Figure 5-4 Initiate Child
5.3.3 Check for Completion
After all required child tasks are instantiated, the parent task begins to process the responses. The results must be returned to the client in chronological order, so the internal array is used to sequence the processing of responses.
As mentioned in the previous section, a child task completes its processing by issuing a WRITEQ TS command. This command creates a uniquely-named TS queue that is derived from information in the management array, which was created by the parent task. The information in the TS queue name includes the array index value that is associated with the corresponding child task. The parent task uses this information to issue a READQ TS command to that unique TS queue name.
When the TS queue does not exist, this condition indicates that the child task has not completed. In this case, the parent task issues a STIMERM macro (SVC 47) with a default of 50 milliseconds. Then the parent task branches back to the READQ TS and attempts to process that child response again. It repeats this process until it receives a response from that child task. Then the parent task proceeds through the remaining entries in the array using the same method. If a total processing time of 30 seconds is reached, the request is ended. See Example 5-9 on page 32.
Example 5-9 Synchronicity in parent task
*********************************************************************** 00790110
* Started entry found. Issue READQ for the TS_TSQ name. * 00790219
* If the TSQ is not available, issue a STIMERM for 50ms and continue * 00790219
* this cycle for 600 times (30 seconds), then issue a Time-Out. * 00790219
*********************************************************************** 00791019
SY_0220 DS 0H
LA R1,TS_L Load TSQ record length
STH R1,TS_LEN Save TSQ record length
MVC TS_IDX,TT_IDX Move TSQ index number
*
EXEC CICS READQ TS X00791223
QNAME (TS_TSQ) X00791223
INTO (TS_REC) X00791223
LENGTH(TS_LEN) X00791223
ITEM (TS_ITEM) X00791223
NOHANDLE 00791223
*
OC EIBRESP,EIBRESP Normal response?
BRC B'1000',SY_0230 ... yes, continue process
*
L R1,SM_COUNT Load STIMERM count
LA R1,1(,R1) Add 1
ST R1,SM_COUNT Save STIMERM count
C R1,SM_MAX Max STIMERM time?
BRC B'1011',SY_0282 ... yes, log a Time-Out
*
*********************************************************************** 00791019
* STIMERM Macro does not support relative addressing, so I'm coding * 00790219
* the instructions with the necessary adjustments. * 00790219
*********************************************************************** 00791019
OC MS_WAIT,MS_WAIT Wait set already?
BRC B'0111',*+10 ... yes, bypass default
MVC MS_WAIT,=F'5' ... no, set 50 ms to interval
LA R8,STIMERID Load STIMER ID
LA R9,MS_WAIT Load wait time
*
* STIMERM SET,BINTVL=(R9),WAIT=YES,ID=(R8)
*
SY_0225 DS 0H STIMERM invocation
LAE R1,SM_LIST Set up list address
MVC 0(4,R1),=X'11000001' Flag byte and LVL#
ST R8,4(,R1) Store ID address in SM_LIST
ST R9,8(,R1) Store Interval address in list
LA 0,4 Load Option byte into R0
SLL 0,24 Shift Option (bit 5 on)
SVC 47 Issue STIMERM SET SVC
*
BRC B'1111',SY_0220 Continue READQ for same Child
*
The updated design diagram in Figure 5-5 shows the objects that are related to the process of checking for completion.
Figure 5-5 Check for Completion
5.3.4 Retrieve Data from Child
After the READQ TS queue runs successfully, the address of the result set (a GETMAIN SHARED address) is obtained from the TS queue data. This result set from the child is sent to the client by using chunked message transfer through a WEB SEND command. After the WEB SEND command is complete, the parent increments the index by one. Then the parent processes the next array entry and repeats this process until the last child response has been sent to the client. The updated design diagram in Figure 5-6 shows these additional data-pull relationships.
Figure 5-6 Retrieve Data from Child
5.3.5 Perform Housekeeping
Along with managing client requests, task coordination, and response processing, the service must also do resource management. In particular, the service supervises and properly reclaims the various types of storage that are employed in this design. This process can be quite complex. When handled improperly, this process might cause storage to be orphaned, which has a negative impact on both the service and the CICS region or system.
Even under normal circumstances, TS queues and GETMAIN SHARED storage areas are not released or freed when the parent task or the child task terminates. In this case, extra logic for housekeeping is necessary in the parent task. After it completes the processing of each child task response, the parent task must run a DELETEQ TS operation to clean up the TS queue (Example 5-10). Then it runs FREEMAIN to release storage that was directly obtained by the child process (Example 5-11).
Example 5-10 DELETEQ of response TS queue in parent task
*********************************************************************** 01070292
* Issue DELETEQ TS for Child TS queue * 01070392
*********************************************************************** 01070292
TS_0010 DS 0H 00973499
ST R14,TS_REG Save return register 01070893
* 00806542
EXEC CICS DELETEQ TS X00806642
QNAME(TS_TSQ) X00806764
NOHANDLE 00806986
* 00806542
L R14,TS_REG Load return register 01070893
BCR B'1111',R14 Return to caller 01070893
Example 5-11 FREEMAIN of SHARED storage in parent task
*********************************************************************** 00790110
* Issue FREEMAIN for Response Array buffer * 00790219
* Check for resume status. * 00790219
*********************************************************************** 00791019
SY_0280 DS 0H 00791110
MVC TS_IDX,TT_IDX Move index number
L R3,TS_RA_A Load Child message buffer
BRAS R14,FM_0010 Issue FREEMAIN
*
CLI TT_STAT,C'R' Resume interval?
BRC B'0111',SY_0290 ... no, get next entry
*
LH R1,TS_ITEM Load Item Number
A R1,=F'1' Add 1 to Item Number
STH R1,TS_ITEM Save Item Number
BRC B'1111',SY_0220 ... yes, process same entry
*
*********************************************************************** 01070292
* Issue FREEMAIN for Child message buffer * 01070392
*********************************************************************** 01070292
FM_0010 DS 0H 00973499
ST R14,FM_REG Save return register 01070893
LTR R3,R3 Zero address?
BRC B'1000',FM_0099 ... yes, bypass FREEMAIN
* 00806542
FM_0020 DS 0H 00973499
EXEC CICS FREEMAIN X00806642
DATAPOINTER(R3) X00806764
NOHANDLE 00806986
* 00806542
FM_0099 DS 0H 00973499
L R14,FM_REG Load return register 01070893
BCR B'1111',R14 Return to caller 01070893
* 00948799
However, abnormal conditions must also be considered. Any premature termination of the service might also orphan storage and lead to instability. To address this risk, another level of housekeeping is incorporated into the design.
An independent background task is defined as an Interval Control Element (ICE) to run periodically for each service. As described earlier in this chapter, the EIBTRNID and EIBTASKN values are used to establish unique channel names that tasks can use. These names are used in TS queue definitions. This information also is used by the background housekeeping process. The background task takes the following actions:
Issue INQUIRE TSQUEUE START and NEXT commands to browse TS queues.
Check EIBTRNID value to identify associated service instance.
Use the EIBTASKN value on an INQUIRE TASK command to determine whether parent task is active.
If parent task is no longer active,
 – Issue READQ TS against queue name to get GETMAIN SHARED address.
 – Issue FREEMAIN to release storage.
 – Issue DELETEQ TS to release TS queue.
This process adds even more components to the design. The updated design diagram in Figure 5-7 shows these additional parts.
Figure 5-7 Retrieve Data from Child
5.4 Summary
This chapter has described the main components and provided high-level views of the original design. This design achieves parallelism with asynchronous methods to achieve the I/O rates that the application requires. The projections of throughput rates that can be achieved by parallelizing the search activity held true. The objective of processing at least 1 million events per second was accomplished.
Even with the simplified description of the design in this chapter reveals the complexity of the solution. In Chapter 6, “IBM CICS asynchronous solution” on page 37, the same general design is described, but it is based on CICS asynchronous API instead of custom-built mechanics.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.17.140