Chapter 5. Homegrown asynchronous solution

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Homegrown asynchronous solution

As described in Chapter 4, “Our initial sequential approach” on page 19, it quickly became apparent that serially searching the collection of events would not be sufficient. Our response time goals were still out of reach for our initial application. This chapter explores how we satisfied the SLA by using asynchronous methods to allow searches to run in parallel.

5.1 Parallel processing

The evolution of computer systems to multi-processor and multi-core systems led to the advent of parallelism in computing. Operations were split into smaller pieces that ran simultaneously on multiple processors or cores. This same concept can be applied to get around the constraints of sequential I/O rates in our scenario.

In Computer Science, Parallelism and Concurrency are similar concepts, but they are not identical. The general distinction is as follows:

•Parallelism involves literal simultaneous processing (in other words, each process on an individual core).

•Concurrency involves perceived simultaneous processing (in other words, time slicing of some of the processing).

In reality, both concepts can come into play for the solutions that are described in this document, depending on conditions. However, for the sake of simplicity, this document focuses on the Parallelism concept.

In Chapter 4, “Our initial sequential approach” on page 19, our preliminary tests established baseline rates for serial I/O in standard CICS APIs, then for native VSAM/RLS I/O operations.

We observed the rates that are listed in Table 5-1:

Table 5-1 Sequential I/O Rates

	CICS APIs	Native VSAM/RLS
Serial I/O rate per second	5,000	60,000

With Parallelism, you can achieve higher I/O rates than is afforded with single serial processes, which permits you to meet the prescribed processing goals. When multiple read tasks run simultaneously, the effective read rate becomes,

(the base serial read rate) X (the number of tasks)

Conversely, the number of parallel tasks that are needed to accommodate a particular throughput value can be calculated as,

(the total number of events) ÷ (the base serial read rate)

For example, Table 5-2 represents the number of parallel tasks that would be required to process 1 million events per second:

Table 5-2 Required Parallel Tasks

I/O Type	Total events	Serial read rate (events per second)	Parallel tasks (Total events/Serial read rate)
CICS API	1,000,000	5,000	200
Native VSAM/RLS	1,000,000	60,000	17

Looking at this another way, the total number of events processed per second by using parallel processing can be projected for a particular number of tasks for each I/O type. In the example in Table 5-3, 200 parallel tasks are assumed:

Table 5-3 Max read rate at 200 tasks

I/O Type	Serial read rate (events per second)	Task quantity	Effective read rate (Serial read rate * Task quantity)
CICS API	5,000	200	1,000,000
Native VSAM/RLS	60,000	200	12,000,000

These calculations suggest that the base requirement of processing 1 million events per second can be satisfied, and also that much higher throughputs can be achieved. However, managing parallelism like this presents many challenges. The number of tasks, the amount of work each task should do, the collation of results, and numerous other details must be considered. The following sections describe the mechanisms that we used to address these challenges.

5.2 Asynchronous processing

Parallel and asynchronous processing are admittedly two distinct concepts. But they can be used together. Specifically for this use-case, asynchronous processing mechanisms implement the parallelized I/O tasks. These tasks must be managed by some authoritative (or parent) process, and this process must be free (non-blocked) to manage multiple tasks within a specific request.

Additionally, search results from individual I/O tasks can be returned to the client in the order that they are processed by the service (in proper collation order, of course). That way, we avoid additional blocking to the client.

As covered in Chapter 2, “Background” on page 5, many techniques can be used to accomplish asynchrony in z/OS and CICS applications. The following section describes the techniques that are used by Walmart.

5.3 Design

As described in Chapter 3, “Requirements and challenges” on page 13, the search service must locate relevant events in the data store. Then it returns those events to the client in chronological order. Chapter 4, “Our initial sequential approach” on page 19 gave a description of the fundamental function of the search service, which still applies in this chapter. But now, we need the ability to split up and parallelize the I/O to the data store, so that we can process high volumes of entries within the response-time SLA. The remainder of this chapter focuses on the mechanics that accomplish this.

The following high-level steps show the sequence of events for parallel processing each search request:

1. Receive request from client

2. Use search parameters to identify and start I/O tasks

3. Monitor I/O tasks for completion

4. Gather and return results to client in the proper order

To accomplish these steps, the search service includes this task hierarchy:

•A parent task receives the client request and determines how many child tasks to start.

•The child tasks scour portions of the data store for relevant records and return those records to the parent task.

The parent task then returns the results to the client. This general design is depicted in Figure 5-1.

Figure 5-1 High-Level Design

Other components are needed to coordinate the activities between the main sections of this design. The focus for this document is the asynchronous mechanisms. So, these components are now explored from the perspective of a standard asynchronous design pattern, which includes the following activities:

•Prepare Data for Child

•Initiate Child

•Check for Completion

•Retrieve Data from Child

•Perform Housekeeping

5.3.1 Prepare Data for Child

An HTTP GET request from the client initiates the EPS search service. The CICS WEB RECEIVE and WEB EXTRACT commands are run to retrieve the search criteria. Then, control is passed to the parent program logic.

As discussed in Chapter 3, “Requirements and challenges” on page 13, each search request is predicated on a time range. Based on this time range, a proprietary algorithm determines the quantity and duration of intervals in the data to search. These intervals equate to the number of asynchronous child processes that the parent task initiates.

All events in the data store are time sequenced. Each child task must process a fraction of the overall time range of the search. As a result, each child task has values: a value for which a STARTBR command will be invoked, and a value for the final READNEXT value of its assigned range.

Figure 5-2 gives a simple view of the search request example from Chapter 4, “Our initial sequential approach” on page 19.

Figure 5-2 Search intervals

Because we want to retrieve the child tasks in chronological order, the parent task creates an array to store information about each child task. Various units of information are placed into this array and used by the parent task to manage the child tasks. Relevant components of the elements in this array include these items:

•Array index value

•Interval start and end times that are assigned to child

•Status flag/value

Sample code that shows the creation of the array is shown in Example 5-1.

Example 5-1 Array in parent task

*********************************************************************** 00581362

* Start TSQ Table List. * 00581489

* Maximum 255 entries. * 00581489

*********************************************************************** 00581562

TT_DSECT DSECT

TT_START DS 0CL09

TT_S_HH DS CL02 TS Start HH

TT_S_MM DS CL02 TS Start MM

TT_S_SS DS CL02 TS Start SS

TT_S_MS DS CL03 TS Start MS (not part of TSQ name)

DS CL07 Align

TT_END DS 0CL09 TS End time 00476045

TT_E_HH DS CL02 TS End HH

TT_E_MM DS CL02 TS End MM

TT_E_SS DS CL02 TS End SS

TT_E_MS DS CL03 TS End MS

DS CL07 Align

TT_209 DS 0CL09 TS Resume time 00476045

TT_R_HH DS CL02 TS Resume HH

TT_R_MM DS CL02 TS Resume MM

TT_R_SS DS CL02 TS Resume SS

TT_R_MS DS CL03 TS Resume MS

DS CL03 Align

TT_STAT DS CL01 TS Table entry status

* A - active

* I - inactive

* R - resume (after 209)

* S - started

* C - completed

TT_IDX DS CL03 TS Index number

TT_E EQU *-TT_DSECT Entry length

The parent task then uses its EIBTRNID and EIBTASKN values, along with the array index value of the associated child task to task these actions (Example 5-2):

1. Create a unique channel name.

2. Store the relevant search request information for this child task through a PUT CONTAINER.

3. Repeat these steps for each child task that needs to be initiated.

Example 5-2 TS and CHANNEL name in parent task

*********************************************************************** 00790110

* TSQ name for Child response. The TSQ name will also serve as the * 00790219

* CHANNEL name. Child task uses the CHANNEL name as the TS response. * 00790219

*********************************************************************** 00791019

DS 0F

TS_TSQ DS 0CL16 TS Queue name 00476045

TS_TRAN DS CL04 TS Tran ID

TS_TASKN DS CL07 TS Task Number

TS_IDX DS CL03 TS Index number

TS_SP DS CL02 TS Spaces

DS 0F 00475945

The updated design diagram in Figure 5-3 shows the addition of these components.

Figure 5-3 Prepare Data for Child

5.3.2 Initiate Child

After creating the set of request containers, the parent program issues a START TRANSID for each child task (Example 5-3). The START TRANSID commands include the CHANNEL option, which indicates the constructed name that corresponds with the array index for each particular child task. After the START commands run, the parent task updates the Status Flag portion of the array entry for each child to indicate that it has been started.

Example 5-3 START TRANSID CHANNEL for parent task

*********************************************************************** 00790110

* Issue START for Child task providing the Channel name, which will * 00790219

* be used as the TSQ response queue name. * 00790219

*********************************************************************** 00791019

SY_0138 DS 0H 00791110

MVC TS_CHILD,EIBTRNID Move current TranID

MVC TS_CHILD+1,EC_CHILD Move child Identifier

MVI TT_STAT,C'S' Move 'started' indicator

MVC TS_IDX,TT_IDX Move index number

EXEC CICS START X

TRANSID (TS_CHILD) X

CHANNEL (TS_TSQ) X

NOHANDLE

* 00791223

When a child task starts, an ASSIGN CHANNEL command runs and gets the channel name that was created by the parent task. The child task then uses this name to create a Temporary Storage (TS) queue into which it posts response information (Example 5-4).

Example 5-4 ASSIGN CHANNEL in child task

*********************************************************************** 00790110

* Issue ASSIGN for CHANNEL name, which is used as the TS response. * 00790219

*********************************************************************** 00791019

SY_0000 DS 0H 00791110

EXEC CICS ASSIGN CHANNEL(CHANNEL) NOHANDLE

MVC TS_QNAME,CHANNEL Move CHANNEL name to TSQNAME

The child task issues the GET CONTAINER command to acquire the search request information that was passed from the parent task. Then the child task performs a GETMAIN SHARED operation to establish a location to store the result set from its search assignment (Example 5-5).

Example 5-5 Issue GETMAIN SHARED in child task

*********************************************************************** 01070292

* Issue GETMAIN for Result Set in SHARED storage * 01070392

*********************************************************************** 01070292

GM_0020 DS 0H 00973499

ST R14,GM_REG Save return register 01070893

* 00806542

EXEC CICS GETMAIN X00806642

SET(R1) X00806764

FLENGTH(G_LENGTH) X00806864

INITIMG(HEX_00) X00806986

SHARED X00806986

NOHANDLE 00806986

* 00806542

L R14,GM_REG Load return register 01070893

BCR B'1111',R14 Return to caller 01070893

* 00948799

Then, the child task issues the STARTBR (Example 5-6) and READNEXT (Example 5-7) commands until it has processed its entire assigned interval.

Example 5-6 STARTBR for parallel I/O child task

*********************************************************************** 00790110

* Issue STARTBR on Primary Column Index when this service is not * 00790219

* defined to the ECM zPARM as HP I/O ‘yes’. * 00790219

*********************************************************************** 00791019

SY_0085 DS 0H

EXEC CICS STARTBR X

FILE (WF_FCT) X

RIDFLD (DF_KEY) X

GTEQ X

NOHANDLE

CLC EIBRESP,=F'13' NOTFND condition?

BRC B'1000',ER_20401 ... yes, STATUS(204)

OC EIBRESP,EIBRESP Normal condition?

BC B'0111',ER_50701 ... no, File I/O error

*********************************************************************** 00790110

* GET for HP I/O or READNEXT for API method. * 00790219

*********************************************************************** 00791019

SY_0090 DS 0H

CLI WS_HPIO,C'Y' ECM HP I/O enabled?

BRC B'0111',SY_0093 ... no, use EIP services

CLI HP_STAT,C'Y' HP I/O active?

BRC B'0111',ER_50712 ... no, exit stage left.

Example 5-7 READNEXT for parallel I/O child task

*********************************************************************** 00790110

* Issue READNEXT until EOF or key range is exceeded. * 00790219

*********************************************************************** 00791019

SY_0093 DS 0H

MVC WF_LEN,=H'32700' Move record length

L R10,FF_ADDR Load record address

EXEC CICS READNEXT X

FILE (WF_FCT) X

RIDFLD(DF_KEY) X

INTO (DF_DATA) X

LENGTH(WF_LEN) X

NOHANDLE

CLC EIBRESP,=F'20' ENDFILE condition?

BRC B'1000',SY_0899 ... yes, set EOF

CLC EIBRESP,=F'13' NOTFND condition?

BRC B'1000',SY_0899 ... yes, set EOF

OC EIBRESP,EIBRESP Normal condition?

BRC B'0111',ER_50702 ... no, File I/O error

An additional container is used during this process and it has the other search criteria for fields in the payload section of the records. This container is not relevant to the asynchronous mechanisms that we are discussing. It is omitted from this design description to maintain simplicity and avoid confusion.

After the child task processes and stores all for its assigned interval, the task runs a WRITEQ TS operation that uses the name that was obtained from the earlier ASSIGN CHANNEL command (Example 5-8).

Example 5-8 Issue WRITEQ TS for response in child task

*********************************************************************** 01111599

* Put response information in TSQ for Parent task to process. * 01112099

*********************************************************************** 01120010

TS_0030 DS 0H 00791110

EXEC CICS WRITEQ TS X

QNAME (TS_QNAME) X

FROM (TS_REC) X

ITEM (TS_ITEM) X

LENGTH (TS_LEN) X

MAIN X

NOHANDLE

* 01077147

TS_0099 DS 0H 00791110

L R14,TS_REG Load return register 01070893

BCR B'1111',R14 Return to caller 01070893

* 01077147

The data that is written to the TS queue includes status information and the address of the result set that is in the GETMAIN SHARED area. The updated design diagram in Figure 5-4 shows the addition of these components.

Figure 5-4 Initiate Child

5.3.3 Check for Completion

After all required child tasks are instantiated, the parent task begins to process the responses. The results must be returned to the client in chronological order, so the internal array is used to sequence the processing of responses.

As mentioned in the previous section, a child task completes its processing by issuing a WRITEQ TS command. This command creates a uniquely-named TS queue that is derived from information in the management array, which was created by the parent task. The information in the TS queue name includes the array index value that is associated with the corresponding child task. The parent task uses this information to issue a READQ TS command to that unique TS queue name.

When the TS queue does not exist, this condition indicates that the child task has not completed. In this case, the parent task issues a STIMERM macro (SVC 47) with a default of 50 milliseconds. Then the parent task branches back to the READQ TS and attempts to process that child response again. It repeats this process until it receives a response from that child task. Then the parent task proceeds through the remaining entries in the array using the same method. If a total processing time of 30 seconds is reached, the request is ended. See Example 5-9 on page 32.

Example 5-9 Synchronicity in parent task

*********************************************************************** 00790110

* Started entry found. Issue READQ for the TS_TSQ name. * 00790219

* If the TSQ is not available, issue a STIMERM for 50ms and continue * 00790219

* this cycle for 600 times (30 seconds), then issue a Time-Out. * 00790219

*********************************************************************** 00791019

SY_0220 DS 0H

LA R1,TS_L Load TSQ record length

STH R1,TS_LEN Save TSQ record length

MVC TS_IDX,TT_IDX Move TSQ index number

EXEC CICS READQ TS X00791223

QNAME (TS_TSQ) X00791223

INTO (TS_REC) X00791223

LENGTH(TS_LEN) X00791223

ITEM (TS_ITEM) X00791223

NOHANDLE 00791223

OC EIBRESP,EIBRESP Normal response?

BRC B'1000',SY_0230 ... yes, continue process

L R1,SM_COUNT Load STIMERM count

LA R1,1(,R1) Add 1

ST R1,SM_COUNT Save STIMERM count

C R1,SM_MAX Max STIMERM time?

BRC B'1011',SY_0282 ... yes, log a Time-Out

*********************************************************************** 00791019

* STIMERM Macro does not support relative addressing, so I'm coding * 00790219

* the instructions with the necessary adjustments. * 00790219

*********************************************************************** 00791019

OC MS_WAIT,MS_WAIT Wait set already?

BRC B'0111',*+10 ... yes, bypass default

MVC MS_WAIT,=F'5' ... no, set 50 ms to interval

LA R8,STIMERID Load STIMER ID

LA R9,MS_WAIT Load wait time

* STIMERM SET,BINTVL=(R9),WAIT=YES,ID=(R8)

SY_0225 DS 0H STIMERM invocation

LAE R1,SM_LIST Set up list address

MVC 0(4,R1),=X'11000001' Flag byte and LVL#

ST R8,4(,R1) Store ID address in SM_LIST

ST R9,8(,R1) Store Interval address in list

LA 0,4 Load Option byte into R0

SLL 0,24 Shift Option (bit 5 on)

SVC 47 Issue STIMERM SET SVC

BRC B'1111',SY_0220 Continue READQ for same Child

The updated design diagram in Figure 5-5 shows the objects that are related to the process of checking for completion.

Figure 5-5 Check for Completion

5.3.4 Retrieve Data from Child

After the READQ TS queue runs successfully, the address of the result set (a GETMAIN SHARED address) is obtained from the TS queue data. This result set from the child is sent to the client by using chunked message transfer through a WEB SEND command. After the WEB SEND command is complete, the parent increments the index by one. Then the parent processes the next array entry and repeats this process until the last child response has been sent to the client. The updated design diagram in Figure 5-6 shows these additional data-pull relationships.

Figure 5-6 Retrieve Data from Child

5.3.5 Perform Housekeeping

Along with managing client requests, task coordination, and response processing, the service must also do resource management. In particular, the service supervises and properly reclaims the various types of storage that are employed in this design. This process can be quite complex. When handled improperly, this process might cause storage to be orphaned, which has a negative impact on both the service and the CICS region or system.

Even under normal circumstances, TS queues and GETMAIN SHARED storage areas are not released or freed when the parent task or the child task terminates. In this case, extra logic for housekeeping is necessary in the parent task. After it completes the processing of each child task response, the parent task must run a DELETEQ TS operation to clean up the TS queue (Example 5-10). Then it runs FREEMAIN to release storage that was directly obtained by the child process (Example 5-11).

Example 5-10 DELETEQ of response TS queue in parent task

*********************************************************************** 01070292

* Issue DELETEQ TS for Child TS queue * 01070392

*********************************************************************** 01070292

TS_0010 DS 0H 00973499

ST R14,TS_REG Save return register 01070893

* 00806542

EXEC CICS DELETEQ TS X00806642

QNAME(TS_TSQ) X00806764

NOHANDLE 00806986

* 00806542

L R14,TS_REG Load return register 01070893

BCR B'1111',R14 Return to caller 01070893

Example 5-11 FREEMAIN of SHARED storage in parent task

*********************************************************************** 00790110

* Issue FREEMAIN for Response Array buffer * 00790219

* Check for resume status. * 00790219

*********************************************************************** 00791019

SY_0280 DS 0H 00791110

MVC TS_IDX,TT_IDX Move index number

L R3,TS_RA_A Load Child message buffer

BRAS R14,FM_0010 Issue FREEMAIN

CLI TT_STAT,C'R' Resume interval?

BRC B'0111',SY_0290 ... no, get next entry

LH R1,TS_ITEM Load Item Number

A R1,=F'1' Add 1 to Item Number

STH R1,TS_ITEM Save Item Number

BRC B'1111',SY_0220 ... yes, process same entry

*********************************************************************** 01070292

* Issue FREEMAIN for Child message buffer * 01070392

*********************************************************************** 01070292

FM_0010 DS 0H 00973499

ST R14,FM_REG Save return register 01070893

LTR R3,R3 Zero address?

BRC B'1000',FM_0099 ... yes, bypass FREEMAIN

* 00806542

FM_0020 DS 0H 00973499

EXEC CICS FREEMAIN X00806642

DATAPOINTER(R3) X00806764

NOHANDLE 00806986

* 00806542

FM_0099 DS 0H 00973499

L R14,FM_REG Load return register 01070893

BCR B'1111',R14 Return to caller 01070893

* 00948799

However, abnormal conditions must also be considered. Any premature termination of the service might also orphan storage and lead to instability. To address this risk, another level of housekeeping is incorporated into the design.

An independent background task is defined as an Interval Control Element (ICE) to run periodically for each service. As described earlier in this chapter, the EIBTRNID and EIBTASKN values are used to establish unique channel names that tasks can use. These names are used in TS queue definitions. This information also is used by the background housekeeping process. The background task takes the following actions:

•Issue INQUIRE TSQUEUE START and NEXT commands to browse TS queues.

•Check EIBTRNID value to identify associated service instance.

•Use the EIBTASKN value on an INQUIRE TASK command to determine whether parent task is active.

•If parent task is no longer active,

– Issue READQ TS against queue name to get GETMAIN SHARED address.

– Issue FREEMAIN to release storage.

– Issue DELETEQ TS to release TS queue.

This process adds even more components to the design. The updated design diagram in Figure 5-7 shows these additional parts.

Figure 5-7 Retrieve Data from Child

5.4 Summary

This chapter has described the main components and provided high-level views of the original design. This design achieves parallelism with asynchronous methods to achieve the I/O rates that the application requires. The projections of throughput rates that can be achieved by parallelizing the search activity held true. The objective of processing at least 1 million events per second was accomplished.

Even with the simplified description of the design in this chapter reveals the complexity of the solution. In Chapter 6, “IBM CICS asynchronous solution” on page 37, the same general design is described, but it is based on CICS asynchronous API instead of custom-built mechanics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5. Homegrown asynchronous solution

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5. Homegrown asynchronous solution