Chapter 7. Operation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Operation

This chapter describes the basic tasks for running and maintaining zCX. It includes the following sections:

•7.1, “Software maintenance” on page 142

•7.2, “Automation” on page 149

•7.3, “Backup and recovery” on page 151

•7.4, “Diagnosis” on page 156

•7.5, “Monitoring with RMF on zCX instance level” on page 160

•7.6, “Configuring Grafana to monitor zCX containers” on page 166

•7.7, “Monitoring with Grafana on container level” on page 177

7.1 Software maintenance

zCX itself is a component of z/OS, but the Docker images that are used within zCX are not part of z/OS. As a result, different methods are required to update the software levels of zCX versus updating the application software that runs within the containers.

7.1.1 Maintenance for zCX

The application of service to zCX is a two-step process. In the first step, you install the maintenance into the z/OS system. In the second step, the installed service has to be picked up by the individual zCX instances.

Install zCX service into the system

zCX is a component of z/OS. So, fixes to zCX are delivered as PTFs and installed through SMP/E, which is the component in z/OS that is most commonly used to install products and fixes. You can and should order and install maintenance to zCX as part of your normal maintenance procedures for z/OS as a whole.

For more information on SMP/E, see the following this link in the IBM Knowledge Center:
https://z-esrv01h.servicedpaor.de/zos/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.gim/gim.htm

Activate Service for zCX instances

The installation of PTFs with SMP/E does not activate this service for any zCX instance that is provisioned. If you want to pick up the new service, it is not sufficient to restart an instance. Instead, to activate the service you must run the upgrade workflow in z/OSMF for each zCX instance separately.

To run the upgrade workflow, you must create a new workflow in z/OSMF as described in Chapter 4.8.2, “Select the zCX provisioning workflow” on page 61. The file path for the workflow definition file that upgrades a zCX instance is as follows:

/usr/lpp/zcx_zos/workflows/upgrade.xml

You can start the upgrade workflow without having to stop the instance in advance. But as last steps of the workflow, you are required to stop and restart the instance.

Step 1 of the workflow retrieves the instance information. Then, Step 2 checks for the upgrade version and shows you the currently active software level and also the level that is activated after this workflow runs, as shown in Figure 7-1 on page 143.

Figure 7-1 Upgrade workflow ROOT binary information

After Step 2, you can run the rest of the workflow with automation. The workflow halts as it reaches the steps to stop and restart the zCX instance. You must manually stop and restart the instance to finish the workflow.

When the instance comes up again, it displays messages about formatting of the root file system.

Example 7-1 zCX Startup messages after upgrade

GLZV002I zCX ZCXVL01 is formatting available space for use by

DSN=ZCX.REDB.ZCXVL01.ROOT2

GLZV003I zCX ZCXVL01 formatting complete for

DSN=ZCX.REDB.ZCXVL01.ROOT2

You can then issue a MODIFY command DISPLAY,DISKVER to verify that the current root version is the expected one that was outlined as target ROOT binary information in the workflow:

Example 7-2 zCX modify command DISPLAY,DISKVER

F ZCXVL01,DISPLAY,DISKVER

GLZC008I Disk Version information for zCX instance ZCXVL01

DevNo Data Set Name Version

1 ZCX.REDB.ZCXVL01.ROOT2 20190730T123523Z

3.5.1 1.7.1 HZDC7C0 oa58015

2 ZCX.REDB.ZCXVL01.CONF 20190905T184238Z

3 ZCX.REDB.ZCXVL01.SWAP1 20190905T184152Z

4 ZCX.REDB.ZCXVL01.DATA1 20190905T184156Z

5 ZCX.REDB.ZCXVL01.DLOG1 20190905T184159Z

Total number of disks: 5

Rollback service

There is one more step at the end of the upgrade workflow, named On error run this step to restore backed up files. This step is not meant to be run under normal circumstances. You would use this step to switch back to the backup version of the root file system in the following immediate circumstances only:

•The upgrade workflow had failed at some point. OR

•The instance does not come up with the new level.

In contrast, if problems arise later (after the workflow has already been finished) you can run the rollback workflow to restore the previous service level. The file path for the definition file of this workflow is as follows:

/usr/lpp/zcx_zos/workflows/rollback.xml

7.1.2 Maintenance for containers

Software that runs within the containers of zCX must be serviced separately from zCX itself. There are many sources for these containers, starting with images pulled from Docker Hub or the proprietary registries of any vendor and continuing up to containers built by your organization. For this reason, the procedures to install and activate service to the software of a container might vary.

Maintenance without Docker swarm

Applying maintenance to a container instance normally replaces the image that is used to start this container with a newer one.

The examples in this section show the upgrade from Grafana5.2.0-f2 to 5.2.0-f3. Grafana is a graphical data visualizer that we use to monitor our zCX instance in Chapter 7.6, “Configuring Grafana to monitor zCX containers” on page 166.

Notice that no persistent data is used for Grafana in Example 7-3 on page 144. This approach keeps the example simple and focused on the maintenance and not on setting up Grafana. If you have persistent data that is defined for your Grafana container, the docker run command reflects that. (For information on the use of persistent data, see Chapter 10, “Persistent data” on page 231.)

Example 7-3 Starting a Grafana container

VL1:/home/admin>docker run -d -p 3000:3000 --name grafana ibmcom/grafana-s390x:5.2.0-f2

d5bd946914840f7d8618c594f8676540b6e96b2ffacdd9598c9f0a6980897b0f

To update this container to a newer level, you must pull the newer version of the Docker image from your registry, by using the tag that points to the correct version. In Example 7-4 on page 144, we upgrade our Grafana from the previously installed version 5.2.0-f2 to the new version 5.2.0-f3.

Example 7-4 Pull a new version of Grafana from the registry

VL1:/home/admin>docker pull ibmcom/grafana-s390x:5.2.0-f3

5.2.0-f3: Pulling from ibmcom/grafana-s390x

b766debbc269: Pull complete

6a13c128fd9c: Pull complete

d6662734a61f: Pull complete

b6c6781ca03e: Pull complete

638b1f753742: Pull complete

Digest: sha256:752c2a42cec21c88c1821cd7afa9a6e593a5646ad035893adb0e03da350d2a0d

Status: Downloaded newer image for ibmcom/grafana-s390x:5.2.0-f3

Having done this, you must stop the running container, remove it, and run a container with the newer image version as shown in Example 7-5 on page 145.

Example 7-5 Stopping Grafana and restarting with new image level

VL1:/home/admin>docker stop grafana

grafana

VL1:/home/admin>docker rm grafana

grafana

VL1:/home/admin>docker run -d -p 3000:3000 --name grafana ibmcom/grafana-s390x:5.2.0-f3

08acc3fcd0f898f6b758072cdadcbcbdc0f364ddd91fb98a667c470b3caab2da

Any changes that have been done to the container are lost through the replacement. The changes must be done again to the new image.

Therefore, you should avoid applying customizations to a container within files of that container. If it is not possible or not desired to specify the needed configuration through parameters to the docker run command, consider the following approach:
Mount a Docker volume to that container to hold the configurations.
For more information on this approach, see Chapter 10, “Persistent data” on page 231.

Maintenance through the use of Docker swarm

If you run your Docker container as a Docker swarm service, you can use the swarm capabilities to upgrade your container to a new software level. The preceding point about changed files within the container also applies to maintenance, when you use Docker swarm.

For more details about setting up Docker swarm and a Docker swarm service see Chapter 11, “Swarm on zCX” on page 239. Example 7-6 uses the docker service create command to create a service that runs Grafana.

Example 7-6 Starting a Docker swarm service

VL1:/home/admin>docker service create --name grafana --replicas 1 -p 3000:3000 ibmcom/grafana-s390x:5.2.0-f2

xj4hbcpgcwp6hmoosirbdyrk5

overall progress: 1 out of 1 tasks

1/1: running [==================================================>]

verify: Service converged

VL1:/home/admin>docker service ps grafana

ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS

mqqiyxmgt8sr grafana.1 ibmcom/grafana-s390x:5.2.0-f2 sc74cn15.pbm.ihost.com Running Running 51 seconds ago

To update the level of a swarm service, you must first pull the image with the new level to the zCX instances where the service will be deployed as shown in the preceding section in Example 7-4 on page 144.

Then, you issue the docker service update command with the --image parameter. And you point to the new image level as shown in Example 7-7 on page 146.

Example 7-7 Update Docker service to new image level

VL1:/home/admin>docker service update --image ibmcom/grafana-s390x:5.2.0-f3 grafana

grafana

overall progress: 1 out of 1 tasks

1/1: running [==================================================>]

verify: Service converged

VL1:/home/admin>docker service ps grafana

ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS

8sqvw3k0mwt2 grafana.1 ibmcom/grafana-s390x:5.2.0-f3 sc74cn15.pbm.ihost.com Running Running about a minute ago

zy5tf3y49goz \_ grafana.1 ibmcom/grafana-s390x:5.2.0-f2 sc74cn15.pbm.ihost.com Shutdown Shutdown about a minute ago

nqfqtgij1a2v \_ grafana.1 ibmcom/grafana-s390x:5.2.0-f2 sc74cn15.pbm.ihost.com Shutdown Shutdown 2 minutes ago

mqqiyxmgt8sr \_ grafana.1 ibmcom/grafana-s390x:5.2.0-f2 sc74cn15.pbm.ihost.com Shutdown Shutdown 6 minutes ago

If your swarm service consists of multiple instances, you have options in the docker service update command to control the sequence, parallelism, and delay for the update of the single instances:

Table 7-1 Options to control docker service update

option	meaning
--update-delay duration	Delay between updates (ns\|us\|ms\|s\|m\|h)
--update-failure-action string	Action on update failure ("pause"\|"continue"\|"rollback")
--update-max-failure-ratio float	Failure rate to tolerate during an update
--update-monitor duration	Duration after each task update to monitor for failure (ns\|us\|ms\|s\|m\|h)
--update-order string	Update order ("start-first"\|"stop-first")
--update-parallelism uint	Maximum number of tasks updated simultaneously (0 to update all at once)

These options help you to ensure that only one instance is affected if an error occurs in the update process.

7.1.3 Building and maintaining your own image

In the prior examples, the Grafana image from Docker Hub was used directly. If you want to customize your Grafana image, you can easily build your own image based on the public image, and then add some configurations.

Creating the Dockerfile

The first step in building your own image is to create the Dockerfile within a separate directory. This directory must contain only the files that are used for the Docker image that you want to build.

Example 7-8 Creating a directory to build your own image

VL1:/home/admin>mkdir mygrafana

VL1:/home/admin>cd mygrafana

VL1:/home/admin/mygrafana>touch Dockerfile

VL1:/home/admin/mygrafana>vi Dockerfile

The last command of Example 7-8 starts the commonly used vi editor. For a first example, you can just put some environment variables into the file to control operation of Grafana. To enhance this example, we add a read.me file, which demonstrates how you can add file changes to a custom image. Example 7-9 shows an example of a Dockerfile to build a Grafana image, including an additional read.me file.

Example 7-9 Contents of Dockerfile for a first custom Grafana image

#Base Image is grafana

FROM ibmcom/grafana-s390x:5.2.0-f3

#Define Environment Variables

ENV GF_SERVER_ROOT_URL http://grafana.server.name

ENV GF_INSTALL_PLUGINS grafana-clock-panel,grafana-simple-json-datasource

#Insert a read.me file

COPY read.me /tmp/read.me

For the docker build command to succeed, the read.me file must exist in the mygrafana directory. Example 7-10 shows how to build the image.

Example 7-10 Building the custom Grafana image

VL1:/home/admin/mygrafana>ls -l

total 8

-rw-rw-r-- 1 admin admin 301 Sep 10 15:44 Dockerfile

-rw-rw-r-- 1 admin admin 96 Sep 10 15:53 read.me

VL1:/home/admin/mygrafana>docker build -t redb_grafana:1.1a .

Sending build context to Docker daemon 3.072kB

Step 1/4 : FROM ibmcom/grafana-s390x:5.2.0-f3

---> 2a9beae25fc2

Step 2/4 : ENV GF_SERVER_ROOT_URL http://grafana.server.name

---> Running in 345d40c0f526

Removing intermediate container 345d40c0f526

---> 8c1eef1e2f99

Step 3/4 : ENV GF_INSTALL_PLUGINS grafana-clock-panel,grafana-simple-json-datasource

---> Running in ec16249ded79

Removing intermediate container ec16249ded79

---> 873b60d2c00c

Step 4/4 : COPY read.me /tmp/read.me

---> 453eb921eb28

Successfully built 453eb921eb28

Successfully tagged redb_grafana:1.1a

VL1:/home/admin/mygrafana>docker images

REPOSITORY TAG IMAGE ID CREATED SIZE

redb_grafana 1.1a f21a2eb2caf5 2 hours ago 942MB

ibm_zcx_zos_cli_image latest 4f886cf032e4 4 days ago 363MB

To build an image with a newer level of Grafana, you must change the FROM statement of the Dockerfile to reflect the new version that you want to use. The same is true for changed environment variables or changes in the included files. You can then do a new docker build, creating a new tag, such as redb_grafana:1.2a.

Distributing and deploying the image

After building a new image, you can start your container by using the new image as shown in Example 7-11.

Example 7-11 Starting a container with a new image

VL1:/home/admin>docker run -d -p 3000:3000 --name grafana redb_grafana:1.1a

065501676a8e7cdbad7468cfab8b746774a6d9828e7841c9d9e573b320a47414

Likewise, you can use this image to create a Docker service as described before.

To have this image available for different zCX instances, it must be tagged and sent to your registry as shown in Example 7-12. Each connected zCX can pull the image from that registry. (For information on setting up a private registry, see Chapter 6, “Private registry implementation” on page 107.)

Example 7-12 Pushing your image to the registry

VL1:/home/admin/mygrafana>docker tag redb_grafana:1.1a sc74cn15.pbm.ihost.com:5000/redb_grafana:1.1a

VL1:/home/admin/mygrafana>docker push sc74cn15.pbm.ihost.com:5000/redb_grafana:1.1a

The push refers to repository [sc74cn15.pbm.ihost.com:5000/redb_grafana]

7d8ef5f54238: Pushed

d4f91fc67624: Pushed

a467a22dc7e9: Pushed

a56b05807b81: Pushed

ce7cd584af89: Pushed

cd2bab4d3640: Pushed

1.1a: digest: sha256:3ea5907c8a5f6ea8e78f6d51124a3813f8ece2e3bfc66996692fbdb1f5875ec7 size: 1790

If you have not set up a registry to which to push the image, you can still make the image available in a different zCX. To do so, you save the image to a tar file as shown in Example 7-13, transfer it to the target zCX, and load it there.

Example 7-13 Save your image to a tar file and send to a different zCX instance

VL1:/home/admin>docker save -o /tmp/rbg11a.tar redb_grafana:1.1a

VL1:/home/admin>scp -P 8022 /tmp/rbg11a.tar [email protected]:/tmp/rbg11a.tar

[email protected]'s password:

rbg11a.tar

To use scp in this way, you first create a user on the receiving zCX by using the adduser command.

On the target zCX instance, you can then load the image by using the docker load command as shown in Example 7-14 on page 149.

Example 7-14 Load your image from a tar file

VL2:/home/admin>docker load -i /tmp/rbg11a.tar

cd2bab4d3640: Loading layer [==================================================>] 240.3MB/240.3MB

ce7cd584af89: Loading layer [==================================================>] 114.6MB/114.6MB

a56b05807b81: Loading layer [==================================================>] 602.1MB/602.1MB

a467a22dc7e9: Loading layer [==================================================>] 4.608kB/4.608kB

d4f91fc67624: Loading layer [==================================================>] 37.04MB/37.04MB

a16efe2c6d33: Loading layer [==================================================>] 2.56kB/2.56kB

Loaded image: redb_grafana:1.1a

7.2 Automation

This section shows the required information for automating the zCX started task through some z/OS automation product like System Automation for z/OS.

Also, this section shows methods to automate the container workload that runs within a zCX. This topic lies outside the scope of z/OS automation.

7.2.1 Automating zCX instances

If you want to automate the operation of zCX, you must define the dependencies of zCX. You must also define dependencies for start and stop commands and the messages that indicate a successful start or stop to your automation product.

Starting zCX

zCX needs to read the start.json from the zFS file system, and it must connect to the TCPIP stack. So, these dependencies must be defined for starting zCX. OMVS and ZFS must be up and running, as does TCPIP.

To start the zCX instance, you use the start command that is provided by the provisioning workflow and which looks similar to Example 7-15:

Example 7-15 Starting a zCX instance

S GLZ,JOBNAME=ZCXVL01,CONF='/global/zcx/instances/ZCXVL01/start.json'

The instance has finished startup and is ready to work when the following message appears:

GLZB001I zCX instance ZCXVL01 initialization is complete. Code date 11/08/19.

Stopping zCX

To stop a zCX instance, you use the stop command that is provided by the provisioning workflow. Example 7-16 show a typical stop command.

Example 7-16 Stopping a zCX instance

P ZCXVL01

When zCX has successfully stopped, it issues the following message:

GLZB002I zCX instance ZCXVL01 has ended.

The shutdown of a zCX might take some minutes to complete. Therefore, automation should wait at least 3 minutes before it takes more aggressive actions to stop the instance.

For a shutdown, zCX signals the containers that are running inside zCX to also stop and gives them 10 seconds to stop, before they get the SIGTERM signal.

If zCX does not gracefully come to an end, you have the option to force it using the command as shown in Example 7-17 on page 150:

Example 7-17 Forcing a zCX instance

FORCE ZCXVL01,ARM

You know that the zCX instance is down in this case, when the following message appears:

IEF450I ZCXVL01 ZCXVL01 - ABEND=SA22 U0000 REASON=00000000 TIME=16.59.02

Moving zCX to a different system

Instances of zCX are not bound to a specific system. You can start them on any system that has these characteristics:

•Runs within the same sysplex.

•Fulfills the prerequisites to run zCX.

•Has access to the VSAM linear data sets of the zCX instance.

•Has access to the folder in the UNIX file system that contains the start.json for this instance.

When you move zCX instances from one system to another, be sure that this system has enough free memory to handle the additional fixed memory requirements for the zCX instance.

7.2.2 Automating containers

Using the --restart option of docker run

Docker gives you the ability to get containers started automatically after zCX completes the startup and after a failure. To achieve this, you use the --restart option of the docker run command as shown in Example 7-18 on page 150.

Example 7-18 Using the --restart option on docker run command

VL1:/home/admin>docker run -d -p 3000:3000 --name grafana --restart always redb_grafana:1.1a

418c8df46ec61c58b6137f5fda381b1822b9457ac259748ee92620a1a398bbe5

The keywords for the --restart option are shown in Table 7-2.

Table 7-2 Keywords for --restart option of docker run

Keyword	Meaning
no	No automated start of the container.
on-failure[:tries]	Restart the container after a failure. Try this for a maximum of the specified amount of tries or infinite, if no tries are specified.
unless-stopped	Restart the container after a failure and in addition at startup of zCX, if it was not in a stopped state when zCX was stopped for the last time.
always	Restart the container after a failure and in addition at startup of zCX, regardless of the container‘s state when zCX stopped for the last time.

Docker delays the restart of a failing container by 100ms. And Docker doubles this delay for each time the container fails. The delay is reset to 100ms when the container stays up for at least 10 seconds after a restart.

Using Docker swarm

With Docker swarm, you define a service that is running your container. Multiple instances of the container can run in parallel. Also, you can have multiple instances of zCX — even spread over different LPARs — to run your container instances.

To use Docker swarm, you first must set up the swarm, as described in Chapter 11, “Swarm on zCX” on page 239. When the swarm is set up, you can run a service in that swarm with the docker service create command as shown in Example 7-19.

Example 7-19 Create a docker service

VL1:/home/admin>docker service create -p 3000:3000 --name grafana redb_grafana:1.1a

image redb_grafana:1.1a could not be accessed on a registry to record

its digest. Each node will access redb_grafana:1.1a independently,

possibly leading to different nodes running different

versions of the image.

eg8w5ppyt2d1mmqj046w1bpu5

overall progress: 1 out of 1 tasks

1/1: running [==================================================>]

verify: Service converged

The preceding sample used a Docker swarm that is spread over two zCX instances with no private registry setup. For this reason, swarm issues a message that warns us that separate images are accessed on the different nodes of the swarm.

If your container uses Docker volumes that are mounted to it, spreading your service over multiple zCX instances does not work properly, because volumes cannot be shared across multiple zCX instances.

7.3 Backup and recovery

Table 7-3 describes the two approaches for backing up the data for your zCX environment.

Table 7-3 Options for backing up zCX data

Option	Details
Back up at the instance level.	You easily manage backup and recovery for a whole instance, but cannot recover single containers within that instance.
Back up at the container level.	This is a more complex way to back up the containers and their data: •You still must do the instance-level backup, in case you must recover zCX itself. •But to recover single containers without affecting the other containers that run in this zCX instance, you must have additional backups.

7.3.1 Backup and recovery on zCX instance level

The instance-level backup is a very convenient way to back up and recover a full zCX instance with all containers that run inside it. It is especially suitable to handle a zCX instance that matches one of these descriptions:

•Only one container runs in it. OR

•All containers that run in it belong to a single application.

Data belonging to a zCX instance

All data that makes up a zCX instance lives within the VSAM linear data sets that have been defined by the provisioning or the add-volume workflows. These VSAM linear data sets are the ones whose name starts with the high-level qualifier for this zCX. This qualifier is set during the provisioning workflow. Example 7-20 shows a list of these data sets.

Example 7-20 VSAM linear data sets that belong to a zCX instance

DSLIST - Data Sets Matching ZCX.REDB.ZCXVL01

Command ===>

Command - Enter "/" to select action

------------------------------------------------------

ZCX.REDB.ZCXVL01.CONF

ZCX.REDB.ZCXVL01.CONF.DATA

ZCX.REDB.ZCXVL01.DATA1

ZCX.REDB.ZCXVL01.DATA1.DATA

ZCX.REDB.ZCXVL01.DLOG1

ZCX.REDB.ZCXVL01.DLOG1.DATA

ZCX.REDB.ZCXVL01.ROOT

ZCX.REDB.ZCXVL01.ROOT.DATA

ZCX.REDB.ZCXVL01.SWAP1

ZCX.REDB.ZCXVL01.SWAP1.DATA

ZCX.REDB.ZCXVL01.ZFS

ZCX.REDB.ZCXVL01.ZFS.DATA

***************************** End of Data Set list

In addition, the zCX instance must know the directory locations of the following components:

•The UNIX file system where the start.json for this instance is stored.

•The properties file of the instance.

Except for the start.json file that is needed for the start of zCX, these files are not used at run time but rather for the z/OSMF workflows to maintain the zCX instance. Example 7-21 shows a list of these files.

Example 7-21 UNIX files and directories belonging to a zCX instance

Pathname . : /global/zcx/instances/ZCXVL01

EUID . . . : 311

Command Filename

-------------------------------------------

config

config.bkup

start.json

start.json.bkup

tmp

FFDC

ZCX-appliance-version

ZCX-ZCXVL01-user.properties

ZCX-ZCXVL01-user.properties.bkup

Pathname . : /global/zcx/cfg/properties

EUID . . . : 311

Command Filename

---------------------------------------------

ZCXVL01-user.properties

Be aware that there are techniques to attach external data to an application, for example, by using NFS or a database that is located outside of this zCX instance. This external data is not covered by the backup and recovery processes that are described here.

Backing up the data of a zCX instance

DFSMShsm is not able to back up all of the VSAM linear data sets that belong to your zCX instance while the instance is running, because the data sets are opened by zCX.

You can stop the instance to do the HSM backups, and then restart it. If this is not an option, you can backup the VSAM linear data set by using DFSMSdss as is shown in Example 7-22.

Example 7-22 Using DSS to manually backup the VSAM linear data sets for a zCX instance

//ZCXPRV8D JOB (ACCNT),LANGER,CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),

// REGION=32M

//BACKUP EXEC PGM=ADRDSSU

//BACKUP DD DISP=(,CATLG),DSN=ZCX.REDB.BACKUP.ZCXVL01,

// SPACE=(CYL,(5000,5000)),DSNTYPE=LARGE

//SYSPRINT DD SYSOUT=*

//SYSIN DD *

DUMP DS(INCLUDE(ZCX.REDB.ZCXVL01.*)) -

OUTDDNAME(BACKUP) TOL(ENQF) CC(PREFERRED) ZCOMPRESS(PREFERRED)

You can include the zFS data set that belongs to this zCX instance in this backup, too. But for this zFS, you also have the option of doing the backup by using your normal DFSMShsm procedures.

Be aware that the properties file of the instance might be located in a directory outside of the zFS that you are backing up. In this case, you must backup the properties file separately.

As of z/OS 2.4, you can use DFSMShsm to backup the UNIX files. The UNIX files don‘t contain data for the containers. Instead, they contain the configuration data for the z/OSMF workflows and the start script for zCX. There are no changes to these files through the zCX instance itself, except for the FFDC directory, where first failure capture data is written by zCX for specific error events. Example 7-23 demonstrates the use of DFSMShsm to back up the UNIX files.

Example 7-23 Using DFSMShsm to back up the UNIX files for a zCX instance

//ZCXPRV8L JOB (ACCNT),LANGER,CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1)

//USS EXEC PGM=BPXBATCH

//STDOUT DD SYSOUT=*

//STDERR DD SYSOUT=*

//STDIN DD DUMMY

//STDPARM DD *

hbackup '/global/zcx/instances/ZCXVL01/*';

hbackup '/global/zcx/cfg/properties/ZCXVL01-user.properties';

Recover a zCX instance

To recover a zCX instance, you must recover all the VSAM linear data sets of that instance. The instance must be stopped for a recovery.

You then recover the VSAM linear data sets from a backup that was created with DFSMSdss as shown in Example 7-24 on page 154. The backup of the VSAM linear data sets also included the zFS that contains the configuration information of the zCX instance. You must unmount this zFS before doing the recovery, if it is still mounted. Also, after recovery, you must mount the zFS again.

If the zFS of the instance is not part of the backup, you can skip the UNMOUNT and REMOUNT steps.

Example 7-24 Using DFSMSdss to recover the VSAM linear data sets of a zCX instance

//ZCXPRV8R JOB (ACCNT),LANGER,CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),

// REGION=32M

//UNMOUNT EXEC PGM=IKJEFT01

//SYSTSPRT DD SYSOUT=*

//SYSTSIN DD *

UNMOUNT FILESYSTEM('ZCX.REDB.ZCXVL01.ZFS')

//RECOVER EXEC PGM=ADRDSSU

//BACKUP DD DISP=SHR,DSN=ZCX.REDB.BACKUP.ZCXVL01

//SYSPRINT DD SYSOUT=*

//SYSIN DD *

RESTORE DS(INCLUDE(**)) INDD(BACKUP)

//REMOUNT EXEC PGM=IKJEFT01

//SYSTSPRT DD SYSOUT=*

//SYSTSIN DD *

MOUNT FILESYSTEM('ZCX.REDB.ZCXVL01.ZFS') TYPE(ZFS) MODE(RDWR) -

MOUNTPOINT('/global/zcx/instances/ZCXVL01') -

AUTOMOVE

Attention: Be aware that the backups were done on open VSAM linear data sets. When recovering the instance from these backups, there might be additional actions necessary to recover some of the containers data inside of the instance to a consistent point in time.

If the instance‘s zFS was not part of the restored data sets, you must recover those files separately. Again, be aware that the properties file of the instance might be located outside of the zFS that was backed up for this instance.

As of z/OS 2.4, you can use DFSMShsm to recover UNIX files on a file level as shown in Example 7-25.

Example 7-25 Using DFSMShsm to recover the UNIX files of a zCX instance

//ZCXPRV8R JOB (ACCNT),LANGER,CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1)

//RECOVER EXEC PGM=IKJEFT01

//SYSTSPRT DD SYSOUT=*

//SYSTSIN DD *

HRECOVER '/global/zcx/instances/ZCXVL01/*' REPLACE

HRECOVER '/global/zcx/cfg/properties/ZCXVL01-user.properties' REPLACE

After this recovery has finished, you can start the zCX instance again.

7.3.2 Backup and recovery on container level

Backup and recovery at the container level is a more complex approach, and it might vary depending on the type of application and the type of data that is used by the application.

You also must make sure that the image of the container itself is backed up. Normally, the image should also exist in your registry. If for some reason, you have images that are not pushed to a registry, you can save the image to an external file by using the docker save command as shown in Example 7-13 on page 148.

If the container application provides its own methods for backing up the application‘s data, it is preferable that you use these methods for backup and recovery.

In other cases, you have various options for backing up container data depending on where it resides.

Backing up application data that resides in the container file system

You can use the docker cp command to copy data between the container file system and the zCX file system. In this way, you can back up application data into a backup directory. Then, in a second step, you tar these backed-up files. Example 7-26 shows how to do this kind of backup.

Example 7-26 Backup container file system data by using docker cp and tar

VL1:/home/admin>mkdir smallapp.backup

VL1:/home/admin>docker cp smallapp:/var/appdata ./smallapp.backup/appdata

VL1:/home/admin>cd smallapp.backup

VL1:/home/admin/smallapp.backup>tar -cf backup.tar appdata

VL1:/home/admin/smallapp.backup>tar -tvf backup.tar

drwxr-xr-x admin/admin 0 2019-09-11 15:01 appdata/

-rw-r--r-- admin/admin 35330 2019-09-11 15:01 appdata/more.data

-rw-r--r-- admin/admin 184214 2019-09-11 15:01 appdata/smallapp.log

-rw-r--r-- admin/admin 9 2019-09-11 14:59 appdata/smallapp.pid

-rw-r--r-- admin/admin 10550 2019-09-11 15:00 appdata/userdata.db

When you must restore this data, you unpack the tar file and use the docker cp command to copy the data back into the container again as shown in Example 7-27.

Example 7-27 Recover container file system data by using tar and docker cp

VL1:/home/admin/smallapp.backup>tar -xf backup.tar

VL1:/home/admin/smallapp.backup>docker cp appdata smallapp:/var

Backing up application data that resides in a Docker volume

Instead of having the data inside your container, you can define a Docker volume and mount that to the containers directory that is to contain the data, as shown in Example 7-28.

Example 7-28 Using docker volume as persistent storage

VL1:/home/admin>docker volume create --name smallapp-appdata

smallapp-appdata

VL1:/home/admin>docker run -it --name smallapp -v smallapp-appdata:/var/appdata ubuntu

The advantage of using a Docker volume is the persistency of the data:

•Without a Docker volume: If for any reason the container is stopped and removed, data that you store in the file system of the container itself is lost.

•With a Docker volume: If you use a volume, the data resides outside the container and survives the stopping and removal of the container.

Backing up and recovering the data can be done in the same way as described in the preceding section.

7.4 Diagnosis

This section describes ways to discover problems in the scope of zCX and to identify the component that might be the source of the problem. The second part of this section describes the procedures to get problem determination data.

7.4.1 Ways to check healthiness of zCX

If you suspect something might be wrong with your zCX instance, there are some places to look first for possible causes.

SSH into zCX

Check whether you are still able to log on to the zCX by using ssh. If you are not able to ssh into the zCX, you must search for symptoms from the outside. In this case, you can jump to “Checking the zCX joblog” on page 158.

Checking TOP command

The UNIX top command shows some memory and cpu statistics and usage information per process. Figure 7-2 on page 156 shows a sample output of this command.

Figure 7-2 output of top command

The top lines about %Cpu, Memory, and Swap show values for the whole zCX instance, not for the specific container that you are in. In contrast, the process list shows the processes of the container that you are currently working in.

Note: You are already within the ibm_zcx_zos_cli container when you log in to zCX. The ibm_zcx_zos_cli container provides the shell functions for zCX.

The top command updates its display periodically. To stop the display, press control-C.

In the ssh CLI where you are logged on, you can enter the top command to get a first view of the zCX instance. You can then enter the top command for the different containers by using the docker exec command as shown in Example 7-29. Most containers should be able to run the command. There might be special containers for which the command is not implemented.

Example 7-29 Execute top command within a container

docker exec -it registry top

Eventually, you can identify a misbehaving process on this layer. Give special attention to the systems memory, and also swap usage as to the %CPU usage on the instance level. And of course, you must monitor the %CPU and %MEM values of the processes within the different containers.

Checking Docker stats

If your checks in the top display do not give any hints about a problem situation, you can enter a similar command for the Docker containers that are running. This is the command docker stats, which shows output that is similar to what you see in Example 30.

Example 30 Docker stats output

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS

52869f058a54 grafana 0.01% 20.93MiB / 1.675GiB 1.22% 4.34MB / 4.35MB 67.3MB / 200MB 17

6134d99615c6 prometheus 0.71% 23.45MiB / 1.675GiB 1.37% 2.44MB / 1.15MB 5.98MB / 8.19kB 16

2475f664e9aa cadvisor 9.79% 59.66MiB / 1.675GiB 3.48% 98.4MB / 1.71GB 104MB / 84MB 20

ddabf91531ac nodeexporter 0.00% 3.492MiB / 1.675GiB 0.20% 1.01kB / 0B 12MB / 0B 6

a0fb146c85ae registry 0.00% 5.504MiB / 1.675GiB 0.32% 3.74kB / 0B 111MB / 0B 13

cae0df807ac6 ibm_zcx_zos_cli 0.76% 27.42MiB / 1.675GiB 1.60% 534kB / 761kB 528MB / 4.1MB 19

This command shows you how the different containers behave in terms of cpu, memory, network, and disk activity. If you see unexpected numbers, you might dive deeper into the respective container activities.

Checking Docker logs

The docker logs command allows you to view the log of a container, as shown in Example 31 on page 158.

Example 31 Using docker logs command

VL2:/home/admin>docker logs --tail 10 grafana

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing InternalMetricsService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing AlertingService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing HTTPServer" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing CleanUpService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing NotificationService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing ProvisioningService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing RenderingService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing TracingService" logger=server

t=2019-09-11T19:41:47+0000 lvl=info msg="Initializing Stream Manager"

t=2019-09-11T19:41:47+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=0.0.0.0:3000 protocol=http subUrl= socket=

This information might reveal more details about problems that the container might have experienced. You can limit the output of the command by using the options in Table 7-4 on page 158.

Table 7-4 Options for the docker logs command

Option	Meaning
--tail amount	Shows the given number of log lines from the end of the log. Default is all.
--since when	Starts showing the log from a given point in time, either in the format yyyy-mm-ddThh:mm:ss or in relative format like 30m for 30 minutes.
--until when	Stops showing the log at the given point in time.

Checking the zCX joblog

The zCX started task writes its log information to SYSPRINT. For example, if there is an error in network communication, you might find it in this log. You might also see messages that show memory constraints here. You can find the SYSPRINT log by viewing the output of your zCX task while you use a spool display application like SDSF. You should also review the joblog for the zCX started task.

Checking CPU consumption of zCX

You can use monitoring tools like RMF to look at the CPU consumption of zCX on a started-task level. It might indicate whether the task is affected by other work within the system. For more information on using RMF, see Section 7.5, “Monitoring with RMF on zCX instance level” on page 160.

Checking system health

In RMF, you can also check for memory constraints in the system. At this time, checking the memory consumption of your zCX task itself has little troubleshooting value, because zCX allocates and fixes all its memory at startup time. Nonetheless, checking the system’s overall memory usage might be helpful. For example, if your system uses most of its memory before zCX gets into the system, starting a zCX instance might constrain the system further. In turn, that affects the performance of zCX.

Besides memory, you can check other metrics to confirm system health, such as CPU usage, zIIP usage, DASD response times, and network response times.

7.4.2 Gathering problem data for zCX

In case you must gather diagnosis data for deeper problem analysis, the following steps might help you to collect the needed diagnosis.

MODIFY instance,DISPLAY,DISKVER

Issue this system console command to get information about the currently active root version of the zCX instance.

Example 7-32 MODIFY instance,DISPLAY,DISKVER

F ZCXVL01,DISPLAY,DISKVER

GLZC008I Disk Version information for zCX instance ZCXVL01

DevNo Data Set Name Version

1 ZCX.REDB.ZCXVL01.ROOT 20190730T123523Z

3.5.1 1.7.1 HZDC7C0 oa58015

2 ZCX.REDB.ZCXVL01.CONF 20190911T181812Z

3 ZCX.REDB.ZCXVL01.SWAP1 20190905T174439Z

4 ZCX.REDB.ZCXVL01.DATA1 20190905T174442Z

5 ZCX.REDB.ZCXVL01.DLOG1 20190905T174446Z

Total number of disks: 5

MODIFY instance,DISPLAY,CONFIG

This system console command shows some information about the allocated memory and CPU, and also the location of the start.json for this zCX instance.

Example 7-33 MODIFY instance,DISPLAY,CONFIG

F ZCXVL01,DISPLAY,CONFIG

GLZC003I Configuration information for zCX instance ZCXVL01

File Path: /global/zcx/instances/ZCXVL01/start.json

FFDC Path: /global/zcx/instances/ZCXVL01/FFDC

Dump Path: /global/zcx/instances/ZCXVL01/FFDC/zcx-guest.dmp

Memory size: 2GB

Number of CPUs: 4

Number of Disks: 5

Number of Networks: 1

CTRACE Parmlib Member: CTIGLZ00

MODIFY instance,DISPLAY,DISK

This system console command shows the sizes of all file systems (backed by VSAM linear data sets) that are allocated to the zCX instance.

Example 7-34 MODIFY instance,DISPLAY,DISK

F ZCXVL01,DISPLAY,DISK

GLZC004I Disk information for zCX instance ZCXVL01

DevNo Size Encrypted? Data set Name

1 4GB No ZCX.REDB.ZCXVL01.ROOT

2 3MB No ZCX.REDB.ZCXVL01.CONF

3 2GB No ZCX.REDB.ZCXVL01.SWAP1

4 20GB No ZCX.REDB.ZCXVL01.DATA1

5 1001MB No ZCX.REDB.ZCXVL01.DLOG1

Total number of disks: 5

MODIFY instance,DISPLAY,NET

This system console command shows the configured IP address and MTU size for the zCX instance.

Example 7-35 MODIFY instance,DISPLAY,NET

F ZCXVL01,DISPLAY,NET

GLZC005I Network information for zCX instance ZCXVL01

DevNo Stack MTU IP Address

0 TCPIP 1492 129.40.23.82

Total number of networks: 1

MODIFY instance,DUMP,GUEST

This system console command writes a dump of the Linux guest of zCX to the instances directory in the z/OS UNIX file system.

Example 7-36 MODIFY instance,DUMP,GUEST

F ZCXVL01,DUMP,GUEST

GLZC010I A Dump Guest command for zCX instance ZCXVL01 has been accepted.

GLZC011I A complete dump of the guest memory for zCX

instance ZCXVL01 has been written to file

/global/zcx/instances/ZCXVL01/FFDC/zcx-guest.dmp

SVC dump

If the zCX task suffered from an abend and has written an SVC dump, this might also be a valuable source for problem diagnosis. If there is no dump available and you need a dump of the started task, you can request one by running the following command:

Example 7-37 Requesting an SVC dump

DUMP COMM='meaningful title for dump'

020 IEE094D SPECIFY OPERAND(S) FOR DUMP COMMAND

20,JOBNAME=ZCXVL01,END

IEE600I REPLY TO 020 IS;JOBNAME=ZCXVL01,END

IEA794I SVC DUMP HAS CAPTURED:

DUMPID=007 REQUESTED BY JOB (*MASTER*)

DUMP TITLE=meaningful title for dump

IEA611I COMPLETE DUMP ON DUMP.D190912.H19.SC74.#MASTER#.S00007

DUMPID=007 REQUESTED BY JOB (*MASTER*)

FOR ASIDS(0001,00B7)

INCIDENT TOKEN: PLEX75 SC74 09/12/2019 19:36:03

The IEA611I message tells you which data set your dump has been written to.

Joblog and Syslog

You should save the joblog of the zCX instance that you want to diagnose. Also, you should extract the syslog from the time that the problem occurred.

7.5 Monitoring with RMF on zCX instance level

Monitoring of the zCX environment consists of two areas. On the one hand, you can monitor zCX as a started task within z/OS, and on the other hand you can monitor the containers that run within that zCX instance.

The main metrics that you might want to monitor from the outside of zCX are the CPU consumption — CP and also zIIP — and the disk I/O rate and response times. The memory is not something that you would monitor for the zCX started task, because the whole memory of the task is fixed in real memory when zCX starts.

If RMF is available on your system, you can access it from the z/OS system programmer applications panel by using option 12 of the ISPF Primary Options Panel, which is shown in Example 7-38.

Example 7-38 ISPF Primary Options Panel

Menu Utilities Compilers Options Status Help

--------------------------------------------------------------------------

ISPF Primary Option Menu

Option ===> 12

0 Settings Terminal and user parameters User ID . : ZCXPRV8

1 View Display source data or listings Time. . . : 04:18

2 Edit Create or change source data Terminal. : 3278

3 Utilities Perform utility functions Screen. . : 1

4 Foreground Interactive language processing Language. : ENGLISH

5 Batch Submit job for language processing Appl ID . : PDF

6 Command Enter TSO or Workstation commands TSO logon : IKJACCT

7 Dialog Test Perform dialog testing TSO prefix: ZCXPRV8

9 IBM Products IBM program development products System ID : SC74

10 SCLM SW Configuration Library Manager MVS acct. : ACCNT#

11 Workplace ISPF Object/Action Workplace Release . : ISPF 7.4

12 z/OS System z/OS system programmer applications

13 z/OS User z/OS user applications

Enter X to Terminate using log/list defaults

After you navigate to the z/OS system programmer applications panel, select option 9, which is shown in Example 7-39 on page 161.

Example 7-39 ISPF z/OS System Programmer Primary Option Menu

z/OS System Programmer Primary Option Menu

Option ===> 9

1 GDDM PQM GDDM Print Queue Manager

2 HCD HCD I/O configuration

5 APPC Admin APPC Administration Dialog

6 WLM Work Load Manager

7 FFST FFST dump formatting

8 Infoprint Srv Infoprint Server

9 RMF RMF

10 SMP/E SMP/E

11 TCP/IP NPF TCP/IP NPF

Within RMF, we focus on the Monitor III, which can be selected with option 3 from within RMF. You can access complete documentation for the available reports within RMF through the following IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.4.0/com.ibm.zos.v2r4.erbb500/abstract.htm

Example 7-40 RMF Primary Panel

RMF - Performance Management z/OS V2R4 RMF

Selection ===> 3

Enter selection number or command on selection line.

1 Postprocessor Postprocessor reports for Monitor I, II, and III (PP)

2 Monitor II Snapshot reporting with Monitor II (M2)

3 Monitor III Interactive performance analysis with Monitor III (M3)

U USER User-written applications (add your own ...) (US)

R RMF SR Performance analysis with the Spreadsheet Reporter

N News What's new in z/OS V2R4 RMF

T TUTORIAL X EXIT

RMF Home Page: http://www.ibm.com/systems/z/os/zos/features/rmf/

5650-ZOS Copyright IBM Corp. 1994, 2019.

Licensed Materials - Property of IBM

Example 7-41 shows the primary menu for RMF Monitor III.

Example 7-41 RMF Monitor III main menu

RMF Monitor III Primary Menu z/OS V2R4 RMF

Selection ===>

Enter selection number or command on selection line.

S SYSPLEX Sysplex reports and Data Index (SP)

1 OVERVIEW WFEX, SYSINFO, and Detail reports (OV)

2 JOBS All information about job delays (JS)

3 RESOURCE Processor, Device, Enqueue, and Storage (RS)

4 SUBS Subsystem information for HSM, JES, and XCF (SUB)

U USER User-written reports (add your own ...) (US)

O OPTIONS T TUTORIAL X EXIT

5650-ZOS Copyright IBM Corp. 1986, 2019.

Licensed Materials - Property of IBM

7.5.1 RMF overview display

First, select the Overview menu and then Job Usage as shown in the following examples.

Example 7-42 RMF Overview Report Selection Menu

RMF Overview Report Selection Menu

Selection ===>

Enter selection number or command for desired report.

Basic Reports

1 WFEX Workflow/Exceptions (WE)

2 SYSINFO System information (SI)

3 CPC CPC capacity

Detail Reports

4 DELAY Delays (DLY)

4A USAGE Job Usage (USG)

5 GROUP Group response time breakdown (RT)

6 ENCLAVE Enclave resource consumption and delays (ENCL)

7 OPD OMVS process data

10 SPACEG Storage space (SPG)

11 SPACED Disk space (SPD)

12 LOCKSP Spin locks (LSP)

13 LOCKSU Suspend locks (LSU)

Example 7-43 RMF Job Usage

RMF V2R4 Job Oriented Usage Line 1 of 143

Command ===> Scroll ===> CSR

Samples: 100 System: SC74 Date: 09/15/19 Time: 20.15.00 Range: 100 Se

Service --- I/O --- --- CPU --- - Storage - ----- QScan ----

Jobname CX Class Conn EXCP Total TCB Total Fixed Total Resct Time

ZCXED01 SO SYSSTC 197.0 0.00 40.47 40.10 544K 527K 0 0.000 0

ZCXME02 SO SYSSTC 176.5 0.00 45.56 45.27 545K 527K 0 0.000 0

ZCXJN04 SO SYSSTC 169.9 0.00 45.32 44.97 544K 527K 0 0.000 0

ZCXZB01 SO SYSSTC 169.3 0.00 43.89 43.47 545K 527K 0 0.000 0

ZCXRJ02 SO SYSSTC 150.4 0.00 45.80 45.50 545K 527K 0 0.000 0

ZCXEM01 SO SYSSTC 135.0 0.07 46.44 46.06 544K 527K 0 0.000 0

ZCXZB02 SO SYSSTC 130.3 0.00 48.57 48.36 545K 527K 0 0.000 0

ZCXME01 SO SYSSTC 123.1 0.00 57.56 57.01 1072K 1053K 0 0.000 0

ZCXRJ01 SO SYSSTC 73.65 0.00 81.05 80.80 545K 527K 0 0.000 0

ZCXSM01 SO SYSSTC 62.87 0.04 44.92 44.53 545K 527K 0 0.000 0

XCFAS S SYSTEM 0.366 5.64 0.01 0.01 7822 2129 0 0.000 0

ZCXPRV8 T TSO1 0.347 5.72 0.04 0.04 161 23 1 0.000 241

ZCXVL01 SO SYSSTC 0.209 0.19 2.84 2.84 544K 527K 0 0.000 0

RMFGAT SO SYSSTC 0.124 0.08 0.16 0.16 29138 255 0 0.000 0

OMVS S SYSTEM 0.032 3.48 0.11 0.11 641K 5895 0 0.000 0

CATALOG S SYSTEM 0.021 0.60 0.01 0.01 4579 698 0 0.000 0

As you can see in Example 7-43, RMF shows information about disk I/O, CPU usage, memory usage, and GRS Queue Scans for each task in the system. You can use the F10 and F11 keys to scroll through the time intervals. This scrolling action is available in all of the RMF panels that show a time range in the heading.

The I/O column shows the cumulative time in seconds within the interval that the task was connected to a device. You can also see the average number of EXCPs (Execute Channel Programs) that were executed per second within the interval.

The CPU column shows the time in seconds that the task was using a processor in the interval. The first value is the sum of all processor usage. The second value shows only the processor time that was used by the problematic program itself. This display does not distinguish between zIIP and CP processors.

The storage column displays the number of memory frames (4K) that were allocated to the task and the amount of these frames that are fixed to main memory. For zCX, almost all memory is fixed, all the time. These values do not significantly vary over time.

The QScan column displays information about GRS queue scans in the interval that was done by the task. For zCX, this value is normally zero.

7.5.2 RMF CPC capacity

A second view that can you can choose from the RMF Overview Report Selection Menu is the CPC capacity. This report shows the processor usage of the different LPARs on the machine for the different processor types.

Example 7-44 RMF CPC Capacity

RMF V2R4 CPC Capacity Line 1 of 46

Command ===> Scroll ===> CSR

Samples: 100 System: SC74 Date: 09/16/19 Time: 04.35.00 Range: 100 Se

Partition: ARIES22 8561 Model 716

CPC Capacity: 2953 Weight % of Max: **** 4h Avg: 220 Group: N/A

Image Capacity: 1477 WLM Capping %: 0.0 4h Max: 261 Limit: N/A

MT Mode IIP: 2 Prod % IIP: 99.7 AbsMSUCap: N

Partition --- MSU --- Cap Proc Logical Util % - Physical Util % -

Def Act Def Num Effect Total LPAR Effect Total

*CP 46.0 0.8 62.6 63.4

ARIES02 0 1477 N N N 8.0 100 100 0.0 50.0 50.0

ARIES16 0 1 N N N 2.0 0.4 0.4 0.0 0.0 0.1

ARIES2B 0 98 N N N 8.0 6.7 6.9 0.1 3.3 3.5

ARIES21 0 2 N N N 2.0 0.6 0.6 0.0 0.1 0.1

ARIES22 0 250 N N N 4.0 33.8 34.0 0.0 8.5 8.5

ARIES23 0 7 N N N 4.0 0.9 1.0 0.0 0.2 0.3

ARIES24 0 4 N N N 4.0 0.5 0.5 0.0 0.1 0.1

ARIES25 0 4 N N N 4.0 0.5 0.6 0.0 0.1 0.1

ARIES26 0 2 N N N 2.0 0.5 0.6 0.0 0.1 0.1

ARIES27 0 2 N N N 2.0 0.7 0.7 0.0 0.1 0.1

ARIES28 0 1 N N N 4.0 0.1 0.2 0.0 0.0 0.0

ARIES29 0 1 N N N 2.0 0.2 0.3 0.0 0.0 0.0

PHYSICAL 0.5 0.5

*IFL 70.0 0.0 13.9 13.9

ARIES1B N N N 8.0 100 100 0.0 13.8 13.8

ARIES12 N N N 8.0 0.1 0.1 0.0 0.0 0.0

ARIES13 N N N 16.0 0.0 0.1 0.0 0.0 0.0

ARIES14 N N N 8.0 0.1 0.1 0.0 0.0 0.0

ARIES15 N N N 16.0 0.0 0.1 0.0 0.0 0.0

ARIES17 N N N 6.0 0.3 0.4 0.0 0.0 0.0

ARIES18 N N N 4.0 0.1 0.2 0.0 0.0 0.0

ARIES28 N N N 4.0 0.0 0.0 0.0 0.0 0.0

PHYSICAL 0.0 0.0

*ICF 8.0 0.1 0.0 0.1

ARIES2C N N N 2.0 0.0 0.0 0.0 0.0 0.0

ARIES2D N N N 2.0 0.0 0.0 0.0 0.0 0.0

ARIES2E N N N 2.0 0.0 0.0 0.0 0.0 0.0

ARIES2F N N N 2.0 0.0 0.0 0.0 0.0 0.0

PHYSICAL 0.1 0.1

*IIP 33.0 0.0 44.8 44.8

ARIES02 N N N 1.0 100 100 0.0 6.2 6.2

ARIES2B N N N 4.0 4.1 4.1 0.0 1.0 1.0

ARIES21 N N N 2.0 0.3 0.3 0.0 0.0 0.0

ARIES22 N N N 4.0 99.8 99.8 0.0 24.9 24.9

ARIES23 N N N 4.0 49.8 49.8 0.0 12.5 12.5

ARIES24 N N N 4.0 0.1 0.1 0.0 0.0 0.0

ARIES25 N N N 4.0 0.2 0.2 0.0 0.1 0.1

ARIES26 N N N 2.0 0.0 0.0 0.0 0.0 0.0

ARIES27 N N N 2.0 0.0 0.0 0.0 0.0 0.0

ARIES28 N N N 4.0 0.0 0.0 0.0 0.0 0.0

ARIES29 N N N 2.0 0.0 0.0 0.0 0.0 0.0

PHYSICAL 0.0 0.0

Example 7-44 shows one table for each type of processor. Each table starts with a summary line for that processor type where the processor type is preceded with an asterisk, like *CP. This line shows the number of processors of that type that are physically available in this machine and also the overall utilization of this processor pool.

After this summary line, one line per LPAR follows, for all LPARs for which at least one processor of this type is defined. In these LPAR statistics lines, you see the number of logical processors of that type that are defined to the LPAR. You also view their utilization as seen by the LPAR itself (logical utilization); and the percentage of the overall physical utilization.

If a processor type on an LPAR has a very high logical percentage, it means that the system is using nearly all processor resources that are defined to this system. If the physical percentage of a processor pool sum shows a very high percentage, this means that the machine has used almost all installed capacity of that processor type.

Very high percentages can cause poor responsiveness for your zCX instance. The zCX itself is mainly running on zIIP processors. Nonetheless, it depends on other components of the z/OS system that are running on CP, such as TCPIP.

7.5.3 RMF job information

To check performance of the zCX instance, navigate back to the RMF III main menu and choose option 2 (JOBS). You must enter the name of your zCX instance in the Jobname field before you select any options.

Example 7-45 RMF Job Report Selection Menu

RMF Job Report Selection Menu

Selection ===> 1

Enter selection number or command and jobname for desired job report.

Jobname ===> ZCXED01_

1 DEVJ Delay caused by devices (DVJ)

1A DSNJ .. Data set level (DSJ)

2 ENQJ Delay caused by ENQ (EJ)

3 HSMJ Delay caused by HSM (HJ)

4 JESJ Delay caused by JES (JJ)

5 JOB Delay caused by primary reason (DELAYJ)

6 MNTJ Delay caused by volume mount (MTJ)

7 MSGJ Delay caused by operator reply (MSJ)

8 PROCJ Delay caused by processor (PJ)

9 QSCJ Delay caused by QUIESCE via RESET command (QJ)

10 STORJ Delay caused by storage (SJ)

11 XCFJ Delay caused by XCF (XJ)

These reports can also be selected by placing the cursor on the

corresponding delay reason column of the DELAY or JOB reports and

pressing ENTER or by using the commands from any panel.

Selection 1 (DEVJ) shows the main device delay that is affecting the selected job. This information might be of greatest value for problem determination in zCX.

Example 7-46 RMF Job Delays (Device)

RMF V2R4 Job Delays Line 1 of 1

Command ===> Scroll ===> CSR

Samples: 100 System: SC74 Date: 09/15/19 Time: 20.15.00 Range: 100 Sec

Job: ZCXED01 Requested delay: Excessive pending time on volume BH5ZCD.

Probable causes: 1) Contention with another system for use of the volume.

2) Overutilized channel, control unit or head of string.

-------------------------- Volume BH5ZCD Device Data --------------------------

Number: 09959 Active: 82% Pending: 66% Average Users

Device: 33909 Connect: 16% Delay DB: 58% Delayed

Shared: Yes Disconnect: 0% Delay CM: 1% 0.3

PAV: 6.6H

--------------------------- Job Performance Summary ---------------------------

Service WFL -Using%- DLY IDL UKN ---- % Delayed for ---- Primary

CX ASID Class P Cr % PRC DEV % % % PRC DEV STR SUB OPR ENQ Reason

SO 0175 SYSSTC 1 82 53 45 21 0 20 13 8 0 0 0 0 ZCXRJ02

If the task is delayed for I/O, this display shows the most active device for the task and suggests possible reasons for the delay.

Selection 8 (PROCJ) provides a similar display for processor delays.

Example 7-47 RMF Job Delays (CPU)

RMF V2R4 Job Delays Line 1 of 1

Command ===> Scroll ===> CSR

Samples: 100 System: SC74 Date: 09/15/19 Time: 20.15.00 Range: 100 Sec

Job: ZCXED01 Primary delay: Job is waiting to use the processor.

Probable causes: 1) Higher priority work is using the system.

2) Improperly tuned dispatching priorities.

------------------------- Jobs Holding the Processor --------------------------

Job: ZCXRJ02 Job: ZCXSM01 Job: ZCXZB01

Holding: 11% Holding: 10% Holding: 10%

PROC Using: 67% PROC Using: 62% PROC Using: 69%

DEV Using: 31% DEV Using: 20% DEV Using: 40%

--------------------------- Job Performance Summary ---------------------------

Service WFL -Using%- DLY IDL UKN ---- % Delayed for ---- Primary

CX ASID Class P Cr % PRC DEV % % % PRC DEV STR SUB OPR ENQ Reason

SO 0175 SYSSTC 1 82 53 45 21 0 20 13 8 0 0 0 0 ZCXRJ02

If the task is delayed by processors, you can see which jobs are contributing to the delay.

7.6 Configuring Grafana to monitor zCX containers

One of the most common monitoring tools for Docker container environments is Prometheus, with visualization by Grafana Both cAdvisor and Node-Exporter are needed as data providers for Prometheus. Each of these four components runs in a separate container and play the following roles:

•Node-Exporter exposes metrics about the Linux operating system.

•cAdvisor exposes metrics about containers.

•Prometheus collects the data of the preceding components.

•Grafana visualizes data that it pulls from Prometheus.

The solution that is demonstrated in this section is designed for a single zCX instance. Therefore, you must repeat this installation for every zCX instance that you want to monitor. For ease of distribution to different zCX instances, consider pushing the images to a private registry. Detailed instructions on setup and use of a private registry can be found in Chapter 6, “Private registry implementation” on page 107.

See Section 5.2, “Get an image from Docker Hub” on page 96 for detailed instructions on how to get images into your zCX, such as the images that are mentioned in the following paragraphs.

7.6.1 Install Node-Exporter

To use Node-Exporter, you must build your own image, because there are different versions of Node-Exporter. The image ibmcom/node-exporter-s390x:v0.16.0-f3 available on Docker Hub is not compatible for the purpose that is needed here.

To build the node-exporter image, you start by creating a directory named nodeexporter and create a file named Dockerfile within that directory. The contents that are needed for this file are shown in Example 7-48.

Example 7-48 Dockerfile for Node-Exporter

########## Dockerfile for Node Exporter #########

# To build this image, from the directory containing this Dockerfile

# (assuming that the file is named Dockerfile):

# docker build -t <image_name> .

# Base Image

FROM s390x/ubuntu:18.04

RUN apt-get update && apt-get install -y prometheus-node-exporter

EXPOSE 9100

# attention: The following command is a very long line and not multiple lines

# please be sure to eliminate any new-line characters that might exist

# from cutting and pasting it

CMD prometheus-node-exporter --path.procfs="/host/proc" --path.sysfs="/host/sys" --collector.diskstats --collector.loadavg --collector.meminfo --collector.netdev --collector.netstat --collector.stat --collector.time --collector.uname --collector.vmstat --collector.filesystem.ignored-mount-points="^/(sys|proc|dev|host|etc)($$|/)"

# this is the first line after the CMD command

To build the image, navigate to the directory that contains the preceding Dockerfile, and run the following command. (Notice the final period, which is mandatory.):
docker build -t nodeexporter:latest .

Before you can start the Node-Exporter, you first must run the following command:
docker network create monitoring

You need to do this one time only. Subsequently, you can start Node-Exporter by using the following command:

docker run --name nodeexporter -v /proc:/host/proc:ro -v /sys:/host/sys:ro

-v /media:/rootfs/media:ro -v /etc/hostname:/etc/host_hostname:ro -p 9100:9100 -d

--network monitoring nodeexporter:latest

If Node-Exporter has started correctly, you should be able to navigate with a browser to your zCX-address port 9100 to see the page, as shown in Figure 7-3.

Figure 7-3 Node Exporter starting page

7.6.2 Install cAdvisor

To install cAdvisor, you can use the image ibmcom/cadvisor-s390x:0.33.0, which is available on Docker Hub. The command to start cAdvisor is as follows:

docker run -v /proc:/rootfs/proc:ro -v /media:/rootfs/media:ro

-v /var/run/docker.sock:/var/run/docker.sock:ro -v /sys:/sys:ro

-v /var/lib/docker/:/var/lib/docker:ro -v /dev/disk/:/dev/disk:ro

-p 8080:8080 -d --network monitoring --name=cadvisor ibmcom/cadvisor-s390x:0.33.0

You can verify that cAdvisor is running properly by navigating with your browser to the address of your zCX instance port 8080 to see the cAdvisor main page, as shown in Figure 7-4.

Figure 7-4 cAdvisor Entry Page

7.6.3 Install Prometheus

To install Prometheus, use the ibmcom/prometheus-s390x:v2.8.0 image, which is available on Docker Hub.

Prometheus needs configuration file settings to enable it to use the data providers node-exporter and cadvisor that you started before. The easiest way to accomplish this is to build your own image.

To build your own image of the Prometheus image first create a directory, for example, Prometheus, and then navigate into that directory. Next, create the configuration file named prometheus.yml in this directory by using the command touch prometheus.yml. Then, edit the file by using vi. The contents of the file are visible in Example 7-49. You can also download the file as instructed in Appendix A, “Obtaining the additional material” on page 251.

Example 7-49 file prometheus.yml

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. Default is every 1 minute.

# scrape_timeout is set to the global default (10s).

external_labels:

monitor: 'my-project'

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:

- targets: ['localhost:9090', 'cadvisor:8080', 'nodeexporter:9100']

Next, create a Dockerfile using the command touch Dockerfile and edit it using vi:

Example 7-50 Dockerfile for Prometheus

#Base Image is prometheus

FROM ibmcom/prometheus:v2.8.0

#Insert the config file

COPY prometheus.yml /etc/prometheus/prometheus.yml

Having done this, you can issue the following command to build the image:
docker build -t prometheus:v2.8.0 .
Be sure to include the dot at the end of the command.

Because Prometheus is the component that stores the monitored data, you should store this data to persistent storage by using a Docker volume. You can create a Docker volume by using the command docker volume create prometheus-data.

The command to start your Prometheus instance would then be the following:

docker run --name prometheus --network monitoring -v prometheus-data:/prometheus -p 9090:9090 -d prometheus:v2.8.0

To verify that Prometheus is up and running, point a browser to the zCX address through port 9090, as shown in Figure 7-5 on page 171.

Figure 7-5 Initial view of Prometheus

7.6.4 Installation of Grafana

You can install Grafana by using the image ibmcom/grafana:5.2.0-f3 available on Docker Hub.

Because Grafana stores several types of configuration data, consider using persistent storage to run the container. To achieve this, you create a Docker volume by using the following command:
docker volume create grafana-data

Mount this volume through the following run command for Grafana:
docker run --name grafana -d -p 3000:3000 -v grafana-data:/var/lib/grafana ibmcom/grafana:5.2.0-f3

To verify that Grafana is up and running, connect your browser to the zCX address on port 3000. If you used the default Grafana setup, you can log on with userid “admin“ and password “admin“. Grafana prompts you to change the password.

Figure 7-6 Grafana Logon Window

After you change the password, you are redirected to the home page of your Grafana server. You can set preferences for your user by hovering on the user button (see red arrow in Figure 7-7 on page 172) and selecting “Preferences“. From there, we changed the UI Theme from “dark“ to “light“ for better readability in the documentation.

Figure 7-7 Grafana home page

7.6.5 Adding Prometheus as data source to Grafana

To use Grafana for visualization of data that is collected by Prometheus, you must add your Prometheus instance as a data source to your Grafana server.

You have two options to get to the add datasource panel as shown in Figure 7-8 on page 173.

Figure 7-8 Grafana select adding a data source

In the panel for adding data sources, first enter the name that you want to give to your Prometheus data source. Then, select “Prometheus“ as type of the data source from the drop-down list as shown in Figure 7-9 on page 174. After you select the type Prometheus, you must enter the URL of your Prometheus instance, including the port number. You finish this panel by clicking save&test. You see a message telling that confirms that the data source is working.

Figure 7-9 Adding Prometheus data source to Grafana

7.6.6 Creating a first dashboard

To add your first dashboard, import the sample that is provided by the IBM zCX team. The file is named IBM zCX for z_OS Monitoring.json. You download this file to your workstation as part of the material that is documented in Appendix A, “Obtaining the additional material” on page 251.

Next, select Manage from the dashboard side menu as shown in Figure 7-10 on page 175.

Figure 7-10 Managing dashboards

Now, click Import in the upper right of the window as shown in Figure 7-11.

Figure 7-11 Select to import a dashboard

On the next screen, click Upload .json File as in Figure 7-12 on page 176. Then, select the downloaded JSON file on your workstation.

Figure 7-12 Upload .json File

The last step is to insert your Prometheus instance as data source and click Import.

Figure 7-13 Import IBM zCX for z/OS Monitoring.json

Now, the IBM zCX for z/OS Monitoring dashboard is displayed as an option in the list of available dashboards on the dashboard home page as shown in Figure 7-14 on page 177.

Figure 7-14 Grafana Dashboard Home Page

Click the dashboard title to open the dashboard.

7.7 Monitoring with Grafana on container level

Now that you have created your first dashboard, this section shows you how to adjust it to meet your needs and what the different graphs mean.

The actual dashboard that you see might differ from what is shown in this book, as there might be updates to the dashboard after the book has been published.

The following paragraphs are meant to give an impression of what this monitoring solution can do for you. At the upper left of each graph in the dashboard, you see a small info-icon. Hover on the icon to get a short description of what that graph is showing.

7.7.1 Adjusting the dashboard

When you first open the dashboard, it looks similar to what is seen in Figure 7-15 on page 178.

Figure 7-15 First view of IBM zCX for z/OS Monitoring dashboard

You can easily resize any tile in the dashboard by positioning your mouse over the small angle in the lower right and while pressing the left mouse button move your mouse. Accordingly, the lower left of the tile moves with the mouse pointer.

To move a tile on your dashboard, drag the tile by clicking and holding the title line of the tile. Then, drag the tile to the desired position. The other tiles of the dashboard reorganize themselves around the newly positioned tile.

When you are satisfied with the changes, you can save them by clicking the diskette icon at the top of the window.

Figure 7-16 on page 178 shows the points where you position the mouse to resize or move the tile.

Figure 7-16 Spots to resize and rearrange tiles on the dashboard

7.7.2 Instance level data

The first tiles on the dashboard shown in Figure 7-16 on page 178 show information on the zCX instance level as seen from within the zCX.

Appliance Uptime

The uptime tile just tells since how long the zCX instance is up and running.

Containers

The containers tile shows how many containers are currently running within this zCX instance.

Appliance CPU Utilization gauge

This tile tells the CPU percent utilized for available CPU’s, from the Linux perspective. To interpret this value, keep in mind that you have configured a number of virtual processors to the zCX instance. This does not mean that your zCX instance has exclusive access to these processors. It gets only a share of the processors that are available to z/OS, based on the decisions of the z/OS Workload Manager.

The share that z/OS gives to zCX is seen by zCX as 100% CPU and the percentage shown in the tile is relative to this share.

Appliance Used Memory gauge

This tile shows the amount of memory within the zCX instance that is currently in use for processes, as a percentage of all memory available to the zCX instance.

Appliance Used Swap Space gauge

This tile shows the percentage of swap memory that is in use.

Appliance Used Swap Space

This tile shows the absolute value for the swap space that is in use.

Appliance CPU Utilization over time

Figure 7-17 Appliance CPU Utilization over time

The graph for CPU utilization shows the CPU utilization over time distinguished between system and user usage.

Click the legend to the right of the graph on the words system or user to limit the display to that part of the CPU usage. Click the legend again, to show the full graph again.

Appliance Used Memory

Figure 7-18 Appliance Used Memory

This graph shows the memory usage of the zCX instance.

Appliance Network Traffic

The Network Traffic tile shows the amount of data that is sent and received over the network by the zCX instance. If you hover over the graph, it shows the numeric values for the point in time where your mouse is located, as shown in Figure 7-19.

Figure 7-19 Appliance Network Traffic

Because of the way Grafana formats this graph, sent bytes are shown as negative values. Actually, they are positive values.

Appliance Disk IO

Figure 7-20 Appliance Disk IO

These tiles show the sum of all disk I/O that the zCX instance does. These amounts are divided into read and write I/O.

Appliance Load Averages

Figure 7-21 Appliance Load Averages

This tile shows the average number of processes that are currently active in the upper row and the average number of processes that are waiting to use the processor in the bottom line.

The three columns display these averages taken over 1, 5, and 15 minutes.

7.7.3 Container level data

Container CPU Utilization %

Figure 7-22 Container CPU Utilization %

This graph shows the percentage of the available CPU that is used by the different containers that run within this zCX. You can click a container in the legend of the graph to show only the graph for this single container.

Container Memory Usage

Figure 7-23 Container Memory Usage

This graph is showing the amount of memory that is in use for each container. It shows the real memory footprint, excluding swapped out memory.

Received Network Traffic per Container

Figure 7-24 Received Network Traffic per Container

This graphic shows the amount of data received per second over the network by each container.

Sent Network Traffic per Container

Figure 7-25 Sent Network Traffic per Container

This graphic shows you the amount of data that is sent per second over the network by each container.

Bytes read per second for each Container

Figure 7-26 Bytes read per Second for each Container

This graph shows the amount of data that is read per second from the file system by each container.

Bytes written per Second by each Container

Figure 7-27 Bytes written per Second by each Container

This graph shows the bytes per second that are written to the file system by each container.

Used Memory for each Container

Figure 7-28 Used Memory for each Container

This table shows the current memory usage of each container as numeric values.

Memory Limit for each Container

Figure 7-29 Memory Limit for each Container

These two tables show the memory limits for each container.

The table Memory Limit for each Container shows the amount of memory that a container is limited to (for example, by the --memory option of a Docker command). If no limits apply to that container, the table shows the value 0 MB for this container.

The table Remaining memory % for each Container show the amount of memory that is still available to the container before it reaches the memory limit that is shown in the previously mentioned table. If no memory limit applies to the container, a hyphen (-) is shown in this table for the container.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7. Operation

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 7. Operation