Maintaining and managing IBM zAware
This chapter provides answers to many questions you might have about managing your IBM zAware environment.
The following topics are discussed:
Managing IBM zAware components
Starting and stopping the IBM zAware application
Managing IBM zAware disks
Managing connections from monitored clients
Maintaining IBM zAware data
Managing IBM zAware firmware
Disaster recovery considerations
Daily and weekly management tasks
Checklist for adding a monitored client
System Logger commands for IBM zAware support
Problem determination for IBM zAware
 
 
 
5.1 Managing IBM zAware components
For those with lengthy experience in mainframe environments, the concept of having a function that does not require significant customization or day-to-day management might seem novel. It might recall the similar challenge of giving up control of your dispatching priorities to Workload Manager when WLM compatibility mode support was removed.
However, that is the intended mode of operation for IBM zAware. IBM zAware might be considered as an appliance, wherein you simply focus on how to use it and all the details about how it works are hidden from you. Across the IT industry, this concept of “black box” appliances, in which only their externally visible behavior is considered and not their implementation or inner workings, is becoming more and more common.
To give you an idea of the differences between z/OS and IBM zAware, in terms of day-to-day management tasks, Table 5-1 provides several comparisons.
Table 5-1 Comparison of z/OS and IBM zAware management actions
Task/Function
z/OS
IBM zAware
Syslog
Must be archived and managed.
Log is not accessible to you. Archiving and retention is handled transparently.
EREP
Should be analyzed and archived.
N/A
SMF
Must be offloaded, analyzed, and retained for differing periods of time.
N/A
RMF
Performance must be monitored and managed.
Ensure that configuration matches IBM guidelines. No performance tools to be monitored.
Fixes and preventative service
You have to order and apply preventative service and monitor for and apply HIPER service.
Service is managed by IBM hardware engineer, like all other hardware service.
Database backups
You have to schedule and manage backups and drive recovery if necessary.
IBM zAware creates its own database backups as necessary and performs recoveries if necessary.
Compared to managing a z/OS environment, managing an IBM zAware environment is by design much simpler. Getting used to this “hands-off” paradigm can sometimes be a challenge for traditional mainframe staff.
Another concept to keep in mind is that IBM zAware is not a production application in the traditional sense. For example, it does not contain operational data, so you might even decide not to back it up. Users who are logged on to it are not able to update any data, so bringing the LPAR down while users are logged on does not raise any loss of data or loss of update issues. And because it can be brought down and back quickly, taking an outage to implement a change does not have the same level of impact that recycling a z/OS image would have. These are all concepts that might take some time to get comfortable with.
Having said that, there are things in an IBM zAware environment that you need to manage. And there are processes that are conceptually similar to z/OS, but the mechanics of how they are carried out are different. The objective of this chapter is to provide you with all the information you need to effectively manage your IBM zAware environment.
5.2 Starting and stopping the IBM zAware application
IBM zAware does not have a traditional operating system console. Instead, the only interfaces you have for managing it are the HMC and the IBM zAware GUI. Because IBM zAware is contained in a special purpose LPAR, starting it consists of activating the LPAR from the HMC. After the LPAR has been activated, the LPAR status on the HMC will change to Operating. However, it takes between three and ten minutes to initialize all the components inside the partition. At that point, you will be able to logon to the IBM zAware GUI, and any monitored clients will be able to re-establish their connections to the IBM zAware LPAR.
Similar to how the IBM zAware LPAR is started, you also use the HMC to stop it. The shutdown process is simple. It is not necessary to manually stop the connections from the monitored clients. The benefit of not stopping the connections from the monitored clients is that the connected System Logger will interpret the interrupt as a temporary event and will continue to accept messages, storing it in its buffers while it attempts to reestablish the connection to IBM zAware.
When System Logger is able to reconnect, it will send all the messages that were queued. The Analysis view will be updated with the messages that were issued during the period when the IBM zAware application was inactive.
If you quiesce the connections before you recycle the IBM zAware LPAR, System Logger will stop buffering messages, and the connections to IBM zAware would not be automatically re-established when it comes back up.
There is no mechanism for determining whether any users are currently logged on to IBM zAware. However, even if there are users logged on, they cannot be in the middle of any updates, and stopping and starting IBM zAware does not take long, so they will quickly be able to log back on again.
When you perform a Deactivate from the HMC, this results in a request being sent to the IBM zAware application to begin stopping all of its components. This will close the connections to all monitored clients, quiesce any database requests, and then shut down. This can take a few minutes, depending on the activity within IBM zAware at the time you issue the Deactivate request. When the deactivation completes, you will see a status of Success on the HMC.
Note that System Logger will only try to restart the connection for about 20 minutes. If it is not able to reconnect within that time, it will quiesce the connection and discard any messages that were queued in its buffers. Therefore, if you need to recycle the IBM zAware LPAR, try to ensure that it is stopped and restarted as quickly as possible.
 
Performing a planned shutdown of IBM zAware: It is common to restart z/OS LPARs by simply performing an Activate, without first performing a Deactivate. That is because you will have performed a planned shutdown of z/OS prior to performing the Activate.
However, the only way to perform a planned shutdown of IBM zAware is to perform a Deactivate from the HMC. If you perform an Activate without a preceding Deactivate, you introduce the risk of corrupting the IBM zAware database.
5.2.1 Starting and stopping the IBM zAware Analytics Engine
Among other things, the Analytics Engine manages the connections from the monitored clients. If you have a reason to temporarily close the connections from all monitored clients, you can stop the Analytics Engine from the System Status window as shown in Figure 5-1.
During the time the Analytics Engine is stopped, monitored clients cannot connect to the IBM zAware application, and no real-time data will be processed. When the Analytics Engine status is stopped, the connections for all monitored clients change to INACTIVE in the IBM zAware System Status window. On the z/OS end, the connection status changes to YES - CONNECTING.
Figure 5-1 Starting and stopping Analytics Engine
This has a number of benefits compared to stopping the connections from the z/OS end:
All connections can be stopped from a single place, rather than having to issue a command on every monitored client.
Because the monitored clients see this as a temporary error, they will continue to accept and buffer new message for later transmission to IBM zAware.
The connections can be re-enabled from a single place. Note that you cannot restart the connections from the IBM zAware. However, System Logger will automatically try to restart the connection for up to 20 minutes. If you restart the Analytics Engine within 20 minutes, the connections should all be restarted with no further manual intervention required.
Other than this situation of closing the connections from all monitored clients, we do not see any other case where you might need to stop the Analytics Engine. If you are recycling the IBM zAware LPAR, then performing the Deactivate from the HMC will stop the Analytics Engine and all other IBM zAware components. Therefore there is no need to manually stop it as part of shutting down the IBM zAware LPAR.
5.2.2 IBM zAware LPAR automation considerations
Most z/OS installations have developed automation routines to manage the shutdown and startup of their z/OS systems. Part of the reason for that is that the startup and shutdown typically involves tens or hundreds of started tasks and trying to control all of them manually is not a realistic option.
However, IBM zAware does not involve any of this complexity, so there is no need for automation to be involved in the startup or shutdown of an IBM zAware LPAR. IBM zAware does provide an API, and that API can be used for monitoring the health of the LPAR to some extent. But the API does not provide any functions to start or stop the IBM zAware LPAR, so there are no considerations for involving your z/OS automation tools in stopping or starting IBM zAware.
5.2.3 IBM zAware support for dynamic configuration changes
z/OS provides extensive support for dynamic configuration changes. For example, you can add and remove devices, configure channels offline and online, and add more general purpose or specialty engines, all without an IPL. The primary reason for this great flexibility is because of the stringent requirements for the highest levels of availability.
IBM zAware is not expected to have the same level of availability requirements that z/OS does. And the number of people using an IBM zAware LPAR at a given time is likely to be a small fraction of the number that are typically using a z/OS system.
For this reason, and also to keep IBM zAware as simple to operate as possible, most configuration changes (adding new network adapters, taking an adapter offline for service, adding engines, and so on) require that you deactivate the IBM zAware LPAR, make the required changes to the LPAR profile, and then reactivate the LPAR again. Because it only takes a few minutes to recycle the IBM zAware LPAR, this should not be a significant inconvenience. See “Adding disks to the IBM zAware LPAR” on page 161 for information about IBM zAware support for adding disks using Dynamic I/O Reconfiguration support.
5.3 Managing IBM zAware disks
The IBM zAware databases are kept on CKD DASD that you assign to the IBM zAware LPAR. This section discusses manual actions related to the IBM zAware disks that you might need to perform.
5.3.1 User ID authority requirements
All IBM zAware GUI user IDs have one of two levels of authority (known as “roles” in IBM zAware terminology):
USER
ADMIN
All disk management actions must be performed by a user ID defined with ADMIN authority. Attempts to access the disk administration functions from a user ID with insufficient authority will result in errors being returned to the user.
5.3.2 Adding disks to the IBM zAware configuration
All IBM zAware disk management functions (except for backing up the disks) are performed from the IBM zAware GUI. To access these functions, click Administration → Configuration → Data Storage. This will bring you to the panel shown in Figure 5-2.
Figure 5-2 Data Storage Devices list
The first list contains all devices that accessible to the IBM zAware LPAR. The list shows the device number, status, and device type of each device. The columns can be sorted by clicking their headings so you can group the devices by their status or device type, or list them by device number.
 
Use care when adding devices to avoid overwriting a device: If you did not specify an Explicit Device Candidate List in HCD, the list of devices is likely to include those that are in use by other LPARs.
Exercise caution when adding devices, because if you accidentally add a device that is in use by another LPAR, IBM zAware will overwrite the disk contents to create its file system.
Note that this list is only for information and status purposes. If you want to make a change to the configuration, you must click Add and Remove Devices. This results in the Add and Remove Devices page (shown in Figure 5-3) being displayed.
Figure 5-3 Add and Remove Devices page
This is the window to use to add or remove devices from IBM zAware. The left pane contains all the devices that are not currently in use by this LPAR. The devices in the right pane are those that are currently in use by this LPAR.
Follow these steps to add disks to the IBM zAware configuration (this will automatically add the IBM zAware file system to the device and remove any other data that might have been present on the device):
1. In the left pane, click each device you want to add to the IBM zAware file system.
Each device will become highlighted.
2. Click the Add button.
This will move those devices to the right pane. At this point nothing has been done to the devices.
3. Click OK.
 
Be aware: This step will erase any data on the device.
4. You will now be taken back to the previous window. If you sort the “Status” column, the devices that were just added will show as Being Added and then In Use when they have been initialized and the file system has been allocated and formatted.
After you add a device, the storage summary information near the top of the page will be updated. The Total storage used (%) value will decrease, because you have added more space. The Total capacity (GB) and Total storage used (GB) values will increase.
 
Volume serial number is changed when a disk is added: As part of the formatting that takes place when you add a disk to the IBM zAware configuration, the volume serial number will be changed to 0Xdddd, where dddd is the device number of that volume.
5.3.3 Removing disks from the IBM zAware configuration
The same panels that you use to add disks to IBM zAware are used to remove disks from it, as shown in Figure 5-4.
Figure 5-4 Add and Remove Devices window
Again, the devices that IBM zAware is not currently using are listed in the left pane, and the devices that are currently in use are contained in the right pane. When you indicate that a disk is to be removed from the IBM zAware configuration, any data on that disk is automatically moved by IBM zAware to another device within the file system. There is no data loss. To remove a disk, perform the following steps:
1. Click each device you want to remove from the IBM zAware file system.
Each device will become highlighted.
2. Click the Remove button.
This will remove the device from the right box titled “Devices In Use.” At this point nothing has been done to the device.
3. Click OK.
You will be taken back to the first window. The devices that you removed will now show a status of Pending Removal. The storage summary fields will temporarily be blanked out and the Add and Remove Devices and Apply Pending Removals buttons will both be grayed out. IBM zAware now identifies any used file space that must be moved from the device to the remaining devices. Depending on the amount of data to be moved, this might take a few minutes.
When that processing completes, the Apply Pending Removals button will become live.
4. Click Apply Pending Removals.
A warning pop-up message will appear, describing the process that is about to take place, and pointing out that it might take some time to complete. You must click OK for processing to proceed.
When the processing completes, the device status will change to Available, meaning that it is no longer part of the IBM zAware configuration.
Adding disks to the IBM zAware LPAR
IBM zAware is not able to initiate a dynamic I/O reconfiguration on the CPC that it is running on. However, if a z/OS or z/VM LPAR performs such a configuration, any disk devices that are added to the IBM zAware LPAR will automatically appear in the Devices Available list on the Data Storage tab on the IBM zAware GUI.
If you are running IBM zAware on a CPC that does not have any z/OS or z/VM LPARs, then you must perform a power-on reset (POR) of the CPC to pick up a new I/O configuration. Following the POR, the IBM zAware LPAR will be able to see the new full set of disk devices that are defined to it.
5.3.4 Backing up the IBM zAware file system
The IBM zAware file system contains information from the monitored clients. It is constantly being updated as new data arrives from the monitored clients. And it contains the information that is required for it to successfully analyze the message traffic on each system. Specifically, the file system contains the following information:
The model database, containing information about every monitored client
The instrumentation data database, containing the information that will be used to update the model database
Information about the sysplex topology
The XML files, containing the Analysis results from the past
The IBM zAware file system is not a published interface, and you generally do not require any knowledge or understanding of it. However, there is one aspect of how it works that is important for you to know. The bulk of the IBM zAware information is kept in the file system on the CKD disks. However, IBM zAware also keeps some information in the Support Element of the CPC that it is operating on. And there is information in the Support Element that must match information in the file system on the CKD disks.
One part of that information is the device number of the volume that each part of the file system resides on. If there is a mismatch between the information in the Support Element and the file system on disk, or if either of those is missing, IBM zAware will not use the file system on the affected volumes. This imposes several restrictions on you:
If you back up an IBM zAware disk, and then need to use that backup, it must be restored to a device with the same device number as the original volume.
If you try to move IBM zAware to another CPC, the information that resided in the Support Element on the original CPC will not reside in the Support Element on the new CPC, meaning that it will not be able to use the CKD disks.
If you mirror the CKD disks to another disk subsystem, IBM zAware will not be able to use the mirrored file system because the device number of the mirror will be different from the device number of the primary device.
This does not mean, however, that you cannot back up IBM zAware file system. It simply means that you need to keep these restrictions in mind when you are considering your backup options.
You have the following choices for handling a failure that impacts your IBM zAware file system:
Do not take any backups of the file system. If you lose the file system, you would use archived message data from the connected systems to rebuild the database.
Create full volume backups of the IBM zAware volumes from z/OS using a product such as DFSMSdss.
Create backups of the IBM zAware volumes using a product such as IBM FlashCopy®.
Because IBM zAware does not provide a facility to perform logical file system backups, any backup that you perform will have to be a physical backup, at the volume level. In the following sections we briefly discuss the considerations for each of these options.
Do not take any backups
If you choose not to back up the volumes, and a hardware problem that affects one or more volumes containing the IBM zAware file system partition occurs, you will effectively lose access to all the information in the file system.
In this case, you would have to remove all the devices from the IBM zAware configuration, and then add them back again, to reinitialize the database. You would then connect the monitored clients and perform a bulk load for each system. Next, you would go through the Assign process. Finally, you would train them all to create the new models. This effectively initializes the IBM zAware application, which will be able to continue with its analysis. However, in the Analysis view you would only see analysis results appearing after the new model was created. Remember that IBM zAware does not create Analysis results for data that is submitted using the bulk load process.
The benefit of this approach is that it is simple. There is no ongoing cost or overhead in creating backups that you might never need.
The disadvantages of this approach are:
If you have many monitored clients, having to run a bulk load on all of them is a laborious and time-consuming process.
You will lose all the Analysis results.
It is likely to be some time before you have IBM zAware running again.
Create full volume backups using DFSMSdss
The next option is to create full volume backups using a product like DFSMSdss. Given that the IBM zAware file system is probably spread across multiple volumes, you need to back up every volume at the same point in time. One way (although not a desirable way) to do this would be to deactivate the IBM zAware LPAR while the backup is running to ensure that no updates are taking place during the backup.
A better option would be to use the DFSMSdss FlashCopy Consistency Group feature. This allows you to take a consistent point-in-time backup across multiple volumes without having to quiesce activity to those volumes. Depending on your DASD subsystem, the use of FlashCopy places restrictions on the location of the source and target volumes.
Whichever option you select, it is vital that the source volumes are consistent for the backup. If you just perform a normal disk copy while the volume is being updated, it is likely that the resulting output volume will not be usable if you ever need to use it.
The benefits of this option (if you use the FlashCopy Consistency Group feature) are:
The backup will be consistent, meaning that there should be no database integrity issues if you need to use it.
It does not require manual intervention to stop IBM zAware from updating its disks.
The disadvantages is that the IBM zAware volumes must be online to the z/OS system.
If you do not want to dedicate another set of disks to act as the FlashCopy target, you can take full volume dumps to tape. However, the only way to achieve consistency across all IBM zAware volumes in that case is to deactivate the LPAR while the backups are running.
Create a copy using FlashCopy Consistency Groups
This option is similar to the DSS option, with the exception that the IBM zAware volumes do not need to be online to any z/OS system. Note, however, that they do need to be defined in the I/O configuration for the z/OS system that you will run the backup job on. Figure 5-5 shows an example configuration, where the source volumes (device numbers 9487-9489) are in the configuration of both the IBM zAware LPAR and the z/OS LPAR. The target volumes (device numbers 948A-948C), however, are only in the z/OS configuration.
Figure 5-5 IBM zAware devices defined to z/OS for backup
A sample set of JCL to create a consistent backup of three volumes is shown in Example 5-1.
Example 5-1 Sample job to take point in time FlashCopy of IBM zAware disks
//KYNEFFC JOB (0,0),'FC IBM ZAWARE',CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//*********************************************************************
//* ESTABLISH FLASHCOPY RELATIONSHIP *
//* SDEVN - SOURCE FLASHCOPY VOLUME *
//* TDEVN - TARGET FLASHCOPY VOLUME *
//*********************************************************************
//STEP1 EXEC PGM=IKJEFT01,REGION=256K
//SYSTSPRT DD SYSOUT=*
//SYSUADS DD DSN=SYS1.UADS,DISP=SHR
//SYSLBC DD DSN=SYS1.BRODCAST,DISP=SHR
//SYSTSIN DD *
FCESTABL SDEVN(X'9487') TDEVN(X'948A') ACTION(FREEZE)
FCESTABL SDEVN(X'9488') TDEVN(X'948B') ACTION(FREEZE)
FCESTABL SDEVN(X'9489') TDEVN(X'948C') ACTION(FREEZE)
//*
//*********************************************************************
//* WITHDRAW FLASHCOPY RELATIONSHIP *
//* DEVN - ANY FLASHCOPY VOLUME *
//*
//*********************************************************************
//STEP2 EXEC PGM=IKJEFT01,REGION=256K
//SYSTSPRT DD SYSOUT=*
//SYSUADS DD DSN=SYS1.UADS,DISP=SHR
//SYSLBC DD DSN=SYS1.BRODCAST,DISP=SHR
//SYSTSIN DD *
FCWITHDR DEVN(X'9487') ACTION(THAW)
/*
Restoring the IBM zAware file system from a backup
If you have made a full volume backup or a FlashCopy of the IBM zAware devices, and there was a problem with one IBM zAware device, then every device must be restored. This is because the file system spans across the multiple devices, and restoring an individual device will not restore the database.
During any restore process, the IBM zAware partition must be deactivated.
When restoring the volumes, ensure that each backup is restored to the same volume (same device number) that it resided on at the time of the backup. If you restore to different device numbers, the information in the file system will not match the device number that the file system resides on, and IBM zAware will not use that file system.
When restored, the information in the database will be current at the time of the backup. This means the days between the backup and the restore date will have no analysis results. This will show in the Analysis Results panel as bars with no anomaly score and no message IDs.
Remember that you always have the fallback of initializing a new IBM zAware database and restoring the contents using a bulk load, as described in “Do not take any backups” on page 162. This is not ideal, but it is a fallback if all else fails. However, for this to be an option, you need to ensure that archived message data is kept for at least 90 days on all monitored clients.
5.3.5 Sharing disks between IBM zAware LPARs
It is not possible to share IBM zAware disks between multiple IBM zAware LPARs.
Consider that you are currently running IBM zAware in LPAR A10. The information stored in the Support Element is specific to that IBM zAware LPAR. This means that if you initialize IBM zAware in a different LPAR, for example A20, that LPAR will not have any information about the disk volumes. If you try to add the disks that were being used by the A10 LPAR to the IBM zAware in LPAR A20, it will consider those to be new disks and will initialize them.
5.3.6 Backup IBM zAware LPARs
We cannot think of a reason why you would want to move IBM zAware from one LPAR to another LPAR on the same CPC. But if you do make such a move, be aware that you will effectively be starting with an empty set of databases. This is because IBM zAware in the target LPAR will initialize any disks that you add to it, as described in 5.3.5, “Sharing disks between IBM zAware LPARs” on page 164.
Also, you cannot use any backups to recreate the volumes because the information in the backups will be specific to the LPAR that created that file system.
Disaster recovery considerations for IBM zAware are discussed in 5.7, “Disaster recovery considerations” on page 178.
5.4 Managing connections from monitored clients
There must be a connection from each monitored client to the IBM zAware LPAR. In a sysplex, even though all the systems might be connected to the SYSPLEX.OPERLOG log stream, each system that is a monitored client of IBM zAware still needs its own connection to IBM zAware. For an example, consider the diagrams in Figure 5-6.
Figure 5-6 Connections to IBM zAware Server
Diagram A in Figure 5-6 shows two sysplexes in which all systems are monitored clients of IBM zAware and each system has a connection to IBM zAware. Diagram B shows that you can connect a subset of systems in a sysplex as monitored clients. This sysplex contains three test systems and two production systems, but only the production systems are to be monitored clients of IBM zAware. Therefore, only the two production systems in that sysplex have a connection to IBM zAware.
During normal processing, there should be no need to stop the connection between a monitored client and IBM zAware. If the connection goes down for some reason, System Logger will try for up to 20 minutes to reconnect. If the reason that the connection went down was that the IBM zAware LPAR reIPLed itself (perhaps after some service was applied), or because you deactivated and reactivated the LPAR, then it should be back before System Logger gives up trying to connect. During this 20-minute window, Logger will continue to accept new messages and save them in its buffer, so no message data will be lost.
If the 20 minutes expires and Logger is still unable to restart the connection, or if some other event causes it to give up, Logger will issue an IXG386I message with one of the following texts:
STATUS: IP ADDRESS RETRIEVE FAILED
STATUS: SOCKET CREATE FAILED
STATUS: SOCKET CONNECT FAILED
STATUS: SOCKET VALIDATION FAILED
It is advisable to implement some automation that would monitor for these messages and, at a minimum, update the status of the IBM zAware resource to reflect the fact that at least one system has lost connectivity.
Another potential reason for the connection between a monitored client and IBM zAware becoming disconnected is that the z/OS system might be IPLed. When z/OS is shut down, it will close its connection to IBM zAware, and any messages that are still in its buffer at that point will be purged.
After the IPL, the IBM zAware server is defined in the IXGCNFxx member and the required network connectivity is in place, so System Logger will attempt to restart the connection. In fact, it will even try to restart the connection if that connection was quiesced prior to the IPL. Depending on how long it takes to start OMVS and TCP/IP, you might see some IXC3xx messages, indicating that Logger is trying to start the connection to IBM zAware. It will continue to attempt to start the connection for 20 minutes or until it is successful, whichever occurs first. During this time, Logger will be accepting messages and storing them in its buffer, waiting for the connection to IBM zAware to come active. If the buffer fills up during this period, new messages are not added to the buffer.
Although this explanation covers most of the situations where the connection might be lost, it is possible that other events can cause the link to go down. To detect this situation, there are a number of things you can do:
Check the IBM zAware GUI System Status window. Investigate systems showing a status of Inactive that you expect to be active.
Check the IBM zAware GUI Analysis view. You would normally expect to see at least a small light blue bar for each system. If any system has an anomaly score of 0.0 and a Unique Msg IDs count of 0, that is an indication of a possible connection problem.
Issue a D LOGGER,C,LSN=SYSPLEX.OPERLOG,D command. The output will contain the ZAI CLIENT: field, which can have a value of:
 – YES - QUIESCED
 – YES - CONNECTING
 – YES - CONNECTED
If any of these tests indicate that the connection is unavailable, but both z/OS and the IBM zAware LPARs are available, try restarting the connection using the SETLOGR FORCE,ZAICONNECT,LSN=SYSPLEX.OPERLOG command. If this is not successful, refer to 5.11, “Problem determination for IBM zAware” on page 183 for information about troubleshooting connection problems. Appendix C, “Using automation to monitor IBM zAware connections” on page 219 contains a REXX exec that can be used by your z/OS system automation product to query the status of the IBM zAware connections.
 
Access to the SERVAUTH profile: If you have AT-TLS activated for Secure Sockets, you will need to give System Logger access to the SERVAUTH profile EZB.INITSTACK.sysname.tcpname in RACF. This will let Logger connect to the IP stack and open a socket connection with the IBM zAware server before AT-TLS becomes active. If you do not add access to this profile, the attempt to start the connection might fail during an IPL.
You might also consider using the IBM zAware Application Programming Interface (API) to check that IBM zAware is receiving and analyzing information for each interval. This offers the benefit of not requiring manual intervention unless a problem is detected. It also is likely to detect a potential problem in a more timely manner than manual checking. For information about using the API, refer to 6.3, “IBM Tivoli NetView for z/OS” on page 196.
If you have many monitored clients, monitoring the status of the connections from so many systems might be a challenge. The easiest way to view the connection status of all systems is to use the IBM zAware System Status window. Additionally, if you use your automation to consolidate information from across the enterprise, you might consider getting every system to forward the status of its IBM zAware connections to a focal point. That focal point would provide an at-a-glance status of the health of all IBM zAware connections.
Note that if you stop the Logger connection to the IBM zAware server using the SETLOGR FORCE,ZAIQUIESCE,LSN=SYSPLEX.OPERLOG command, any messages received for transmission to IBM zAware will not be sent, and any messages that are already in the buffer will be purged.
5.5 Maintaining IBM zAware data
Most of the management of the information in the IBM zAware database and file system is handled automatically by IBM zAware, based on the customization that you provide.
This section discusses how you get data into IBM zAware, what influence you have over database updates, and the considerations for how long to retain the data in IBM zAware.
5.5.1 Bulk Data Load Utility
The Bulk Data Load Utility reads data sets containing historical message data that has been extracted from syslog or OPERLOG. Each message that is read is then written to a log stream. That log stream has been defined with the ZAI parameters, indicating that all blocks written to it are to be forwarded to IBM zAware.
z/OS installations typically store their system log in one of these places:
In the OPERLOG log stream. The messages are kept in the log stream until they reach their retention period as specified in the log stream attributes.
In a set of sequential data sets that are extracted from OPERLOG using the IEAMDBLG program.
In the JES spool as SYSLOG files.
In a set of sequential data sets that are created from the syslog spool files.
Regardless of where you keep your message data, you need to prepare files that will be input to the Bulk Data Load Utility. The utility can handle both 2-digit year and 4-digit year format1, and files that contain carriage control characters or do not contain carriage control.
 
JES3 consideration: At the time of writing, the Bulk Data Load Utility does not support JES3 DLOG data sets.
If you have JES3 and want to prime the IBM zAware database with historical message data, you must use OPERLOG and extract the input files from OPERLOG.
The Bulk Data Load Utility consists of a batch job (provided in member AIZBLK in SYS1.SAMPLIB) that performs the following tasks:
Defines a model log stream.
Runs a REXX exec that reads the messages from the input data set and writes them to a temporary log stream. The REXX exec is provided in member AIZBLKE in SYS1.SAMPLIB.
Deletes the temporary log stream and the model log stream.
Copy the job JCL and the REXX exec to your own libraries in preparation for any customization. The JCL clearly documents the changes you need to make to the JCL.
The job can be run multiple times for each system or sysplex. Any duplicate message information will simply overlay the existing entries in the database, so you do not need to be concerned if you accidentally send the same data more than once.
5.5.2 Creating and updating the system model
Having bulk loaded your data into IBM zAware and assigned the data to the correct sysplex, the next step is to create the model database with the message information for your system. This is known as “training” the system. The training process is used both for loading the initial set of information about a system (known as priming the database) and for then updating the model with information from the messages that are received in real time.
It is vital to remember the objective of IBM zAware: to make you aware of differences between the current message behavior and what the system normally looks like. But anyone who has experience managing z/OS systems knows that there are many “normals”. For example, weekdays look different than weekends. Prime shift is different than night shift. Month-end might look different than the rest of the month.
Therefore, to build a model of what the system normally looks like, IBM zAware needs as much historical information (within reason) as possible. If it does not find sufficient message diversity and sufficient patterns (for example, IPLs, subsystems starting and stopping, the network coming up and down, and so forth), it will refuse to create a model of that system.
The alternative is that it will have an inaccurate picture of the normal activity on a system, and will flag normal messages as anomalies. If that were to happen people would learn to ignore the analysis provided by IBM zAware, and thereby potentially miss critical messages.
Initiating training
To initiate a manual training for a system, go to the Training Sets panel, as shown in Figure 5-7.
Figure 5-7 Training Sets panel
There will be a line for every system that has assigned data. To initiate training for a system, select the radio button beside the system and click the Actions drop-down. Within the list, select Request Training. The Training Progress field for that system will change to In Progress. To obtain an update, click Refresh. If the training attempt is successful, the Last Training Result field will change to Complete and the data and time in the Current Model Built field will reflect the current time and date (in local time).
If the training attempt fails, there are a number of things you must consider:
Did the bulk load job complete successfully? If not, address the problem, rerun the job, perform the Assign again, and then try training the system again.
Did the input file to the bulk load contain sufficient message data? If the file contained less than 90 days’ worth of data, try increasing the number of days in the input file and run the bulk load, Assign, and training again.
If the training still does not work, check the training period (see “Controlling training options” on page 170). Try increasing the value, disconnecting and reconnecting that system, and training the system again. Note that this option affects every system that is to be trained, so set it back to the previous value after this training attempt.
If the training attempt still fails, go to the Notifications panel, scroll to the end, and find the AIFTnnnn messages relating to the training attempt. You might see a message similar to that shown in Figure 5-8 on page 170. Refer to IBM System z Advanced Workload Analysis Reporter (IBM zAware) Guide, SC27-2623, for help in understanding the reason for the failure.
Figure 5-8 Notifications window showing training attempt
At this point, if the system still does not train successfully, one thing that you must consider is whether this system is a good candidate for monitoring with IBM zAware.
If IBM zAware is unable to find sufficient message diversity and repeating message patterns, the value of the analysis of the message activity on that system will be compromised.
The Message Analysis Program documented in Appendix A, “Syslog Message Analysis Program” on page 207, can be used to identify the number of unique messages in the input file and the average and peak message rates.
As discussed in 3.2, “Selecting which systems to monitor with IBM zAware” on page 96, you could “cheat” and run a program that will generate repeating patterns of message IDs. That might be successful in getting the training to complete successfully. However, you need to consider whether this will just result in inaccurate analysis results from IBM zAware for that system, because the underlying system message activity is still not well suited to IBM zAware.
If you believe that the input file you provided meets the criteria for successfully creating a model, take a nondisruptive dump of the IBM zAware LPAR as soon as the training attempt fails and send the dump to IBM. The process for doing this is described in 5.11.4, “Sending diagnostic information to IBM” on page 185.
Controlling training options
As discussed previously, IBM zAware has three logical databases:
Instrumentation data database
This database contains information about the data that was loaded using a bulk load, or transmitted in realtime. The information in this database is the input when you train a system.
Be aware that the full message text for each message is not stored in the database. An analysis is performed on the message data as it arrives, and the required information is extracted and saved in the instrumentation data database.
Model database
This database contains the models of normal message behavior for each monitored client.
The analysis results
The XML files that contain the information that is presented on the Analysis View window are kept in the IBM zAware file system.
You can change parameters for the amount of data that will be used to train a system. The parameters apply to all the systems defined to the IBM zAware application.
The retention period for the information in each of these three databases is controlled from the Administration  Configuration  Analytics page, as shown in Figure 5-9.
Figure 5-9 Configure Settings - Analytics page
In addition to the retention period for the three databases, there are also options to let you control the training period and the training interval. The considerations for these fields are explained here:
Instrumentation data retention time
Set this value to reflect the number of days that you believe are required to build an accurate model of the normal behavior for each system.
 
Note: The values on this window apply to all monitored systems. If some systems require higher values than other systems, use those higher values.
Training models retention time
This value determines the time period that is used when analyzing how anomalous a message is. If your systems’ message behavior is exactly the same every month, it is probably sufficient to keep a few months’ worth of data. If periods like year-end or quarter-end are different, set this to be a little over a year.
Analysis results retention time
This controls how far back in time you can go on the Analysis view and see the results for each interval. You need to decide how likely it is that you will want to go back and look at the message analysis. For example, you might want to look at the analysis for this time last year. Or, possibly, back to the last time you upgraded the operating system or a major subsystem.
Training period
This controls how many days of data, going backwards starting from yesterday, are used when a training activity is performed. Note that there is a relationship between this value and the Instrumentation data retention time value. There is no point in setting the Training period field to a value larger than the Instrumentation data retention time.
The Training period value should be at least as large as the Training interval value.
 
Changing the Training period value: If you change the Training period value, all systems must disconnect and reconnect to IBM zAware to pick up the new values.
Training interval
This interval determines how often IBM zAware will perform an automatic training for each system.
Give some consideration to the Training interval value. That is, if you decide to exclude message data for “bad” days from the training process, you might want to set the training interval to a larger value so that you do not have to define the excluded days to IBM zAware so frequently (this is discussed in “When to exclude a day from training” on page 173).
Alternatively, consider the case where you install a new product on the day after the last training, and that product produces a new set of messages. IBM zAware will mark those messages as anomalous until you next update the model database. You can initiate a manual training for that system, but if you rely solely on the automatic training to update the model, then you will have a period of time when the results from the IBM zAware analysis might be misleading.
 
Purge consideration: If you decrease any of the retention period values, the data will not actually be purged from the database until the daily database purge process runs.
Determining the Training period
The Training period you use might need to be changed, depending on the age of the data you use in the bulk load process. The last day of a Training period is always yesterday, and this affects how much of the bulk load data is eligible for inclusion in the model creation.
For example, if the day that you initiate the Training is May 15, and the bulk load data contained 90 days’ worth of data and you want to include all available instrumentation data in the Training, then the Training period value might need to change. The default Training period is 90 days. Because the training period ends with the day before training is requested and looks back the amount of days defined in the Training period, change the Training period value to 104 (and then disconnect and reconnect that monitored client) before initiating the training. This is shown in Figure 5-10.
Figure 5-10 Determining the Training period
To include all the data from a bulk load, you must know the first day of the data and subtract that from the current day to get the Training period. Remember that this Training period affects all systems that will be trained from that point.
When to exclude a day from training
IBM zAware learns what a system normally looks like from the message data that you feed to it. If an event occurs that IBM zAware has not seen before, or has not seen frequently, that message will be assigned a high anomaly score. But if a message occurs many times, IBM zAware will think that message is normal for this system.
To avoid having that occur, IBM zAware provides the ability to identify abnormal days for each system, so that message data from those days will be excluded when IBM zAware updates its model of that system.
If you decide to use this capability, navigate to the Training Sets panel. Select the system that you want to exclude some days for and click the Actions drop-down. In the drop-down, select Manage Model Dates. This presents you with the window shown in Figure 5-11.
Figure 5-11 Excluding dates from training
On this panel, select the anomalous days that should be excluded from the next training attempt for this system.
 
One day for one system: The granularity for excluding message data is one day for one system. That is, if the whole sysplex was experiencing a problem on a given day, and you do not want those messages becoming part of the model, you need to go through this process for each system in the sysplex.
It is not possible to exclude a time range less than one calendar day. And there is no mechanism to exclude only certain messages.
5.5.3 Moving systems between sysplexes in IBM zAware
Although it is not something that would occur often, IBM zAware does support the ability to move a system from one sysplex to another. When you move a system, IBM zAware changes the topology of the sysplex information contained in its database.
To move a system from one sysplex to another:
1. Go to the Administration  Configuration  Sysplex Topology window.
2. Select the system that you want to move.
3. Click Move Selected Systems.
4. Select the sysplex that you want to move the system to. Note that the target sysplex must have already connected to the IBM zAware LPAR.
5. Click OK.
After the move is successful, when you view the Analysis panel and look at the sysplex you moved the system to, the system will now appear grouped under that sysplex.
5.5.4 Data management granularity
The retention period controls that IBM zAware provides apply to all monitored clients. When the retention period is reached, any data older than that for all clients will be purged.
There is no ability to have different retention periods for different monitored clients. There is also no ability to delete data for a monitored client before the retention period expires. So, for example, if you use a test system to perform initial testing of IBM zAware, and then decide to only connect production and QA systems to IBM zAware, you cannot delete the data for the test system until it times out based on the retention values.
Similarly, IBM zAware does not provide an ability to export the information for a monitored client. You can certainly move a monitored client from one IBM zAware to another. However in the new IBM zAware, you need to perform a bulk load and start fresh, just as you did when you connected that system to the first IBM zAware.
5.6 Managing IBM zAware firmware
It is to be expected that fixes and enhancements will be provided for IBM zAware over time. You can manage IBM zAware firmware similar to the way you manage your CFCC code.
Determining current IBM zAware firmware level
If you have a problem with IBM zAware, or simply want to know the IBM zAware firmware level that you are currently using, you can retrieve this information from the Hardware Management Console (HMC).
Select the System Information option under Change Management, as shown in Figure 5-12.
Figure 5-12 Selecting System Information option
If the zAware feature has been ordered and installed on the zEC12, the Microcode Control Level (MCL) for the zAware firmware can be displayed on the resulting window, as shown in Figure 5-13.
In this example, the activated MCL level of the IBM zAware code is shown to be H09126 - 012.
Figure 5-13 Displaying IBM zAware firmware MCL level
Applying IBM zAware firmware updates
IBM zAware firmware updates (distributed by IBM as Microcode Control Levels or MCLs) can be retrieved onto the CPC Support Element non-disruptively. However, activating the updates will result in IBM zAware automatically reIPLing itself.
The process of activating new service on a CPC can vary from a few minutes to over an hour, and it is not possible to predict the exact point during that process when the IBM zAware MCLs will be activated. Your IBM Customer Engineers will tell you when they start the activate, and the point at which it has completed. The reIPL of IBM zAware, if it is needed, will occur during that interval.
When IBM zAware performs the IPL, the connections from all monitored systems will be reset. This results in messages similar to those shown in Example 5-2 on each monitored system.
Example 5-2 System Logger messages when IBM zAware reIPLs itself
IXG384I ZAI LOGSTREAM CLIENT ERROR OCCURRED 700
FOR LOGSTREAM SYSPLEX.OPERLOG
REASON: OMVS BPX-SERVICE ERROR - LOGGER WILL RETRY THE REQUEST,
DIAG=00000000
When the connection is reset in this manner, System Logger attempts to restart the connection for up to 20 minutes. If IBM zAware does not complete its reinitialization within this time, it will issue an IXG386I message and stop attempting to restart the connection.
Your z/OS automation product should be monitoring for that message and raise an alert that the operator should investigate the problem. Resolution might be as simple as issuing a SETLOGR FORCE,ZAICONNECT,LSN=SYSPLEX.OPERLOG command. If that is not successful, use the System Logger IXG3xx messages to determine the cause of the problem.
Moving to a new IBM zAware release
It is important to remember that IBM zAware is unlikely to be considered a mission-critical application. We expect that most customers will not have a test IBM zAware LPAR. Furthermore, because IBM zAware automatically reIPLs whenever it detects that an MCL or upgrade has been applied, it is not possible to run more than one level of IBM zAware concurrently on a CPC. Refer to 3.2, “Selecting which systems to monitor with IBM zAware” on page 96 for more considerations about the need for test IBM zAware LPARs.
When IBM zAware MCLs or new releases are installed, perform the normal testing to ensure that it is performing as expected. If it is not, evaluate whether the problem is significant enough to warrant backing out the new code. Keep in mind, however, that backing out the IBM zAware changes is likely to result in also backing out service to other CPC components.
5.7 Disaster recovery considerations
In addition to the license for a normal production IBM zAware, you can also license IBM zAware on a CPC that will be used in case of a disaster.
IBM zAware can be beneficial in detecting unusual messages during a disaster recovery (DR) test. One of the top priorities in a DR test is to ensure that everything is working the same in the disaster recovery site as it normally works. By spotting anomalous messages, IBM zAware can be a powerful part of your DR test tool kit.
In the case of a DR test or a real disaster, you would set up your IBM zAware LPAR and its disks and then perform bulk loads from the monitored systems that are part of your DR plan. In fact, the steps to get IBM zAware up and running are identical to when you initially installed and implemented it in your production environment.
At the time of writing, you cannot use mirrored IBM zAware disks in your DR site. In 5.3.4, “Backing up the IBM zAware file system” on page 161, we discuss the relationship between IBM zAware information that is stored on the CPC Support Element and its file system. Although it is possible to mirror the IBM zAware disks, when you add them to the IBM zAware in the DR site, it will view them as new disks and format them, thereby removing the data on them.
5.8 Daily and weekly management tasks
As you become more familiar with IBM zAware, you will probably develop your own list of management activities. This section provides suggestions that you can use to get your list started. You can add to and fine-tune these lists based on your own experiences.
5.8.1 Daily management tasks
As stated in 5.1, “Managing IBM zAware components” on page 154, IBM zAware is designed to be largely self-maintaining. However, there are a small number of daily tasks that should be carried out in relation to IBM zAware:
Using the Analysis view, check for any bars that contain an unusually high number of message IDs or that are dark blue, yellow, or orange. These are all indicators of anomalous message behavior and should be investigated.
While in the Analysis view, check that there is a bar for every monitored client for every interval. If any clients have intervals with no bars, the likely cause is that the client was IPLed/ However, check to ensure that this is the cause of any missing analysis data.
Use the System Status view to verify that all monitored clients have a Status of Active. Have automation in place on each monitored client to monitor the status of the connections. In addition, it is also prudent to take the time to ensure that the IBM zAware view of the status of each connection is consistent with the z/OS view.
Check the Notifications window for messages. It is most likely that any messages will be related to training attempts. If any messages indicate a failed training attempt or other problem with IBM zAware, follow the guidance for the message as documented in IBM System z Advanced Workload Analysis Reporter (IBM zAware) Guide, SC27-2623.
After you have addressed the messages, delete them.
5.8.2 Weekly management tasks
There is also a small number of tasks to perform less frequently.
Using the Data Storage tab in the Configure Settings window, check the Total Storage usage (%). If it exceeds a threshold (perhaps 80%), add more volumes to the IBM zAware configuration. And if you are backing up the IBM zAware volumes, remember to add the new volumes to your backup jobs.
If the IBM zAware disks become completely full, attempts to send more log data to that LPAR will fail. The monitored clients will present the following messages:
IXG372I ZAI LOGSTREAM CLIENT MANAGER ERROR
FOR LOGSTREAM SYSPLEX.OPERLOG
FUNCTION=BPX4AIO ERRNO=0000008C ERRNOJR=76697242
Note that this message is presented any time System Logger finds that IBM zAware will not accept message traffic, so this message does not necessarily mean that the IBM zAware disks are full. However, if the disks are full, you would see this message on all connected clients.
If you are backing up the IBM zAware disks, check the backup job to ensure it is completing successfully.
If you decide to follow a strategy of excluding abnormal days from training, use the Manage Model Dates function on the Training Sets window to provide IBM zAware with the list of dates that you want to exclude.
5.9 Checklist for adding a monitored client
As you become more familiar with IBM zAware, you will probably want to add more monitored clients. The process for this will vary from one enterprise to another, depending on your specific requirements. However, the following checklist can be helpful as a base to get you started with your own checklist:
Ensure that the system that you plan to add has at least 90 days of archived syslog or OPERLOG.
Ensure that the system has all necessary IBM zAware-related service applied.
If necessary, add the system IP address to the firewall that is protecting the IBM zAware LPAR.
Check that the IBM zAware file system has sufficient spare space for the additional data.
Verify that the IBM zAware LPAR has enough memory (see “Memory” on page 101 for more information about memory requirements for IBM zAware LPARs).
Update the z/OS system’s TCP/IP hosts file, if necessary.
Ensure that the z/OS LPAR is able to see the IBM zAware IP address.
Ensure that all required security accesses have been defined (see 3.4.4, “Security” on page 109 for more information).
Modify the OPERLOG log stream attributes to add the ZAI keywords if this has not already been done.
Update the IXGCNFxx member to point at the IBM zAware LPAR, and activate that member using the SET IXGCNF=xx command.
Remember to update the IEASYSxx member to explicitly point at the IXGCNFxx member.
Extract 90 days’ worth of syslog or OPERLOG into the file that you will input to the Bulk Data Load Utility job.
Issue the D LOGGER,STATUS,ZAI,VERIFY command to ensure that System Logger can connect to IBM zAware.
Issue the SETLOGR FORCE,ZAICONNECT,LSN=SYSPLEX.OPERLOG command to start the connection between System Logger and IBM zAware.
Submit the Bulk Data Load Utility job and ensure that it completes successfully.
Use the D LOGGER,C,LSN=SYSPLEX.OPERLOG,D command to monitor the amount of data that is queued in the buffers, waiting to be sent to IBM zAware.
When all data has been sent, logon to the IBM zAware GUI, navigate to the Priming Data tab, and assign the data that you just bulk loaded to the correct sysplex.
After a few minutes, check the System Status window to verify that all systems have connected again after the Assign. In particular, ensure that the new system that you are adding has reconnected.
Navigate to the Training Sets window and initiate training for your new system.
Update the automation on the new monitored client to monitor the status of the connections to the IBM zAware LPAR.
5.10 System Logger commands for IBM zAware support
The PTF to enable the System Logger support for IBM zAware added a number of new commands to System Logger. These commands can be used to stop and start the connection to IBM zAware, and to check the status of the connection. To be able to quickly and successfully manage your IBM zAware configuration and address any problems, become familiar with these commands.
The first command is the most basic one. It updates the information that System Logger gets from the IXGCNF Parmlib member. If you update that member, or are adding the IBM zAware information for the first time, you can activate it with the following command:
SET IXGCNF=xx
Be aware that if System Logger currently has an active connection to an IBM zAware LPAR, or is trying to activate such a connection, and the new IXGCNF member points to a different host name or IP address for the IBM zAware server, the attempt to activate the new member will be rejected. If you want to change the IP address or host name value in the IXGCNFxx member, you must first quiesce the connection to IBM zAware.
You can display the current IXGCNF settings using the following command:
D LOGGER,IXGCNF
Note that this will return information about aspects of System Logger other than simply the IBM zAware information.
To obtain somewhat more information about the IBM zAware support in System Logger, issue the following command:
D LOGGER,STATUS,ZAI
This command returns some of the same information as the D LOGGER,IXGCNF command. However, it also provides information about whether any log streams have the ZAI attribute, and whether those log streams are connected to this system.
If this system is a monitored client, the output from that command should contain ZAI LOGSTREAM CLIENTS: ACTIVE”. The output also provides information about buffer use if messages are queued waiting to be sent to IBM zAware. It is most likely that you will see non-zero buffer use numbers when you are running a bulk load job or if the connection to IBM zAware is temporarily unavailable.
To confirm that this system is able to successfully connect to the IBM zAware LPAR, use the following command:
D LOGGER,STATUS,ZAI,VERIFY
This will prompt Logger to initiate the connection to IBM zAware. If there is already an active connection, Logger will verify that the connection is operational. If Logger is unable to start the connection, refer to 5.11, “Problem determination for IBM zAware” on page 183 for help with determining the cause of the problem.
To verify that System Logger is currently connected to IBM zAware and confirm that messages are being successfully transmitted, use the following command:
D LOGGER,C,LSN=SYSPLEX.OPERLOG,D
The output from this command will include, among other things, the status of the ZAI client. For a monitored client, you normally expect this to show a status of CONNECTED. It also provides a count of the number of blocks that were sent to IBM zAware successfully, and the number of blocks that System Logger was unable to transmit.
Normally, System Logger will automatically start the connection to IBM zAware after an IPL, assuming that you provided an IXGCNFxx member with the required information. However, if you need to manually start the connection, you can do so using the following command:
SETLOGR FORCE,ZAICONNECT,LSN=SYSPLEX.OPERLOG
Note this command can be issued even if the connection to IBM zAware is already active. Additionally, this command does not cause System Logger to purge any messages from its buffers.
Finally, to stop the connection to IBM zAware for some reason, issue the following command:
SETLOGR FORCE,ZAIQUIESCE,LSN=SYSPLEX.OPERLOG
This command will cause the connection to go inactive. It will also purge any messages that are queued in the System Logger buffers, waiting to be sent to IBM zAware.
There is currently no command to cause System Logger to close the connection to IBM zAware, but to continue accepting messages into its buffers for eventual transmission to IBM zAware. However, shutting down the IBM zAware LPAR appears as a temporary connection error to System Logger, and it will continue to buffer messages to be sent over to IBM zAware. For this reason, there will probably be few cases where you would want to use this command. If you want to permanently stop a monitored client from sending data to IBM zAware, update the IXGCNFxx member used by that system to specify a SERVER value of NONE.
5.10.1 IBM zAware messages
The IBM zAware notifications window shown in Figure 5-14 provides information about various IBM zAware activities, particularly attempts to performing training for the monitored the systems. The messages are documented in Appendix D of IBM System z Advanced Workload Analysis Reporter (IBM zAware) Guide, SC27-2623. If you encounter a message that is not included in your level of that document, retrieve the most recent update from ResourceLink.
Figure 5-14 IBM zAware notifications window
You can see in the figure that some messages are informational (the ones ending in I), and some are error messages (the ones ending in E). For error messages that require action by IBM, IBM zAware attempts to gather the required documentation and automatically send it to IBM for analysis. The “response” section of the documentation for each message will indicate whether a user action is required, or if IBM zAware will automatically open an incident and send supporting documentation to IBM.
You can use the Hardware Messages for the CPC that the IBM zAware LPAR resides on to determine whether any calls to IBM were made by the CPC, as shown in Figure 5-15.
Figure 5-15 HMC Hardware Messages
5.11 Problem determination for IBM zAware
IBM zAware is designed to be a “closed application”. This means that IBM zAware does not have a console where you can issue commands or look at messages. For people who are used to working with z/OS, this is a different paradigm, and you might wonder how you can investigate perceived problems with IBM zAware or your configuration. There are facilities to help you perform some problem determination.
5.11.1 Problems during IBM zAware initial setup
The initial setup of IBM zAware is straightforward, with few things that can go wrong. In our case, the only challenges we encountered were related to connecting to the IBM zAware LPAR, either from a web browser or from our z/OS systems.
When you perform an Activate against the IBM zAware LPAR, the Activate will probably run for longer than you might be used to with z/OS. In our case, the activation took about 3 minutes. At that point, the LPAR status on the HMC changed to Operating. We took this to mean that IBM zAware was ready for use and immediately tried logging on.
However, after the LPAR completes initializing, the application server still has additional work to do before it is ready to accept browser sessions. Wait a few minutes after you see the Operating status before you attempt to start a session with IBM zAware.
If the IBM zAware initialization fails for some reason, the status will be reflected on the HMC. If you are unable to determine the reason for the failure, open an incident with IBM. See 5.11.4, “Sending diagnostic information to IBM” on page 185, for information about reporting IBM zAware problems to IBM.
5.11.2 Confirming connectivity to IBM zAware
A few minutes after the status of the IBM zAware LPAR changes to Operating, it should be possible to logon to the IBM zAware GUI. Point your browser at the URL that represents the IBM zAware LPAR, remembering to include “/zAware” after the IP address or domain name. Note that the “/zAware” is case sensitive.
If the browser is unable to connect to IBM zAware, try pinging the IBM zAware LPAR. If the ping is unsuccessful, issue a TRACERT command (if using Windows), or traceroute (if using Linux) and bring the output to your network team.
To verify that you can connect to IBM zAware from System Logger, issue the D LOGGER,STATUS,ZAI,VERIFY command. If the command fails, issue a PING command from that z/OS system to ensure that it has network connectivity to the IBM zAware LPAR. If the PING is successful, check the return and reason codes from the IXG387I message.
Assuming that the required network connectivity is in place (and that there are no firewall issues), two possible reasons for System Logger being unable connect to IBM zAware are:
No disks have yet been added to the IBM zAware LPAR. Because there is nowhere to store the messages that would be sent over from System Logger, IBM zAware will not allow the connection to be started. In this case, logon to the IBM zAware GUI and add some disks.
The analytics engine on IBM zAware has been stopped. If the engine is stopped, System Logger will not be able to connect successfully and you will observe messages similar to those shown in Example 5-3.
Example 5-3 Attempting to start connection when analytics engine is stopped
IXG372I ZAI LOGSTREAM CLIENT MANAGER ERROR 396
FOR LOGSTREAM SYSPLEX.OPERLOG
FUNCTION=BPX1CON ERRNO=00000468 ERRNOJR=76630291
5.11.3 Checking for problem notifications on the IBM zAware GUI
Certain events can cause a notification to be generated on the IBM zAware GUI. If there are notifications that have not been removed yet, there will be a small lightning symbol near the top right of the GUI and the Notifications window will show the messages, as shown in Figure 5-16.
Figure 5-16 IBM zAware Notifications window
Refer to Appendix D of IBM System z Advanced Workload Analysis Reporter (IBM zAware) Guide, SC27-2623, for information about the error and the recommended action. If the message is not yet documented in that book, open an incident with IBM.
5.11.4 Sending diagnostic information to IBM
IBM zAware is designed to automatically gather diagnostic information and Call Home to IBM when certain problems are encountered. Assuming that Call Home is enabled on your CPC, this should result in the IBM support center contacting you to obtain more information about the event. To determine whether a Call Home was generated for a given call, use the HMC to check whether there are hardware messages outstanding against the CPC that IBM zAware runs on. The hardware message will identify the IBM zAware LPAR name and the date and time that the problem was encountered and reported to IBM.
If the problem did not result in a Call Home, or the problem is such that IBM zAware is not aware of the problem, take a dump of the IBM zAware LPAR and submit it to IBM. Follow these steps:
Take an LPAR dump of the IBM zAware LPAR.
Logon to the Support Element of the CPC where IBM zAware is running. Expand the list of partitions and select the IBM zAware LPAR. Then expand the Service option in the bottom half pane of the window, as shown in Figure 5-17.
Figure 5-17 Navigating to LPAR dump option
Click Dump LPAR Data. You will be presented with the window shown in Figure 5-18.
Figure 5-18 LPAR Dump options
Ensure that the Not disruptive option is selected and click OK. The dump might take some time. When the dump completes, click OK.
Deselect the IBM zAware LPAR and select the CPC instead. The bottom pane will now change and look similar to that shown in Figure 5-19.
Figure 5-19 Opening a PMV
Click Report a Problem in the Service drop-down. This will display the selection shown in Figure 5-20 on page 187.
Figure 5-20 Reporting a PMV
Select the Type V Viewable PMH(PMV) option. Enter descriptive text in the Problem Description box. Then click Request Service.
You will be presented with a window requesting contact information. Complete the information and click Request Service.
Depending on the RSF profile settings for the system, you might be required to authorize the service request through the hardware messages window. If you are, go to the Hardware Messages icon. There should be an entry with a Message Text that says Problem reported by customer. Select that entry and click Details. View the details to ensure this is the PMV that you opened, and then click Request Service as shown in Figure 5-21.
Figure 5-21 Authorizing service request
You are presented with a window confirming your contact information. Click Request Service.
Allow a little time for the PMV to be processed in IBM. Then go back to the Support Element window. In the Service drop-down, click View Service History as shown in Figure 5-22 on page 188.
Figure 5-22 Retrieving the PMV number
You will be presented with a window showing the service history for the CPC. Your call should be at the top of the list. Select your call and click the View drop-down. Then click Problem Summary. You will be presented with a window similar to that shown in Figure 5-23. Record the PMH number and click OK.
Figure 5-23 Viewing the Problem Summary
This will bring you back to the Service History window. Close that window.
In the Support Element window, select Transmit Service Data under the Service list. You will be presented with a window similar to that shown in Figure 5-24 on page 189.
Figure 5-24 Transmitting service data to IBM
Select the CPC firmware embedded framework dump data option. The right side of the window will change, and you will be presented with a place to enter the PMH number. Click Send.
You will be presented with a window confirming that the information will be sent to IBM. Click OK.
Remember to log off the Support Element when you are finished.

1 The year format (2-digit or 4-digit) is controlled by the HARDCOPY HCFORMAT keyword in the CONSOLxx member of Parmlib.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.178.181