Troubleshooting performance issues

Troubleshooting performance issues is a rather vast and complicated area, which we will only be able to touch upon lightly. In general, performance becomes a concern when a transaction is unable to be executed within a reasonable or expected time or when the system is unable to process expected volumes. Sometimes, this may be specific to a single transaction; other times, it may be something that affects every composite instance running on the server.

Server-wide performance issues

When server-wide performance problems occur, they impact most or all of the transactions currently being executed on your infrastructure. For example, if the dehydration store database is down, this will definitely have an impact on all the running instances.

Consider asking yourself a few questions to determine whether there is an overarching infrastructure problem:

  • Is logging in to Oracle Enterprise Manager Fusion Middleware Control extremely slow?
  • Are all composite instances getting completed in an unusually long period of time?
  • Are the logs or your dehydration database growing unusually quickly?
  • Are you seeing an exceptionally high number of errors in the logs?

These are usually indications that there may be a server-wide performance problem. Performance issues impacting the entire server are usually related to either one or a few specific issues. In this case, take a look at the following:

  • Check available disk space: Lack of available disk space is known to have an adverse effect on an instance execution. Sometimes, this is due to excessive logging that fills up the disk or logs not being rotated.

    The following code is an example of how to view the available disk space. Here, the /u01 mount point on which our software is installed has 10.8 GB available, so space is not an issue:

    oracle@soahost1:/home/oracle> df -m
    Filesystem  1M-blocks  Used Available Use% Mounted on
    /dev/sda6         730   387       306  56% /
    /dev/sda7         244    35       197  15% /home
    /dev/sda5         996   804       141  86% /var
    /dev/sda3        1984    38      1844   2% /tmp
    /dev/sda2        4841  3514      1077  77% /usr
    /dev/sda1          99    12        83  13% /boot
    /dev/sdb        20159  8267     10869  44% /u01
    tmpfs            3992     0      3992   0% /dev/shm
    soadb.packt.com:/u01/share
                    40318     37175      1095  98%
  • Check CPU, memory, and I/O utilization: Some processes can hog the CPU, be it a backup job, bug in AdminServer, unauthorized code deployments, or your monitoring tool agent. For example, some intelligent monitoring agents, such as the Oracle Enterprise Manager Cloud Control Agent, perform constant queries against the database to retrieve metrics and statistics, which may result in undue burden on your environment. Also, ensure that there is sufficient memory available and that SWAP space (Unix) or virtual memory (Windows) is not actively being used.

    By reviewing the output of the vmstat Linux command in the following example, we see that SWAP utilization is 0 MB (good!), the amount of free memory is 87 MB (keep an eye on it!), the number of bytes in and bytes out at this point in time (to measure I/O) is 6 and 49, respectively (not bad!), and the CPU is 94 percent idle (no problem here!):

    oracle@soahost1:/home/oracle> vmstat -S m
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     0  0   1095     87    153    454    0    0     6    49   26   34  4  1 94  2  0
  • Check operating system resources and logs: For example, in Linux, the /var/log/messages log reveals critical events at the operating system level. This log file can reveal errors that may have a direct impact on the stability, behavior, and/or performance of SOA Infrastructure. In the following example, it appears that we have exceeded certain ulimit resources:
    root@soahost1:/root> cat /var/log/messages
    Aug 31 20:53:22 uslx286 sshd[22480]: fatal: setresuid 10000: Resource temporarily unavailable

    The lsof command in Linux can list the number of open files. Too many may exhaust operating system resources:

    root@soahost1:/root> lsof | wc -l
    6064

    The ps command in Linux can list the number of running processes. Ultimately, too many may also exhaust operating system resources:

    root@soahost1:/root> ps -A | wc -l
    297
  • Check JVM available memory and frequency of full garbage collection: Not enough available heap may result in excessive garbage collection, or depending on the garbage collection algorithm being used, increased pause time may occur, which could also result in higher CPU utilization. We recommend you review the Oracle Fusion Middleware Performance and Tuning for Oracle WebLogic Server 12c Release 1 documentation at http://docs.oracle.com/cd/E24329_01/web.1211/e24390/wls_tuning.htm#PERFM173 for details regarding additional JVM tuning.
  • Check connection pools: When connection pools are exhausted, this may lead to transactional failures. However, the errors are apparent in both the logs and in the Flow Trace of the composite instances.
  • Check database performance: Often under heavy load, a larger number of writes to the database are experienced, which have a direct impact on database server I/O. Tools such as Oracle Enterprise Manager Database Control or Oracle Enterprise Manager Cloud Control can provide insight into the performance of the database. Other times, data growth can also contribute to slow console performance. Oracle Database tuning is a huge topic in itself, but running a few queries, such as the one shown here, can be useful to determine whether your database has adequate free space to process instance requests:
    SELECT a.tablespace_name, 
           b.autoextensible, 
           b.increment_by,
           SUM(a.bytes)/1024/1024 free_space_mb, 
           SUM(a.blocks) free_blocks,
           SUM(b.bytes)/1024/1024 allocated_space_mb 
    FROM sys.dba_free_space a, 
         sys.dba_data_files b 
    WHERE a.tablespace_name = b.tablespace_name
    AND a.tablespace_name ='DEV_SOAINFRA'
    GROUP BY a.tablespace_name,
             b.autoextensible, 
             b.increment_by;

    The output of this query in Figure 9.4 shows that the AUTOEXTENSIBLE column of SOAINFRA data space is TRUE (good!) and the amount of free space is ~25 MB (worrisome) of a total allocated space of 500 MB:

    Server-wide performance issues

    Figure 9.4: Database schema free and allocated space query

Hardware may not be able to keep up with all the transactions, but by monitoring the various hardware, operating system, and infrastructure metrics, over time you should be able to determine whether you have maxed out your current hardware capacity or not.

Is your environment always slow or does it only demonstrate poor performance under heavy loads? Monitoring the performance of your environment is not a one-time activity. You should have monitoring tools in place to help you pinpoint unusual or high utilization and compare them to average usage times. In Chapter 6, Monitoring Oracle SOA Suite 12c, we proposed using Oracle Enterprise Manager Cloud Control, which provides end-to-end monitoring and alerting (and even some administration) capabilities for both your SOA Infrastructure and operating system. By adding the SOA Management Pack, you would also be able to get composite-specific information of all your environments through a single console.

SOA composite instance performance

When we refer to composite instance performance, we mean to say that a particular SOA composite is behaving poorly. In these cases, performance issues are often isolated to one (or a few) composites, but not all. For example, this could be due to the response time from external systems, a badly designed process, due to a suddenly poor performing database, or a queue that the SOA composite is using to read/write data.

Consider a scenario where your composite begins processing an instance by consuming messages from a JMS queue. During peak load, if sufficient threads are not available to the polling adapter, the overall performance of the composite is degraded. There may be no problem with your infrastructure and it may be capable of handling extra load by allocating more threads. The easiest way to overcome this bottleneck is to increase the value of the adapter.jms.receive.threads property that controls the number of threads from its default value of 1. This is a binding property that can be set in an individual composite:

<property name="adapter.jms.receive.threads" type="xs:string" many="false">4</property>

Note

Starting with Oracle SOA Suite 12c, Mediator instance metadata is no longer stored in the database, but payloads are, depending on the audit level.

We cannot address all scenarios that may cause a specific composite to behave poorly, as they are unlimited. However, we can provide the means and direction to identify poorly behaving composites and help retrieve the specific composite instance ID of the offending instance so that we may navigate to its details and find out more about what's going on.

Average, minimum, and maximum duration of BPEL/BPMN components

The queries in this section will help you get an idea of the performance of runtime components. The majority of the commands performed provide an easy-to-read format and may initially look complex. However, the queries themselves are relatively basic.

This first query, when run against the [PREFIX]_SOAINFRA schema, returns the name of the BPEL or BPMN component, the partition it is deployed to, its state, as well as average, minimum, and maximum execution time along with the total count. The following query is filtered to retrieve statistics for the last 24 hours via the SYSDATE-1 clause. If preferred, you can further limit the query to a specific composite (COMPOSITE_NAME LIKE '%<composite_name>%') or component (COMPONENT_NAME LIKE '%<component_name>%'):

--------------------------------------------------
-- BPEL/BPMN AVG/MIN/MAX                        --
--------------------------------------------------
SELECT domain_name,
       component_name,
       DECODE(state,'5','COMPLETE','9','STALE','10','FAULTED') AS state,
       TO_CHAR(AVG((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date),12,2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4))), '999990.000') AS avg,
       TO_CHAR(MIN((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date),12,2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4))), '999990.000') AS min,
       TO_CHAR(MAX((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date),12,2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4))), '999990.000') AS max,
       COUNT(1) AS count
FROM   cube_instance
WHERE  creation_date >= SYSDATE - 1
--AND    component_name LIKE '%%'
--AND    composite_name LIKE '%%'
GROUP BY domain_name, 
         component_name,
         state
ORDER BY component_name, 
         state;

The output of the preceding query is shown in Figure 9.5. Here, we can see that the ChangeAddress BPEL component had 112 executions within the last 24 hours with an average execution time of 0.279 seconds and a maximum execution time of 7.3 seconds:

Average, minimum, and maximum duration of BPEL/BPMN components

Figure 9.5: Output of the SQL query returning aggregated statistics of the BPEL/BPMN component performance

Note

Poor execution times of BPEL, BPMN, and Mediator components may not be due to the slowness of the infrastructure or code, but could be due to poor response time from the invocation of external services or resources.

This query comes in very handy as it is simple to use and quickly retrieves good, aggregated information of your overall component instance performance.

The duration of a single BPEL/BPMN component instance

In other cases, the performance may be isolated to a single transaction, which could be related to one or more composite and/or component. Regardless, you will have to start somewhere and have a general idea of the composite that is reportedly responding poorly.

This following SQL query lists all BPEL and BPMN instances within the last 24 hours (refer to SYSDATE - 1 in the WHERE clause) that have been completed in over 10 seconds (refer to 10 in the WHERE clause):

--------------------------------------------------
-- BPEL/BPMN COMPONENT INSTANCE DURATION TIMES  --
--------------------------------------------------
SELECT cmpst_id,
       TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') AS creation_date,
       component_name,
       componenttype,
       DECODE(state, '5', 'COMPLETE', '9', 'STALE', '10', 'FAULTED') AS state,
       TO_CHAR((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 12, 2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4)),'999990.000') AS duration  
FROM   cube_instance
WHERE  TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= TO_CHAR(SYSDATE - 1,'YYYY-MM-DD HH24:MI')  
AND    (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 12, 2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4)) > 10
--AND    component_name LIKE '%%'
--AND    composite_name LIKE '%%'
ORDER BY component_name,
         creation_date;

You can further limit the query to a specific composite or component as needed.

Figure 9.6 shows the output of this query. Now, we can see that we have 7 instances that have taken longer than 10 seconds, and one in particular took 26.7 seconds. The good thing is that the query outputs the composite instance ID! Now we can log in to the console and navigate to the Flow Trace of that particular instance to find out exactly what activity took so long:

The duration of a single BPEL/BPMN component instance

Figure 9.6: Output of the SQL query returning individual BPEL/BPMN component instance performance

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.178.181