Troubleshooting performance issues is a rather vast and complicated area, which we will only be able to touch upon lightly. In general, performance becomes a concern when a transaction is unable to be executed within a reasonable or expected time or when the system is unable to process expected volumes. Sometimes, this may be specific to a single transaction; other times, it may be something that affects every composite instance running on the server.
When server-wide performance problems occur, they impact most or all of the transactions currently being executed on your infrastructure. For example, if the dehydration store database is down, this will definitely have an impact on all the running instances.
Consider asking yourself a few questions to determine whether there is an overarching infrastructure problem:
These are usually indications that there may be a server-wide performance problem. Performance issues impacting the entire server are usually related to either one or a few specific issues. In this case, take a look at the following:
The following code is an example of how to view the available disk space. Here, the /u01
mount point on which our software is installed has 10.8 GB available, so space is not an issue:
oracle@soahost1:/home/oracle> df -m Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda6 730 387 306 56% / /dev/sda7 244 35 197 15% /home /dev/sda5 996 804 141 86% /var /dev/sda3 1984 38 1844 2% /tmp /dev/sda2 4841 3514 1077 77% /usr /dev/sda1 99 12 83 13% /boot /dev/sdb 20159 8267 10869 44% /u01 tmpfs 3992 0 3992 0% /dev/shm soadb.packt.com:/u01/share 40318 37175 1095 98%
By reviewing the output of the vmstat
Linux command in the following example, we see that SWAP utilization is 0 MB (good!), the amount of free memory is 87 MB (keep an eye on it!), the number of bytes in and bytes out at this point in time (to measure I/O) is 6
and 49
, respectively (not bad!), and the CPU is 94 percent idle (no problem here!):
oracle@soahost1:/home/oracle> vmstat -S m procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 1095 87 153 454 0 0 6 49 26 34 4 1 94 2 0
/var/log/messages
log reveals critical events at the operating system level. This log file can reveal errors that may have a direct impact on the stability, behavior, and/or performance of SOA Infrastructure. In the following example, it appears that we have exceeded certain ulimit
resources:root@soahost1:/root> cat /var/log/messages Aug 31 20:53:22 uslx286 sshd[22480]: fatal: setresuid 10000: Resource temporarily unavailable
The lsof
command in Linux can list the number of open files. Too many may exhaust operating system resources:
root@soahost1:/root> lsof | wc -l 6064
The ps
command in Linux can list the number of running processes. Ultimately, too many may also exhaust operating system resources:
root@soahost1:/root> ps -A | wc -l 297
SELECT a.tablespace_name, b.autoextensible, b.increment_by, SUM(a.bytes)/1024/1024 free_space_mb, SUM(a.blocks) free_blocks, SUM(b.bytes)/1024/1024 allocated_space_mb FROM sys.dba_free_space a, sys.dba_data_files b WHERE a.tablespace_name = b.tablespace_name AND a.tablespace_name ='DEV_SOAINFRA' GROUP BY a.tablespace_name, b.autoextensible, b.increment_by;
The output of this query in Figure 9.4 shows that the AUTOEXTENSIBLE column of SOAINFRA data space is TRUE (good!) and the amount of free space is ~25 MB (worrisome) of a total allocated space of 500 MB:
Hardware may not be able to keep up with all the transactions, but by monitoring the various hardware, operating system, and infrastructure metrics, over time you should be able to determine whether you have maxed out your current hardware capacity or not.
Is your environment always slow or does it only demonstrate poor performance under heavy loads? Monitoring the performance of your environment is not a one-time activity. You should have monitoring tools in place to help you pinpoint unusual or high utilization and compare them to average usage times. In Chapter 6, Monitoring Oracle SOA Suite 12c, we proposed using Oracle Enterprise Manager Cloud Control, which provides end-to-end monitoring and alerting (and even some administration) capabilities for both your SOA Infrastructure and operating system. By adding the SOA Management Pack, you would also be able to get composite-specific information of all your environments through a single console.
When we refer to composite instance performance, we mean to say that a particular SOA composite is behaving poorly. In these cases, performance issues are often isolated to one (or a few) composites, but not all. For example, this could be due to the response time from external systems, a badly designed process, due to a suddenly poor performing database, or a queue that the SOA composite is using to read/write data.
Consider a scenario where your composite begins processing an instance by consuming messages from a JMS queue. During peak load, if sufficient threads are not available to the polling adapter, the overall performance of the composite is degraded. There may be no problem with your infrastructure and it may be capable of handling extra load by allocating more threads. The easiest way to overcome this bottleneck is to increase the value of the adapter.jms.receive.threads
property that controls the number of threads from its default value of 1
. This is a binding property that can be set in an individual composite:
<property name="adapter.jms.receive.threads" type="xs:string" many="false">4</property>
We cannot address all scenarios that may cause a specific composite to behave poorly, as they are unlimited. However, we can provide the means and direction to identify poorly behaving composites and help retrieve the specific composite instance ID of the offending instance so that we may navigate to its details and find out more about what's going on.
The queries in this section will help you get an idea of the performance of runtime components. The majority of the commands performed provide an easy-to-read format and may initially look complex. However, the queries themselves are relatively basic.
This first query, when run against the [PREFIX]_SOAINFRA
schema, returns the name of the BPEL
or BPMN
component, the partition it is deployed to, its state, as well as average, minimum, and maximum execution time along with the total count. The following query is filtered to retrieve statistics for the last 24 hours via the SYSDATE-1
clause. If preferred, you can further limit the query to a specific composite (COMPOSITE_NAME LIKE '%<composite_name>%'
) or component (COMPONENT_NAME LIKE '%<component_name>%'
):
-------------------------------------------------- -- BPEL/BPMN AVG/MIN/MAX -- -------------------------------------------------- SELECT domain_name, component_name, DECODE(state,'5','COMPLETE','9','STALE','10','FAULTED') AS state, TO_CHAR(AVG((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date),12,2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4))), '999990.000') AS avg, TO_CHAR(MIN((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date),12,2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4))), '999990.000') AS min, TO_CHAR(MAX((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date),12,2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4))), '999990.000') AS max, COUNT(1) AS count FROM cube_instance WHERE creation_date >= SYSDATE - 1 --AND component_name LIKE '%%' --AND composite_name LIKE '%%' GROUP BY domain_name, component_name, state ORDER BY component_name, state;
The output of the preceding query is shown in Figure 9.5. Here, we can see that the ChangeAddress
BPEL component had 112
executions within the last 24 hours with an average execution time of 0.279
seconds and a maximum execution time of 7.3
seconds:
This query comes in very handy as it is simple to use and quickly retrieves good, aggregated information of your overall component instance performance.
In other cases, the performance may be isolated to a single transaction, which could be related to one or more composite and/or component. Regardless, you will have to start somewhere and have a general idea of the composite that is reportedly responding poorly.
This following SQL query lists all BPEL and BPMN instances within the last 24 hours (refer to SYSDATE - 1
in the WHERE
clause) that have been completed in over 10 seconds (refer to 10
in the WHERE
clause):
-------------------------------------------------- -- BPEL/BPMN COMPONENT INSTANCE DURATION TIMES -- -------------------------------------------------- SELECT cmpst_id, TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') AS creation_date, component_name, componenttype, DECODE(state, '5', 'COMPLETE', '9', 'STALE', '10', 'FAULTED') AS state, TO_CHAR((TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 12, 2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4)),'999990.000') AS duration FROM cube_instance WHERE TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= TO_CHAR(SYSDATE - 1,'YYYY-MM-DD HH24:MI') AND (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 12, 2)) * 60 * 60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 15, 2)) * 60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date - creation_date), 18, 4)) > 10 --AND component_name LIKE '%%' --AND composite_name LIKE '%%' ORDER BY component_name, creation_date;
You can further limit the query to a specific composite or component as needed.
Figure 9.6 shows the output of this query. Now, we can see that we have 7 instances that have taken longer than 10 seconds, and one in particular took 26.7 seconds. The good thing is that the query outputs the composite instance ID! Now we can log in to the console and navigate to the Flow Trace of that particular instance to find out exactly what activity took so long:
18.188.178.181