Application lifecycle maintenance
An IBM Business Process Manager (BPM) application consists of a set of Business Process Definitions (BPDs), human or system tasks, services, and so on. IBM BPM application developers are responsible for creating and testing the application code.
The application lifecycle does not end when the code is deployed to the production system and the responsibility is shifted to the operations team. The application must be maintained, updated, and tested over the entire lifespan, from the beginning of development until it is withdrawn from service. This chapter describes the application lifecycle (primarily from the operation’s point of view) and emphasizes testing and maintenance.
This chapter includes the following topics:
2.1 Application development versus operations roles
A IBM BPM application has many points of contact with different roles. Application developers develop the code and run unit and functional verification tests to ensure that the solution meets the functional requirement.
Further system integration testing (SIT) and user acceptance testing (UAT) are performed to ensure that the application’s behavior at the system level also meets the solution requirement. Non-functional requirements, such as performance, scalability, security, and globalization, also are tested.
It is debatable where performance and stress testing teams (SIT and UAT) belong from the organizational point-of-view. In large enterprises, there often are dedicated testing teams (functional and non-functional). For the purposes of this chapter, we consider these teams a part of the IBM BPM operations.
IBM BPM and WebSphere administrators are responsible for deploying and maintaining IBM BPM applications on the test and production systems. They also are responsible for versioning the performing instance migrations when new snapshots are deployed.
Database administrators (DBAs) also are an important part of the IBM BPM operations. Organizationally, DBAs belong to a different team. However, DBAs must be fully engaged with the rest of the IBM BPM operations to ensure that IBM BPM applications perform well in production.
The application developers and operations teams must work together for successful testing and problem determinations.
2.2 Application lifecycle: Operation’s point of view
It is important to ensure that the quality of process applications meet the functional and non-functional requirements of the business. To produce reliable process-based solutions that achieve the business outcomes, you must run functional and non-functional tests. These tests provide confidence that the applications are developed and deployed correctly, that the system is configured properly, and that the system performs and scales well under normal operations or under stressful situations.
A part of quality assurance is to establish a baseline to test against future system changes. These changes are in the form of the following upgrades, changes, and modifications:
IBM BPM product upgrades
Server modifications
Network changes
Application upgrades
Application server upgrades
Database upgrades
System load dynamics changing over time
It also is important to provide visibility into system scalability to allow for budget planning that is based on anticipated growth in system load, (see Figure 2-1 on page 11).
 
Figure 2-1 Application lifecycle: From design to production
2.2.1 Versioning and continuing testing
New versions of the application can be developed to introduce new capabilities or to address issues, such as simple bugs or improved security. As the owner of operations, you must create policies about application versioning that are based on business requirements, such as deciding whether in-progress process instances must be finished on the current snapshot version or be migrated to the new version. More testing also can be considered when new versions of the application are delivered. The versions can progress from the SIT and UAT testing environment before going into the production environment (see Figure 2-2).
Figure 2-2 Application versioning and testing
The process of testing and versioning should be iterated over the lifespan of the application. Figure 2-2 also shows one iteration of application or system update or upgrade that involves comprehensive testing before deploying the update into the production.
2.2.2 Reviewing and planning
The production system capacity must be planned based on performance testing results. Over time, you must work with business owners to anticipate business growth. For example, if the business anticipates annual growth rate of 10%, and the number of concurrent users is 500, you should expect to support 550 concurrent users within one year. If the current production is expected to support future workload, it must be provisioned based on the projected growth rate.
The operation team should review business needs and system capacity regularly (quarterly or bi-annually). Business requirements change over time. Estimates that are based on previous reviews must be adjusted based on the current state of the business. Future growth must be reevaluated to prepare for the production system capacity.
2.3 Types of tests and their usage
Many terms describe many types of tests. This section defines terminology that is associated with testing and the tests. For example, the term load test is often used incorrectly to describe different types of performance testing.
2.3.1 IBM BPM application tests
The following tests are frequently performed for IBM BPM applications:
Performance test
A performance test attempts to push the system to the limit to determine the maximum load that can be achieved. The result of performance test is used for capacity planning of a specific environment. For example, if the maximum load of the test is 50 transactions per second, the capacity of the system cannot go beyond the tested throughput.
Load test
A load test is similar to the performance test, except that it might not reach the maximum capacity. A load test is to ensure that a specific volume of work (for example, a predefined throughput rate or number of concurrent users) can be sustained on a specified topology while meeting a service level agreement.
Endurance test
An endurance test involves testing a system with a significant load over a long period for determining how a system behaves under a period of sustained use. Typically, an endurance test runs for 8 hours or more. The focus of an endurance test is to determine whether flaws in the system appear only gradually over time. For example, a slow memory leak can develop into a serious out-of-memory situation over 48 hours of testing.
Soak test
A soak test is another name for an endurance test.
Stress test
A stress test involves testing a system with heavy load or high user count. The purpose of this test is to discover how a system behaves under heavier than normal usage. For example, a tax-related system must handle high load just before the tax filing due date, which is April 15 in the United States. A retailer must prepare for the rush of customers and much higher volume during seasonal shopping.
Failover test
A failover test is to test how the system behaves when the system or a part of the system fails for whatever reasons. The purpose of the failover test is to identify issues that are associated with system failure and see how the system recovers from normal or abnormal failures. In a topology with high availability, failures in one cluster member should result in the load being routed to the remaining nodes. A single-node failure should not bring down the entire IBM BPM cell. System failures can be simulated through random or specific types of failures.
Unit test
A unit test examines the functionality of individual components, such as coaches, coach views, human tasks, and system activities. Unit tests often are performed by the developer to ensure that a function or method does what it is designed to do. For Business Process Model and Notation (BPMN) processes, unit tests can be performed in the process center during the initial development phase. For better testing and to avoid negatively affecting the process center’s performance and stability, testing should be run on a process server that is dedicated for testing.
User Acceptance Test
User acceptance testing (UAT) is one of the final phases of the IBM BPM solution testing. During UAT, users of the line of the business test the IBM BPM processes and services. This type of testing is to ensure that required tasks in real-world scenarios function according to specifications. UAT is often considered beta testing of the system.
Integration test
The integration test focuses on the entire process, including external systems and involves process portal or external user interfaces. Integration testing scenarios must cover all possible paths in the process and consider timers, events, gateways, and so on.
Instance migration test
Before a snapshot is deployed into the production environment, an instance migration test is performed. This test ensures that instances of the current snapshot can be migrated to the new snapshot version. It is important that the instance migration test covers problematic cases to ensure that these cases are handled properly. Use the wsadmin BPMCheckOrphanTokens command to detect the possibility of orphaned tokens before a new snapshot is deployed and to identify whether to delete or move each token.
Each process application can use different migration patterns. The migration can drain, which allows the instances to finish on the current snapshot version. An alternative approach is to migrate all process instances to the newly deployed snapshots.
Browser and mobile test
Browser testing tests coaches on browsers to be used by users in the production environment. For mobile users, the test is applied to mobile devices, such as smart phones and tablets.
2.3.2 Functional and non-functional tests
Functional testing is a quality assurance process to check whether the process application works as designed. Functional tests use the functional requirements as the exit criteria. For IBM BPM applications, functional tests typically include unit tests, integration tests, user acceptance tests, instance migration tests, globalization tests, mobile, and browser tests.
Non-functional testing is to test a process application for its non-functional requirements, such as performance, scalability, security, and failover. Non-functional tests determine how a system operates, rather than specific behaviors of the system. For IBM BPM applications, non-functional tests typically include performance testing, stress testing, endurance testing, and security testing.
2.3.3 Performance and stress tests
Many people use performance and stress tests interchangeable as though they are the same. Performance and stress tests are different.
Performance testing focuses on testing the most commonly executed path. The test uses metrics, such as throughputs of processes and tasks completed per unit of time, and concurrent users. Performance tests end when the service level agreement (SLA) is violated.
Stress testing focuses on the system surviving under heavy load. In real-world scenarios, a system can be under stressful work load for a short period, often because of bursts of incoming requests, such as batch jobs. If the system survives the burst of activities without falling over, it can recover and continue to work when the volume of workload returns to normal. During stress testing, performance metrics often are not tracked if the system behaves normally. The exit criteria of stress testing is that the system stays alive.
2.3.4 Black-box and white-box testing
Black-box testing is a testing methodology that examines the system without looking in to specific internal components. Black-box testing does not need to know how the system works, and it can be applied to any other tests. Test cases for black-box testing are built around requirements and specifications.
White-box testing scenarios are designed with knowledge of how internal components work. White-box testing can be used in situations where black-box testing is insufficient or not feasible. For example, failover testing is difficult to reproduce in a normal testing environment because it can take too long for failures to occur. For failover testing, failure points can be triggered at specific points in the process applications. For instance, network connections to the backend server are pulled for a period to simulate network failures. White-box testing also can be applied to unit testing, integration, and user acceptance testing.
2.4 Performance testing methodology
Performance testing must be planned and executed from end-to-end. As shown in Figure 2-3 on page 15, performance objectives should be defined at the beginning of the project that is based on non-functional requirements. It is important to clearly define exit criteria for testing to determine whether the IBM BPM application meets the specified non-functional requirements.
Figure 2-3 Performance Testing Methodology
Many testing scenarios can be defined to evaluate a IBM BPM solution’s performance. For most applications, a few key scenarios are sufficient to represent typical application usages by business users.
With scenarios defined, the next step is to develop and debug testing scripts by using your preferred testing tool. Testing tool options can be HP Enterprise's LoadRunner, IBM Rational® Performance Tester, or Apache JMeter. LoadRunner and RPT are licensed application load testing tools, and JMeter is a no-cost open source tool that is distributed under Apache License.
Before testing is performed, a performance testing environment must be set up and testing data must be prepared in the environment. The performance testing environment should be provisioned with the same hardware and software as the intended production environment. A commonly made mistake is that performance testing is conducted on an inadequately provisioned environment (which can be shared with other testing activities).
Prepare the test environment with realistic data, including user and group definitions and their entitlements. Populate the database with active and completed process instances and tasks. Performance results can vary widely, depending on the data that is used in the test.
Ensure that the application code is well-debugged before serious performance testing is run. Buggy application code generates exceptions and errors. Performance testing of buggy code is equivalent to testing how fast errors are generated by the system. It is not a productive use of the testing resources.
Performance testing and tuning is an iterative process. The data-driven performance tuning process starts with measurement. Traces and logs are gathered and analyzed during and after the test. System resources, such as CPU and memory utilization, should be monitored and analyzed. A typical IBM BPM configuration is a multi-tiered environment that consists of the following tiers:
Browser
Web server, such as IBM HTTP Server
Process Server, including Application Target
Message Engine and support cluster members
Database server
Other back end servers
During performance testing, you must monitor and analyze all of these tiers. Bottlenecks in any of these tiers can cause system bottlenecks.
Ideally, tuning changes are made one at a time. Test, analyze, tune a parameter setting, and test again. With too many changes, it can be difficult to determine whether tuning changes help or hurt performance.
During performance testing and tuning, always address the biggest performance bottleneck first. Performance benefits of other tuning changes cannot be clearly visible until the bigger bottleneck is fixed.
2.4.1 Defining objectives
Non-functional requirements are agreements between the business stakeholder and the IT department that is providing the IBM BPM solutions. It is important to engage with business users, business analysts, and executive sponsors. This communication clearly defines the non-functional requirements based on which the performance targets are defined.
User-related targets include the following examples:
Up to 5,000 business users can access to the system.
On average, there are 1,000 concurrent users who are using the system at any time (10% of which work on human tasks A, 10% of which work on human task B, and so on).
From 9 AM - 9:30 AM every day, all 5,000 users log in to the system.
The response times of task list searches should be less than five seconds.
On average, each business user works on and completes five tasks per hour.
Business users often take three minutes to complete each activity.
In addition to user-related non-functional requirements, the following requirements are for how many business processes and tasks are generated:
There are 10,000 process instances created every day (the peak hour is 1 PM). During this peak hour, 40% of the process instances are generated.
Each process contains 10 tasks on average. Details about these tasks also are specified.
Each process takes an average of five days to complete. The completed instances and tasks are kept in the system for two weeks.
Performance targets can be defined based on non-functional requirements. The performance targets should be defined in the following areas:
Users and groups related targets in terms of concurrent and accessible user counts.
Throughput-related targets, such as how many process instances and tasks are to be created and completed per unit of time.
Response time targets as service level agreements.
In addition to business stakeholders, the previous production system is another important source of acquiring non-functional requirements. If the IBM BPM solution is designed to replace a system, you can obtain throughput and usage data. You can use that data to produce concrete performance targets or validate requirements from the business owners.
Testing scenarios
It is important to understand how business users conduct their daily business by using the capabilities that are provided by the IBM BPM application. Their usages can be broken into a few complex end-to-end scenarios or more individual scenarios. Either way, the scenarios must be clearly defined for measuring against previously defined performance objectives.
For example, the following scenarios can be defined for an account opening application at a bank:
Users log in to the system.
Bank associates go through steps to collect customer information, enter that information into the system, and then continue the rest of account opening business process.
Validate customer information and check for potential fraud, and if anomaly is detected, route to human users to perform more checks.
Branch manager approves account opening requests.
It takes resources and efforts to turn scenarios into test scripts and make them run as performance workloads. For practical reasons, create no more than a dozen scenarios to cover the most commonly used application usage scenarios. Avoid creating too many fine grained scenarios.
2.4.2 Develop testing scripts
Based on defined scenarios, testing scripts must be developed. That process and the tools that are used are described in this section.
Testing tools
To run performance testing, we need tools to generate load to drive the application to do the work it is designed to perform. We must simulate activities from human users or from external services (such as REST calls or UCA messages). The following tools are used for load generation:
HP LoadRunner
IBM Rational Performance Tester
Apache JMeter (open source tool)
Testing script development
You must record scripts by playing back the application through steps that were defined in the testing scenarios. Each tool has its own mechanism for script recording. The script programming languages also differ from one tool to another. Script development is a non-trivial task. It is important to understand the load testing skills of the team that is creating the script development.
After the initial scripts are recorded, you must develop them into testing scripts that implement the scenarios as defined in previous steps. Development includes the following typical activities:
Organize http activities into transactions and measure each transaction's response time. For example, task list searches and task claims should be grouped into separate transactions and be measured separately.
Static content, such as images and html code, can be removed or moved outside of measurement transactions. After first use, static contents should be cached in the client browser so that they do not affect runtime performance.
Different users should be added to the script to simulate multi-user testing. Different user IDs should be used in the testing. Users should be organized into groups, if applicable.
Input data should be added to the script to simulate different usage patterns.
Think times and pacing times should be carefully set and added to the script to pace simulated user’s activities.
IBM BPM’s notification is implemented as CometD long polls, which time out every 30 seconds. When the script is recorded, you can see many such long poll http calls because the recording can take time to complete. In script editing, these long polls should be reduced or removed (this setting is configurable). For more information and product documentation, see the following website:
Finally, developing testing scripts is an iterative process that requires close cooperation between the test script developer and application code owner. Scripts must be debugged and validated to ensure that they are correctly implemented.
2.4.3 Think and pacing times
At an abstract level, a IBM BPM user often performs the tasks that are shown in Figure 2-4. A user queries for a list of available tasks to work on, chooses one of them to claim, works on the task, and then completes the task. The user then repeats these activities.
Figure 2-4 Simulated user’s activities
Think time is the time that the real user waits between the actions. For example, a user can take 30 seconds between the task list being displayed in the process portal to the time they choose a task on which to work. There also are more fine grained think times between user interface steps that are specific to the application that is tested.
Think times can be expressed as constant times, simple randomized times, or more sophisticated timers. A think time of 5 seconds means that the load generator injects a 5-second sleep time between the steps. With randomized think time, you can specify the minimum and the maximum sleep times with the mean value being the average (if the test runs long enough). For example, a randomized think time of 3 - 7 seconds still means 5 seconds of think time on average over the test run. Randomized think times reflect user behaviors more realistically than constant think times.
Pacing time is the total time between iterations. For example, if a user is expected to finish five tasks per hour, it means that the user takes an average of 20 minutes to complete the round trip of querying, claiming, and completing a task.
JMeter
JMeter does not support pacing time ready for use. However, you can use the cumulative think times to implement pacing times in JMeter.
JMeter’s timers have their scope, and they are evaluated before each sampler in which they are found. If there is more than one timer in a scope, all timers are evaluated and the cumulated value of the timers within the same scope is used. For example, a scope has sampler A followed by a timer of 10 seconds and then is followed by sample B with another timer of 30 seconds. The JMeter evaluates the timers before sampler A and uses think times of 40 seconds after sampler A and B. To avoid confusion, avoid having multiple timers within each scope.
Pacing time and throughput time
Pacing times and think times determine how fast simulated users work on the processes and tasks. A few virtual users with no or low think times can put heavy load on the testing system. It is important to set think times and pacing times.
Real human users can take relatively long times to process tasks that are complex, and the process instance can take days, weeks, or even months to complete. For example, a home mortgage application process typically takes several weeks to go through all the steps from the initial application until the closing of the mortgage loan.
For performance testing, it is not realistic to run tests longer than a few hours, at most. It is a usual practice to shorten think times and pacing times to make virtual users work much faster than real users. The number of concurrent users in performance testing are equivalent to a higher number of concurrent real human users. When reporting testing results in terms of users, it is important to address the following areas:
Distinguish between virtual users as compared to real human users
Explain the relationship between these two metrics based on think time and pacing times
For example, the pacing time of a virtual user’s round-trip time is 120 seconds between tasks, and a real human user takes about 20 minutes to finish the job. This means that a virtual user works at 10 times the rate of real human users. In this case, a test with 500 virtual users is equivalent to 5,000 real human users.
2.4.4 Testing data considerations
Before testing is run, the IBM BPM system must be pre-populated with users, groups, and process instances. The data that is used in the test should reflect a realistic production environment. Several considerations regarding the testing data to be used drive performance testing.
Simulated users
Consider the following points when deciding to use simulated users:
Load testing tools, such as LoadRunner and RPT, are licensed based on the number of virtual users, which often is a limiting factor. Use as many users as possible to simulate real application usage scenarios, especially for test cases, such as logins.
Use distinct user names, IDs, credentials, entitlements, and so on. Avoid the use of a single or a few user IDs for all simulated virtual users. Reusing the same user rather than unique users might not give accurate performance results.
Organize users into groups or departments of correct sizes.
Ensure realistic think times are used between requests, as described in 2.4.3, “Think and pacing times” on page 18.
Input data
The following considerations are related to the input data that is used in tests:
Use realistic input data in terms of size and data complexity. For example, if your application contains industry standard schemas (such as HIPAA schemas), use them in your test data.
Use various numbers of fields and nesting levels to cover a range of input data.
Use various data sizes.
Active and completed instances
Pre-populate the system with a mix of active and completed processes, tasks, and instances. In typical real production environments, it is common for the system to contain work-in-progress business processes. For example, a loan application takes weeks to complete. At any specific time, we should see loan applications at different stages of processing. Consider the following examples when creating your processes for testing:
Assign tasks to groups and users. Having hundreds or even thousands of users in a single group usually is not realistic. Without proper group structure, the task list can become unrealistically deep, which affects the task search times.
Start with the same data for each test. This task can be achieved by cleaning up and restoring the test system after each test. Another approach is to always start with an empty system and pre-populate it with the same set of data before each test. If data pre-population takes too much time, the second approach might not be ideal.
Back-end services
Typically, simulated mock services are used in performance and other non-functional testing. Consider the following criteria:
Ensure that mock services return realistic response data.
Ensure that the response data is of complexity and size that reflects real production data.
Notice the response time of the simulated backend services (it should be within SLA for production).
2.4.5 Drain down and steady state testing
There are two popular ways of conducting performance testing: One is to measure the time it takes to address a set of process instances or other inbound requests. The other is to maintain a steady state over the entire testing period. There are advantages and disadvantages to both approaches.
Drain down testing
Drain down testing is easier to set up than steady state performance testing. For drain down testing, you populate the input queue or the database with a many inbound requests. You then run the test and let the system address the requests until all of the requests are processed. For example, you create 10,000 process instances and then test how long it takes for the system to process and complete all these instances.
The use of this approach for testing includes the following drawbacks:
System behavior can be different, dependent on the state of the processes.
Be careful about choosing the period to measure performance. For example, most process instance creations occur at the beginning of the test.
The system can gradually idle at the end of the test run. Much work is completed at the beginning of the test. As time increases, the number of work items becomes smaller, which can make it difficult to calculate proper averages. Figure 2-5 shows an example of drain down testing.
Figure 2-5 Drain down testing
It is difficult to plan for large-scale tests because all of the process instances are created at the beginning of the test.
Steady state testing
The steady state testing method maintains a consistent flow of work into the system. During the testing, approximately the same number of active process instances and tasks are used as there are at the beginning of the test. You can think of steady state testing as a manufacturer assembly line. New orders come in at one end, and the assembled products exit at the other end. At any time, there should be the same number of work-in-progress products on the assembly line, and workers work at each station focusing on their specific tasks.
Steady state testing requires careful planning. The test driver should control the flow of work into the system by using the following criteria:
First, pre-populate the system with a mix of process instances at different states.
Partition users and groups so they complete processes at about the same rate.
Create process instances and tasks at the same rate as existing instances are completed. The same amount of work should be in the task queues or other internal database tables throughout the test.
Steady state testing often is more realistic than the drain down approach. It is rare for businesses to start from scratch with all work that is created at the beginning. Running steady state performance testing includes the following key advantages:
It is easy to measure the throughput of the system. Calculate the average instances and tasks that are completed over a period (such as five minutes of steady state run) and you determine the throughput of the system.
Tests can be run for a short or long period and the results should show consistent throughput.
Steady state testing example
The Human Services Benchmark sample can be found in IBM Bluemix® DevOps Services, which is available at this website:
Documents and sample artifacts for IBM BPM performance testing also are available at this website. You can use the information and adapt or augment it with your own applications and services to fit your specific requirements.
The sample is delivered in IBM BPM V8.5.7 and when the IBM Rational Performance Tester is used. However, the methodology for setting up steady state testing can be applied to other releases and use different load testing tools.
2.4.6 Performance metrics
There are several performance metrics for characterizing the performance of IBM BPM, as described in this section.
Response times
Response times measure how long users wait for their activities, such as refreshing task lists, completing tasks, and selecting the submit buttons. Response times include time that is spent in the browser, in the network for sending requests and receiving responses, and on the server handling the request. In some cases, response times can also include time that is spent in processing the back-end services and database operations.
Response times are often considered SLAs. Business users expect to see response times no greater than 3 seconds.
Response times can be measured as averages or percentiles (typically 90 or 95 percentiles). The measurements also can include both options.
If the measured average response time is significantly higher than the corresponding 90th percentile, it can mean that large outliers are lifting the average. This result can be a cause for further investigation.
Throughput
Throughput defines how many processes, tasks, or other units of works are completed in units of time (such as per second, per hour, or per day).
Consider peaks and averages when throughputs are defined. For example, the throughput of tasks that are created can peak at 10 AM and 1 PM each day. It is important to test the system to handle peak throughput rate, which can be significantly different than the average throughput rate over the entire work-day.
Business users do not always see throughput as an important performance metric (partly because it is not readily visible to many of them). You must identify and connect to the correct business process owner in your organization to obtain the throughput information.
Concurrent users
The number of concurrent users is a metric that readily resonates with business process owners and the executive sponsor. However, this metric also often is a misrepresented metric. Therefore, it is important to report it correctly. The number of simulated virtual users might not be the same as real human users. As described in 2.4.3, “Think and pacing times” on page 18, a simulated user can exert a heavier load on the system than a real human user. Take both think times and pacing times into consideration when calculating supported concurrent users as a performance metric.
Resource utilizations
Performance metrics that track the resource utilizations on the IBM BPM and DB servers also are important for performance testing. The following typical metrics are captured by monitoring tools:
CPU utilization on both IBM BPM and database servers
Memory utilization
I/O and network utilization
 
2.5 Tuning the system and caches
Optimal tuning is an iterative exercise. It is also a collaboration between Application Development and Operation teams. Tuning continues during application performance testing and after the application is in production. Changes in the use of the production system can require a tuning change of scaling up or scaling down.
The following common usage patterns can require a change in tuning parameters:
New applications
Increase in number of concurrent users
Increase in daily or weekly task and instance generation
Increase in data size that is sent to IBM BPM (for example, more document attachments or increased JMS messages to be addressed)
Total number of tasks and instances in the system
Total number of applications and IBM BPM assets in the applications
For more information, see IBM BPM Performance Tuning, SG24-8216-00. The relevant sections are section 2.5 Large Business Objects and Chapter 4: Performance Tuning and configuration. The publication is available at the following website:
Application caches and definitions are documented in the latest release. These settings must be reviewed during load testing scenarios. For more information about the configuration options, see the following documentation:
Some performance metrics see a positive improvement after data is deleted. With less data in the system, database queries can perform better. For more information, see Chapter 5, “Maintaining IBM BPM-dependent systems” on page 69.
2.6 Application versioning
IBM BPM applications are written with two clients. The IBM BPM Standard (BPMN) is written in the IBM Business Process Execution Language (BPEL) and with the IBM Integration Designer. The applications contain all code, logic, and assets for your business process. This section describes considerations that the Operations team uses to ensure quality delivery of applications and business data.
For more information about deploying applications, see IBM BPM Adoption: From Project to Program with IBM Business Process Manager, SG24-7973, which is available at this website:
2.6.1 Naming conventions and exports
IBM BPM features two application types: BPEL and BPMN. The IBM Integration Designer creates the BPEL applications; the exports are project interchange (PI) files. BPMN applications are written with the process designer and include process exports (TWX) files.
BPMN Applications that are written for IBM BPM Standard are stored in the Process Center. It is a best practice to use a naming convention for the snapshots and exports. For more information, see the application snapshot naming convention best practice documentation that is available the following websites:
Naming conventions help reduce collisions when snapshots are installed and allow for easier maintenance and troubleshooting when issues occur. For example, if the export features a version and date in the name, it assists with identifying instances and the snapshot on which they were created.
2.6.2 Toolkit dependencies
Toolkits are a way to have common services and assets that are shared by several applications. The sizing of the environment depends on the total number of applications and dependent toolkits that are needed during run time. The cache setting <branch-context-max-cache-size> controls the number of snapshots in memory. It is important that applications use the latest toolkits to reduce the number of toolkits versions in production.
For example, suppose that there is a toolkit that is named Common Utilities that contains logging and other services that are shared by the applications. There are 25 applications in production. These applications rely on the toolkit, and there are four active versions in production of the Common Utilities (v1.1, 1.4, 2.0, 2.2). Because this library is used by all applications, it needs four copies in the memory of the Common Utilities. This amount can be reduced to one copy if all applications used the latest version (v2.2).
This example has only one toolkit; however, it is common for an application to have several toolkits and those toolkits to depend on other toolkits. The dependency chain can be long. To reduce the number of active versions of toolkits, the Operations team can work with the Applications team to help progress applications and the toolkit dependencies.
For more information about guidelines for toolkit best practices, see Section 5.5 of Business Process Management Design Guide: Using IBM Business Process Manager, SG24-8282, which is available at this website:
2.6.3 Dependent assets
When the IBM BPM applications are written, it is common to use more libraries. These libraries can come from third-party sources or from an approved library of code within the company. These files are added to the IBM BPM application and are contained within the application and thus the export of the application. It is recommended that the added assets are versioned and the versions are included in each IBM BPM application are known.
The following assets are included in an IBM BPM application:
CSS file
JavaScript library
Graphic images
Compiled Java code
XML and XSLT files
A list of the third-party dependencies is important for the Operation team. It can be used to get notifications of new updates, such as found security vulnerabilities or critical bug fixes. Collaboration between Applications and Operations teams continues further. For example, within the company, a common library needs to conform to new corporate standards and all dependency assets must be updated by a certain date.
The Process Center is a development environment and repository for your applications. As application milestones are reached, it is recommended exports of the application (PI or TWX) are saved in a company’s official code repository. This export is another backup and an official archive of assets.
2.7 Instance migration and application deployment
When an application is ready to deploy to a target runtime environment, there are some considerations for active instances and various recommendations for successful application deployments.
2.7.1 General checklist
When the application has an application to deploy, the Operations team might be responsible for deploying the application. This responsibility depends on your company’s separation of duties and into which environment the application is deployed. For example, the production system might be firewalled. Therefore, select individuals and a strict process only are followed to ensure quality and auditing.
It is a reasonable expectation for the Operations team to have a checklist to review with the Application Development Team. The Operations team is responsible for maintaining the system and ensuring the highest operational run time. A best practice is to have a requirements checklist before approving the deployment of an application to a runtime environment.
Table 2-1 lists the example application deployment requirements as a guideline for consistent application deployments. These requirements are checked and verified before code is deployed into the target environment.
Table 2-1 Example application deployment requirements
Activity
Test
Staging
Production
Naming convention followed for snapshot names
X
X
X
Unit testing of Application
X
X
X
User functional
 
X
X
Load testing: stress
 
 
X
Load testing: endurance
 
 
X
Team role bindings
X
X
X
Clean up instances before deployment
 
X
X
Instance migration policy
X
X
X
 
2.7.2 Offline and online deployments
Depending on the topology of the system and security requirements, deploying an application can be online or offline. The runtime process servers are visible to the Process Center when online. In this configuration, applications can be installed with the web-based Process Center console.
Offline means that the runtime system is not directly connected to the Process Center. Deployment to an offline server is done by command line access and requires the export file to be physically moved to the target system. For more information about deploying applications, see the following resources:
Closed production with Process Center
Regarding the production system, some customers have a dedicated Process Center from which to deploy code. This environment is isolated and is effectively an offline server in respect to the primary development Process Center. The production Process Center has an online connection to the production runtime system and can deploy applications in the online fashion. One advantage to this scenario is the Process Designer gives a graphical view of tokens on active process. The requirement for this option lessened because troubleshooting issues in production can be done in IBM BPM 8.5 with the BPM REST UI helper. You can access the console by using the following URL:
For more information about troubleshooting tips, see Chapter 6, “Problem determination and remediation” on page 87.
2.7.3 Closed and in-flight instances
When deploying an application to a runtime server, consider what happens to an instance on the server. There are three options for addressing instances on the target system that belong to snapshots: leave, migrate, and delete. The delete option is not available in systems that are configured as production. Discuss with the application development team which option is needed for the application version you are installing.
Some applications have small changes and migrating instances is appropriate. Some applications have larger modifications and need multiple running versions of the application. Deleting instances is a common request for a test environment. For load test scenarios that test the same application again but with a new version, it is advised to purge the instance data from the previous test. For information about data clean up, see Chapter 5, “Maintaining IBM BPM-dependent systems” on page 69.
Avoiding orphaned tokens and tasks
For cases where the next application (snapshot) is to migrate instances, ask the Application Team if a migration policy file (BPMN) is needed. If long-running instances are used or the application changed in a significant way, a migration policy file might be needed.
For IBM BPM, the managed policy file is necessary to move tokens to prevent instance failures. For example, if an activity is divided into two new activities, a policy file is needed to move tokens to the appropriate activity. Without the policy file, the failure tokens can become orphaned and instances fail and require manual intervention. For more information, see Chapter 6, “Problem determination and remediation” on page 87. Options to recover can be to move the token, change the instance data, or roll back the snapshot installation.
For more information about commands to help prevent orphaned tokens and tasks, see the following resources:
BPMN Application:
BPEL Application:
 
Note: Back up the production databases before new applications are installed to production. This step is important for your recovery planning. With a backup, the IBM BPM database can be rolled back to the state before the snapshot installation. The rollback option can be used in the following cases to restore production functionality:
If during the migration of instances, a significant portion of instances were orphaned in an unrecoverable state because an application change was not considered during testing.
If ending and deleting unclosed instances occurs before new snapshots are in production, business data can be accidentally deleted.
2.7.4 Multiple active versions of snapshots
There are times in which two or more versions of an application can be active in Production. Discuss with the development team how each of the versions handles new and existing instances. One option is to have existing instances on the old snapshot complete, and allow only the creation of instances on the new snapshot. For more information about these settings, see the following website:
It is reasonable for the Operations team to require the Development team to perform the following tasks:
Test various scenarios with data
Test the migration of instance data
Create orphan token policy files for changes to applications
Running tests before the new version is in production reduces the risks of having failed instances that need an emergency manual recovery.
Migrating product version by using active instances
When migrating to the next major version of IBM BPM, in-flight instances must be considered. The two options are to migrate to the in-flight instances or perform a drain and fill approach. The drain and fill approach is used when two versions of IBM BPM are active. The new IBM BPM system has two parallel production environments and new instances are on the current version of IBM BPM. The older instances are on the previous version of IBM BPM.
One configuration option with drain and fill is to have two different portals for interaction with users. A second option is to use the federated process server. This server allows for a unified process portal for users. The interaction between the new and the old system is to be transparent to users.
For more information about the federated process server setup and installation, see this website:
In some cases, the drain and fill approach is not appropriate for your application. Some considerations are the length of running instances and how IBM BPM interacts with external systems of record. For example, if the external system of record uses some of IBM BPM’s primary keys (for example, task ID and instance ID), this approach cannot be used. Work with the Application Development team to develop a strategy when migrating to the next version of IBM BPM.
2.8 Capacity planning
Any sizing and capacity planning methodology is an estimate that is based on input variables. It is not exact and it should be validated against the actual variables in your system. There are a few guidelines that are described in this section that are based on the authors’ experience in working with many customer situations.
Capacity planning should be data-driven and based on performance testing of the actual application to be deployed in the production. IBM BPM is a middleware product. The performance of the application and capacity of the system depends on the application’s implementation, choices of system capabilities that are used, and the type of data that is processed. No estimation is better than actual measurement.
With proper performance testing, capacity planning is relatively straightforward. Complete the following steps to guide performance testing:
1. Provision the performance testing environment with a hardware configuration that is similar to the production environment.
2. Run performance tests and gradually increase the load until SLAs are violated.
3. Capture CPU, memory, and other resource utilizations.
4. Use the testing results to project the required hardware configurations for the IBM BPM server and the DB servers.
Consider the following points when extrapolating results:
Use conservative scaling factors for adding cluster nodes. Horizontal scaling typically starts to decline rapidly when the cell contains too many nodes; for example, six nodes. If you have a large IBM BPM cell, run your application-specific performance testing to ensure that the cell scales well.
It becomes risky to extrapolate beyond twice of what was measured. Many parameters can affect a IBM BPM solution’s scalability.
All tiers should be considered, including the web tier, database, and backend service providers. IBM BPM is a multi-tiered environment. Any of these tiers can become a bottleneck.
Consider the capacity of the network and I/O. For example, the storage subsystem of the database server can become bottle necked after its I/O capacity is maxed out. The network between the IBM BPM and the database server can often reach the maximum capacity at a high transaction rate.
Memory considerations
JVM heap sizes should be tuned during performance testing. For the physical memory sizing, we use the following standard formula (rounded up to the next 4 GB boundary). It is the minimum physical memory that is needed for the server:
2GB (OS) + for each JVM: (Maximum Heap + 1 GB)
For example, if JVM settings for AppTarget, Support, and Message Engine cluster members are 4 GB, 2 GB, and 2GB (respectively), the total physical memory of the IBM BPM server should be at least 2 + (4 + 1) + (2 + 1) + (2 + 1) rounded to 16 GB.
For database server memory sizing, consult each DB vendor’s recommendation. In general, the database server must be provisioned with enough physical memory to accommodate buffer pools and other in-memory cache to support efficient execution of typical database queries.
Disk and file systems
Capacity planning should also consider disk and file system spaces for the IBM BPM server and the DB server.
IBM BPM server storage capacity
The IBM BPM server uses the file system to store log files, Lucene indexes, and other temporary files. A standard is that each server should reserve at least 100 GB free disk space. For example, if an out-of-memory error ever occurs, the JVM can trigger heapdumps of multiple gigabytes, dependent on the size of the Java heap.
The following sample directories include log data:
install_root/profiles/profile_name/logs/server_name
install_root/profiles/profile_name/
For more information about how to gather logs and which directories contain log and temporary information, see the following resources:
IBM BPM MustGather:
Things to know before deleting temporary, cache, and log files in WebSphere Application Server:
Database server storage capacity
IBM BPM uses databases to store persistent data, such as process instances, tasks, searchable variables, and deployed artifacts. Many tables are defined in IBM BPM databases and some of these tables require more storage than others. For sizing estimates, we can focus on the storage requirement of these large tables.
The following key variables affect the size of the IBM BPM DB:
Number of processes: Total number of process instances in the database, including all active and completed (but not yet deleted) processes.
Number of tasks: Total number of user and system tasks in the database, including all active and completed (but not yet deleted) tasks.
Number of documents: If embedded ECM is used, you must know how many documents are stored in the database.
Average size of process instances.
Average size of tasks.
Average size of documents.
Database transaction and archive logs.
How to obtain values of the list of variables? You can get some of the data points, such as number of processes, tasks, and documents from solution architects and business stakeholders. You can also find the average size information from the database directly. For example, The top nine tables, which are sorted by the size that are related to BPD process instances and tasks, are listed in Table 2-2 on page 31. The data is captured after running performance testing with an internal IBM lab workload that is named BPMBench v8. The top nine tables account for 70% of total IBM BPM database sizes.
Table 2-2 IBM BPM database table sizes
Table name
Total size (MB)
Data size (MB)
Index size (MB)
Row count
 
Comment
LSW_TASK_EXECUTION_CONTEXT
2,935
2,915
16
372,931
Task related
LSW_BPD_INSTANCE_DATA
868
860
4
100,000
Process related
LSW_TASK
430
213
217
372,931
Task related
LSW_BPD_INSTANCE_VARIABLES
413
287
126
2,000,000
Searchable variables, process related
BPM_TASK_INDEX
75
24
47
372,931
Task related
LSW_BPD_INSTANCE
75
31
40
100,000
Process related
LSW_TASK_NARR
68
38
26
372,931
Task related
LSW_TASK_ADDR
48
21
27
372,931
Task related
LSW_BPD_INSTANCE_VARS_PIVOT
43
39
4
100,000
Process related
For typical BPMN applications, the top-most tables in terms of sizes are almost always related to tasks and process instances. As the tables imply, table LSW_TASK_EXECUTION_CONTEXT stores the execution contexts of all tasks (including all variables that are used in the task) and table LSW_BPD_INSTANCE_DATA, which stores the contexts of BPD process instances.
The amount of DB storage per process and per task can be calculated from the data that is listed in Table 2-2 on page 31. In this example, the DB storage per process instance is 14.3 KB, and the DB storage per task is 9.8 KB. No embedded ECM documents are in the processes. For applications with documents, the corresponding document table can appear at the top of the list.
The formula to estimate the size of IBM BPM database is shown in Example 2-1. The total size is the sum of processes, tasks, and documents divided by the percentage that is accounted for by these tables.
Example 2-1 Formula for estimating size of IBM BPM database
Total BPM DB size =
(num_processes x avg_process_size +
num_tasks x avg_task_size +
num_documents x avg_document_size) / %_of_top_tables
For example, by using averages that are calculated from Table 2-2 on page 31, the total size of BPMDB is estimated to be the following size for a production system with 1,000,000 process instances and 4,000,000 tasks:
(1,000,000 * 14KB + 4,000,000 * 9.8 KB) / %70 = 72.5 GB
When requesting storages for the DB, you must take variability into account. Build into estimates a margin for error (minimum of 2X).
Also, the method that is described in this section describes only the estimation of BPMDB. A similar approach can be used to estimate the sizes of PDW and other databases.
For more information about purging and archiving data in IBM BPM, see Chapter 4, “Purging and archiving in IBM BPM systems” on page 51.
Application repository tables
Applications contain the coaches, business process definition (BPDs), managed assets, and other artifacts to run the application. Depending on the size of the applications and the number of snapshots deployed, the table that contains these assets might become large in disk space.
The tables to watch are LSW_SNAPSHOT and LSW_PO_VERSIONS. Customers who migrated from versions where the snapshot delete command was not available (WebSphere Lombardi Edition and IBM BPM 7.5.1) should plan to delete snapshots. For more information, see Chapter 4, “Purging and archiving in IBM BPM systems” on page 51.
Review capacity regularly
Review the resource usages by the IBM BPM and database servers regularly, such as on a quarterly or bi-annual basis. Create a checklist to review and include the following areas:
IBM BPM server CPU utilization, averages, and peaks
IBM BPM server memory utilization, averages, and peaks
IBM BPM server free disk spaces
DB server CPU utilization, averages, and peaks
DB server memory utilization, averages, and peaks
DB server free disk spaces
2.9 Anti-patterns
There are a few common pitfalls that can be navigated or bypassed. To do so, avoid these anti-patterns when you plan and run testing of your IBM BPM solutions. The list that is featured in this section is far from complete. For more information, see other relevant publications and references.
2.9.1 Run testing on the Process Center
The Process Center featured two-fold capabilities. It is a repository server for storing and managing authored artifacts and is a Playback server for playing back newly created artifacts, such as coaches, coach views, tasks, integration services, and BPDs.
As a Playback server, it has all the capabilities of a process server. Performing testing on the Process Center can result in Process Center overload. This situation negatively affects its performance and thus affects other developer’s productivity.
Performance testing results from Process Center also is invalid for the following reasons:
The Process Center is not provisioned for proper performance testing.
The application can run a path in Process Center that is different from the application in Process Server. For example, some runtime caches are disabled on the Process Center to enable playing back artifacts under development.
Instead of running tests on the Process Center, set up a dedicated testing Process Server and run tests on it.
2.9.2 Run performance testing on inadequately provisioned environment
For proper performance testing of the end-to-end IBM BPM application, the testing environment should be provisioned and configured in the same way as the production environment. For numerous reasons (cost, poor planning, and so on) some customers run performance testing on a much smaller environment than the target production environment. The results are often extrapolated to estimate whether the production environment can handle the load. Relying on extrapolation includes the following pitfalls:
Performance results do not always scale linearly.
Resource bottlenecks, such as DB I/O, can be detected by testing only.
It is important to plan ahead to obtain production-like systems for testing. It takes time and effort to overcome funding obstacles, bureaucracies, skill gaps, and so on.
2.9.3 Run performance testing before application code is debugged
For performance testing, the application code should be reasonably debugged and free of application errors or exceptions. Performance testing of buggy code tests how fast errors and exceptions are generated.
2.9.4 No or low think times between activities
Testing scripts with no or low think times can create high load on the system that is under test. Such testing scenarios are good for generating stressful situations, but they do not reflect realistic human activities.
2.9.5 Use unrealistic data
An unrealistic data anti-pattern includes the following attributes:
All virtual users use a single user ID.
Hundreds (or even thousands) of users are grouped into a single group, which results in a deep task list.
Users are granted with extraordinary entitlements.
There are too many or too few active instances or tasks.
The use of unrealistic testing data can produce testing results that are confusing and difficult to interpret.
2.9.6 Common mistakes for IBM BPM testing
IBM BPM testing can include the following common mistakes:
Lack of realistic business requirements
Focus on rarely used operations
Measurement period covering only the task or instance creation
Database server not properly tuned or under provisioned
Testing on overcommitted virtualized hardware
2.9.7 Application goes into production without load testing
Allowing applications to go into production without any load testing is one of the worst anti-patterns. Without proper load testing, there are high risks that the production environment can encounter in the future. It ultimately costs more than the savings from bypassing load tests.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.170.239