Capacity planning with IBM Content Capacity Planner
In this chapter, we briefly discuss capacity planning and the use cases for the IBM Enterprise Content Manager system capacity planning tool. This tool is called IBM Content Capacity Planner, formerly known as Scout.
We cover the following topics:
8.1 IBM Content Capacity Planner
When you introduce a new system or extend an existing one, choosing the correct hardware is an important consideration in your planning. The IBM sales team supports you in planning the capacity of the system during this phase.
The marketing team uses IBM Content Capacity Planner to model transactions and to obtain answers to various questions:
•Based on the projected use of the IBM FileNet ECM system, what servers are needed?
•Given a certain hardware configuration, how busy will the servers be?
IBM Content Capacity Planner is generally used by IBM FileNet ECM System Engineers, IBM FileNet ECM Lab Services, and IBM FileNet ECM Partners.
After modeling a workload, IBM Content Capacity Planner produces utilization reports that show the demand placed upon a certain set of hardware by that workload.
Figure 8-1 illustrates the basic modeling process for capacity planning.
Figure 8-1 Basic modeling process for capacity planning
IBM Content Capacity Planner uses at least two input sources. One is the hardware configuration, and the other is the defined workload that consists of one or multiple transactions. The output from IBM Content Capacity Planner consists of performance charts. If the system utilization of all components is below a threshold, the system is deemed adequate to meet the workload requirements. The results are documented. If system utilization is at or above the threshold, you need to change the hardware configuration.
When defining a workload in a presales situation, the details of a model might not be obvious. Therefore, it might be easiest to develop your general model first and refine it as you learn more details.
You might want to start with a moderate hardware configuration. When defining your workload, after each transaction, you can immediately see the result in the chart and scale the hardware with the transactions. This provides a better understanding of the cost per modeled transaction. However, there is a chart option to view utilization by transaction function to get the explicit cost per modeled transaction function.
When modeling the workload, IBM Content Capacity Planner provides a walk-through wizard for a quick start that helps you to configure the basic parameters of the components that you want to size. We found it useful to use the wizard and save the result to another file. The wizard helps you learn which transaction functions to add to your workload but it creates a simplified model. Some of the lesser used functions can only be obtained by manually adding them to your workload from the Transaction Templates in the tree view.
8.1.1 Example use cases for IBM Content Capacity Planner
Use IBM Content Capacity Planner to help you prepare for the following tasks:
•A new system is planned, and you need to select the hardware.
•During a system implementation, the IBM Content Capacity Planner sizing is refined reflecting the latest requirements.
•An existing system is extended. Additional users and additional applications are rolled out.
•An existing system needs to be migrated to new hardware. This can occur in conjunction with reorganization and moving into new buildings, system consolidation, new outsourcing contracts, or simply replacing outdated hardware.
•The current system needs to be analyzed. For example, a client wants to know what additional workload the system can handle or requests a detailed performance analysis. In this case, current production data is available and can be used by IBM Content Capacity Planner.
8.1.2 Capacity planning for new systems
In this section, we list typical questions for sizing a system.
In
Chapter 2, “Solution examples and design methodology” on page 17, the following P8 Content Manager solutions were introduced: policy document management process, invoice archiving, email archiving, insurance claim processing, and social enterprise content management. Each solution focuses on a different functionality:
•Versioning and document management
•Scanning and processing via a business process and records management
•High volume ingestion and storage using IBM Content Collector
•Ingestion, storage, and compliance
•Social features of IBM FileNet Content Manager in conjunction with IBM Connections
Each system sizing is individual. Avoid the “one fits all” approach after sizing one initial IBM FileNet Content Manager environment. Each solution is built to fulfill defined functional requirements and has a different sizing of required hardware, number of CPUs, memory, disk capacity, and network bandwidth.
We concentrate on general sizing questions. The typical questions to ask the client when preparing to size a system usually fit into the following categories:
•Client environment
•Content ingestion
•User activities
•Configuring records management
•Business process management specifics
Client environment
The following list provides questions to ask during sizing that are related to the client environment:
•Does the client prefer specific hardware? If yes, which vendor?
•Are there standard machine types that the client wants to use? If yes, what is the standard server, which processor, and how many CPUs?
•What application server will be used?
•What database server will be used?
•What are the default working hours? You can overwrite this default value in each transaction if needed.
Content ingestion
The following list provides questions to ask during sizing that are related to content ingestion:
•If content is ingested through scanning:
– What are the scanning hours?
– What is the average number of scanned documents during the scanning hours?
– What is the total number of documents usually scanned?
– What is the average size (in KB) of a scanned document?
– In how many batches are these scanned documents processed?
– How many documents are in a batch?
•If content is ingested through file import:
– What are the importing hours?
– What is the average number of documents imported during that time?
– What is the total number of documents usually imported?
– What is the average size (in KB) of an imported file?
•If ingested content is email via IBM Content Collector for Emails:
– Will original emails be archived?
– What is the average email size (in KB)?
– What is the average properties set?
– What is the number of duplicate email pointers?
– What is the number of original attachments?
– What is the number of duplicate attachment pointers?
– What is the average size of attachments (in KB)?
User activities
After the content is ingested, corresponding actions are started. The content can be processed by IBM Case Foundation or simply stored and used for retrieval later. A user can work on the content using a custom application or FileNet Workplace XT. How the user uses the content might determine the sizing of the system.
The following questions relate to user activities:
•For logon and logoff activities:
– How many times does a user generally log on and log off per day or per week?
– Are there peak hours of logon and logoff activities during the day or during the week?
– Are there different logon and logoff behaviors for different users (for example, are there different behaviors for power users compared to occasional users)?
•For search, browsing, and retrieval activities (the same questions can be asked for different user groups):
– At what times do browsing and retrieval take place?
– Are there peak hours during the day?
– Are there deadlines (such as all orders have to be reviewed by noon)?
– What is the average document size of the documents to be retrieved?
– How many searches are usually performed per day?
– How many documents are returned on average per search action?
– How many custom properties (metadata fields) are retrieved on average per document?
– How many folders are browsed on average per day by a user?
– How many folders are accessed via a bookmark?
– How many documents are retrieved per day by a user?
•For new document creation:
– Will new documents be created evenly during the work hours?
– How many documents on average will be created during the work hours?
– What is the average document size (in KB)?
•For check-out and check-in activities:
– Will check-out and check-in be distributed evenly during the work hours?
– What is the number of documents checked out and in during the work hours?
– What is the average document size (in KB)?
Note: After documents are checked out, they usually are viewed. This viewing is modeled as an additional retrieval.
|
•For metadata modification activities:
– Are there major updates of metadata? If yes, in what time frame?
– How many documents are usually updated during the working hours?
– Before they are updated, how many properties are retrieved?
Configuring records management
We distinguish records management actions by the Records Manager role and by the users who declare records. Records can be declared through a system step in a business process or manually by users. Ask the following questions when sizing an IBM FileNet P8 records management solution:
•For Records Managers:
– What is the logon and logoff pattern of the Records Managers?
– How many searches for records are performed in a certain time period?
– How many browse actions in the file plan are performed?
– How many times are details retrieved? Examples of details are access security, detail, history, holds, and so on.
•For general users who declare records:
– How many existing documents are declared as records in a certain time period?
– How many new documents are declared as records in a certain time period?
Business process management
If the solution involves business process management, ask the following questions for each workflow:
•What is the time pattern for launching workflows?
•How many metadata fields does the workflow contain?
•What is the average field length (in bytes) of the metadata?
•How many workflows are launched in the time pattern?
•How many user steps does a workflow contain?
•How many system steps does a workflow contain?
•How often are workflow fields updated?
•How often are users updating their views?
Note: IBM Techline provides dedicated ECM sizing questionnaires that cover all these questions and many more questions for other IBM FileNet products:
•Industry Solutions ECM Sizing Questionnaire Oct2012, PRS5034
•IBM FileNet P8 Platform Sizing Questionnaire Jan2012, PRS3071
|
8.1.3 IBM Content Capacity Planner output
For every server and certain infrastructure components, IBM Content Capacity Planner produces a utilization chart for one day. The system is adequately handling the workload if the CPU utilization is below 40%. This threshold is used, because response time is exponential, not linear (Queuing Theory).
By sizing the system for 40%, the system can handle temporary peaks with acceptable wait times.
Figure 8-2 shows a sample output of IBM Content Capacity Planner with the threshold at 0.4 (40%).
Figure 8-2 Sample IBM Content Capacity Planner output
You can see the Content Platform Engine load throughout the day. In the morning hours between 8:30 a.m. to 11:30 a.m., the system load is higher due to scanning activities. From 11:30 a.m. to 4:30 p.m., the activity level is lower, because only retrieval and processing activities occur. Between 3 a.m. and
4 a.m., prefetching takes place. Documents that are needed for the next day are retrieved and loaded into the cache for better performance.
8.1.4 Predictions from a baseline
When sizing a new system, IBM Content Capacity Planner converts a certain workload to utilization data for a selected, dedicated kind of server.
If you are sizing a system upgrade, you already have current data (an existing baseline) available on which you can perform additional modeling. Examples are migration to new hardware, added applications, or added users.
The first step is to collect baseline data for the involved systems. For the Content Platform Engine baseline, you use the System Manager Dashboard. A dashboard is a tool for gathering performance data and provides current Content Platform Engine utilization data. If an Image Services system is also involved, data can be exported by the integrated performance data collecting function (perf_mon). The baseline data can be imported to IBM Content Capacity Planner, and the utilization data can be used as the basic workload.
Regular capacity planning is important for business continuity. All baseline data must be taken and analyzed on a regular basis. That information is helpful in forecasting upgrades of the actual IBM FileNet environment at the client site. Schedules for upgrades of hardware and software are planned and managed with minimum or no interruption to the production system during normal working hours.
Figure 8-3 is an extension of the capacity planning process. It includes the collection and importation of baseline data from running systems.
Figure 8-3 Import from a baseline
For example, we show an existing IBM FileNet P8 system, including IBM FileNet Image Services. The client is planning to roll out another application on Content Platform Engine that is expected to double its workload. In addition to that, a third-party application is installed that adds about 20% additional load.
For modeling purposes, we import the current Content Platform Engine utilization with a factor of two, import the Image Services utilization, and add an application that accounts for an increased workload of 20%.
Figure 8-4 shows the utilization for the Image Services system.
Figure 8-4 Utilization of an Image Services system
The chart shows the workload summary after importing the three workload profiles: one for Content Platform Engine, one for the Image Server, and one for the additional third-party application. The various colors represent single services that run simultaneously. The chart illustrates the imported workload together with the new application workload.
The result is that with the additional application, the Image Services server, exceeds its threshold at 7:30 a.m. It needs to be scaled up with two additional CPUs.
8.1.5 Best practices
The following bullets summarize several recommendations when working with IBM Content Capacity Planner:
•When initially performing an IBM Content Capacity Planner sizing, the client will not have the exact answer to all of the questions; therefore, make assumptions and document them. Get the clients to sign off on the assumptions used for sizing. Be conservative when making assumptions. Configure the system for peak loads.
•Add a document to the IBM Content Capacity Planner calculations describing which data was provided by the client, which assumptions were made, and what the IBM Content Capacity Planner output was. Also, document how the IBM Content Capacity Planner input fields were calculated from the data given by the client. This helps you to review an IBM Content Capacity Planner calculation after a certain amount of time and helps you to understand why transactions were modeled in a particular way at a later refinement.
•Use project variables to ensure consistency throughout your transactions.
•When you start, you might want to choose medium performance hardware to better see the effects of the configured transactions.
•If you are unsure about the parameters of a transaction, use the online help. Use the Help topic icon that lists the details quickly.
•Split a complex scenario into several steps to reduce complexity.
•After changing parameters, immediately check the output to learn what effect the change has created, which gives you an idea of the costs of the transactions.
•Common mistakes are defining workload hourly instead of daily (and therefore creating eight times the load) or making mistakes when entering the number of transactions (for example, entering 1,000,000 instead of 100,000).
•If the system looks misconfigured, change the chart to the Average Utilization view instead of the Transaction Functions view. The Average Utilization view allows you to compare the utilization by function and helps you to localize the function that most influences the system load.
Figure 8-5 shows an example in which the Content Platform Engine is under a heavy load.
Figure 8-5 Content Platform Engine under heavy load (utilization is more than 90%)
We want to discover what transaction led to the workload. So, we switch to the Transaction Functions view.
Figure8-6 on page265 shows the result and the transaction responsible for the workload.
Figure 8-6 Transactions Function view (showing the transactions causing the workload)
As shown in
Figure8-6, we see that the IBM FileNet P8 4.
x Java Create Documents transaction creates the most intense workload. When verifying with the system, in this example, we realize a typographical error in the number of input documents and correct it.
With the correction made, we see in
Figure8-7 on page266 that the system operates well under the threshold.
Figure 8-7 Normal system workload
For international users, two troubleshooting tips might be helpful:
•If you encounter IBM Content Capacity Planner runtime errors, change the regional settings of the operating system to English.
•Do not use region-specific special characters. If you do, IBM Content Capacity Planner might not be able to open the files and informs you at which line the problem occurs. In that case, you can edit the IBM Content Capacity Planner files (.sct), which are in XML format, to remove the special characters.
8.2 IBM FileNet Disksizing Tool spreadsheet
In addition to sizing hardware by calculating the utilization, which was derived from a modeled workload, another important point is the sizing of disk space for the managed content. The IBM FileNet P8 Disksizing Tool spreadsheet enables you to enter key system values, and then, it produces the estimated required disk space.
Figure8-8 shows an extract of the spreadsheet that contains the input system values and the output, which is the estimated disk space required for IBM FileNet Content Platform Engine and additional components.
Figure 8-8 Extract of IBM P8 Disksizing Tool spreadsheet
There are also several additional sizing spreadsheets provided by IBM for other IBM FileNet P8 products.
8.3 Performance-related reference documentation
In this section, we provide additional references about where to find performance-related material.
8.3.1 Standard product documentation
This documentation is available for IBM FileNet P8 Content Manager:
•IBM FileNet P8 Performance Tuning Guide
Provides information about tuning parameters that can help improve the performance of your IBM FileNet P8 system. This document covers operating system, database, and application server parameters and IBM FileNet P8 component parameters to help you tune an existing system. You can retrieve this white paper directly from the following website:
•IBM FileNet P8 Performance Tuning
There are several web pages that provide additional information for improving the performance of IBM FileNet P8 components. Go to the following website:
•For the latest performance-related documentation and technical papers, go to the product documentation website for the IBM FileNet P8 Platform:
8.3.2 Benchmark papers
These papers are system performance tests of specific configurations performed either by independent companies or in the IBM FileNet test environment. All these documents are only available for IBM technical sales, IBM lab services, or IBM Business Partners. For more details, contact with your IBM Sales Team or IBM Business Partner.
8.4 Conclusion
This chapter offers a brief look of the concept and process to define and size hardware, network bandwidth, and storage by using IBM Content Capacity Planner. It also offers hints about the input information that is needed to size the environment with IBM FileNet Content Manager.
Note: The sizing with IBM Content Capacity Planner will only be as good as the provided input information.
|
Now that you have a general understanding of how to plan and lay out an IBM FileNet Content Manager environment, we explore the basic deployment concepts of IBM FileNet Content Manager in
Chapter9, “Deployment” on page271.