Application design
In this chapter, we discuss useful principles for designing IBM FileNet Content Manager (P8 Content Manager) applications.
We discuss the following topics:
– Available P8 Content Engine APIs
– Transports available with the APIs
– Minimizing round-trips
– Creating a custom AddOn
– Exploiting the active content event model
– Logging
Note: Although the technical components that make up the server pieces are called the Content Platform Engine, there are still many separate aspects, including APIs, for content and process as of the writing of this book. For clarity, we use the earlier term “Content Engine” when talking specifically about content matters in this chapter.
|
6.1 IBM FileNet P8 applications
P8 Content Manager includes a number of standard applications, and many more applications are available as add-ons to the basic product. The applications are aimed at different audiences and use cases. In this section, we introduce a few of the applications to serve as examples for application development.
6.1.1 IBM Administration Console for Content Platform Engine
The IBM Administration Console for Content Platform Engine (ACCE) is a powerful tool for administrators to use in performing routine setup, maintenance, and specialized tasks. ACCE is implemented as a rich web application sharing much infrastructure scaffolding with IBM Content Navigator.
Because it is an administrator’s tool that can be used for extraordinary and powerful low-level changes, ACCE strikes a balance. It exposes low-level details of the IBM FileNet Content Manager, yet it remains usable through extensive task wizards and other user interface help.
Although ACCE is an administrator’s tool, it uses the normal Content Engine APIs, and you are subject to normal security access checks. The administrator running ACCE typically has a high level of security access, but ACCE does not and cannot provide any additional privileges. It therefore serves as a good example of the actions that can be done with custom applications that also use the Content Engine APIs.
Recommendations: ACCE is a replacement of an earlier thick client tool called IBM FileNet Enterprise Manager. IBM FileNet Enterprise Manager is still shipped because of the familiarity many administrators have with it. However, the primary administrator tool is ACCE, and you need to get familiar with it in preference to IBM FileNet Enterprise Manager. New features will only be added to IBM FileNet Enterprise Manager as exceptions.
|
6.1.2 IBM Content Navigator
In contrast to ACCE (see
6.1.1, “IBM Administration Console for Content Platform Engine” on page 192), IBM Content Navigator is intended for the wider audience of non-administrator users. Even though it is generic in nature, it still provides a comfortable and productive user interface for accomplishing a variety of everyday tasks. IBM Content Navigator is a rich web application. The user interface uses modern Web 2.0 and Ajax technologies to closely model a desktop application experience. It provides easy-to-use windows and wizards for navigating and searching for documents and folders. IBM Content Navigator can connect to other IBM content repositories and is included as the standard client in several IBM ECM products.
In addition to being a ready-to-use client application, you can also easily customize IBM Content Navigator user interface elements or extend it with entirely new features. See
“IBM Content Navigator extensions” on page 199.
Recommendations: You might have experience with using and extending earlier generations of IBM FileNet Content Manager web clients, including Application Engine, Workplace, or FileNet Workplace XT. For any new client application development work, you need to strongly consider basing it on IBM Content Navigator.
|
6.2 Application technologies
Content Manager comes with a set of applications that you can use as is. The operations and interfaces provided by these applications might not always satisfy your enterprise’s business requirements. In many circumstances, you have to create custom applications to fulfill your business needs. Your applications will be designed with specific business goals in mind, and those come in many varieties. We do not attempt to cover business goals here. Instead, we discuss more general technical application technologies.
6.2.1 Traditional Java thick clients
Content Manager’s Content Platform Engine consists of Java Platform, Enterprise Edition (Java EE) components, but your client application can be a Java thick client. By thick client, we mean an application running in its own Java virtual machine (JVM) launched from the desktop. It can be a simple command-ine program or have a full-featured graphical user interface. Because it is launched from the local client machine, there are virtually no security restrictions on what a thick client application can do.
A thick client application normally consists of directories or Java archive (JAR) files of Java classes, both for the application and for supporting utility libraries. One of the biggest problems in using thick clients is the logistical hurdle of keeping all of the copies of the application up-to-date. This trait is not unique to Java applications; it is the same for any thick client technology. Because of this problem, however, thick clients are best suited for use by a small number of users or for mature and stable software.
Recommendations: Limit the use of thick client applications to exploratory code or utilities with a limited user population.
|
6.2.2 Java applets
A specialized form of thick client is a Java application that runs inside a security-constrained Java environment in a web browser. This application is called an applet. An applet can have most of the rich interactions of a traditional thick client, but it has advantages and disadvantages.
The most obvious disadvantage is that the users must run a Java capable web browser, and the use of Java applets must be enabled. All major web browsers are Java capable, but for security reasons, organizational policies sometimes forbid enabling the running of Java applets.
Recommendations: Avoid the use of Java applets. For most needs, modern JavaScript toolkits and Ajax technologies can serve just as well and have fewer deployment problems.
|
6.2.3 Java EE web applications and other components
The technology underlying much of the enterprise software development these days is Java Platform Enterprise Edition (Java EE). Java EE helps you make efficient use of resources by providing common services, such as security, high availability, transaction management, and scalability. Because the platform provides these services with mechanisms for configuring them when the applications are deployed, you are free to concentrate on business logic in your applications. The Content Platform Engine, which is implemented as Java EE components, uses many common features of Java EE. You can write Content Platform Engine applications with traditional thick client Java applications or even non Java client technologies, but the tightest integration will naturally be available when your application is integrated with a Java EE application server.
There are many standardized technologies available in Java EE, but a few technologies are particularly worth mentioning because they often show up in typical Java EE application development: servlets and Enterprise JavaBeans (EJBs):
•The Java EE servlet container is often thought of as the container for web applications because it represents the tier where Java EE presentation logic is generally placed. Web applications are perhaps the most popular use for servlets, but it is not necessary to have an actual web interface to use servlets. For example, the IBM Content Management Interoperability Services (CMIS) provider is implemented using a servlet, and any user interface is provided by the CMIS client applications. The servlet container is appropriate for application components that receive and respond to outside requests and that optionally preserve some state on the server side between requests.
•The Java EE EJB container provides what are often thought of as enterprise-level services. For example, EJBs can have declarative security and transactional properties, provide transparent load balancing across servers, and provide nearly transparent access to relational databases. EJBs are frequently used to encapsulate reusable business logic and seldom, if ever, contain any presentation logic.
In recent years, web services have expanded and matured. That has blurred the line between what needs to be implemented in the web tier and what needs to be implemented in the EJB tier.
6.2.4 .NET components
Just as the Java community has standardized on Java EE as a software component architecture, Microsoft has popularized the .NET environment. .NET shares many concepts with Java and Java EE, but, from the point of view of the Content Platform Engine, only clients can be written using .NET technology. .NET is fundamentally incompatible with Java and Java EE except when interacting via a common protocol. In Content Manager, the common protocol is the web services transport of the IBM FileNet P8 Content Engine APIs or the direct use of IBM FileNet P8 Content Engine Web Services.
6.3 Principles for application design
In this section, we present principles to consider when designing your own applications. Obviously, situations vary, and not all of these principles apply to every situation. Our intention is to give you a brief survey, which will have a bearing on your designs and that might even suggest new application designs to you.
6.3.1 Available P8 Content Manager APIs
One of the goals of the Content Manager is to make all features available through robust APIs. Content Manager applications add their own utility layers, often with significant amounts of application logic, but interaction with the server always comes down to a set of calls to published and documented APIs. Those APIs are also available to you for custom application development. If you see a feature in a P8 Content Manager application, you can be confident that your custom application can do the same or similar things via the APIs.
This section describes the APIs available in the IBM FileNet P8 4.0 release and later. We do not discuss the compatibility APIs (for Java and COM) that exist to help in the transition of applications written for earlier P8 Content Manager releases. Both of those compatibility APIs are now deprecated, but no date for their complete removal has been announced as of this writing.
Recommendations: Use the current IBM FileNet P8 APIs for any new development and, where possible, for additions to existing applications. Avoid extending your use of the compatibility APIs any longer than necessary.
|
Many specific details are treated lightly in this section. That is intentional because there is a separate IBM Redbooks publication, Developing Applications with IBM FileNet P8 APIs, SG24-7743-00, that provides in-depth descriptions, details, and illustrative code samples.
Java API
Content Manager provides a full-featured Java API. Any feature that is available in the server is completely available to Java programmers. This access includes routine operations, such as retrieving and updating Document objects, and specialized operations, such as adding a custom class or property to an object store’s metadata definitions.
In simplified terms, an API object can be thought of as containing the following information:
•Something that identifies the object residing on the server. Typically, this is an object store reference and an object ID or path.
•Some number of locally cached properties. These might have been fetched from the server, or they might have been set locally. A property value that has been set or changed in the API object and not yet sent to the server is said to be dirty, because its value does not match what is persisted on the server.
•Some number of pending actions. When you call a method that implies a change to the object (including simple property value changes), the change is not made immediately. Instead, a representation of that change is added to the API object’s list of pending actions. For example, if you call the method Document.checkin(), a Checkin pending action is added to the API object.
Dirty property values and pending actions are not sent to the server until an explicit call is made to do so. If an API object is discarded without that call, the changes are never made on the server. The most common method of sending changes to the server is to call the save() method on an API object. There is also a batching mechanism for sending updates to multiple objects in a single round-trip over the network. Batching provides improved performance and provides transactional atomicity for all of the changes in the batch.
Recommendations: Use only exposed and supported classes and interfaces in the API. Do not use internal implementation classes; in particular, do not make calls into anything in the com.filenet.apiimpl.* packages.
|
.NET API
Content Manager provides a full-featured .NET API, which you can use to write programs in any .NET compatible language. With a couple of exceptions, any feature that is available in the server is completely available to .NET programmers. The exceptions are mainly custom code that must be executed within the server, for example, EventActionHandler. Because the Content Platform Engine server is a Java EE application, internally executed custom code is limited to Java compatible technologies.
The principles behind the .NET API are the same as those behind the Java API (see
“Java API” on page 196), so we do not repeat that discussion here. One significant feature available only with the .NET API is the use of Kerberos to perform authentication via Microsoft Windows Integrated Login. This is only possible when the client application is running on Microsoft Windows and the Content Platform Engine is using Microsoft Active Directory. In practice, that latter constraint usually means that the Content Platform Engine is also running on Microsoft Windows.
Recommendations: Use only exposed and supported classes and interfaces in the API. Do not use internal implementation classes; in particular, do not make calls into anything in the FileNet.Apiimpl.* namespaces.
|
Web services
Modern, loosely coupled frameworks, such as a service-oriented architecture (SOA), favor web services protocols for connecting components. Content Manager provides Content Engine Web Services (CEWS) for accessing nearly all features available in the Content Engine server.
Typically, if you as a programmer want to use a web services interface, you obtain the interface description in the form of a Web Services Description Language (WSDL) file. You run the WSDL file through a toolkit to generate programming language objects for interacting with the web services interface. You then usually build up a library of utilities to provide abstraction layers, caching, security controls, and other conveniences. The Java and .NET APIs provided by Content Manager are already exactly equivalent to that, and both APIs can use web services as a transport (see
6.3.2, “Transports available with the APIs” on page 200). Consequently, there is not as much motivation to use CEWS directly, although there are still a few occasions where the direct use of CEWS might be useful:
•You have an application already using CEWS, and no plans exist for immediately porting it to the Java or .NET API.
•You are building an application component as part of a framework in which the use of web services is the model for communicating with external systems.
•Although a rare occurrence, you might be using a language or technology that can make use of web services but is not compatible with the use of a Java or .NET API.
For these occasions, the direct use of CEWS is a good choice and is supported.
In theory, you can take the WSDL file for CEWS and use any current web services toolkit to generate the interfaces that you will use on your end. In practice, however, toolkits are still individualistic in their handling of various WSDL features, and it is difficult to write a WSDL for a complex service that is usable by a wide cross-section of web services toolkits. Check the latest hardware and software support documentation corresponding to the product version you are using, and use only a supported toolkit.
Recommendations: To decide which API to use to implement your application, follow this approach:
•If you are writing handler code that runs inside the Content Platform Engine server, it must be written in Java or a supported scripting language.
•If you have a case where you must use CEWS, use it. However, if you can possibly avoid using CEWS, avoid it.
•If your development organization has more experience in .NET or Java, choose the corresponding Content Engine API.
•If it is still a toss-up, choose Java (because of greater flexibility in API transports that you might find handy later).
We intentionally do not list performance as a way to choose the API since it really is dominated by the other factors.
|
Content Management Interoperability Services
Content Management Interoperability Services (CMIS) is an industry standard for accessing content repositories and performing mainstream document management tasks. It defines a set of REST and web services interfaces and has fundamental constructs for documents, folders, and properties. IBM has created a CMIS provider for IBM FileNet Content Manager that is included in the Content Manager license (although it is a separately installed item).
Many third-party components include the ability to use CMIS to connect to content repositories. In such cases, your integration work can be relatively straightforward and simple.
CMIS is also well-suited for use in scenarios where you might already be considering using a REST or web services interface for mainstream document management functions. Instead of having to write those services yourself and interfacing to Content Manager repositories by using Content Engine APIs, you can instead use the CMIS provider as the service layer. There are readily available vendor and open source toolkits to help you construct the client side to communicate with a CMIS service layer.
IBM Content Navigator extensions
IBM Content Navigator is the current generation user client for all IBM content repositories. It builds on the extensive experience gained from previous generations of user clients across several product lines.
A key strength of Content Navigator is that it was built with extensibility in mind. We expect that a large percentage of application developers will find it useful to start with Content Navigator and then customize and extend it to meet their custom application needs.
By customization, we mean altering the visual appearance or behavior of an existing Content Navigator component. By extension, we mean adding new features, large or small, to an Content Navigator environment.
Content Navigator is a browser-based web application. Its layered architecture consists of these components:
•A collection of visual widgets written using the Dojo JavaScript toolkit and the dijit component libraries.
•A layout framework for arranging visual components into logical desktops and pages.
•A browser-resident JavaScript model view controller (MVC) layer for orchestrating the flow of information.
•Mid-tier server-based components for interfacing to repositories and providing other services.
Each of those layers has available customization and extension points for application developer use.
Recommendations: Use Content Navigator as your application framework for content-centric applications that need a rich and modern user interface. Extend Content Navigator with features you need that are not already part of Content Navigator.
|
In addition to being a highly customizable and extensible application framework, most of the visual widget components used in the user interface layer can also be easily adapted for use outside of the Content Navigator environment. The reuse of those widgets can represent considerable development time savings even if you do not choose to use Content Navigator itself.
All of the customization and extension topics in this subsection are covered in extensive detail in Customizing and Extending IBM Content Navigator, SG24-8055.
6.3.2 Transports available with the APIs
When designing any multi-tiered application, you must carefully consider how information will be conveyed back and forth between the client side and the server side of the network connection. Different frameworks for remote calls typically come with different advantages and constraints.
In the Content Engine APIs, the framework mechanisms are called transports. The APIs were designed so that all API operations are completely independent of the transport used. (The few exceptions deal with the propagation of security and transaction contexts.) A benefit of this independence is that applications can be written without considering the transport. The selection of a transport is a configuration decision when the application is deployed (the API finds out about it through the URI used for the Connection object).
There are two available transports: Content Engine Web Services (CEWS) and Enterprise JavaBeans (EJB). EJB transport is available only for the Java API. CEWS transport is available for both APIs. For most situations, the EJB transport has slightly better performance, but the CEWS transport can be used in more environments. In all cases, the transport is considered stateless, which means that the APIs operate on the basis of a single request and response for each interaction. No client state is maintained by the server after a request has been serviced. There is one exception to the statelessness, which is that recent releases of the Content Engine Java API can be configured to use a stateful session bean when uploading multiple chunks of content over EJB transport.
EJB transport
The EJB transport internally uses EJB method calls. The method calls are made on the client side and transported by the application server to the server side of the network connection. Although many people think of EJBs using Java Remote Method Invocation (RMI) as the remote communications mechanism, that is not necessarily the case. Application server vendors are free to provide whatever implementation they like as long as they meet the EJB requirements, and many vendors use something other than RMI. In any case, the details of the application server’s implementation are transparent to the API, and the API does not need to have facilities for controlling things, such as clustering or server affinity of the EJB, because those things are configured within the application server.
CEWS transport
As its name implies, the CEWS transport uses web services protocols. In fact, the WS transport uses the same Content Engine Web Services (CEWS) protocol that we mentioned in
“Web services” on page 198. You probably already know that means XML over HTTP or HTTPS. Because HTTP and HTTPS use only a single port for the entire conversation and use a strict client/server interaction model, it is generally easier to configure a firewall or reverse proxy through which to allow CEWS transport requests to pass.
Web services attachments are used for carrying pieces of content between the client and server sides. Attachment handling has undergone many changes over the years, and different environments and tools support different standards:
•When using either API, you must select the CEWS endpoint that supports Message Transmission Optimization Mechanism (MTOM) attachments (recognizable because it has MTOM in the endpoint name: FNCEWS40MTOM).
•There is another attachment format called SOAP that is less efficient in a couple of ways than MTOM. Nonetheless, it is sometimes useful to temporarily use the SOAP endpoint (FNCEWS40SOAP) as a troubleshooting step if you suspect problems at the transport layer. That is seldom actually the case, but it does not hurt to rule it out.
Comparing the transports
Consider the following information when deciding which transport to use:
•Because it usually employs a binary protocol likely to have been engineered for high performance, the EJB transport typically has better performance than the CEWS transport in the same environment. In particular, processor utilization is likely to be a bit higher with CEWS transport due to XML parsing activity. The actual performance difference is extremely dependent upon the specific mix of API calls your application makes.
•The EJB used by the EJB transport automatically propagates any active transactional context to the server. In contrast, transaction propagation is not possible when using CEWS transport. Whether transaction propagation is desirable depends upon the application. The Content Platform Engine always treats incoming client requests transactionally, so most applications do not need to worry about it at all.
•The EJB used by the EJB transport automatically propagates any ambient JAAS authentication context to the server. If you are already using a JAAS-based authentication scheme, either in isolation or as part of a single sign-on (SSO) framework, Content Manager is likely to participate in that scheme with few or no configuration changes if you use EJB transport.
•In contrast, there is no general framework for propagating an authentication context when using WS transport. Although a standard called WS-Security provides a high-level framework for adding authentication schemes, CEWS transport can only support schemes backed by specific implementation programming in the Content Platform Engine server. Content Manager directly supports WS-Security Username token and Kerberos token authentication schemes. The latter can be used to facilitate integration with Microsoft Windows applications. Custom authentication schemes can also be implemented by using the IBM FileNet Web Services Extensible Authentication Framework (WS-EAF).
Specific details of using Kerberos and WS-EAF are provided in the Web Service Extensible Authentication Framework Developer’s Guide section of the online help files, IBM FileNet P8 Documentation.
•CEWS transport, which is based on HTTP or HTTPS, uses just one or two TCP/IP ports for all interactions. There are also commercially available products for examining and validating web services traffic. Therefore, many administrators find it easier and more secure to open their firewalls to CEWS transport requests. In contrast, EJB transport might use a vendor-specific binary protocol. Such protocols often employ a range of TCP/IP ports. These factors typically lead to a greater willingness to allow CEWS transport to pass through firewalls and a reluctance to do the same for EJB transport.
•In cases where WS transport is using Username token authentication, the credentials will appear on the wire unprotected unless you use Transport Layer Security or Secure Sockets Layer (TLS/SSL), which we strongly advise.
•With EJB transport, content is uploaded or downloaded in chunks. With CEWS transport, the entire content is uploaded as part of a single HTTP request. For download, however, CEWS transport also generally chunks content.
Note: It used to be recommended to use CEWS transport for upload of large content. However, recent releases have included some optimization work using a stateful EJB call when uploading content chunks. That translates directly to less work needed on the Content Engine server side once the chunks have been uploaded. Although EJB transport still chunks content on both upload and download, the performance overhead of the chunking itself is typically quite small. Do not use the presence of large content as your sole reason for selecting a particular transport.
|
Recommendations: To decide which transport to use, follow this approach:
•If you are using the .NET API, you must use the CEWS transport.
•If you are using the Java API and need one of the features that is only provided by EJB transport (security or transaction context propagation), use EJB transport.
•If you are writing a Java application that is hosted in a Java EE application server, it is generally easier to configure EJB transport.
•However, EJB transport is only supported between homogeneous types of Java EE application servers on the client and server. So, if you have heterogeneous types of application servers, you must use CEWS transport.
•If you are writing a Java thick application, it might be easier to configure CEWS transport.
These considerations are mainly about the simplicity of runtime configuration and deployment. For almost everything, your application coding is exactly the same regardless of transport.
|
6.3.3 Minimizing round-trips
The number and nature of network round-trips, that is, requests from the client to get a response from the server, usually dominate the performance picture of the application. There are simple and powerful tools available in the APIs to reduce your round-trips, and API logging can be used to assess how well you are doing.
Recommendations: When developing your application, allow some time in the schedule to examine the working application for opportunities to eliminate round-trips. That can sometimes be done with simple code tweaks, but it will sometimes require a bit of refactoring of your logic.
|
Get or fetch
When many people think about interacting with an object from the server, they first think about doing a round-trip to fetch the object. That is a necessity for many things, but there are several cases where you do not need that initial fetch. For example, if you are only going to use an object so you can set the value of an object-valued property on another object, you really only need a reference. If you somehow know that the object already exists, you can skip the round-trip to fetch it.
(If it turned out that you were wrong and it did not already exist, the referential integrity mechanisms in Content Engine will throw an exception when you try to save the referencing object.) The APIs have a mechanism called fetchless instantiation. There are three types of Factory methods for creating programming language objects that reference Content Engine objects, and you can tell them apart by the word used as the beginning of the method name:
•create indicates that a new Content Engine object is to be created. No round-trip is done as the result of this Factory method call. A save() call must eventually be done.
•fetch indicates that a round-trip is immediately made to the Content Platform Engine to verify that the object exists and to return an initial set of properties. Fine-tuning of the properties returned can be controlled via an optional
PropertyFilter. See
“Property filters” on page 205.
•get indicates that no round-trip will be made. This is a fetchless instantiation. The API assumes that the object exists. There is no initial set of property values available, so you need to request any property values that you need. If you know that you always need some property values immediately, there is no advantage to fetchless instantiation.
Property filters
Property filters are optional parameters to a number of methods that fetch objects or properties from the Content Platform Engine. They allow highly granular control of the objects or properties being returned.
It is easy to understand how returning fewer properties can improve performance, but, less obviously, you can also improve performance by returning more properties and objects. The savings comes if you can return multiple objects in a single round-trip instead of making multiple round-trips to perform the same work. A property filter can do just that. Over time, most application developers know what properties and objects they need, so this can be an efficient way to perform most or all of your retrievals in just a few round-trips.
Most of the Content Engine API calls that can take a property filter also accept a null value. In these cases, the API still works correctly, but it might make additional round-trips in the background as your application progresses. It is designed that way so that you can get your application working quickly and optimize the performance later.
Batching
The Content Engine APIs contain two separate but similar batching mechanisms:
•A RetrievingBatch is used to fetch multiple, possibly unrelated, objects from the Content Platform Engine in a single round-trip. Object references and property filters are added to the batch, and retrieveBatch() is called to trigger the round-trip.
•An UpdatingBatch is used to group multiple updates in a single round-trip to the Content Platform Engine. Instead of calling save() on individual objects, the objects are added to the batch, and updateBatch() is called to trigger the round-trip. Updates are performed as an atomic transaction.
Recommendations: Unless it leads to tortured application logic, it is a good idea to accumulate multiple changes to objects before calling save(), and it is also a good idea to batch updates to multiple objects in an UpdatingBatch.
As a general rule, plan to carry no more than 50 - 100 items in a batch. Somewhere in that range, the overhead associated with batching itself tends to neutralize any performance benefits. Since specifics of application workload can change for various reasons, consider making the batch sizes configurable so that a code change is not needed for that adjustment.
|
6.3.4 Parallel processing
With any client/server application arrangement, there is likely to be a significant amount of time where the client is simply waiting for a response from the server. If your application is handling a large workload, it might benefit from being split up into a number of parallel work items. Not every application activity is amenable to that sort of splitting, but many are or can easily be adapted.
The usual way of splitting up work is to use multiple threads inside a single process, but it is sometimes adequate to simultaneously execute multiple single-threaded processes. Depending on the nature of the work being done, it might be necessary to have a single overall coordinator thread or process that dispatches specific work items to worker threads or processes. In other cases, it is possible to assign each thread or process a specific piece of work in its start-up parameters.
Recommendations: When splitting the application into multiple threads or processes, make the number of these threads or processes configurable. That eliminates the need for a code change if you discover that the optimal number of threads or processes changes over time.
|
There is a recurring application pattern that involves issuing a query for objects matching a particular criteria and then performing an action on each result object. The issuance of the query and accumulation of results are good jobs for the coordinator thread or process. Disjoint sets of result objects can be handed to worker threads or processes for action.
Alternatively, you might have an application that must process a large number of objects, but your performance constraints are to operate as a background task. That is, you want to the processing to move forward, but you do not want to interfere with foreground work by placing an undue load on the server machines. In that case, single-threaded processing might be a better match. In some cases, you can easily distinguish the already processed objects from those still needing processing. For example, your criteria might include some property value being null, and your action might include setting that property to a non-null value. In such cases, you can use a non-continuable query instead of a continuable, paged query. Non-continuable queries have lower server overhead than continuable queries. Just be sure to include a TOP qualifier to the SELECT clause, for example, “TOP 50”. The number that you use can be convenient for the batch sizes that you plan to use for the actions.
Recommendations: When the semantics of iterative processing allow it, use a non-continuable query for best performance. This approach generally does not work when multiple threads or processes perform the update actions in parallel.
|
6.3.5 Client-side transactions
All work performed by the Content Engine in a database or other storage is done transactionally, which means that you never get partially successful calls to Content Engine. The call either completely succeeds or completely fails. This is important for maintaining the consistency of the data in the repositories. You do not need to do anything to get that sort of transactional behavior inside Content Engine. Actually, there is no way to avoid it, because it is hardcoded into Content Engine logic.
There is another type of transaction that you can control in your application. If you use the Java API with EJB transport, you can include Content Engine activity within a client-side transaction. This feature is unavailable when using CEWS transport. (See
“CEWS transport” on page 201.) The client-side transaction can be started implicitly by the Java EE container or started explicitly through your use of a
javax.transaction.UserTransaction object.
Content Manager follows the Java EE model for transactions, and Java EE in turn follows industry standards for distributed transactions. In this context, the relevant facts are that a transaction is started, operations performed by a transactional resource (in this case, Content Engine) are tagged with the transaction identifier, and the transaction is either committed or rolled back. All changes tagged with a certain transaction identifier are committed or rolled back as an atomic unit.
Now that we have described the use of client-side transactions, here are a few reasons to avoid them:
•Client-side transactions tend to create or magnify performance problems. The overall transaction times are longer simply due to network latency and other factors inherent in the interaction between client and server. Longer transaction times mean that resources all the way into the database are being held for longer periods of time. This greatly increases the chances for resource contention and slows overall system throughput.
•Most of the tasks that applications want to do in a client-side transaction can be done more efficiently with the API batching mechanism using an UpdatingBatch object. A batch is performed as an atomic transaction, but the transactional control is on the Content Platform Engine side.
•API batches can be used with all APIs and transports, so it is a more flexible mechanism than client-side transactions.
After some analysis, it almost always turns out to be the case that applications using client-side transactions can be rewritten to use API batching. For the few cases where client-side transactions are genuinely needed, they are supported as described. The case where you might be forced into a client-side transaction is when your application must include transactional resources outside of Content Manager. For example, if you must include P8 Content Manager updates atomically with updates to a stand-alone database, that is a motive for using a client-side transaction. If you find yourself using a client-side transaction that you cannot avoid, do your best to minimize the amount of time that the transaction is active.
Recommendations: Avoid using client-side transactions. Instead, rely on the inherent transactional behavior of the Content Engine server.
|
6.3.6 Creating a custom AddOn
If you plan to use your application in multiple environments, either in your own organization or by distributing it to others, you need to be able to re-create the classes, properties, and perhaps some instance data from your repository.
We discuss the process of moving from development to production environments in
Chapter 9, “Deployment” on page 271. For situations where you want to deliver your application as a package, you can consider developing an
AddOn. An AddOn is a bundle of exported data with optional pre-installation and post-installation scripts. The scripts are run automatically before or after the AddOn is installed. The scripts can be used for any programmatic activity that you need to customize the data in the target environment. An AddOn also has information about other AddOns that must be installed as prerequisites.
An AddOn is created by creating an instance of the Content Engine AddOn class. When saved, the AddOn is stored within the global configuration database (GCD). Available AddOns are accessible via the Domain object’s AddOns property. An available AddOn can then be installed into an object store, which means that the data is imported and the post-installation script is run. IBM FileNet ACCE has menu actions and wizards for manipulating AddOns, including selecting which AddOns to install when an object store is created.
6.3.7 Using the JDBC interface for reporting
In addition to programming language APIs, P8 Content Manager also presents a read-only Java Database Connectivity (JDBC) interface. This interface is not an interface directly to the relational database tables used in the repository. Rather, it is a view into the object model represented by the Content Engine metadata. In the JDBC interface, queries follow a model analogous to that of the native APIs, where each metadata class looks like a database table and each property looks like a database column.
The JDBC interface follows the JDBC specifications and programming models, but the motivation for its development was primarily for use by reporting tools. The JDBC interface is also purely read-only. Therefore, the JDBC interface is not a good choice for use in application development. For general application programming, the native APIs provide a richer interface.
Recommendations: Avoid the use of the Java API JDBC provider except for integrating with off-the-shelf report generation packages and similar products that require a JDBC interface. For developer-written code, use the facilities of the Content Engine .NET or Java API. If you have an administrative need to perform reporting and counts that cannot be done in a performant way using the APIs, you might need to query directly against the underlying database.
|
6.3.8 Exploiting the active content event model
Content Manager provides a unique active content capability that proactively moves content and content-related business tasks through a business process without requiring human initiation. You probably have several objects that are mostly directly controlled by your application, but you also want to be aware of it if another application tries to make a change to these objects. When that happens, you might want to either prevent the change or perform follow-up actions to ensure data consistency in an application-specific way. One well-known follow-up action is to launch a workflow activity so that an affiliated IBM Case Foundation system can coordinate a complex chain of events.
As a programmer or an administrator, your exposure to active content is via the Content Manager’s event subscriptions model. You create and register a subscription for various events. The subscriptions can be created for individual object instances or for an entire class of objects. The subscribed events represent updates (or at least attempted updates) to an object.
When an event occurs in Content Engine, any active subscriptions link the event to an EventAction and ultimately to your code. Your code receives parameters that describe the event that occurred as well as the state of the object when the event occurred. For some events, you get both before and after snapshots of the object.
Event subscriptions come in three types: change preprocessor, synchronous, and asynchronous. It is up to you as the creator of the subscription to decide which type to use:
•For a change preprocessor, which happens synchronously as an update request arrives at the Content Platform Engine server, your handler is allowed to make simple changes to the incoming object before it is passed along to the main part of the server. A change preprocessor runs under the security context of the original calling user.
•For a synchronous event subscription, your event action handler is called after the change has been made to the object, but before it is committed (in the transactional sense). You are not allowed to make changes to the object, but you have the opportunity to veto the change by throwing an exception.
•For an asynchronous event subscription, your event action handler is called after the change to the object has been committed to the database. Your handler does not run within the context of the original transaction of the update request. Instead, it has its own transaction started by Content Platform Engine. You can make changes to the triggering object, but those changes are just normal, additional changes as you might make from a client program. Your handler cannot veto the original change, because it has already happened and been committed.
Recommendations: The event model is powerful and can be a useful component of your overall solution design. Become familiar with the three types of handlers. Do not be tempted to violate the usage rules for the three types. Even if something non-compliant happens to work in a simple test, it can fail later in mysterious ways.
|
By using the event subscription model in Content Engine, you can create handlers that monitor changes to objects not just from your application or components, but from all sources.
6.3.9 Logging
The Content Manager APIs and the server have built-in logging that focuses on providing details of round-trips between the client and server. The reason for that focus is because those details are typically interesting information for resolving both performance and functional problems. The main purpose of the logging is to have artifacts for diagnosing problems when hands-on debugging is not possible. Those logs are intended to be examined by IBM Support and development engineers. They are not documented in detail, but you might easily develop an informal familiarity with them if you work with them.
When designing logging for your own applications, you are likely to have similar goals. You might want to consider the following points:
•Determine the interesting interactions in your application. Focus your logging efforts on those interactions first. You can always add more logging as your application evolves or as you become more familiar with the types of problems that occur in production. Think of logging those interesting interactions as a unit, whether they are all contained within a certain software module or not.
•Do not log uninteresting details. Logs can become quite large, and many details that are logged turn out to be distracting clutter when you are looking at log files later. If something is likely to help solve a problem, log it.
If there is just a remote possibility that it will help, skip it.
•Be careful about tying things to source code. It is fine to assume that the people looking at the logs will have access to the source code to see what entries mean, but only do that if that is actually true. Otherwise, log entries must be reasonably self-explanatory so that you can teach someone what they mean.
•Log the impossible. In any application, there are conditions that are supposed to be impossible. It is tempting to silently ignore those conditions in program logic. If one of those conditions actually happens, it must be logged, because it is an indication of a design flaw or something seriously strange in the runtime environment.
•Pick a few severity and verbosity levels. It is probably better to have fewer rather than more levels of granularity in your controls for logging. Modern logging toolkits often give you the freedom to control things with many levels. Do you really need them all? You probably do not. You probably do not need much more than “on”, “off”, and perhaps one level in between. For each combination, ask yourself who will really use it and why it is better than another combination that you already need. One reason to have an intermediate level is because voluminous logging usually has an impact on performance. You can sometimes get ideas for narrowing your focus by using only intermediate logging.
•When logging error conditions, log the entire exception message and stack trace, including any nested exceptions. Some people consider it a security risk to display this information to users, but this is not a problem for logs seen only by administrators.
•Make it possible to reconfigure logging dynamically without restarting the application. Some logging toolkits have this capability built in. If they do not, code your application layer so that it periodically checks for a change.
Recommendations: The 5.2 release introduces a new Java class, HandlerCallContext, with several logging-related methods. That class is intended for use by event handler code and other custom code that runs inside the Content Engine server. Use those logging methods so that your logging output is integrated with trace and error logging from the Content Engine server itself. Enable it with the Handlers trace logging subsystem in ACCE.
|
6.3.10 Creating a data model
Designing applications goes along with the design of how you plan to store your permanent data. In P8 Content Manager, the available mechanisms in the repository fully support your use of object-oriented programming models. We describe here just a few of the items that might be overlooked by developers unfamiliar with an object-oriented persistence layer. In certain cases, there are features that are not commonly available in object-oriented programming languages.
Inheritance
Repository classes support a convenient inheritance model. You can define new subclasses that add properties or change various characteristics of existing properties for the subclass.
You can also add new properties to most system classes, although it usually makes more sense to define a subclass just for that purpose and extend it (by adding properties or further subclassing) for your application’s needs.
Property value constraints
The repository metadata model also allows you to define default values and constraints on properties that will be enforced by the server. For example, you can define an integer property constrained to a specific range or set of allowed values. Although you might traditionally put that validation logic into your application, having it in the metadata ensures that no other application can put invalid data in those properties. After the constraints are in the metadata, your application can read the metadata and use that to guide application layer validation.
Object-valued properties
One of the more powerful features of the data model is object-valued properties (OVPs). When one object needs to reference another object, use OVPs instead of storing the ID or path to the object. By using OVPs, you can directly navigate from object to object. For an OVP, the metadata provides type safety by only allowing you to point to objects of a certain class (or subclass), just like an object reference in a programming language. The server provides features for referential integrity and configurable cascading deletion (automatically controlling the deletion of pointed-to objects or preventing the deletion of pointing-to objects).
Reflective properties
A particularly useful form of OVPs is a reflective property, also known as association properties. More than one object can point to a particular other object. When that happens, the reflective property mechanism is used to simplify the bookkeeping and let Content Engine perform most of the work. The usual examples have a parent and many children. Suppose you have an Invoice object with many LineItem child objects. With the reflective property mechanism, define an Invoice property on the LineItem class and a LineItems property on the Invoice class. The naming is just a convention that works well in practice. Any property names can be used. To affiliate a new LineItem with the Invoice, you need to only populate the Invoice property on the LineItem object. Because it was created as a reflective property, the LineItems property on the Invoice class automatically reflects the new line item being added. When you access the multi-valued property (the LineItems property in our example), the Content Engine automatically performs a query for applicable objects with the appropriate value in the single-valued property (the Invoice property in our example).
Many-to-many relationships
Especially because of reflective properties, it is easy to use OVPs to model one-to-many and many-to-one relationships. You might find the need to model a many-to-many relationship. The usual solution for that is to use an intermediate object to express a single pair of relationships. The system class, ReferentialContainmentRelationship (RCR), is an example of this solution for the special case of containing objects in folders. A single object can be contained in many folders, and a folder can contain many objects. The document class has a reflective property, Containers, which identifies all the RCRs (and, therefore all the containment relationships) that reference a specific document instance. The folder class likewise has a Containees property.
You can see that this intermediate relationship object, combined with reflective properties, is a powerful tool for simplifying your modeling of many-to-many relationships. Not only does it express the relationship, but it can also have properties specific to that particular relationship. For example, an RCR has a property, ContainmentName, that gives a unique name to a contained object for the purposes of path-based navigation. When you use an intermediate object for a relationship, you can add whatever properties are appropriate to your business needs. Both ReferentialContainmentRelationship and DynamicReferentialContainmentRelationship classes are subclassable, and you can use them for your own relationships if they happen to fit the folder containment model. Other good choices for the intermediate object are subclasses of customer object and link system classes.
Custom objects
You will often find yourself with a need to hold a collection of related properties for one reason or another. In a database programming environment, you might create a new table with rows representing the collection of information. The Content Manager solution for this is to create a subclass of the custom object class. The custom object system class has only a few properties of its own, and it exists specifically to be subclassed for this use. The invoice and line item example used for reflective properties can also be modeled this way.
As part of the persistence architecture the Content Engine stores all custom objects, regardless of class, in a single database table. It sometimes happens that different kinds of custom objects are used in significantly different ways by applications. For example, an object store might have numerous custom objects that represent business object entities, and it might also have custom objects that represent configuration items. The latter custom objects are relatively few in number and can get lost in the volume of business objects. That can result in performance problems at the database level. Because of this occasional database issue, the 5.2 release introduces custom root classes. A custom root class has its own table in the database but otherwise is similar to a custom object subclass.
Recommendations: When contemplating the use of custom objects in your data model design, consider using a subclass of CmAbstractPersistable as a custom root class. This is useful if your objects will not be typical business objects.
|