Chapter 9. Data conversion

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Data conversion

In this chapter, we provide information about data conversion for IBM Content Manager OnDemand (Content Manager OnDemand). We describe the reasons for data conversion and describe the interface that Content Manager OnDemand uses to convert data.

In this chapter, we cover the following topics:

•Overview of data conversion

•Generic Transform Interface

9.1 Overview of data conversion

To work with data conversion, understand the data conversions that are required, and when and how to convert the data. Perform detailed planning before you build your solution so that you achieve a design that remains efficient for many years.

In this section, we describe why you might need data conversion, when to convert the data stream, and how to convert the data.

9.1.1 Why convert data streams

You might want to convert data streams for many reasons:

•Certain data streams, such as Hewlett-Packard (HP) Printer Command Language (PCL) or Xerox metacode, are printer-specific and cannot be displayed. Before you archive or display the documents, these data streams must be transformed into a compatible format.

•The archived data stream might need to comply with a company’s internal rules or regulations. Therefore, the produced data streams must be transformed into the defined and required final format before they are archived.

•The documents might need to be accessible by a user that is outside of the company. The document must be displayed through standard tools that are available on any or at least most of the clients, such as an Internet browser or Adobe Acrobat Reader.

•The documents might need to be manipulated so that only part of the document is displayed in a personalized way.

9.1.2 When to convert data streams

The decision of when to convert data streams relies mainly on the use of the system. Typically, converting data at load time requires more time to process the print stream file, and converting data at retrieval time causes the user retrieval to be a little slower. The decision might depend on how many documents are retrieved, compared to how many documents are loaded daily. It might also depend on legal requirements about the format of stored data.

AFP to PDF

If a requirement exists to present AFP documents in the Portable Document Format (PDF) format over the web, from a storage perspective, it is more efficient to store the documents in their native format and then convert them to PDF at retrieval time. AFP documents are stored more efficiently than PDF documents.

The PDF print stream, when it is divided into separate customer statements, is larger than AFP because each statement contains its own set of structures that are required by the PDF architecture to define a document.

Elapsed time and processor time are also essential factors in the decision-making process. The amount of time (elapsed and CPU) that is needed to convert the document depends on how large the document is and how many resources or fonts are associated with the document.

9.1.3 How to convert the data

Content Manager OnDemand uses the Generic Transform Interface to integrate Content Manager OnDemand with third-party transform solutions.

Consider the following information about target flows:

•HTML might be used with the same intent, but an HTML document is not always displayed identically, depending on the web browser that is used. Additional testing that includes your needs and the encountered environments might be necessary for validation before the implementation.

•PDF might be used as a way to make documents available through standard and no-charge tools, such as Adobe Acrobat Reader. The transformed documents must be displayable, saveable, and printable the same way regardless of the environment on which the user works.

•XML is an intermediate text-based data format for the manipulation of documents, regardless of the source data stream, and displays the documents totally or partially in a personalized way. The use of XML usually involves additional development, including scripts and stylesheets.

9.2 Generic Transform Interface

Content Manager OnDemand uses the Generic Transform Interface to manage third-party data transforms for the Content Manager OnDemand Web Enablement Kit (ODWEK) application programming interface (API) set. This interface is used with the document retrieval APIs.

The ODWEK Java API provides industry-standard Java classes that can be used by a customer to write a custom web application that can access data that is stored on the Content Manager OnDemand server. This custom application can, for example, permit the user to log on to a Content Manager OnDemand server, get a list of folders, search a specific folder, generate a hit list of matching documents, and retrieve those documents for viewing. Many APIs provide advanced functionalities.

For more information, see the following resources:

•IBM TechDoc Best practices for building Web Applications using IBM Content Manager OnDemand Java APIs:

https://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101203

This document, which is prepared by the Content Manager OnDemand development team, provides recommendations about how to use the ODWEK Java APIs. Use this document to understand how the ODWEK Java APIs interface with the Java virtual machine (JVM) and Content Manager OnDemand systems to avoid common coding mistakes.

•IBM Content Manager OnDemand Web Enablement Kit Java APIs: The Basics and Beyond, SG24-7646:

This publication provides basic and advanced information about how to use the ODWEK Java APIs to develop custom applications.

9.2.1 Overview

Before version 8.5.0.0, the ODWEK Java APIs provided a tight integration with only a few specific transforms: AFP2PDF, AFP2HTML, and AFP2XML. These transform engines were used by ODWEK clients to generate different document types for display purposes. Although this capability provided invaluable functionality, it meant that new transform engines were not readily integrated into ODWEK.

To meet this requirement, a highly flexible interface was added to the ODWEK Java APIs that allows a developer to easily implement a third-party document transform solution.

The new ODWEK Interface allows a client developer to implement an external program to transform a document in one of two ways:

•If the transform vendor provides a basic command-line executable file, it is implemented in an XML interface, which supports the retrieval of all of the document details that are stored in Content Manager OnDemand, and also allows specific options to be passed to the transform.

•The ODWEK Java APIs also provide a Java interface that a client developer can use to add even more flexibility to their client solution. The Java interface allows a client developer to get the document byte stream from ODWEK, then use any methods that they want to convert the document. These methods can include calls to web services that allow remote transformation. After the document is transformed, the resulting data can be returned to ODWEK, where it is passed back to the client that made the request.

9.2.2 Configuration

To enable the Generic Transform Interface in ODWEK, an XML document must be created and defined in the ODConfig.Properties object. This XML document is identified by the <ODConfig.TransformXML> key name and must include the fully qualified path to the XML file where the transforms are defined.

After you configure your XML configuration for the Generic Transform Interface, as described in 9.2.3, “Basic implementation: Executable interface” on page 211, you can enable this functionality in your ODWEK environment, as shown in Example 9-1.

Example 9-1 Enabling Generic Transform Interface in the ODWEK environment

Properties props = new Properties();

props.setProperty(ODConfig.TRANSFORMS_XML, "transform.xml"); /*Fully qualified path to XML file containing transform details.*/

ODConfig odConfig = new ODConfig(ODConstant.PLUGIN, //AfpViewer

ODConstant.APPLET, //LineViewer

null, //MetaViewer

10, //MaxHits

"", //AppletDir

"ENU", //Language

null, //TempDir

"c: racedir", //TraceDir

4, //TraceLevel

props); //Additional properties

9.2.3 Basic implementation: Executable interface

The basic implementation of the Generic Transform Interface involves an XML configuration to define a transform to ODWEK that uses the command-line (cmdline) executable functionality. With this configuration, you can request details that Content Manager OnDemand stored for the document to be passed in the specified cmdline options and to also pass through transform-specific options, as specified in the ODTransform.xml file.

Example 9-2 shows a sample of the ODTransform.xml file that can be used in this implementation.

Example 9-2 ODTransform.xml sample

<TransformName>MyTXFRM_EXE</TransformName>

<TransformDescription>Transform Cmdline Executable</TransformDescription>

<OutputMimeType>application/pdf</OutputMimeType>

<CmdLineExe>c://opt//txfrm.exe</CmdLineExe>

</Passthru>

</transform>

In this example, you can see that we defined a transform that is named MyTXFRM_EXE, which calls the transform command txfrm.exe, which is defined in the <CmdLineExe> tag.

The <TransformName> is used as the viewer name when it calls the ODWEK Retrieve APIs. From this configuration, ODWEK knows that the transform requires RECORDLENGTH, CARRIAGECONTROL, CODEPAGE, and OUTPUTFILE information from Content Manager OnDemand, and can set it on the cmdline by using the options that are specified in each related XML tag.

Also, the txfrm.exe requires additional information to be passed on the cmdline. The -r that is specified in the <Cmdlineparm> tag has no meaning to Content Manager OnDemand, so it is passed through and set on the cmdline call to the txfrm.exe.

In the custom Java code, the call to retrieve the data from ODWEK includes the <Transform Name> that is specified in the XML and looks like the following line:

"byte[] transformedDocument = ODHit.retrieve("MyTXFRM_EXE");

From this example definition, ODWEK calls the specified transform with the following cmdline executable file. Details for the items within “< >” are provided by ODWEK from the Content Manager OnDemand data definitions:

"c:/opt/txfrm.exe -lm <record len> -x <carriage control> -a <codepage> -o <output file name> -r PDF"

9.2.4 V9.5 enhancement: Customizing values that are returned from ODWEK

For certain transforms, values that are returned from ODWEK might not be consistent with the command-line values that are expected by the transform. For example, a transform might have a fixed set of options to specify a carriage control type. The values that are returned by ODWEK when the <CARRIAGECONTROL> tag is included in the <CmdParms> are 'A' (ANSI), 'M' (Machine), and 'N' (None). The following command is produced by the XML in Example 9-1 on page 210:

c:/opt/txfrm.exe -lm 133 -x A -a 500 -o <outputfilename> -r PDF <datafilename>

Because the <CARRIAGECONTROL> tag is present, ODWEK returns the document’s corresponding value - "-x A", or "-x M", or "-x N", depending on the carriage control type (CC Type) that is defined in this document’s application definition. If the transform defines a different set of acceptable values, for example 2, 4, and 0, to specify the document’s carriage control, you can map those values by substituting the following XML as shown in Figure 9-1.

Figure 9-1 Sample XML with custom options

Note: The <CARRIAGECONTROL> node was replaced by three values. When the CC Type that is returned by ODWEK matches ANSI, rather than an 'A', the command includes "-x 2".

This type of substitution can be used to specify the RECFM (Record Format), PRMode, TRC, and CC Type.

9.2.5 V9.5 enhancement: Application Group and Application-specific XML

In version 9.5.0.2, ODWEK now provides additional options under the <transform> node that allow the transform command parameters to be generated based on an Application Group, or an Application Group and Application pair.

Figure 9-2 on page 213 shows a sample transform.xml that can be used in this implementation.

Figure 9-2 Sample XML with <ApplicationGroup><Application> tags

Figure 9-3 shows the transform commands that are generated based on the sample XML and Application Group and Application of the document that is retrieved.

Figure 9-3 Table of generated commands

Note: Inheritance is not supported. If an <ApplicationGroup> node is matched, only those options within that node are used for the transform; no parameters that are identified for a parent transform node are used. Similarly, if an <Application> node is matched within an <ApplicationGroup> node, only those options are used for the transform; nothing from the <ApplicationGroup> node is used.

9.2.6 Advanced implementation: Custom Java interface

By using the advanced implementation of the Generic Transform Interface, client developers can write a Java interface to ODWEK that can handle the transform requests in a programmatic way, offering the most application flexibility. Developers can create a class and implement the transformData() method to accept document data and details from Content Manager OnDemand and transform the data in any way they choose.

Example 9-3 shows a sample of the ODTransform.xml files that can be used in this implementation.

Example 9-3 Sample ODTransform.xml file

<TransformName>MYTXFRM</TransformName>

<TransformDescription>GENERIC Transform Engine.</TransformDescription>

<ClientClass>com.companyA.corp.TransformClient</ClientClass>

<OutputMimeType>application/pdf</OutputMimeType>

<AG_NAME>agName</AG_NAME>

<APPL_NAME>applName</APPL_NAME>

<RECORDFORMAT>recfmt</RECORDFORMAT>

<RECORDLENGTH>LineLength</RECORDLENGTH>

<CODEPAGE>CodePage</CODEPAGE>

</CmdParms>

</transform>

</Transforms>

Similar to the basic implementation, the developer uses this XML stanza to set up the required details for document transformation and how those details are passed to the Java transform interface. Example 9-4 shows an example of how the Java interface can be used with the XML stanza to create a document transform request. The example is a code snippet of how the Client Class that is defined in Example 9-3 might be written to transform data.

Example 9-4 Client Class code snippet for transform data

//*******************************************************************

// Testcase: CustomTransform

// This class tests the ODWEK Generic Transform's Custom

// Java Interface by implementing the required transformData method.

// transformData is called by ODWEK when its corresponding custom

// viewer is called via ODHit.retrieve.

//*******************************************************************

import java.util.*;

import com.ibm.edms.od.*;

public class CustomTransform {

public static HashMap transformData(HashMap odMap) throws Exception {

System.out.println("Inside transformData method");

// List this transform name from the XML file

System.out.println(" Transform name: " +

(String)odMap.get(ODTransform.TXFRM_REQ_NAME));

// List the property keys and values ODWEK read from the transform XML

// file and provided to this Custom Class

System.out.println(" Transform properties:");

Properties gtProps = (Properties)odMap.get(ODTransform.TXFRM_REQ_PROPS);

Enumeration<?> enumeration = gtProps.keys();

List<String> list = new ArrayList<String>();

while (enumeration.hasMoreElements()) {

list.add((String)enumeration.nextElement());

}

Collections.sort(list);

for (String key : list)

System.out.println(String.format("%25s = %-25s", key,

gtProps.getProperty(key)));

// Retrieve the native document from ODWEK

byte[] inDoc = (byte [])odMap.get(ODTransform.TXFRM_REQ_DATA);

System.out.println(" Native document size: " + (inDoc == null ? null:

inDoc.length));

// Retrieve the document resources from ODWEK

byte[] inRes = (byte [])odMap.get(ODTransform.TXFRM_REQ_RES);

System.out.println(" Native doc resource size: " + (inRes == null ? null:

inRes.length));

// Normally this is where you do the transform or do something with the

byte data.

// Let's just concat the resources if there are any to the doc

byte[] transformedDoc;

if (inRes != null) {

transformedDoc = new byte[inRes.length + inDoc.length];

System.arraycopy(inRes, 0, transformedDoc, 0, inRes.length);

System.arraycopy(inDoc, 0, transformedDoc, inRes.length,

inDoc.length);

}

else

transformedDoc = inDoc;

System.out.println(" Concatenated resources to doc size: " +

transformedDoc.length);

// Send the transformed data back to ODWEK

HashMap rtnMap = new HashMap();

rtnMap.put(ODTransform.TXFRM_RESP_DATA, transformedDoc);

return rtnMap;

}

Example 9-4 on page 214 shows how to set up the HashMap to pass document byte arrays in and out of this custom interface, and how to define a custom Java class that contains the transformData() method.

This code retrieves the raw document data from ODWEK, gathers all of the document details that Content Manager OnDemand might store from loading the data, and then transforms the document data. The transformed document data can be passed back through ODWEK to the original client request.

Table 9-1 lists the XmlTagNames for the transformation specification.

Table 9-1 XmlTagNames for the transform specification

XmlTagname	ODConstant	Description
TransformName	TransFormName	Name of the transform. It is used as the viewer argument that is passed to ODWEK Retrieve APIs.
TransformDescription	TRANSFORM_DESC	Description of the transform.
ClientClass	TRANSFORM_CLIENTCLASS	The class name of the custom interface class.
CmdLineExe	TRANSFORM_CMDLINEEXE	Fully qualified name of the transform executable file.
OutputMimeType	TRANSFORM_MIMETYPE	The MIME type of the data as it is returned from the transform.
OutputExtension	TRANSFORM_OUTPUTEXT	The extension of the data that is returned from the transform.
CmdParms	TRANSFORM_PARMS	The mappings of OD Values to custom variables. See the constant key words that are shown in Table 9-2 on page 216.
Passthru	TRANSFORM_PASSTHRU	These values are passed through ODWEK directly to the transform.
Cmdlineparm	TRANSFORM_PASSTHRU_CMDLINE	These values are passed through ODWEK directly to the transform command line.

Table 9-2 provides information about the XMLTags. These XML tags are used to pass specific values to the transform command line. These XML tags allow the mapping of the command-line option where the specified value can be passed.

Table 9-2 XmlTags detailed information

XmlTagname	ODConstant	Description
RECORDFORMAT	DOCUMENT_RECORD_FORMAT	The record format of the document as stored in Content Manager OnDemand.
RECORDLENGTH	DOCUMENT_RECORD_LENGTH	The record length of the document as stored in Content Manager OnDemand.
CARRIAGECONTROL	DOCUMENT _CARRIAGE_CONTROL	The carriage control of the document as stored in Content Manager OnDemand.
TRC_EXIST	DOCUMENT_TRC _EXIST	The TRC settings as stored in Content Manager OnDemand.
DOCROTATION	DOCUMENT _ROTATION	The rotation of the document as stored in Content Manager OnDemand.
AG_NAME	AGNAME	The Content Manager OnDemand application group where the document is stored.
APPL_NAME	APPLNAME	The OnDemand application where the document is stored.
CODEPAGE	DOCUMENT_CODEPAGE	The code page of the document as stored in OnDemand.
LINEDELIMITER	DOCUMENT_LINE_DELIMITER	The line delimiter of the document as stored in OnDemand.
INPUTFILE	TXFRM_INPUT_FILE	The Inputfile parameter to be used by the transform.
OUTPUTFILE	TXFRM_OUTPUT_FILE	The OutputFile parameter that is used by the transform.
V9.5 enhancements
DOCUMENT_CC_ANSI	DOCUMENT_CC_ANSI	Used instead of <CARRIAGECONTROL> to define the command-line option and value when the document’s CC Type is “ANSI” as stored in Content Manager OnDemand.
DOCUMENT_CC_MACHINE	DOCUMENT_CC_MACHINE	Used instead of <CARRIAGECONTROL> to define the command-line option and value when the document’s CC Type is “Machine” as stored in Content Manager OnDemand.
DOCUMENT_CC_NONE	DOCUMENT_CC_NONE	Used instead of <CARRIAGECONTROL> to define the command-line option and value when the document’s CC is “No” as stored in Content Manager OnDemand.
RECORDFORMATFIXED	DOCUMENT_RECORDFORMAT _FIXED	Used instead of <RECORDFORMAT> to define the command-line option and value when the document’s RECFM is “Fixed” as stored in Content Manager OnDemand.
RECORDFORMATVARIABLE	DOCUMENT_RECORDFORMAT _VARIABLE	Used instead of <RECORDFORMAT> to define the command-line option and value when the document’s RECFM is “Variable” as stored in Content Manager OnDemand.
RECORDFORMATSTREAM	DOCUMENT_RECORDFORMAT _STREAM	Used instead of <RECORDFORMAT> to define the command-line option and value when the document’s RECFM is “Stream” as stored in Content Manager OnDemand.
PRMODENONE	DOCUMENT_PRMODENONE	Used instead of <PRMODE> to define the command-line option and value when the document’s PRMode is “None” as stored in Content Manager OnDemand.
PRMODESOSI1	DOCUMENT_PRMODESOSI1	Used instead of <PRMODE> to define the command-line option and value when the document’s PRMode is “SOSI1” as stored in Content Manager OnDemand.
PRMODESOSI2	DOCUMENT_PRMODESOSI2	Used instead of <PRMODE> to define the command-line option and value when the document’s PRMode is “SOSI2” as stored in Content Manager OnDemand.
TRC_YES	DOCUMENT_TRCYES	Used instead of <TRC_EXISTS> to define the command-line option and value when the document’s TRC is “'Yes” as stored in Content Manager OnDemand.
TRC_NO	DOCUMENT_TRCNO	Used instead of <TRC_EXISTS> to define the command-line option and value when the document’s TRC is “No” as stored in Content Manager OnDemand.

Table 9-3 provides information about the OnDemand client HashMap keys that are used for advanced Java implementation.

Table 9-3 OnDemand client hashmap key and descriptions

HashMap key	Description
TXFRM_RESP_DATA	This key is the HashMap key for the transformed data byte[] to be returned to ODWEK.
TXFRM_REQ_NAME	Name of transform for this request.
TXFRM_REQ_METHOD	The method name that is used in the custom Java class. The transformData() method must exist in the client class.
TXFRM_REQ_DATA	The original Content Manager OnDemand Document data that is contained in this request.
TXFRM_REQ_PROPS	The document details as specified or requested in the transform.xml file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9. Data conversion

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9. Data conversion