Data conversion
In this chapter, we provide information about data conversion for IBM Content Manager OnDemand (Content Manager OnDemand). We describe the reasons for data conversion and describe the interface that Content Manager OnDemand uses to convert data.
In this chapter, we cover the following topics:
9.1 Overview of data conversion
To work with data conversion, understand the data conversions that are required, and when and how to convert the data. Perform detailed planning before you build your solution so that you achieve a design that remains efficient for many years.
In this section, we describe why you might need data conversion, when to convert the data stream, and how to convert the data.
9.1.1 Why convert data streams
You might want to convert data streams for many reasons:
Certain data streams, such as Hewlett-Packard (HP) Printer Command Language (PCL) or Xerox metacode, are printer-specific and cannot be displayed. Before you archive or display the documents, these data streams must be transformed into a compatible format.
The archived data stream might need to comply with a company’s internal rules or regulations. Therefore, the produced data streams must be transformed into the defined and required final format before they are archived.
The documents might need to be accessible by a user that is outside of the company. The document must be displayed through standard tools that are available on any or at least most of the clients, such as an Internet browser or Adobe Acrobat Reader.
The documents might need to be manipulated so that only part of the document is displayed in a personalized way.
9.1.2 When to convert data streams
The decision of when to convert data streams relies mainly on the use of the system. Typically, converting data at load time requires more time to process the print stream file, and converting data at retrieval time causes the user retrieval to be a little slower. The decision might depend on how many documents are retrieved, compared to how many documents are loaded daily. It might also depend on legal requirements about the format of stored data.
AFP to PDF
If a requirement exists to present AFP documents in the Portable Document Format (PDF) format over the web, from a storage perspective, it is more efficient to store the documents in their native format and then convert them to PDF at retrieval time. AFP documents are stored more efficiently than PDF documents.
The PDF print stream, when it is divided into separate customer statements, is larger than AFP because each statement contains its own set of structures that are required by the PDF architecture to define a document.
Elapsed time and processor time are also essential factors in the decision-making process. The amount of time (elapsed and CPU) that is needed to convert the document depends on how large the document is and how many resources or fonts are associated with the document.
9.1.3 How to convert the data
Content Manager OnDemand uses the Generic Transform Interface to integrate Content Manager OnDemand with third-party transform solutions.
Consider the following information about target flows:
HTML might be used with the same intent, but an HTML document is not always displayed identically, depending on the web browser that is used. Additional testing that includes your needs and the encountered environments might be necessary for validation before the implementation.
PDF might be used as a way to make documents available through standard and no-charge tools, such as Adobe Acrobat Reader. The transformed documents must be displayable, saveable, and printable the same way regardless of the environment on which the user works.
XML is an intermediate text-based data format for the manipulation of documents, regardless of the source data stream, and displays the documents totally or partially in a personalized way. The use of XML usually involves additional development, including scripts and stylesheets.
9.2 Generic Transform Interface
Content Manager OnDemand uses the Generic Transform Interface to manage third-party data transforms for the Content Manager OnDemand Web Enablement Kit (ODWEK) application programming interface (API) set. This interface is used with the document retrieval APIs.
The ODWEK Java API provides industry-standard Java classes that can be used by a customer to write a custom web application that can access data that is stored on the Content Manager OnDemand server. This custom application can, for example, permit the user to log on to a Content Manager OnDemand server, get a list of folders, search a specific folder, generate a hit list of matching documents, and retrieve those documents for viewing. Many APIs provide advanced functionalities.
For more information, see the following resources:
IBM TechDoc Best practices for building Web Applications using IBM Content Manager OnDemand Java APIs:
This document, which is prepared by the Content Manager OnDemand development team, provides recommendations about how to use the ODWEK Java APIs. Use this document to understand how the ODWEK Java APIs interface with the Java virtual machine (JVM) and Content Manager OnDemand systems to avoid common coding mistakes.
IBM Content Manager OnDemand Web Enablement Kit Java APIs: The Basics and Beyond, SG24-7646:
This publication provides basic and advanced information about how to use the ODWEK Java APIs to develop custom applications.
9.2.1 Overview
Before version 8.5.0.0, the ODWEK Java APIs provided a tight integration with only a few specific transforms: AFP2PDF, AFP2HTML, and AFP2XML. These transform engines were used by ODWEK clients to generate different document types for display purposes. Although this capability provided invaluable functionality, it meant that new transform engines were not readily integrated into ODWEK.
To meet this requirement, a highly flexible interface was added to the ODWEK Java APIs that allows a developer to easily implement a third-party document transform solution.
The new ODWEK Interface allows a client developer to implement an external program to transform a document in one of two ways:
If the transform vendor provides a basic command-line executable file, it is implemented in an XML interface, which supports the retrieval of all of the document details that are stored in Content Manager OnDemand, and also allows specific options to be passed to the transform.
The ODWEK Java APIs also provide a Java interface that a client developer can use to add even more flexibility to their client solution. The Java interface allows a client developer to get the document byte stream from ODWEK, then use any methods that they want to convert the document. These methods can include calls to web services that allow remote transformation. After the document is transformed, the resulting data can be returned to ODWEK, where it is passed back to the client that made the request.
9.2.2 Configuration
To enable the Generic Transform Interface in ODWEK, an XML document must be created and defined in the ODConfig.Properties object. This XML document is identified by the <ODConfig.TransformXML> key name and must include the fully qualified path to the XML file where the transforms are defined.
After you configure your XML configuration for the Generic Transform Interface, as described in 9.2.3, “Basic implementation: Executable interface” on page 211, you can enable this functionality in your ODWEK environment, as shown in Example 9-1.
Example 9-1 Enabling Generic Transform Interface in the ODWEK environment
Properties props = new Properties();
props.setProperty(ODConfig.TRANSFORMS_XML, "transform.xml"); /*Fully qualified path to XML file containing transform details.*/
ODConfig odConfig = new ODConfig(ODConstant.PLUGIN, //AfpViewer
ODConstant.APPLET, //LineViewer
null, //MetaViewer
10, //MaxHits
"", //AppletDir
"ENU", //Language
null, //TempDir
"c: racedir", //TraceDir
4, //TraceLevel
props); //Additional properties
9.2.3 Basic implementation: Executable interface
The basic implementation of the Generic Transform Interface involves an XML configuration to define a transform to ODWEK that uses the command-line (cmdline) executable functionality. With this configuration, you can request details that Content Manager OnDemand stored for the document to be passed in the specified cmdline options and to also pass through transform-specific options, as specified in the ODTransform.xml file.
Example 9-2 shows a sample of the ODTransform.xml file that can be used in this implementation.
Example 9-2 ODTransform.xml sample
<Transforms>
<transform>
<TransformName>MyTXFRM_EXE</TransformName>
<TransformDescription>Transform Cmdline Executable</TransformDescription>
<OutputMimeType>application/pdf</OutputMimeType>
<OutputExtension>pdf</OutputExtension><CmdParms>
<RECORDLENGTH>-lm</RECORDLENGTH>
<CARRIAGECONTROL>-x</CARRIAGECONTROL>
<CODEPAGE>-a</CODEPAGE>
<OUTPUTFILE>-o</OUTPUTFILE></CmdParms>
<CmdLineExe>c://opt//txfrm.exe</CmdLineExe>
<Passthru>
<!-- Use tag cmdlineparm to declare additional cmdline variables that the transform might require -->
<Cmdlineparm>-r PDF</Cmdlineparm>
</Passthru>
</transform>
<Transforms>
In this example, you can see that we defined a transform that is named MyTXFRM_EXE, which calls the transform command txfrm.exe, which is defined in the <CmdLineExe> tag.
The <TransformName> is used as the viewer name when it calls the ODWEK Retrieve APIs. From this configuration, ODWEK knows that the transform requires RECORDLENGTH, CARRIAGECONTROL, CODEPAGE, and OUTPUTFILE information from Content Manager OnDemand, and can set it on the cmdline by using the options that are specified in each related XML tag.
Also, the txfrm.exe requires additional information to be passed on the cmdline. The -r that is specified in the <Cmdlineparm> tag has no meaning to Content Manager OnDemand, so it is passed through and set on the cmdline call to the txfrm.exe.
In the custom Java code, the call to retrieve the data from ODWEK includes the <Transform Name> that is specified in the XML and looks like the following line:
"byte[] transformedDocument = ODHit.retrieve("MyTXFRM_EXE");
From this example definition, ODWEK calls the specified transform with the following cmdline executable file. Details for the items within “< >” are provided by ODWEK from the Content Manager OnDemand data definitions:
"c:/opt/txfrm.exe -lm <record len> -x <carriage control> -a <codepage> -o <output file name> -r PDF"
9.2.4 V9.5 enhancement: Customizing values that are returned from ODWEK
For certain transforms, values that are returned from ODWEK might not be consistent with the command-line values that are expected by the transform. For example, a transform might have a fixed set of options to specify a carriage control type. The values that are returned by ODWEK when the <CARRIAGECONTROL> tag is included in the <CmdParms> are 'A' (ANSI), 'M' (Machine), and 'N' (None). The following command is produced by the XML in Example 9-1 on page 210:
c:/opt/txfrm.exe -lm 133 -x A -a 500 -o <outputfilename> -r PDF <datafilename>
Because the <CARRIAGECONTROL> tag is present, ODWEK returns the document’s corresponding value - "-x A", or "-x M", or "-x N", depending on the carriage control type (CC Type) that is defined in this document’s application definition. If the transform defines a different set of acceptable values, for example 2, 4, and 0, to specify the document’s carriage control, you can map those values by substituting the following XML as shown in Figure 9-1.
Figure 9-1 Sample XML with custom options
Note: The <CARRIAGECONTROL> node was replaced by three values. When the CC Type that is returned by ODWEK matches ANSI, rather than an 'A', the command includes "-x 2".
This type of substitution can be used to specify the RECFM (Record Format), PRMode, TRC, and CC Type.
9.2.5 V9.5 enhancement: Application Group and Application-specific XML
In version 9.5.0.2, ODWEK now provides additional options under the <transform> node that allow the transform command parameters to be generated based on an Application Group, or an Application Group and Application pair.
Figure 9-2 on page 213 shows a sample transform.xml that can be used in this implementation.
Figure 9-2 Sample XML with <ApplicationGroup><Application> tags
Figure 9-3 shows the transform commands that are generated based on the sample XML and Application Group and Application of the document that is retrieved.
Figure 9-3 Table of generated commands
 
Note: Inheritance is not supported. If an <ApplicationGroup> node is matched, only those options within that node are used for the transform; no parameters that are identified for a parent transform node are used. Similarly, if an <Application> node is matched within an <ApplicationGroup> node, only those options are used for the transform; nothing from the <ApplicationGroup> node is used.
9.2.6 Advanced implementation: Custom Java interface
By using the advanced implementation of the Generic Transform Interface, client developers can write a Java interface to ODWEK that can handle the transform requests in a programmatic way, offering the most application flexibility. Developers can create a class and implement the transformData() method to accept document data and details from Content Manager OnDemand and transform the data in any way they choose.
Example 9-3 shows a sample of the ODTransform.xml files that can be used in this implementation.
Example 9-3 Sample ODTransform.xml file
<Transforms>
<transform>
<TransformName>MYTXFRM</TransformName>
<TransformDescription>GENERIC Transform Engine.</TransformDescription>
<ClientClass>com.companyA.corp.TransformClient</ClientClass>
<OutputMimeType>application/pdf</OutputMimeType>
<OutputExtension>pdf</OutputExtension>
<CmdParms>
<AG_NAME>agName</AG_NAME>
<APPL_NAME>applName</APPL_NAME>
<RECORDFORMAT>recfmt</RECORDFORMAT>
<RECORDLENGTH>LineLength</RECORDLENGTH>
<CARRIAGECONTROL>CC</CARRIAGECONTROL>
<CODEPAGE>CodePage</CODEPAGE>
</CmdParms>
</transform>
</Transforms>
Similar to the basic implementation, the developer uses this XML stanza to set up the required details for document transformation and how those details are passed to the Java transform interface. Example 9-4 shows an example of how the Java interface can be used with the XML stanza to create a document transform request. The example is a code snippet of how the Client Class that is defined in Example 9-3 might be written to transform data.
Example 9-4 Client Class code snippet for transform data
//*******************************************************************
// Testcase: CustomTransform
//
// This class tests the ODWEK Generic Transform's Custom
// Java Interface by implementing the required transformData method.
//
// transformData is called by ODWEK when its corresponding custom
// viewer is called via ODHit.retrieve.
//*******************************************************************
import java.util.*;
import com.ibm.edms.od.*;
 
public class CustomTransform {
public static HashMap transformData(HashMap odMap) throws Exception {
System.out.println("Inside transformData method");
// List this transform name from the XML file
System.out.println(" Transform name: " +
(String)odMap.get(ODTransform.TXFRM_REQ_NAME));
 
// List the property keys and values ODWEK read from the transform XML
// file and provided to this Custom Class
System.out.println(" Transform properties:");
Properties gtProps = (Properties)odMap.get(ODTransform.TXFRM_REQ_PROPS);
Enumeration<?> enumeration = gtProps.keys();
List<String> list = new ArrayList<String>();
while (enumeration.hasMoreElements()) {
list.add((String)enumeration.nextElement());
}
Collections.sort(list);
for (String key : list)
System.out.println(String.format("%25s = %-25s", key,
gtProps.getProperty(key)));
 
// Retrieve the native document from ODWEK
byte[] inDoc = (byte [])odMap.get(ODTransform.TXFRM_REQ_DATA);
System.out.println(" Native document size: " + (inDoc == null ? null:
inDoc.length));
 
// Retrieve the document resources from ODWEK
byte[] inRes = (byte [])odMap.get(ODTransform.TXFRM_REQ_RES);
System.out.println(" Native doc resource size: " + (inRes == null ? null:
inRes.length));
 
// Normally this is where you do the transform or do something with the
byte data.
// Let's just concat the resources if there are any to the doc
byte[] transformedDoc;
if (inRes != null) {
transformedDoc = new byte[inRes.length + inDoc.length];
System.arraycopy(inRes, 0, transformedDoc, 0, inRes.length);
System.arraycopy(inDoc, 0, transformedDoc, inRes.length,
inDoc.length);
}
else
transformedDoc = inDoc;
System.out.println(" Concatenated resources to doc size: " +
transformedDoc.length);
 
// Send the transformed data back to ODWEK
HashMap rtnMap = new HashMap();
rtnMap.put(ODTransform.TXFRM_RESP_DATA, transformedDoc);
return rtnMap;
}
}
Example 9-4 on page 214 shows how to set up the HashMap to pass document byte arrays in and out of this custom interface, and how to define a custom Java class that contains the transformData() method.
This code retrieves the raw document data from ODWEK, gathers all of the document details that Content Manager OnDemand might store from loading the data, and then transforms the document data. The transformed document data can be passed back through ODWEK to the original client request.
Table 9-1 lists the XmlTagNames for the transformation specification.
Table 9-1 XmlTagNames for the transform specification
XmlTagname
ODConstant
Description
TransformName
TransFormName
Name of the transform. It is used as the viewer argument that is passed to ODWEK Retrieve APIs.
TransformDescription
TRANSFORM_DESC
Description of the transform.
ClientClass
TRANSFORM_CLIENTCLASS
The class name of the custom interface class.
CmdLineExe
TRANSFORM_CMDLINEEXE
Fully qualified name of the transform executable file.
OutputMimeType
TRANSFORM_MIMETYPE
The MIME type of the data as it is returned from the transform.
OutputExtension
TRANSFORM_OUTPUTEXT
The extension of the data that is returned from the transform.
CmdParms
TRANSFORM_PARMS
The mappings of OD Values to custom variables. See the constant key words that are shown in Table 9-2 on page 216.
Passthru
TRANSFORM_PASSTHRU
These values are passed through ODWEK directly to the transform.
Cmdlineparm
TRANSFORM_PASSTHRU_CMDLINE
These values are passed through ODWEK directly to the transform command line.
Table 9-2 provides information about the XMLTags. These XML tags are used to pass specific values to the transform command line. These XML tags allow the mapping of the command-line option where the specified value can be passed.
Table 9-2 XmlTags detailed information
XmlTagname
ODConstant
Description
RECORDFORMAT
DOCUMENT_RECORD_FORMAT
The record format of the document as stored in Content Manager OnDemand.
RECORDLENGTH
DOCUMENT_RECORD_LENGTH
The record length of the document as stored in Content Manager OnDemand.
CARRIAGECONTROL
DOCUMENT
_CARRIAGE_CONTROL
The carriage control of the document as stored in Content Manager OnDemand.
TRC_EXIST
DOCUMENT_TRC
_EXIST
The TRC settings as stored in Content Manager OnDemand.
DOCROTATION
DOCUMENT
_ROTATION
The rotation of the document as stored in Content Manager OnDemand.
AG_NAME
AGNAME
The Content Manager OnDemand application group where the document is stored.
APPL_NAME
APPLNAME
The OnDemand application where the document is stored.
CODEPAGE
DOCUMENT_CODEPAGE
The code page of the document as stored in OnDemand.
LINEDELIMITER
DOCUMENT_LINE_DELIMITER
The line delimiter of the document as stored in OnDemand.
INPUTFILE
TXFRM_INPUT_FILE
The Inputfile parameter to be used by the transform.
OUTPUTFILE
TXFRM_OUTPUT_FILE
The OutputFile parameter that is used by the transform.
V9.5 enhancements
DOCUMENT_CC_ANSI
DOCUMENT_CC_ANSI
Used instead of <CARRIAGECONTROL> to define the command-line option and value when the document’s CC Type is “ANSI” as stored in Content Manager OnDemand.
DOCUMENT_CC_MACHINE
DOCUMENT_CC_MACHINE
Used instead of <CARRIAGECONTROL> to define the command-line option and value when the document’s CC Type is “Machine” as stored in Content Manager OnDemand.
DOCUMENT_CC_NONE
DOCUMENT_CC_NONE
Used instead of <CARRIAGECONTROL> to define the command-line option and value when the document’s CC is “No” as stored in Content Manager OnDemand.
RECORDFORMATFIXED
DOCUMENT_RECORDFORMAT
_FIXED
Used instead of <RECORDFORMAT> to define the command-line option and value when the document’s RECFM is “Fixed” as stored in Content Manager OnDemand.
RECORDFORMATVARIABLE
DOCUMENT_RECORDFORMAT
_VARIABLE
Used instead of <RECORDFORMAT> to define the command-line option and value when the document’s RECFM is “Variable” as stored in Content Manager OnDemand.
RECORDFORMATSTREAM
DOCUMENT_RECORDFORMAT
_STREAM
Used instead of <RECORDFORMAT> to define the command-line option and value when the document’s RECFM is “Stream” as stored in Content Manager OnDemand.
PRMODENONE
DOCUMENT_PRMODENONE
Used instead of <PRMODE> to define the command-line option and value when the document’s PRMode is “None” as stored in Content Manager OnDemand.
PRMODESOSI1
DOCUMENT_PRMODESOSI1
Used instead of <PRMODE> to define the command-line option and value when the document’s PRMode is “SOSI1” as stored in Content Manager OnDemand.
PRMODESOSI2
DOCUMENT_PRMODESOSI2
Used instead of <PRMODE> to define the command-line option and value when the document’s PRMode is “SOSI2” as stored in Content Manager OnDemand.
TRC_YES
DOCUMENT_TRCYES
Used instead of <TRC_EXISTS> to define the command-line option and value when the document’s TRC is “'Yes” as stored in Content Manager OnDemand.
TRC_NO
DOCUMENT_TRCNO
Used instead of <TRC_EXISTS> to define the command-line option and value when the document’s TRC is “No” as stored in Content Manager OnDemand.
Table 9-3 provides information about the OnDemand client HashMap keys that are used for advanced Java implementation.
Table 9-3 OnDemand client hashmap key and descriptions
HashMap key
Description
TXFRM_RESP_DATA
This key is the HashMap key for the transformed data byte[] to be returned to ODWEK.
TXFRM_REQ_NAME
Name of transform for this request.
TXFRM_REQ_METHOD
The method name that is used in the custom Java class. The transformData() method must exist in the client class.
TXFRM_REQ_DATA
The original Content Manager OnDemand Document data that is contained in this request.
TXFRM_REQ_PROPS
The document details as specified or requested in the transform.xml file.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.117.191