Chapter 14. Customizing and extending the text miner application

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Customizing and extending the text miner application

The text miner application is a powerful text analysis tool that provides invaluable insight to your unstructured content. The text miner application provides many different views from which to analyze your data, ranging from views for time series and trend pattern analysis to various facet correlation views. You can customize the text miner application or extend it. For example, you can add one or more of your own text analytic views that are customized to your specific data and visualization needs. This chapter addresses how to customize and extend the application.

This chapter includes the following sections:

•Customizing the text miner application

•Reasons for extending the text miner application

•Sample plug-ins for text miner views

•Customizing the sample text miner plug-in

•Testing the customized plug-in

14.1 Customizing the text miner application

The previous chapters show you how to use the text miner application. This section provides tips to help you customize the text miner application to meet your business requirement. Customizing the text miner application is useful when you examine the text miner application during the testing stage. Do not modify the text miner application after the system goes into production.

Limiting user access to specific collections: You can limit user access to specific collections for security purposes. To limit user access, see “Limiting user access to the text analytics collection” on page 647.

14.1.1 Analytics Customizer

Content Analytics provides the Analytics Customizer application to help you customize parts of the text miner application for your company. The advantage of using Analytics Customizer is that you can quickly update the properties that are used frequently by using a GUI. You do not have to edit the configuration file directly. You can examine and change the properties during the testing period.

Setting the customizerDisabled property: Set the customizerDisabled property to true after you finalize the customization of your text miner application and prepare the system to go to production.

Accessing the Analytics Customizer depends on your deployment. If you use the deployed Jetty web server, click the Analytics Customizer link in the administration console (Figure 14-1).

Figure 14-1 Linking to the Analytics Customizer in the administration console

When you open the Text Analytics Customizer, you can update the following options on each tab:

Server Specify the host name, protocol (HTTP or HTTPS), logging level, and query timeout in seconds.

Screen Specify images and texts, links, and paths to view in the window.

Query options Specify the Query Options tab or the File Type filterResults tab.

Results Specify what to show in the search result.

Images Specify the image files for the data sources.

You can also set the Preferences for the views.

After you update the values, click Close in the Analytics Customizer window, and then click Exit.

For more information, go to the IBM Content Analytics Information Center at the following address, and search on customizing applications:

http://publib.boulder.ibm.com/infocenter/analytic/v2r2m0/index.jsp

14.1.2 Modifying the URI link in the Documents view

In the Documents view, the text miner application shows the document link that you can click to view the data in the data source. The document link is constructed based on the indexed Uniform Resource Identifier (URI).

Depending on the data source, the URI stored in the index is different as described in the “URI formats in the index” topic in the IBM Content Analytics Information Center at the following address:

http://publib.boulder.ibm.com/infocenter/analytic/v2r2m0/index.jsp

Sometimes you need to change the document link so that it does not direct you to the data source. Alternatively, you might want to remove the document link from the search result in the Documents view. In this case, the regular expression URL filter helps to achieve your requirement. If you configure the regular expression URL filter properly, you can replace the URL that is used in the document link.

For example, you store the original data as an XML file in the file system. The XML file is stored as the following URI in the index:

file://c:/shared/document1.xml

To show the file from a web server so that it is easier to read and the system does not have to fetch the XML file directly from the document link, use the following link:

http://example.com/data/document1.xml

Displaying the XML tasks: If you want to use this example, you must configure the web server properly to display the XML files in a browser beforehand. This task is independent.

In this example, you can set the regular expression filter as follows:

|^file://c:/shared|http://example.com/data|

You can disable all document links so that users cannot click to see the data source. To disable all document links, set the regular express filter as |.*||, which means that you must replace any string (.*) with null.

After you determine how you want to change the URL that is used in the document link, you must perform additional tasks in the Content Analytics configuration. For more information, go to the IBM Content Analytics Information Center at the following address, and search on configuring a regular expression URL filter for search results:

http://publib.boulder.ibm.com/infocenter/analytic/v2r2m0/index.jsp

14.2 Reasons for extending the text miner application

You might extend the text miner application for one of two reasons.

First, you might use several tools (of which Content Analytics is only a part) in the analysis of your data. If these tools are accessible by using a web browser or provide a programmable GUI, you can incorporate these tools into the text miner application and make them accessible from their own tabbed views. In this way, the text miner application consolidates most, if not all, of your various analytic tools into a single portal. This approach greatly simplifies the switching back and forth between tasks.

The second reason is that you realize that a more suitable visualization technique might better serve your data and that you have the programmatic means to implement such a view. In this case, you want to create your own view of the data provided by Content Analytics. To access the data, you can use the Content Analytics REST API to search and retrieve any of the information stored in a text analytic collection. When the information retrieved, you might have your own programmatic means to manipulate and display the data.

The remainder of this chapter concentrates on the latter use case and demonstrates an example extension of the text miner application.

14.3 Sample plug-ins for text miner views

Extending the text miner application follows a plug-in architecture where each plug-in creates a view tab on the results menu bar. The analyticsViewPlugin samples in the ES_INSTALL_ROOT/samples/ directory provide the following example plug-ins:

myFirstPlugin A simple plug-in that uses the Dojo Toolkit only. The Dojo toolkit is already installed with the text miner web application.

mySecondPlugin A simple plug-in that uses the Dojo Toolkit and Adobe® Flex files.

TiaraPlugin An example of an interactive, visual text summarization view that shows the topic keywords for facets over a specified time period.

In this chapter, you modify the first sample plug-in by adding your own visualizations of the data. Therefore, you must perform the following steps to enable the first sample plug-in in the text miner application:

1. Copy the entire contents of the samples/analyticsViewPlugin directory into the ES_NODE_ROOT/master_config/searchapp/analytics/plugin directory. If the plugin directory does not exist, create one.

2. In the plugins.xml configuration file, uncomment the definitions for the myFirstPlugin sample.

3. Restart the text miner application:

– If you use the provided Jetty web server, enter the following command, where node_ID identifies the search server:

esadmin session searchapp.node_ID restart

– If you use WebSphere Application Server, enter the following command:

esadmin config sync

4. Stop and restart the text miner application.

The myFirstPlugin sample is displayed to the right as the last text mining view that is provided with the product (Figure 14-2). (We use the sample text analytics collection throughout this example demonstration.) To the right of the Reports tab is the highlighted My First Plugin tab. In the Facet Navigation pane, the Product facet is the currently selected facet. A table is displayed on the right side that shows the list of its facet values, their document frequency counts, and associated correlation values.

Figure 14-2 MyFirstPlugin enabled in the text miner application

You have now successfully enabled your first sample plug-in. Continue with the next section where you modify this sample to include the custom visualization of the faceted data.

14.4 Customizing the sample text miner plug-in

This section shows how to add your own visualization to the myFirstPlugin sample. More specifically you add a facet cloud, which is a word cloud that shows the relative importance of the facet values based on their frequency counts.

Word clouds show more frequently used words in a larger font and sometimes with a different color. When viewing a word cloud, you can quickly see which words were used more frequently and, therefore, which words are potentially more important. Visually word clouds are thought to be quicker and more intuitive to comprehend rather than trying to decipher entries in a conventional list table.

14.4.1 Changing the view tab title

A simple modification is to change the title of the view tab. To change the title of the view tab, edit the plugin.xml file, which is the same file that was used to uncomment the myFirstPlugin sample. Change the title attribute from “My First Plugin” to “Facet Cloud” as highlighted in Figure 14-3.

Figure 14-3 Changing the title to a plug-in in the plugin.xml file

14.4.2 Customizing the plug-in template HTML file

Each plug-in has a JavaScript widget (.js) file and corresponding template HTML (.html) file. The template HTML file provides the overall layout of the view. The JavaScript widget references key elements of the template and populates those elements with data.

Edit the myFirstPlugin.html file in the ES_NODE_ROOT/master_config/searchapp/
analytics/plugin/myFirstPlugin/templates directory. Example 14-1 shows the edited file.

Example 14-1 Updated HTML template with the facet cloud table added

<div style="font-size: large;">Cloud for selected facet:

<span dojoAttachPoint="selectedFacetSpan"

style="color: blue;"></span>

</div>

</table>

<tr>

<th>Facet name</th>

<th>Frequency</th>

<th>Correlation</th>

</tr>

</thead>

</tbody>

</table>

</div>

In addition to minor changes and additions in style specifications, notice that a new table was added before the conventional facet list table. The body of the facet cloud table is assigned a dojoAttachPoint of “cloudBody.” This name indicates how and where the facet cloud table will be referenced and updated with data by the myFirstPlugin.js JavaScript widget.

14.4.3 Customizing the javascript widget

The myFirstPlugin JavaScript widget does all of the work. It updates the HTML template file with data in response to certain events.

Update the myFirstPlugin.js file in the ES_NODE_ROOT/master_config/
searchapp/analytics/plugin/myFirstPlugin directory:

1. Add the function shown in Example 14-2 to the JavaScript file.

2. In the _onLoad function, insert a line to call the function as shown in Example 14-2.

Example 14-2 The renderFacetCloud JavaScript function

_renderFacetCloud: function(facetValues) {

dojo.empty(this.cloudBody);

var maxFacets = 75; // Maxiumum number of facets values to display in the cloud

var htmlCloud = ""; // Contains the html for the cloud (to be incrementally built)

if (maxFacets > facetValues.length) maxFacets = facetValues.length;

var minValue = facetValues[maxFacets-1];

var offset = minValue.weight-1;

var maxValue = facetValues[0];

// Divide the facet frequencies into groups of 7 (seven increasing font sizes)

var factor = Math.round((maxValue.weight - minValue.weight + 1) / 7);

if (factor == 0) factor = 1;

var rands = []; // Array to hold the number of random numbers generated

var numRands = 0; // Current number of random numbers generated

for (var i=0;i<maxFacets;i++) {

// Randomly select a facet that has not been selected yet

var rand=Math.floor(Math.random()*maxFacets);

while (true) {

var j = 0;

for (;j<numRands;j++) {

if (rand == rands[j]) {

rand=Math.floor(Math.random()*maxFacets);

break;

}

if (j >= numRands) {

rands[j] = rand;

numRands = numRands + 1;

break;

}

// Add the randomly select facet value to our cloud and set

// its span class font size dependent on its frequency (weight)

var facetValue = facetValues[rand];

var fontsize = Math.round((facetValue.weight - offset) / factor);

if (fontsize <=0) fontsize = 1;

if (fontsize > 7) fontsize = 7;

htmlCloud = htmlCloud + "<SPAN class="topicCloudSize"+fontsize+"">";

htmlCloud = htmlCloud + facetValue.label;

htmlCloud = htmlCloud + " </SPAN> ";

}

// Add our html cloud as a cell in the cloud table (body)

var tr = dojo.create("tr", null, this.cloudBody);

dojo.create("td", {innerHTML: htmlCloud}, tr);

3. In the _onLoad function, after the facet values have been retrieved, insert a call to the renderFacetCloud function (Example 14-3).

Example 14-3 Inserting the call to the _renderFacetCloud function

_onLoad: function(data) {

if(data && data["es:apiResponse"] && data["es:apiResponse"]["ibmsc:facet"]) {

var facetValues = data["es:apiResponse"]["ibmsc:facet"]["ibmsc:facetValue"];

this._renderFacetCloud(facetValues);

4. Increase the number of facet values that will be returned from the default of 10 to 100. Do this step near the top of the file when defining the variable for the facet string. Insert the count parameter set to 100 as shown in Example 14-4.

Example 14-4 Increasing the count of facet values returned

var _facet = {

"id": facetId,

"count": "100", // We want 100 maximum facet values returned instead of the default of 10

"namespace": (facetType == "subcategory" ? "keyword" : facetType) // show keywords if subcategory

};

5. Save the myFirstPlugin.js file.

14.4.4 Updating the style sheet for the plug-in

In the _renderFacetCloud JavaScript function, you might have noticed the use of SPAN tags around the facet values because they were added to the cloud. Each SPAN tag refers to one of seven style classes that control the size and color of the font for the facet value. The last task is to add those styles to the common.css file for the text miner application.

1. Edit the common.css file in the ES_INSTALL_ROOT/jetty/searchapp/analytics/ directory. At the end of the file, add the entries shown in Example 14-5.

Example 14-5 Adding facetCloud styles to the common.css file

/**********************/

/* Topic Cloud Styles */

/**********************/

.topicCloudSize1 {

font-size: 9pt;

color: #b4b4b4;;

font-family: Arial

}

.topicCloudSize2 {

font-size: 12pt;

color: #6E96C8;

font-family: Arial

}

.topicCloudSize3 {

font-size: 14pt;

color: #85BEE7;

font-family: Arial

}

.topicCloudSize4 {

font-size: 16pt;

/* color: #5b00b7; */

color: #3EE9DB;

font-family: Arial

}

.topicCloudSize5 {

font-size: 18pt;

/* color: #400040; */

color: #41B041;

font-family: Arial

}

.topicCloudSize6 {

font-size: 20pt;

color: blue;

font-family: Arial

}

.topicCloudSize7 {

font-size: 22pt;

color: #dd00dd;

font-family: Arial

}

2. Save the file.

14.5 Testing the customized plug-in

To test the changes made to the sample plug-in, follow these steps:

1. Restart the text miner application.

– If you use the provided Jetty web server, enter the following commands, where node_ID identifies the search server:

esadmin session searchapp.node_ID restart

– If you use WebSphere Application Server, enter the following command:

esadmin config sync

2. Stop and restart the text miner application.

You then see the customized plug-in with a facet cloud being displayed as shown in the example in Figure 14-4.

Figure 14-4 Custom plug-in shown a facet cloud

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14. Customizing and extending the text miner application

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 14. Customizing and extending the text miner application