Chapter 3. Sample use cases

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Sample use cases

Three use cases are provided in this chapter. These use cases were chosen because they represent many of the challenges enterprises face when it comes to managing metadata in massive scale.

Although these use cases provide an excellent representation of how the IBM Spectrum Discover platform can be used to optimize storage systems, enable data governance, and fulfill analytical needs that challenge organizations with billions of files and or objects, they are in no way the only use cases possible. In fact, the extensible architecture of the IBM Spectrum Discover platform enables metadata management for a nearly limitless number, and types, of use cases.

This chapter includes the following topics:

•3.1, “Storage optimization” on page 34

•3.2, “Data governance” on page 48

•3.3, “Healthcare and life sciences use cases” on page 52

•3.4, “Summary” on page 89

3.1 Storage optimization

In this section, we describe how to use IBM Spectrum Discover for the following tasks:

•Gain insights into unstructured data.

•Decrease storage capital expenditure (CaPex) by identifying aged data that can be relocated or deleted.

•Map data to business priorities, organization, and projects.

•Reduce storage operation expenditures (OpEx) by improving storage administrator productivity.

The reduction in storage OpEx is compounded when data is stored across dissimilar storage systems or hybrid multi-cloud environments.

3.1.1 Gaining insight into unstructured data

IBM Spectrum Discover provides two methods to easily explore cataloged data. The first method is a standard search box that allows for quickly finding data that meets the search criteria. The second method is a visual exploration of the data.

The visual exploration capabilities, which are provided by the IBM Spectrum Discover GUI, enable administrators and users, to quickly locate data that might, for all practical purposes, be lost. In this section, we show how CaPex can be greatly decreased by identifying aged data that can be relocated, or even deleted.

Visual exploration

Complete the following steps to start a visual exploration of the cataloged data in IBM Spectrum Discover:

1. Select the search icon in the navigation menu of the GUI. The search icon is highlighted by a red box on the left side in Figure 3-1.

Figure 3-1 Starting a visual exploration

2. Select the wanted groups in the or start a visual exploration section. Then, click press the circled-arrow to continue (see Figure 3-1).

The GUI presents available tags that are based on the selected groups. If a tag must be available, click Add more search criteria to select more groups, as shown in Figure 3-2.

Figure 3-2 Adding groups to your visual search

3. Click Submit.

4. Within the groups chosen, select the tags that are to be used to isolate the wanted data, as shown in Figure 3-3.

Figure 3-3 Tags selected

5. Click the circled-arrow. The GUI now displays the results of the visual exploration, as shown in Figure 3-4.

Figure 3-4 Visual exploration results

6. Now that data is narrowed down to some number of groupings, based on search criteria, you can drill down to individual records by selecting one or more groups and clicking Convert to individual record mode, as shown in Figure 3-4.

The resulting table provides all records that meet the search criteria for the groups that were selected, as shown in Figure 3-5.

Figure 3-5 Individual record table

Next, we describe generating a report for the records in our results.

Generating a report

We now want to report on data that was not accessed in the past year.

Note: All records in the record table were not accessed for a year or more. Therefore, for the purposes of our example, we limit the record table to records that were not accessed in 1 year only to show the flexibility of IBM Spectrum Discover.

Complete the following steps:

1. Continuing with our data that is shown in Figure 3-5 on page 36, click the funnel icon that is next to Generate Report to show the available filters that can be applied.

2. Click the twistie that is next to Last Accessed Time and set the start date to the date of interest, as shown in Figure 3-6.

Figure 3-6 Setting Last Access Time

3. Click OK.

4. Click Apply to update the record table.

Now, the record table shows only files last were accessed by the specified date, as shown in Figure 3-7.

Figure 3-7 Last Accessed Time results

5. Columns can also be added, or removed, by clicking the twistie that is next to Columns. In our example, we add Permissions and then, click Apply to update the record table. Also, we now click the funnel icon again to hide the filter section.

Note: Our source storage system is not integrated into LDAP. Therefore, owner is reported as the UID.

6. We are now ready to generate a report of all of the records that were not accessed within the last year. To select the records to be included in the report, click the check box next to the path, or select individual record, as shown in Figure 3-8.

Figure 3-8 Ready to generate a report

7. Click Generate Report. Enter a name for the report in the pop-up window, as shown in Figure 3-9.

Figure 3-9 Naming the report

8. Click Submit to generate the report. Click OK after the report is generated.

9. Click Reports in the Navigation Menu. Locate the report that was created. Now the wanted action can be completed by clicking the wanted action on the right side of the wanted report in the Reports window. Reports can be viewed, downloaded, rerun, or deleted. The results are downloaded into a spreadsheet. The report generated is shown in Figure 3-10.

Figure 3-10 Report generated

3.1.2 Mapping data to business priorities

By using IBM Spectrum Discover, custom metadata values can be added to a set of the records based on filter criteria. For example, you can add a project name to records based on their location within the file system, owner ID, and so on. This process is done by using an auto-tagging policy.

Auto-tagging

To begin the auto-tagging process, we must create a tag to be associated with the policy we build to conduct the auto-tagging process.

Project name tag

Complete the following steps to create a project name tag:

1. Click the Metadata icon in the Navigation Menu.

2. Within the Metadata window, select the Tags link at the top of the window. In the Tags window, click Add and the New Organizational Tags pop-up window opens, as shown in Figure 3-11.

Figure 3-11 New Organizational Tags pop-up

3. Within the New Organizational Tags window, enter a unique name for the tag that is created.

Note: Tag names cannot contain spaces.

4. Select Open from the Type pull-down menu.

Note: An Open tag can be anything that describes groups of records and is not restricted to predefined values. Therefore, a project name, department name, or sensor serial number can be used.

5. Leave the Values field empty because it is not required for this open tag.

6. Click Submit to save the tag.

Now that a tag is created to use during project naming, we can build our policy to conduct the tagging.

Project naming policy

Complete the following steps to create a project naming policy:

1. Click Policies at the upper left corner of the Metadata window. Because a policy for project naming does not exist, we build a policy by clicking Add Policy in the right side of the policy listing, as shown in Figure 3-12.

Figure 3-12 Add Policy

2. In the Add new policy window (see Figure 3-13 on page 42), move the slider from Inactive to Active so that the policy can be run. Enter a name for the policy in the Name field.

Note: Policy names cannot contain spaces.

3. By using the Policy Type pull-down menu, choose AUTOTAG.

Next, we must provide a filter for the data we want to auto-tag. In our scenario, we know that user 1001 has numerous data for the project that we want to tag; therefore, our filter carries out on that user's data within the catalog.

4. After the filter is created, select Extract tag from path. This selection makes the Field and Depth pull-down menus visible.

5. In the Field pull-down menu, select the tag that was created. The tag now is associated with the policy.

6. Determine where the project names are within the path to the data. For the project naming to be carried out here, the seventh forward slash (/) in the path is where the project name derives.

Note: The directory that is below the root of the file system is where to begin counting forward slashes; for example:

root/folder1/subfolder2/subfolder3/subfolder4/…

If the depth is set to 4, the project name is subfolder3.

Figure 3-13 Add new policy

We run this policy immediately. If the policy is to be scheduled, select the wanted frequency. This provides options for selecting time of day, days of week, and date, as required. Figure 3-14 shows an example for the Schedule portion of the Add new policy window.

Figure 3-14 Scheduling a policy

7. Click Save after the schedule is set, or Now is chosen automatically.

Progress for the policy is reported on the Policies window, as shown in Figure 3-15.

Figure 3-15 Policies Progress

Because we elected to run our policy now, it completed. To confirm the results, we perform another visual exploration. To return to the search window, use the Navigation Menu search icon. The newly created tag is included in the choices for the visual exploration, as shown in Figure 3-16.

Note: It takes approximately 30 minutes before the newly created tag is updated with project names.

Figure 3-16 Search with project naming tag

Select the tag created for the project naming and click the circled-arrow. On the next window of the visual exploration, select the project name of interest, as shown in Figure 3-17. Then, click the circled-arrow.

Figure 3-17 Select a project

After the results are displayed (see Figure 3-18), select the project name and click Convert to individual record mode.

Figure 3-18 Convert project name

After the individual records display, click the funnel icon. Then, click the twistie for columns, select the project name tag, and click Apply (see Figure 3-19).

Figure 3-19 Setting displayed columns

Click the funnel icon again to clear the side pane from the window.

The Results windows now includes a column for the project name, as shown in Figure 3-20.

Figure 3-20 Display records with project names

3.1.3 Reducing storage operation expenditures

The IBM Spectrum Discover GUI dashboard, or home page, provides access to information about the connected storage systems in a manner that enables storage administrators to quickly ascertain high-level details that are required for day-to-day operations. For example, the Datasource Capacities section shows used and free space, but enhances that view by indicating the number of files to move or archive, based on user-defined policies. This feature enables a storage administrator to determine whether action is required, at a glance, as shown in Figure 3-21.

Figure 3-21 Datasource capacity

However, to further reduce the time required for administrative duties, these high-level views offer drill-down capabilities. This feature enables, in a matter of minutes, the ability to perform the actions required, by way of policies and action agents, or generating detailed reports.

The Records Indexed section of the IBM Spectrum Discover GUI dashboard provides the total capacity that is required for the records within the catalog. It also displays the total number of records, but again, provides drill-down capabilities that enable the storage administrator to get more information as required.

The next section of the IBM Spectrum Discover GUI dashboard is dedicated to highlighting space that might be wasted because of duplicate files. The duplicate file information section is shown on the lower right side of Figure 3-21.

Note: Identifying potential duplicate files can be resource-intensive on IBM Spectrum Discover. Therefore, the background task that provides this information is disabled by default. To enable this capability, see the instructions that are linked in the duplicate file information section or IBM Knowledge Center.

The lower portion of the IBM Spectrum Discover GUI dashboard displays capacity used, based on the tags available, as shown in Figure 3-22.

Figure 3-22 Capacity Used By

To offer the maximum efficiency of the storage administrator's time, and provide greater flexibility, custom tags (such as the ProjectNamingTag) are displayed in the capacity that is used by view. It is the bottom option in the pull-down menu that is shown in Figure 3-22.

To further enhance the insight that is provided by the capacity used by views, a pull-down menu allows for displaying the number of files that are cataloged within each option of the view, as shown in Figure 3-23.

Figure 3-23 Capacity Used By file counts

3.2 Data governance

Data governance is the overall management of data availability, accessibility, relevancy, usability, quality, and security in an enterprise. It also can help you manage your business data throughout its lifecycle.

One role that IBM Spectrum Discover plays in data governance is to ensure that your data storage systems are compliant with governance policies. IBM Spectrum Discover reduces the risk that might be buried in unstructured data stores. It also can accelerate the investigation into potentially fraudulent activities while making regulatory audits faster and more thorough.

3.2.1 Use case scenario

Suppose that a financial services organization maintains storage in multiple locations and in multiple data storage entities. The organization recently adopted a data governance policy to store all personally identifiable information (PII) in a particular data store in a single directory or folder.

The CIO’s organization is given the responsibility of ensuring this policy is adhered to across the organization. The CIO immediately appoints someone to be the organization’s data steward and assigns them the task of identifying problem areas in the context of data governance.

3.2.2 Data stewardship with IBM Spectrum Discover

The data steward finds that IBM Spectrum Discover enables fast data exploration and identification of PII data by using regular expression searches of the data contents. The data steward outlines a plan that includes the following tasks:

•Document the various components that can categorize a file or object as PII.

•Identify regular expressions to be used when searching files or objects that might match potential PII in their contents.

•Create IBM Spectrum Discover tags to be used in identifying a file or object as containing PII.

•Design IBM Spectrum Discover policies that use the regular expressions and tags to manifest the identification of files or objects that contain PII.

•Scheduling regular report generation with a format and frequency suitable for providing to the CIO.

After discussing and receiving approval for this plan from management, the data steward moves ahead with its implementation. Each step of their plan is described in the following sections.

3.2.3 Documenting the various PII components

Suppose the PII stored by the organization for individuals, in addition to their name, is limited to the following components:

•Email

•Address and ZIP code

•Credit card account number

•CVV number

•Social security number or tax identification number

3.2.4 Identifying regular expressions for the PII components

IBM Spectrum Discover provides several predefined regular expressions for content search policies. Those expressions that correspond to the PII components of concern to the organization are listed in Table 3-1.

Table 3-1 PII Components and the associated IBM Spectrum Discover regular expression names

PII component	Regular expression names
Email	EmailID
ZIP Code	USZIPCode
Credit card number	MasterCard, VisaCard, AmexCard
CVV code	CVV-Number
US Social Security number	US-SSN

The data steward notices that their organization stores US Social Security numbers such that the three numbers are delimited by spaces rather than numbers. Therefore, the IBM Spectrum Discover predefined US-SSN regular expression is not sufficient for their needs.

The data steward can redefine the US-SSN regular expression or define another regular expression to identify US Social Security numbers that are delimited by blanks. The latter approach is described in the “Regular expressions” on page 20. As shown in the example, the new regular expression is named US-SSNbd (“bd” suffix for blank-delimited).

3.2.5 Creating tags to identify files or objects that include PII

Only a few individuals across the organization have sufficient permission or privilege to view the PII data. That is, to protect their customers’ information, few members can open file or objects and examine their content. Therefore, generating reports that identify data governance problem areas should not display the actual PII. Governance reports. Instead, they need only to identify files or objects that contain PII.

As a result, the data steward uses a single IBM Spectrum Discover tag, containsPII, for this identification. This tag is set to true if any PII data is detected and false otherwise.

3.2.6 Creating policies to identify files or objects that include PII

With the tag defined for identifying PII-containing files or objects, we define the policy by using the IBM Spectrum Discover user interface, as shown in Figure 3-24.

Figure 3-24 Adding a policy for use in identifying storage that contains PII

When defining the policy, select the CONTENT SEARCH in the Agent pull-down menu. Therefore, the data steward selects the agent, contentsearchagent, which is associated with CONTENT SEARCH policies.

All regular expressions that apply to identifying PII data in the Search Expression pull-down menu are selected. Because of the sensitivity of the data and the potential for those users that do not have a need-to-know, the data steward selects True/False from the Value pull-down menu. The policy is scheduled to be run every Thursday in preparation for meeting with the CIO on Fridays.

3.2.7 Defining and scheduling regular reports for governance

After the new findPII policy runs, the containsPII tag’s values are updated. The data steward must identify any files that are outside of the PII-restricted storage area. To do so, they browse to the Search window and search for files that are identified as containing PII; that is, those files that include a containsPII value of True.

The search is further refined to exclude files that are in the restricted area; that is, we want to identify PII information that is outside of the designated PII storage location. The data steward enters the specific query and runs the search, as shown in Figure 3-25.

Figure 3-25 Searching for files that contain PII but are outside the designated PII storage area

After the governance report is generated (see Figure 3-26), it can be rerun on-demand. This feature eliminates the time that is required to generate the search syntax and configuration.

Figure 3-26 Generated governance report

The data steward can now select these results, generate a report, and name it governanceViolationReport (for example). The report is saved in the Reports window and as a result, can be rerun ad hoc, as shown in Figure 3-26 on page 51.

Reports can be downloaded in CSV format or by using the IBM Spectrum Discover REST API, in JSON format. With this report, the data steward is now prepared to take action in moving or eliminating data with PII that do not adhere to the organization’s data governance policies.

3.2.8 Summary

The example use case that is described in this section is only one data governance scenario where IBM Spectrum Discover can be a powerful tool in managing and enforcing an organization’s data governance policies.

Other governance scenarios

Imagine the initial scenario that the data steward might be faced with: locating all of the PII in the organization’s data center and then, working with storage management to identify a technology or location that is suitable for storing and protecting PII data. This task is dramatically simplified with IBM Spectrum Discover.

The data steward can run the same findPII policy, which identifies all files or objects across all storage systems that are owned by the organization. The subsequent search criteria are simplified by reporting PII-containing data only. Such a report can then be provided to the storage administrators to aid their planning in moving and protecting the organization’s PII data.

Similarly, any audit of the organization's PII policy can be accelerated by using IBM Spectrum Discover to help identify PII data. Auditors can supply their own regular expression searches to better satisfy any concerns in a fast and efficient manner.

Ensuring data protection with IBM Spectrum Discover Exploration

IBM Spectrum Discover can be used to not only “flag” data as containing PII, but to collect PII data. Users of IBM Spectrum Discover can be assured that PII is not exposed during metadata collection, exploration, or analysis.

All metadata that is collected by IBM Spectrum Discover is encrypted by using industry standard practices. In-flight data is protected by TLS throughout its journey to the IBM Spectrum Discover system. Inside the system, the DBMS that contains any metadata also is protected by in-place encryption.

3.3 Healthcare and life sciences use cases

IBM Spectrum Discover supports a wide range of use cases that can be applied to healthcare and life sciences applications. The technological advances in this industry led to a ubiquitous amount of unstructured data with rich metadata information in it.

In this section, we showcase working with Variant Call Format (VCF) files and Digital Imaging and Communications in Medicine (DICOM) files. We add custom metadata tags, add custom regex expression, create and run custom policies, and run search and report operations on the data. All of these processes by using the content search features that are delivered with IBM Spectrum Discover.

3.3.1 Variant Call Format use case

The VCF specifies the format of a text file that is used in bioinformatics for storing gene sequence variations. The standard is in version 4.3. A typical vcf file header is shown in Example 3-1.

Example 3-1 Example of a typical vcf file header

##fileformat=VCFv4.3

##fileDate=20090805

##source=myImputationProgramV3.1

##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta

##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>

##phasing=partial

##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">

##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">

##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">

##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">

##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">

##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">

##FILTER=<ID=q10,Description="Quality below 10">

##FILTER=<ID=s50,Description="Less than 50% of samples have data">

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">

##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003

20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.

20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3

20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4

20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2

20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3

Our example data is primarily of vcf v3.2 format. Most of our example data includes the line that contains the vcf file header that is shown in Example 3-2.

Example 3-2 Example of a typical vcf file header

##format=VCFv3.2

As a first example of how to work with IBM Spectrum Discover contentsearch policies and custom tags, we define the necessary infrastructure and help categorize the data that is based on the vcf format level.

Creating custom metadata tags

Log on with a user ID that includes data admin or data user rights and browse to Metadata → Tags in the web UI, as shown in Figure 3-27.

Figure 3-27 Creating custom metadata tags

Click Add and create an organizational tag of type Characteristics, as shown in Figure 3-28.

Figure 3-28 Creating Characteristics tag

Click Metadata → Regular Expressions tab and create a regex by clicking Add Regex, as shown in Figure 3-29.

Figure 3-29 Adding regex

Because we want use our regular expression to return a value for our tag that is behind the search pattern, our regex must look similar to the regex that is shown in Example 3-3.

Example 3-3 Regex policy

format=(.*)$

At times, the special characters of a regex are difficult to work with (see Figure 3-30).

Figure 3-30 Our example regex

Browse to Metadata → Policies to create a policy by clicking Add Policy. The policy takes the Metadata Tag and the Regular Expression as input, as shown in Figure 3-31.

Figure 3-31 Creating a policy

The following information is needed to successfully create our example vcf policy. We highlight the names used in our lab test environment to help map back to the fields that are shown in Figure 3-31:

•Name

Choose a meaningful name for your policy. This name is shown in the IBM Spectrum Discover Metadata → Policy window.

Lab test value: vcf_format_pol

•Policy Type

The Policy Type can be AUTOTAG, DEEP_INSPECT, or CONTENT SEARCH. For our example, we want the CONTENT SEARCH agent to work with our data. Select CONTENT SEARCH.

Pre-completed from the previous window selection: CONTENT SEARCH

•Filter

This filter value controls the range of your data to which this policy is applied. Carefully choose the filter to ensure that the policy is applied to only the part of your data for which you intended it. This has a positive effect on performance and reduces policy runtime. In our example, we apply the policy to all *.vcf files

Lab test value: filetype='vcf'

•Agent

This action agent is the agent with which the policy works. The action agent for content search is included in the IBM Spectrum Discover documentation. It must be installed before this step.

Lab test value: contentsearchagent

•Tag

This tag is the tag with which the policy works. We created it as shown in Figure 3-28 on page 54.

Lab test value: vcf_format

•Search Expression

This menu allows you to select on or multiple regex from the list

Lab test value: vcf_format.

•Value

This setting can be True/False or Value matching expression. We are extracting the format of the .vcf file; therefore, we select Value matching expression.

Lab test value: Value matching expression

Choose to activate the policy while creating it. You can also add a schedule to the policy to run it repeatedly.

Click Save to save the policy.

The UI returns to the Metadata → Polices window and shows the running policy as displayed (see Figure 3-32).

Figure 3-32 vcf_format_pol Policy is running

After the policy completed successfully, we can search through our newly enriched metadata. Browse to the Search page and search for the .vcf files, as shown in Example 3-4.

Example 3-4 Search syntax for newly enriched vcf files

filetype='vcf' and vcf_format is not null

The search results resemble the results that are shown in Figure 3-33.

Figure 3-33 All *.vcf files with vcf_format detected

Click the Funnel icon that is to the right of Generate Report for more information.

Open the Columns row on the right side of the window to add the newly created custom metadata tag to the report, as shown in Figure 3-34 on page 60.

Figure 3-34 Adding vcf_format to the report

Click Apply to re-create the report with the added custom metadata tag. The newly created report is displayed, as shown in Figure 3-35.

Figure 3-35 Report with vcf_format column

You can also directly use a search query to create a report that shows (for example) all *.vcf files with a VCF format of 3.2. Our example search expression is shown in Example 3-5.

Example 3-5 Example of a vcf_format search

filetype='vcf' and vcf_format like 'VCFv3.2'

The report resembles the example that is shown in Figure 3-36.

Figure 3-36 All vcf files with v3.2 format

Another search query creates a report that shows (for example) all *.vcf files with a VCF format of 4.0. Our example search expression is shown in Example 3-6.

Example 3-6 Example of a vcf_format search

filetype='vcf' and vcf_format like 'VCFv4.0'

The report resembles the example that is shown in Figure 3-37.

Figure 3-37 All vcf files with v4.0 format

The results page also shows how many hits the search produced, as shown in Figure 3-38.

Figure 3-38 The 46 items in the search

Our example produces the results with the respective search terms as listed in Table 3-2.

Table 3-2 vcf_format search hits

Search expression	Number of hits
filetype='vcf'	46
filetype='vcf' and vcf_format is not null	46
filetype='vcf' and vcf_format like 'VCFv3.2'	44
filetype='vcf' and vcf_format like 'VCFv4.0'	2
vcf_format='VCFv3.2'	44
vcf_format='VCFv4.0'	2

With this simple analysis, we verified that all .vcf files are accounted for. The specific syntax of the strings we are looking for in the data can create a 100% fool proof regex challenging. It is always a good idea to perform a counter calculation.

To continue to work with the data, we run the search that is shown in Example 3-7 and generate a report from it.

Example 3-7 vcf_format search for report generation

vcf_format is not null

Run the search and click Generate Report, as shown in Figure 3-39.

Figure 3-39 Search Results window

Enter a meaningful name for the report and click Submit to generate it, as shown in Figure 3-40.

Figure 3-40 Generate report

Click the Reports icon on the left side of the IBM Spectrum Discover UI. The list with available reports is displayed, as shown in Figure 3-41.

Figure 3-41 Available Reports

Click Download Report in the Actions column, as shown in Figure 3-42. You can open the report as comma-separated value (.csv) files directly into a third-party application; for example, Microsoft Excel. You can also download the report and save it to disk.

Figure 3-42 Download or open *.csv

The report contains the metadata tags that are available for the selected files and objects, as shown in Figure 3-43.

Figure 3-43 *.csv file in Excel with the vcf_format column

3.3.2 Digital Imaging and Communications in Medicine use case

Digital Imaging and Communications in Medicine (DICOM) is a standard for handling, storing, printing, and transmitting information in medical imaging. It includes a file format definition and a network communications protocol.

In this example, we focus on the file format definition and use it to extract metadata from the DICOM files for use in IBM Spectrum Discover. The DICOM files in our example are stored as *.dcm files. The file extension can also be different, such as dcm30 or nonexistent.

In this example, we also describe the use of the REpresentational State Transfer (REST) application programming interface (API). For more information about the use of the IBM Spectrum Discover REST API, see IBM Knowledge Center.

A typical .dcm file header resembles the example that is shown in Example 3-8.

Example 3-8 Example of the beginning of a typical dcm file header

(0008, 0005) Specific Character Set CS: 'ISO_IR 100'

(0008, 0008) Image Type CS: ['ORIGINAL', 'PRIMARY', 'AXIAL', 'CT_SOM5 SPI']

(0008, 0012) Instance Creation Date DA: '20150127'

(0008, 0013) Instance Creation Time TM: '164740.587000'

(0008, 0016) SOP Class UID UI: CT Image Storage

(0008, 0018) SOP Instance UID UI: 1.3.6.1.4.1.9590.100.1.2.261253712234905766225279748492392506610

(0008, 0020) Study Date DA: '20150127'

(0008, 0021) Series Date DA: '20160923'

(0008, 0022) Acquisition Date DA: '20150127'

(0008, 0023) Content Date DA: '20150127'

(0008, 002a) Acquisition DateTime DT: '20150127164818.751986'

(0008, 0030) Study Time TM: '155258.755000'

(0008, 0031) Series Time TM: '130446.484542'

(0008, 0032) Acquisition Time TM: '164818.751986'

(0008, 0033) Content Time TM: '164818.751986'

(0008, 0050) Accession Number SH: u''

(0008, 0060) Modality CS: 'CT'

(0008, 0070) Manufacturer LO: u'Siemens Healthcare'

(0008, 0080) Institution Name LO: u'Siemens Healthcare GmbH MED CT'

(0008, 0081) Institution Address ST: u'Siemensstrasse 1'

(0008, 0090) Referring Physician's Name PN: u'User'

(0008, 1030) Study Description LO: u'UNKNOWN'

(0008, 103e) Series Description LO: u'CL514_BODYAXW 5.0 I31f 2 B31f 0.6 (0.6) [A,0]'

(0008, 1050) Performing Physician's Name PN: u'User'

(0008, 1090) Manufacturer's Model Name LO: u'Somatom Definition Flash'

[…]

Most of our example data contains the line that is shown in Example 3-9 in the header.

Example 3-9 Example of a typical dcm file header

(0010, 0010) Patient's Name PN: u'734_201_164'

As a first example of how to work with IBM Spectrum Discover contentsearch policies and custom tags, we define the necessary infrastructure and help categorize the data based on the information Patient's Name in the DICOM file metadata.

Creating custom metadata tags

Log on with a user ID that includes data admin or data user rights and browse to Metadata → Tags in the web UI, as shown in Figure 3-44.

Figure 3-44 Creating custom metadata tags

Click Add and create an organizational tag with the name dicom_pname and the type Characteristics, as shown in Figure 3-45.

Figure 3-45 Creating Characteristics tag

As an alternative to creating a characteristics tag with the web UI, the REST API can be used. Log on to the IBM Spectrum Discover console with an SSH client and run the commands that are shown in Example 3-10.

Example 3-10 Create characteristics tag with REST API

export SD_USER=sdadmin

export SD_PASSWORD=Passw0rd

export OVA=localhost

gettoken

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_pname", "type": "Characteristics", "value": "[]"}' -X POST

Tag dicom_pname added

[moadmin@art3mis ~]$

Verify the successful addition of the dicom_pname tag with the command that is shown in Example 3-11.

Example 3-11 .Verify characteristics tag creation REST API

tcurl_json https://localhost/policyengine/v1/tags/dicom_pname | jq

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 64 100 64 0 0 214 0 --:--:-- --:--:-- --:--:-- 214

{

"tag": "dicom_pname",

"type": "Characteristics",

"value": "[]"

}

[moadmin@art3mis ~]$

Click Metadata → Regular Expressions tab and create a regex by clicking Add Regex, as shown in Figure 3-46.

Figure 3-46 Adding regex

Because we want to use our regular expression to return a value for our tag that is behind the actual search pattern, our regex must look similar to the example that is shown in Example 3-12.

Example 3-12 Regex policy

^.*Patient.ssNames+[A-Z]+:s(.*)$

Sometimes, the special characters of a regex are difficult to work with (see Figure 3-47).

Figure 3-47 Our (Redpaper) example regex

As an alternative to creating a regex with the web UI, the REST API can be used. Log on to the IBM Spectrum Discover console with an SSH client and run the commands that are shown in Example 3-13.

Example 3-13 Create regex with REST API

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_pname", "pattern": "^.*Patients\sName\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Patients Name"}' -X POST

Regex dicom_pname added

[moadmin@art3mis ~]$

Verify the successful addition of the dicom_pname regex with the command shown in Example 3-14.

Example 3-14 Verify regex creation REST API

tcurl_json https://localhost/policyengine/v1/regex/dicom_pname | jq

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 123 100 123 0 0 425 0 --:--:-- --:--:-- --:--:-- 427

{

"pattern": "^.*Patient.s\sName\s+[A-Z]+:\s(.*)$",

"description": "Find DICOM Patients Name",

"regex_id": "dicom_pname"

}

[moadmin@art3mis ~]$

Click Metadata → Policies to create a policy by clicking Add Policy. The policy takes the Metadata Tag and the Regular Expression as input, as shown in Figure 3-48.

Figure 3-48 Creating a policy

The following information is needed to successfully create our example vcf policy. We highlight the names that are used in our lab test environment to help map back to the fields that are shown in Figure 3-48:

•Name

Choose a meaningful name for your policy. This name is shown in the IBM Spectrum Discover Metadata → Policy window.

Lab test value: dicom_pname_pol

•Policy Type

The Policy Type can be AUTOTAG, DEEP_INSPECT, or CONTENT SEARCH. For our example, we want the CONTENT SEARCH agent to work with our data. Select CONTENT SEARCH.

Pre-completed from previous window selection: CONTENT SEARCH

•Filter

This filter value controls the range of your data to which this policy is applied. Carefully choose the filter to ensure that the policy is applied to the part of your data that you intended it for only. This has a positive effect on performance and reduces policy runtime. In our example, we apply the policy to all *.dcm files.

Lab test value: filetype='dcm'

•Agent

This action agent works with the policy. The action agent for content search is included in the IBM Spectrum Discover documentation. It must be installed before this step.

Lab test value: contentsearchagent

•Tag

This tag is the tag with which the policy works. We created it as shown in Figure 3-28 on page 54. The value of the DICOM metadata is written into this tag.

Lab test value: dicom_pname

•Search Expression

This menu allows you to select on or multiple regex from the list

Lab test value: dicom_pname

•Value

The setting for Value can be True/False or Value matching expression. We are extracting the Patient's Name from the .dcm file; therefore, we select Value matching expression.

Lab test value: Value matching expression

Choose to activate the policy while creating it. You can also add a schedule to the policy to run it repeatedly.

Click Save to save the policy.

The UI returns to the Metadata → Polices window and shows the running policy as shown in Figure 3-49 on page 74.

As an alternative to creating a policy with the web UI, the REST API can be used. Log on to the IBM Spectrum Discover console with an SSH client and prepare a JavaScript Object Notation (JSON) file, as shown in Example 3-15.

Example 3-15 Create JSON file

cat dicom_pname_pol.json | jq

{

"pol_id": "dicom_pname_pol",

"action_id": "CONTENTSEARCH",

"action_params": {

"agent": "contentsearchagent",

"search_tags": [

{

"match_type": "value",

"tag": "dicom_pname",

"patterns": [

"dicom_pname"

]

}

]

"pol_filter": "filename LIKE '%dcm%'",

"schedule": "NOW",

"pol_state": "active"

}

Using jq in the command allows you to verify that the syntax of your JSON file is correct. Run the command that is shown in Example 3-16 to create the policy.

Example 3-16 Create policy with REST API

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_pname_pol.json

Policy 'dicom_pname_pol' added

[moadmin@art3mis ~]$

Verify the successful addition of the dicom_pname_pol policy with the command that is shown in Example 3-17.

Example 3-17 Verify policy creation with REST API

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_idate_pol5 | jq

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 587 100 587 0 0 1577 0 --:--:-- --:--:-- --:--:-- 1582

{

"pol_status": "running",

"action_params": "{"agent": "contentsearchagent", "search_tags": [{"patterns": ["dicom_pname"], "tag": "dicom_pname", "match_type": "value"}]}",

"pol_id": "dicom_pname_pol",

"schedule": "NOW",

"collection_list": null,

"pol_state": "active",

"execution_info": "{"submitted_count": 120000, "total_count": 512447, "start_time": "2019-06-27_22:43:29", "failed_count": 1, "completed_count": 12000}",

"explicit": "true",

"policy_owner": null,

"last_updated_by": null,

"pol_filter": "filename LIKE '%dcm%'",

"action_id": "CONTENTSEARCH"

}

[moadmin@art3mis ~]$

Independent from the source of creation of the policy, the web UI shows the policy running, as shown in Figure 3-49.

Figure 3-49 dicom_pname_pol Policy is running

After the policy completed successfully, we can search through our newly enriched metadata. Browse to the Search page and search for the .dcm files with the search expression that is shown in Example 3-18.

Example 3-18 Search syntax for newly enriched dcm files

filetype='dcm' and dicom_pname is not null

The search resembles the example that is shown in Figure 3-50.

Figure 3-50 All *.dcm files with dicom_pname detected

Click the Funnel icon to the right of Generate Report for more information. Open the Columns row on the right side of the window to add the newly created custom metadata tag to the report, as shown in Figure 3-51.

Figure 3-51 Adding dicom_pname tag to the report

Click Apply to re-create the report with the added custom metadata tag. The newly created report is displayed, as shown in Figure 3-52.

Figure 3-52 Report with dicom_pname column

We get 10,000 hits. We create a search query for a specific patient that is named Anonymous3159, as shown in Example 3-19.

Example 3-19 Example of a dicom_imagetype search

dicom_pname like '%Anonymous3159%'

The report shows all of the DICOM files that belong to patient Anonymous3159, as shown Figure 3-53.

Figure 3-53 Search results for specific patient

We get 1,284 hits. You can repeat this process to add the metadata tag dicom_imagetype to IBM Spectrum Discover. The regex is shown in Example 3-20.

Example 3-20 .dicom_imagetype search

^.*ImagesTypes+[A-Z]+:s(.*)$

After you repeated these steps, you can search for all of the files that are of a specific imagetype, as shown in Example 3-21.

Example 3-21 Example of a combined dicom_pname and dicom_imagetype search

dicom_pname like '%Anonymous3159%' and dicom_imagetype is not null

After adding the column dicom_imagetype to the report (see Figure 3-53 on page 78), the report resembles the example that is shown in Figure 3-54.

Figure 3-54 All dcm files of patient "Anonymous3159" showing dicom_imagetype

Search for all DICOM files of patient Anonymous315 that are of the image type ORIGINAL, SECONDARY, AXIAL. A successful search query is shown in Example 3-22.

Example 3-22 Example of a search query for combined dicom_pname and dicom_imagetype

dicom_pname like '%Anonymous3159%' and dicom_imagetype like '%ORIGINAL%SECONDARY%AXIAL%'

The report resembles the example that is shown in Figure 3-55.

Figure 3-55 All dcm files of patient Anonymous3159 of imagetype ORIGINAL SECONDARY AXIAL

We get 1,284 hits. As a form of control, our counter-calculation results in all available files for patient Anonymous3159 are of the same DICOM Image Type. Enter a meaningful name for the report and click Submit to generate the report, as shown in Figure 3-56.

Figure 3-56 Generate report

Click the Reports icon on the left side of the IBM Spectrum Discover UI. The list with available reports is displayed, as shown in Figure 3-57.

Figure 3-57 Available Reports

Click your newly created report. Click Download Report in the Actions column, as shown in Figure 3-58. You can open the report as a .csv file directly into a third-party application; for example, Microsoft Excel. You can also download and save the report.

Figure 3-58 Download or open *.csv

The report contains the metadata tags that are available for the selected files and objects, as shown in Figure 3-59.

Figure 3-59 *.csv file in Excel with the dicom columns

Much metadata can be in certain file types, including DICOM. Manually creating tags, regex, and policies can take some time. Next, we describe how to partially automate creating multiple tags, regex, and policies.

Start with inspecting the DICOM metadata headers.

The DICOM header looks similar to our example that is shown in Example 3-8 on page 67 at the beginning of this use case. Develop a naming scheme that is derived from the metadata headers. Our example is shown in Example 3-23.

Example 3-23 Example naming scheme for DICOM tags, regex, and policies

Instance Creation Time dicom_ictime

SOP Class UID dicom_sopcuid

SOP Instance UID dicom_sopiuid10

Study Date dicom_studate

Series Date dicom_serdate

Acquisition Date dicom_adate

Content Date dicom_contdate

Acquisition DateTime dicom_adatetime

Study Time dicom_stutime

Series Time dicom_sertime

Acquisition Time dicom_atime

Content Time dicom_conttime

Accession Number dicom_accnum

Modality dicom_moda

Manufacturer dicom_manu

Institution Name dicom_iname

Institution Address dicom_iaddress

Referring Physician's Name dicom_refname

Study Description dicom_studesc

Series Description dicom_serdesc

Performing Physician's Name dicom_physname

Manufacturer's Model Name dicom_manmodname

With this naming scheme, we can start to create a multitude of tags by using the REST API, as shown Example 3-24.

Example 3-24 Example tag creation following a naming scheme

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_scset", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_itype", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_icdate", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_ictime", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_sopcuid", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_sopiuid", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_studate", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_serdate", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_adate", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_contdate", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_adatetime", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_stutime", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_sertime", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_atime", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_conttime", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_accnum", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_moda", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_manu", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_iname", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_iaddress", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_refname", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_studesc", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_serdesc", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_physname", "type": "Characteristics", "value": "[]"}' -X POST

tcurl_json https://localhost/policyengine/v1/tags -d '{"tag": "dicom_manmodname", "type": "Characteristics", "value": "[]"}' -X POST

We also create many regex, as shown in Example 3-25.

Example 3-25 Example tag creation following a naming scheme

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_scset", "pattern": "^.*Specific\sCharacter\sSet\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Specific Character Set"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_itype", "pattern": "^.*Image\sType\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Image Type"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_icdate", "pattern": "^.*Instance\sCreation\sDate\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Instance Creation Date"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_ictime", "pattern": "^.*Instance\sCreation\sTime\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Instance Creation Time"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_sopcuid", "pattern": "^.*SOP\sClass\sUID\s+[A-Z]+:\s(.*)$", "description": "Find DICOM SOP Class UID"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_sopiuid", "pattern": "^.*SOP\sInstance\sUID\s+[A-Z]+:\s(.*)$", "description": "Find DICOM SOP Instance UID"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_studate", "pattern": "^.*Study\sDate\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Study Date"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_serdate", "pattern": "^.*Series\sDate\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Series Date"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_adate", "pattern": "^.*Acquisition\sDate\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Acquisition Date"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_contdate", "pattern": "^.*Content\sDate\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Content Date"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_adatetime", "pattern": "^.*Acquisition\sDateTime\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Acquisition DateTime"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_stutime", "pattern": "^.*Study\sTime\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Study Time"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_sertime", "pattern": "^.*Series\sTime\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Series Time"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_atime", "pattern": "^.*Acquisition\sTime\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Acquisition Time"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_conttime", "pattern": "^.*Content\sTime\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Content Time"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_accnum", "pattern": "^.*Accession\sNumber\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Accession Number"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_moda", "pattern": "^.*Modality\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Modality"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_manu", "pattern": "^.*Manufacturer\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Manufacturer"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_iname", "pattern": "^.*Institution\sName\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Institution Name"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_iaddress", "pattern": "^.*Institution\sAddress\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Institution Address"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_refname", "pattern": "^.*Referring\sPhysician.s\sName\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Referring Physicians Name"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_studesc", "pattern": "^.*Study\sDescription\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Study Description"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_serdesc", "pattern": "^.*Series\sDescription\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Series Description"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_physname", "pattern": "^.*Performing\sPhysician.s\sName\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Performing Physicians Name"}' -X POST

tcurl_json https://localhost/policyengine/v1/regex -d '{"regex_id": "dicom_manmodname", "pattern": "^.*Manufacturer.s\sModel\sName\s+[A-Z]+:\s(.*)$", "description": "Find DICOM Manufacturers Model Name"}' -X POST

Finally, we create many polices by using the tags from Example 3-24 on page 83 and the regex from Example 3-25 on page 84. To create many JSON files, we use our helper script, as shown in Example 3-26. It is creating JSON files that can be used to create polices for this specific example. The policy filter is fixed to all *.dcm files. The output is a JSON file that follows the naming scheme with an added _pol.json at the end; for example, dicom_scset_pol.json.

Example 3-26 JSON file creation helper script

cat create_json.sh

echo "{ " > $1\_pol.json

echo " "pol_id":""$1"_pol", " >> $1\_pol.json

echo " "action_id":"CONTENTSEARCH", " >> $1\_pol.json

echo " "action_params":{ " >> $1\_pol.json

echo " "agent":"contentsearchagent", " >> $1\_pol.json

echo " "search_tags":[ " >> $1\_pol.json

echo " { " >> $1\_pol.json

echo " "match_type":"value", " >> $1\_pol.json

echo " "tag":""$1"", " >> $1\_pol.json

echo " "patterns":[ " >> $1\_pol.json

echo " "$1" " >> $1\_pol.json

echo " ] " >> $1\_pol.json

echo " } " >> $1\_pol.json

echo " ] " >> $1\_pol.json

echo " }, " >> $1\_pol.json

echo " "pol_filter":"filename LIKE '%dcm%'", " >> $1\_pol.json

echo " "schedule":"NOW", " >> $1\_pol.json

echo " "pol_state":"active" " >> $1\_pol.json

echo "}

You can use the jq tool to verify the syntax of your JSON files by running the command that is shown in Example 3-27.

Example 3-27 Verification of JSON syntax with jq

cat dicom_scset_pol.json | jq

{

"pol_id": "dicom_scset_pol",

"action_id": "CONTENTSEARCH",

"action_params": {

"agent": "contentsearchagent",

"search_tags": [

{

"match_type": "value",

"tag": "dicom_scset",

"patterns": [

"dicom_scset"

]

}

]

"pol_filter": "filename LIKE '%dcm%'",

"schedule": "NOW",

"pol_state": "active"

}

By using the naming scheme, we run the helper script to create the JSON files, as shown in Example 3-28.

Example 3-28 Calling the JSON helper script

./create_json.sh dicom_scset

./create_json.sh dicom_itype

./create_json.sh dicom_icdate

./create_json.sh dicom_ictime

./create_json.sh dicom_sopcuid

./create_json.sh dicom_sopiuid

./create_json.sh dicom_studate

./create_json.sh dicom_serdate

./create_json.sh dicom_adate

./create_json.sh dicom_contdate

./create_json.sh dicom_adatetime

./create_json.sh dicom_stutime

./create_json.sh dicom_sertime

./create_json.sh dicom_atime

./create_json.sh dicom_conttime

./create_json.sh dicom_accnum

./create_json.sh dicom_moda

./create_json.sh dicom_manu

./create_json.sh dicom_iname

./create_json.sh dicom_iaddress

./create_json.sh dicom_refname

./create_json.sh dicom_studesc

./create_json.sh dicom_serdesc

./create_json.sh dicom_physname

./create_json.sh dicom_manmodname

With the prepared JSON files, we can run the policy creation commands as shown in Example 3-29.

Example 3-29 Creating the policies

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_scset_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_itype_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_icdate_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_ictime_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_sopcuid_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_sopiuid_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_studate_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_serdate_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_adate_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_contdate_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_adatetime_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_stutime_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_sertime_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_atime_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_conttime_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_accnum_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_moda_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_manu_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_iname_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_iaddress_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_refname_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_studesc_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_serdesc_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_physname_pol.json

tcurl_json https://localhost:443/policyengine/v1/policies -X POST -d @dicom_manmodname_pol.json

The web UI l shows the running policies in the Metadata → Policies view, as shown in Example 3-60.

Figure 3-60 Many policies running

Example 3-30 shows how to use the REST API to delete a multitude of policies.

Example 3-30 Delete a multitude of policies

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_itype_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_icdate_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_ictime_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_sopcuid_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_sopiuid_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_studate_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_serdate_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_adate_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_contdate_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_adatetime_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_stutime_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_sertime_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_atime_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_conttime_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_accnum_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_moda_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_manu_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_iname_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_iaddress_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_refname_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_studesc_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_serdesc_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_physname_pol -X DELETE

tcurl_json https://localhost:443/policyengine/v1/policies/dicom_manmodname_pol -X DELETE

3.4 Summary

The use cases that are described in this chapter are only example scenarios where IBM Spectrum Discover can be a powerful tool to help system administrators optimize the use of storage systems, manage and enforce an organization’s data governance policies, or use deep inspection to provide insight to data contents.

The flexible and extensible architecture of IBM Spectrum Discover makes it an excellent metadata management system for almost any use case where greater insight of data that is contained within the physical confines of an enterprise’s data center, across clouds, or both simultaneously, is required.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Sample use cases

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 3. Sample use cases