Job to install wrapper stored procedures
The sample we used is listed in “JCL job sample” on page 180.
The in-database models are described in “IBM Netezza Analytics stored procedures” on page 190.
JCL job sample
The base we used to register the IBM Netezza Analytics Wrapper Stored Procedures is listed in Example B-1.
Example B-1 JCL job sample to install wrapper stored procedures
//INZADDL JOB <job parameters>
//*********************************************************************
//* JOB NAME = INZADDL
//*
//* Licensed Materials - Property of IBM
//* 5697-DA5
//* (C) COPYRIGHT IBM Corp. 2015.
//*
//* US Government Users Restricted Rights
//* Use, duplication or disclosure restricted by GSA ADP Schedule
//* Contract with IBM Corporation
//*
//* DISCLAIMER OF WARRANTIES :
//* Permission is granted to copy and modify this Sample code provided
//* that both the copyright notice,- and this permission notice and
//* warranty disclaimer appear in all copies and modified versions.
//*
//* THIS SAMPLE CODE IS LICENSED TO YOU AS-IS.
//* IBM AND ITS SUPPLIERS AND LICENSORS DISCLAIM ALL WARRANTIES,
//* EITHER EXPRESS OR IMPLIED, IN SUCH SAMPLE CODE, INCLUDING THE
//* WARRANTY OF NON-INFRINGEMENT AND THE IMPLIED WARRANTIES OF
//* MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT
//* WILL IBM OR ITS LICENSORS OR SUPPLIERS BE LIABLE FOR ANY DAMAGES
//* ARISING OUT OF THE USE OF OR INABILITY TO USE THE SAMPLE CODE OR
//* COMBINATION OF THE SAMPLE CODE WITH ANY OTHER CODE. IN NO EVENT
//* SHALL IBM OR ITS LICENSORS AND SUPPLIERS BE LIABLE FOR ANY LOST
//* REVENUE, LOST PROFITS OR DATA, OR FOR DIRECT, INDIRECT, SPECIAL,
//* CONSEQUENTIAL,INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND
//* REGARDLESS OF THE THEORY OF LIABILITY,-, EVEN IF IBM OR ITS
//* LICENSORS OR SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
//* DAMAGES.
//*
//* STATUS = Version 5
//*
//* DESCRIPTION: This JCL creates DB2 for z/OS objects
//* required by
//* IBM DB2 Analytics Accelerator
//* for z/OS, Version 05.01.00.
//*
//*********************************************************************
//*
//* Pseudocode =
//* INZACO0 Step Create INZA Wrapper Procedures and
//* grant execute to PUBLIC.
//*
//* Notes =
//* PRIOR TO RUNNING THIS JOB, customize it for your system:
//* (1) Review GRANT statements issued by this job and
//* change GRANT TO PUBLIC if needed.
//* (2) Add a valid job card
//* (3) Locate and change all occurrences of the following strings
//* as indicated:
//* (A) The subsystem name '!DSN!' to the name of your DB2
//* (B) 'DSN!!0' to the prefix of the target library for DB2
//* (C) 'DSNTIA!!' to the plan name for DSNTIAD on your DB2
//* (D) '!WLMENV!' to the name of the WLM environment for
//* the stored procedures
//*
//*********************************************************************
//JOBLIB DD DISP=SHR,
// DSN=DSN!!0.SDSNLOAD
//*
//INZACO0 EXEC PGM=IKJEFT01,DYNAMNBR=20,COND=(4,LT)
//SYSTSPRT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM(!DSN!)
RUN PROGRAM(DSNTIAD) PLAN(DSNTIA!!) -
LIB('DSN!!0.RUNLIB.LOAD')
END
//SYSIN DD *
CREATE PROCEDURE "INZA"."DECTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN N O
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."DECTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."DROP_ALL_MODELS" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL SMALLINT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."DROP_ALL_MODELS" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."DROP_MODEL" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."DROP_MODEL" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."GROW_DECTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."GROW_DECTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."GROW_REGTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."GROW_REGTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."KMEANS" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL BIGINT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."KMEANS" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."LIST_MODELS" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."LIST_MODELS" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."MODEL_EXISTS" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."MODEL_EXISTS" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."NAIVEBAYES" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."NAIVEBAYES" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PMML_MODEL" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PMML_MODEL" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PREDICT_DECTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL BIGINT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PREDICT_DECTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PREDICT_KMEANS" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL BIGINT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PREDICT_KMEANS" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PREDICT_NAIVEBAYES" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PREDICT_NAIVEBAYES" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PREDICT_REGTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PREDICT_REGTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PREDICT_TWOSTEP" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL BIGINT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PREDICT_TWOSTEP" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PRUNE_DECTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PRUNE_DECTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."PRUNE_REGTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."PRUNE_REGTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."REGTREE" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL INTEGER,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."REGTREE" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."SPLIT_DATA" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL DOUBLE,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."SPLIT_DATA" TO PUBLIC;
 
CREATE PROCEDURE "INZA"."TWOSTEP" (
IN ACCELERATOR_NAME VARCHAR(128) CCSID UNICODE FOR MIXED DATA,
IN PARAMETER CLOB(65536) CCSID UNICODE FOR MIXED DATA,
OUT RETVAL BIGINT,
INOUT MESSAGE CLOB(65536) CCSID UNICODE FOR MIXED DATA)
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"
LANGUAGE C
MODIFIES SQL DATA
PARAMETER STYLE DB2SQL
FENCED
WLM ENVIRONMENT !WLMENV!
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS
'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),POSIX(ON),
STACK(128K,128K,ANY,KEEP,512K,128K),BELOW(4K,4K,FREE),
XPLINK(ON),ALL31(ON),ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ON RETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
GRANT EXECUTE ON PROCEDURE "INZA"."TWOSTEP" TO PUBLIC;
 
//*
IBM Netezza Analytics stored procedures
The following sections provide information about the five in-database models as referenced in 4.3, “Nodes supporting SQL generation for DB2 Accelerator” on page 88 and in the following publications:
IBM Netezza Analytics Release 3.2.0.0, In-Database Analytics Developer’s Guide (November 20, 2014)
IBM Netezza Analytics Release 3.0.1, IBM Netezza In-Database Analytics Reference Guide
K-means
The K-means algorithm is the most widely used clustering algorithm that uses an explicit distance measure to partition the data set into clusters.
The main concept behind the K-means algorithm is to represent each cluster by the vector of mean attribute values of all training instances assigned to that cluster, called the cluster’s center. There are direct consequences of such a cluster representation:
The algorithm handles continuous attributes only, although workarounds for discrete attributes are possible.
Both the cluster formation and cluster modeling processes can be performed in a computationally efficient way by applying the specified distance function to match instances against cluster centers.
The algorithm operates by performing several iterations of the same basic process. Each training instance is assigned to the closest cluster with respect to the specified distance function, applied to the instance and cluster center. All cluster centers are then recalculated as the mean attribute value vectors of the instances assigned to particular clusters.
The cluster centers are initialized by randomly picking k training instances, where k is the desired number of clusters. The iterative process should terminate when there are either no or sufficiently few changes in cluster assignments. In practice, however, specifying the number of iterations, typically a number in the range of 3 - 36, is sufficient.
The range of reasonable distance (or dissimilarity) measures is the same for the kNN algorithm, including all the standard difference-based measures: Euclidean, Manhattan, Canberra, and Maximum. For k-means, two additional distance measures are available: Norm_Euclidean and Mahalanobis. The Norm_Euclidean measure is the default used for k-means clustering.
This stored procedure builds a K-means clustering model. The model is saved to the database in a set of tables and registered in the database model metadata. Use the Model Management functions to further manipulate the model or access the model tables.
The clusters are built iteratively by partitioning data into k (or less) separate clusters according to their distance to the cluster center, then recalculating the cluster centers. The K-means clustering algorithm stops after <maxiter> iterations.
The output table <outtable> is created with the following columns: id, cluster_id, distance. The id column matches the <id> column of the input table. Each input table record is associated with a cluster, where the distance from the record to the cluster center is the smallest. The cluster ID and the distance to the cluster center are given in the columns cluster_id and distance.
Examples
Examples are as follows:
CALL nza..KMEANS('model=adult_mdl, intable=nza..adult,
outtable=adult_out, id=id, target=income, transform=S,
distance=euclidean, k=3, maxiter=5, randseed=12345,
idbased=false');
CALL nza..DROP_MODEL('model=adult_mdl');
CALL nza..DROP_TABLE('adult_out');
KMEANS
--------
3
(1 row)
DROP_MODEL
------------
t
(1 row)
DROP_TABLE
------------
t
(1 row)
Naive Bayes
The naive Bayes classifier is a simpler classification algorithm than most, which makes it quick and easy to apply. Although it does not compete with more sophisticated algorithms with respect to classification accuracy, in some cases it is able to deliver similar results in a fraction of the computation time.1
This stored procedure builds a naive Bayes model. The model is saved to the database in a set of tables and registered in the database model metadata. Use the Model Management functions to further manipulate the model or access the model tables.
Building a naive Bayes model works only on nominal <target> column. Only VARCHAR and NVARCHAR columns are treated as nominal attributes. Discretization is applied on numeric attributes according to the <disc> and <bins> parameters.
The output of the building process is a contingency table that gathers statistics about the class labels per attribute and attribute value. The contingency table has following columns: attributename, val, class, classvalcount, classcount, totalcount.
This algorithm also supports missing data.
Examples
Examples are as follows:
CALL nza..NAIVEBAYES('model=NB_soybean,
intable=nza..soybean_train, id=instance, target=class');
CALL nza..DROP_MODEL('model=NB_soybean');
NAIVEBAYES
------------
1881
(1 row)
DROP_MODEL
------------
t
(1 row)
CALL nza..NAIVEBAYES('model=NB_iris, intable=nza..iris,
coldeftype=cont,
incolumn=PETALWIDTH:nom;PETALLENGTH:ignore;class:nom:targ
et;id:id');
CALL nza..DROP_MODEL('model=NB_iris');
NAIVEBAYES
------------
126
(1 row)
DROP_MODEL
------------
t
(1 row)
Decision Tree
A decision tree is a hierarchical structure that represents a classification model using a “divide and conquer” approach. Internal tree nodes represent splits applied to decompose the data set into subsets, and terminal nodes, also referred to as leaves, assign class labels to sufficiently small or uniform subsets. Splits are specified by logical conditions based on selected single attributes, with a separate outgoing branch corresponding to each possible outcome.
The concept of decision tree construction is to select splits that decrease the impurity of class distribution in the resulting subsets of instances, and increase the domination of one or more classes over the others. The goal is to find a subset containing only or mostly instances of one class after a small number of splits, so that a leaf with that class label is created. This approach promotes simple trees, which typically generalize better.
Creating and using decision tree models involves three major algorithmic sub-tasks:
Decision tree growing
Decision tree pruning
Decision tree prediction
This Decision Tree stored procedure builds a decision tree model by growing and (optionally) pruning the tree. The model is saved to the database in a set of tables and registered in the database model metadata. Use the Model Management functions to further manipulate the model or access the model tables.
A top-down tree-growing algorithm is used with the following features:
Binary splits (equality-based for nominal attributes, inequality-based for continuous attributes).
No missing value handling.
Entropy or Gini index for split evaluation.
Stop criteria is satisfied at a node with a uniform class.
Stop criteria is satisfied when further splits do not improve the class impurity by at least <minimprove>.
Stop criteria is satisfied when the number of instances is less than <minsplit>.
Stop criteria is satisfied when the tree depth reaches <maxdepth>.
A bottom-up reduced error pruning algorithm is used. It bases on the prediction accuracy of the model against the validation data set. The pruning is activated when parameter <valtable> is specified.
Examples
Examples are as follows:
CALL nza..DECTREE('model=adult_tree, intable=nza..adult_train,
id=id, target=income, minsplit=1000, eval=entropy,
valtable=nza..adult_prune, qmeasure=wAcc');
CALL nza..DROP_MODEL('model=adult_tree');
DECTREE
---------
13
(1 row)
DROP_MODEL
------------
t
(1 row)
Regression Tree
Regression trees are decision trees adapted to the regression task; regression trees store numeric target attribute values instead of class labels in leaves, and use appropriately modified split selection and stop criteria.
As with decision trees, regression tree nodes decompose the data into subsets, and regression tree leaves correspond to sufficiently small or sufficiently uniform subsets. Splits are selected to decrease the dispersion of target attribute values, so that they can be reasonably well predicted by their mean values at leaves. The resulting model is piecewise-constant, with fixed predicted values assigned to regions to which the domain is decomposed by the tree structure.2
Creating and using regression tree models involves three major algorithmic subtasks:
Regression tree growing
Regression tree pruning
Regression tree prediction
This Regression Tree stored procedure builds a regression tree model. The model is saved to the database in a set of tables and registered in the database model metadata. Use the Model Management functions to further manipulate the model or access the model tables.
A top-down tree-growing algorithm is used with the following features:
Binary splits (equality-based for nominal attributes, inequality-based for continuous attributes).
No missing value handling.
Variance for split evaluation.
Stop criteria is satisfied when further splits do not improve the variance by at least <minimprove>.
Stop criteria is satisfied when the number of instances is less than <minsplit>.
Stop criteria is satisfied when the tree depth reaches <maxdepth>.
Examples
Examples are as follows:
CALL nza..GROW_REGTREE('model=wrt, intable=nza..weatherr,
id=instance, target=grade, minsplit=2, maxdepth=3');
CALL nza..DROP_MODEL('model=wrt');
GROW_REGTREE
--------------
7
(1 row)
DROP_MODEL
------------
t
(1 row)
TwoStep
TwoStep clustering is a data mining algorithm for large data sets. It is faster than traditional methods because it typically scans a data set only once before it saves the data to a clustering feature (CF) tree. TwoStep clustering can make clustering decisions without repeated data scans, whereas other clustering methods scan all data points, which requires multiple iterations. Non-uniform points are not gathered, so each iteration requires a reinspection of each data point, regardless of the significance of the data point. Because TwoStep clustering treats dense areas as a single unit and ignores pattern outliers, it provides high-quality clustering results without exceeding memory constraints.
The TwoStep algorithm has the following advantages:
It automatically determines the optimal number of clusters. You do not have to manually create a different clustering model for each number of clusters.
It detects input columns that are not useful for the clustering process. These columns are automatically set to supplementary. Statistics are gathered for these columns but they do not influence the clustering algorithm.
The configuration of the CF tree can be granular, so that you can balance between memory usage and model quality, according to the environment and needs.
To cluster data, the TwoStep clustering algorithm does the following actions:
1. The algorithm scans all data and builds a CF tree. This tree is built by arranging the input records in a way so that similar records become part of the same tree node. If a memory issue occurs, the tree is rebuilt with an increased threshold and outliers are removed.
2. The leaves of the CF tree are clustered hierarchically in memory. The clustering is done by calculating the n * (n-1) / 2 distances between each pair of leaves and merging the two clusters with the smallest distance. The process of calculating distances between clusters and merging the closest two is repeated until the root of the tree is reached. All data is contained in one cluster, thus forming a binary tree.
Starting with the root node of the binary tree, the child node with the worst quality is added. The process of adding child nodes continues until the number of clusters that is determined automatically or that is specified by the user is reached. The number of clusters that is determined automatically is also the optimal number of clusters.
3. The clustering result is refined by a final pass over the data where each record is assigned to the closest cluster. This behavior is similar to the behavior of the K-means algorithm.
This TwoStep stored procedure builds a TwoStep clustering model. The model is saved to the database in a set of tables and registered in the database model metadata. Use the Model Management functions to further manipulate the model or access the model tables.
The clustering model is built in two steps:
1. The input data records are distributed into a balanced tree according to their distance to the data records already in the tree node. The tree size is limited: outliers are removed and similar tree nodes are merged. Then, k clusters are determined out of the tree.
2. The input data records are assigned again to the nearest of the k clusters. The output table <outtable> is created with following columns: id, cluster_id, distance. The id column matches the <id> column of the input table. Each input table record is associated with a cluster, where the distance from the record to the cluster is the smallest. The cluster ID and the distance to the cluster are given in the columns cluster_id and distance.
Examples
Examples are as follows:
CALL nza..TWOSTEP('model=adult_mdl, intable=nza..adult,
id=id, target=income');
CALL nza..DROP_MODEL('model=adult_mdl');
TWOSTEP
---------
6
(1 row)
DROP_MODEL
------------
t
(1 row)
 

1 This information is from the IBM Netezza Analytics Release 3.2.0.0, In-Database Analytics Developer’s Guide.
2 This information is from IBM Netezza Analytics Release 3.2.0.0, In-Database Analytics Developer’s Guide.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.79.206