ODBC connectivity between SPSS Modeler and DB2 for z/OS
SPSS Modeler must communicate with DB2 for z/OS. This appendix provides some basic terminology and context and then gives two alternatives of how to configure the Open Database Connectivity (ODBC) layer. ODBC is a standard application programming interface (API) for client applications to access a database management system (DBMS).
ODBC related terms and concepts
An ODBC driver is a driver used for database connectivity. It allows the client application to communicate with the database management system (DBMS) using an interoperable protocol that is independent from the used programming language of the client application and also from the used DBMS. Basically, an ODBC driver is similar to a device driver for a piece of hardware, with the ODBC Driver Manager orchestrating the registered ODBC drivers for different data sources in a way that is similar to the operating system orchestrating the installed device drivers for the used hardware components.
An ODBC Data Source Name (DSN) represents a connection to a database. The related data structure contains information such as DB2 location name, IP address, port number, and so on.
ODBC Driver Manager
In Microsoft Windows, an ODBC Driver Manager is integrated as part of the operating system. Driver manager configuration can be started by selecting the Windows Start menu and then typing ODBC in the Search field, as shown in Figure A-1.
Figure A-1 Using the Windows Start menu to open ODBC Data Sources program
The Windows configuration program opens (Figure A-2).
Figure A-2 Windows ODBC Data Source Administrator allowing management of ODBC DSNs
If an ODBC client application (such as SPSS Modeler client or MicroStrategy) looks for ODBC data sources, it asks the ODBC Driver Manager, which manages all ODBC drivers found on the platform.
UNIX based operating systems do not include an integrated ODBC Driver Manager. SPSS SDAP contains an ODBC driver manager on UNIX platforms (based on Progress DataDirect). Other ODBC driver manager products are also available on UNIX.
ODBC gateway mode
If a DBMS is not performing the SQL query execution alone, but forwards the request to and the reply from a different DBMS, the used ODBC driver must support ODBC gateway mode. This is the case with all IBM DB2 drivers in service.
If connecting a client application to z/OS DB2 without using accelerator, gateway mode is not needed. If connecting to z/OS DB2 with accelerator, it is needed.
SPSS SDAP prior Version 7.1 shows an error similar to the following message:
[ODBC DB2 Wire Protocol driver] Unhandled CodePoint: 1C03
In SPSS Modeler, the Unhandled Codepoint message is displayed if DataDirect or SDAP ODBC driver not supporting DB2 Analytics Accelerator (with ODBC gateway mode) is used, as shown in Figure A-3.
Figure A-3 SPSS Modeler message
Although Progress DataDirect does not support gateway mode, for this book, SDAP 7.11 worked to some degree. However, if being able to request service or being able to open a defect report if something is not working, use DB2 ODBC drivers for now.
The major disadvantage of DB2 ODBC drivers compared to SDAP is that they are slightly more difficult to install.
The generation of an efficient INSERT FROM SELECT SQL statement using SPSS SDAP 7.11, and exporting data from accelerated tables to a DB2 base table, with an AOT for intermediate results in between proved unsuccessful for our tests. The ODBC driver returned an error message to the SPSS Modeler server and the SPSS Modeler failed back to using INSERT via VALUES SQL, which is significantly less efficient.
Using the IBM DB2 10.5 for a logical unit of work (LUW) ODBC driver and the INSERT FROM SELECT SQL statements, this worked as expected and used the optimized code within DB2 to copy the data from the Accelerator to a base table.
We found that when using the SPSS SDAP driver, it removes error messages returned by IBM Netezza Analytics stored procedures. So, if doing in-database modeling and using DB2 ODBC driver, the system will report issues such as the record ID attribute does not contain unique values, or some other attribute contains NULL values, although that is not supported by the used IBM Netezza Analytics algorithm. If using SPSS SDAP driver, only the information that something went wrong with the model build is reported to the user.1
To visualize this, we created a stream with invalid input data and a database modeling DB2 z/OS Decision Tree node. Figure A-4 on page 169 indicates what is displayed if using DB2 ODBC driver (in this case, the user learns from the message that the used column "AMOUNT" in the input stream either does not contain unique values or it contains NULL values). Actually that is the attribute used as the Record ID, but we deliberately chose an attribute that is not unique.
Figure A-4 Error reporting if using IBM DB2 ODBC driver
Running the same stream by using SDAP ODBC driver produced the result shown in Figure A-5.
Figure A-5 Error reporting if using IBM SDAP ODBC driver (not supported for DB2 Analytics Accelerator)
Here, the SPSS Modeler does not understand from the communication with the database that the DECTREE stored procedure failed; it reports that the model build stored procedure completed successfully.
The other two IBM Netezza Analytics stored procedures (PRUNE_DECTREE, PMML_MODEL) fail also because they cannot find a model to work on, but that is also not reported. Only at the point SPSS Modeler tries to read the generated PMML model using an SQL SELECT query, it fails with a message that the specified table does not exist.
In summary, to avoid such issues, invest the time to install DB2 ODBC. SDAP works most of the time, but (in our tests) not always, and it is not supported with IDAA.
ODBC driver used by SPSS Modeler server
The ODBC driver used is always the one available within the operating system that hosts the modeler server. Although this might be evident, it is easy to forget.
Figure A-6 shows a flow chart of database connectivity from the perspective of the SPSS Modeler.
Figure A-6 Database connectivity by SPSS Modeler
In this sample configuration, the local server (which is used by default) uses the DataDirect ODBC driver through Microsoft ODBC Driver Manager from the Windows OS; the Linux on z based SPSS Modeler server used the DB2 ODBC driver.
Installation and configuration of IBM DB2 ODBC driver
Various DB2 LUW offerings contain the IBM DB2 ODBC driver, plus the stand-alone command-line interface (CLI) driver that has a smaller footprint (IBM Data Server Driver for ODBC/CLI) but also limited functionality. For SPSS Modeler connectivity, all DB2 ODBC or CLI drivers in service should work.
The setup of the environment for this publication uses the IBM DB2 for Linux, UNIX, and Windows version 10.5 client.
Base installation and DB2 instance create
We completed the base installation of the DBMS, including the db2inst1 instance, with the commands in the following procedure, using Linux on z Systems and SUSE Linux Enterprise Server 11 Linux distribution. As user root, the base installation of DB2 LUW used for the contained ODBC driver and the DB2 Communications Database (CDB) was as follows:
1. Extracted (unpack) the DB2 package.
2. Changed to (cd) the directory containing db2_install.
3. Used defaults to run ./db2install.
4. Continued with the following commands:
a. groupadd -p 999 db2iadm1
b. useradd -u 1004 -g db2iadm1 -m -d /home/db2inst1 db2inst1
c. cd /opt/ibm/db2/V10.5/instance
d. ./db2icrt db2inst1 -s client -u db2inst1 -p db2inst1
Note that db2setup can do some of these steps in the background, however it requires X11 for its graphical user interface (GUI), and not everyone has that available for server management.
The DB2 instance is just used for connectivity to SPSS Modeler server. Although having enough reachable free space in the file system for the modeler server process is important, this DB2 instance usually does not consume significant file system storage resources (except for potential error reporting), because it does not process significant amounts of persistent data on its own.
Starting DB2
To first start DB2, change the userid to db2inst1 and type in db2start.
DB2 communication database and BIND
For every DB2 subsystem that should be reachable by SPSS Modeler server, an entry in the DB2 Communications Database is required, which corresponds to an ODBC Data Source Name (DSN), which is a data structure describing the data path and connectivity information related to that database connection.
The DB2 directory can be created as shown in Example A-1.
Example A-1 Connect from DB2 LUW to DB2 z/OS
db2inst1@zbapi:~> db2 catalog tcpip node MYNODE remote MYNODE.ibm.com server 446 ostype mvs
DB20000I The CATALOG TCPIP NODE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
db2inst1@zbapi:~> db2 catalog database MYDB as MYALIAS at node MYNODE authentication server
DB20000I The CATALOG DATABASE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
db2inst1@zbapi:~> db2 catalog database MYDB as MYALIAS at node MYNODE authentication server
DB20000I The CATALOG DATABASE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
db2inst1@zbapi:~> db2 connect to MYDB user tsobenk
Enter current password for tsobenk:
 
Database Connection Information
 
Database server = DB2 z/OS 11.1.5
SQL authorization ID = TSOBENK
Local database alias = MYDB
Next, do a BIND by going to the DB2 subdirectory that contains the bind scripts and running the command shown in Example A-2.
Example A-2 MVS BIND from DB2 LUW
db2inst1@zbapi:~> cd /opt/ibm/db2/V10.5/bnd
db2inst1@zbapi:~> db2 bind '@ddcsmvs.lst' blocking all sqlerror
continue grant public action add
 
LINE MESSAGES FOR ddcsmvs.lst
------ --------------------------------------------------------------------
SQL0061W The binder is in progress.
 
LINE MESSAGES FOR db2clist.bnd
------ --------------------------------------------------------------------
SQL0038W The bind option SQLERROR CONTINUE has been
activated since it is required when binding this DB2-supplied
 
list file to DB2/MVS, SQL/DS, or OS/400.
SQL0038W The bind option SQLERROR CONTINUE has been
activated since it is required when binding this DB2-supplied
 
list file to DB2/MVS, SQL/DS, or OS/400.
 
LINE MESSAGES FOR ddcsmvs.lst
------ --------------------------------------------------------------------
SQL0091N Binding was ended with "0" errors and "2" warnings.
DB2 Connect license
If you see error message SQL1598N (An attempt to connect to the database server failed because of a licensing problem), you must purchase and activate an IBM DB2 Connect™ license. If you installed an traditional DB2 version, you can use the db2licm utility to install the license, as shown in Example A-3.
Example A-3 Installing the DB2 Connect license
db2licm -a db2consv_zs.lic
SPSS Modeler configuration for use of DB2 ODBC driver
For SPSS Modeler, a requirement is that some DB2 libraries must be found by the dynamic loader. On current Linux distributions, this can be achieved by adding a db2.conf file to the /etc/ld.so.conf.d/ subdirectory (as user root), with the contents pointing to the 64-bit library folder in DB2, as shown in Example A-4.
Example A-4 Contents of /etc/ld.so.conf.d/db2.conf file
/opt/ibm/db2/V10.5/lib64/
Next, complete the following steps:
1. As root, use the ldconfig command.
2. In the modelersrv.sh file, add lines similar to those shown in Example A-5.
Example A-5 Part of modelersrv.sh file in SPSS Modeler server directory
DB2_INSTALL_PATH=/opt/ibm/db2/V10.5
export DB2_INSTALL_PATH
DB2_INSTANCE_HOME=/home/db2inst1
export DB2_INSTANCE_HOME
# ...
ODBC_DRIVER_MANAGER_PATH=/opt/ibm/db2/V10.5/lib64/
3. Go to the SPSS Modeler server installation directory
(the default is /usr/IBM/SPSS/ModelerServer/17.1 for version 17.1).
4. Go to the bin/ subdirectory.
5. Change the used ODBC library shown in Example A-6.
Example A-6 Changing symbolic links to the used SPSS ODBC library (for DB2 ODBC driver)
rm libspssodbc.so
ln -s libspssodbc_db2cli.so libspssodbc.so
6. Verify that all shared libraries are correctly resolved, similar to Example A-7.
Example A-7 Use "ldd" to check whether all libraries used by "libspssodbc.so" are found
zbapi:/usr/IBM/SPSS/ModelerServer/17.1/bin # ldd libspssodbc.so
libdb2o.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2o.so.1 (0x000003fffb59d000)
libdl.so.2 => /lib64/libdl.so.2 (0x000003fffb598000)
libunicodelib.so => /data/IBM/SPSS/ModelerServer/17.1/bin/./libunicodelib.so (0x000003fffb58f000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000003fffb43c000)
libm.so.6 => /lib64/libm.so.6 (0x000003fffb3a2000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000003fffb38f000)
libc.so.6 => /lib64/libc.so.6 (0x000003fffb204000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x000003fffb1c5000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000003fffb1a6000)
libpam.so.0 => /lib64/libpam.so.0 (0x000003fffb195000)
libdb2dascmn.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2dascmn.so.1 (0x000003fffb178000)
libdb2g11n.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2g11n.so.1 (0x000003fffab03000)
libdb2genreg.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2genreg.so.1 (0x000003fffaac8000)
libdb2install.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2install.so.1 (0x000003fffaac1000)
libdb2locale.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2locale.so.1 (0x000003fffaa9c000)
libdb2osse.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2osse.so.1 (0x000003fffa56d000)
libdb2osse_db2.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2osse_db2.so.1 (0x000003fffa516000)
libdb2sdbin.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2sdbin.so.1 (0x000003fffa48d000)
libdb2trcapi.so.1 => /opt/ibm/db2/V10.5/lib64/libdb2trcapi.so.1 (0x000003fffa480000)
/lib/ld64.so.1 (0x000002aace099000)
libicuuc.so.51 => /data/IBM/SPSS/ModelerServer/17.1/bin/./libicuuc.so.51 (0x000003fffa2d6000)
libaudit.so.0 => /lib64/libaudit.so.0 (0x000003fffa2ba000)
librt.so.1 => /lib64/librt.so.1 (0x000003fffa2ae000)
libicudata.so.51 => /data/IBM/SPSS/ModelerServer/17.1/bin/./libicudata.so.51 (0x000003fff8d63000)
7. If no errors are produced, start (or restart) the SPSS Modeler server.
After connecting to this server using SPSS Modeler client, you should see the DB2 ODBC based data sources.
Installation and configuration of SPSS SDAP driver
Use of SDAP driver is not yet officially supported. With SDAP 7.1, we know DB2 Analytics Accelerator does not work because of the missing support for ODBC gateway mode. Although SDAP 7.11 partially works, it is still unsupported. Setup is slightly less complicated than what is required for the IBM DB2 ODBC driver though, which is fully serviced. Issues we found with SPSS SDAP are that it is significantly slower in some special cases and that an important part of the error reporting is removed; as a result, developing SPSS streams doing in-database modeling is more difficult.
To install and configure the SPSS SDAP driver, complete the following steps:
1. Go to the SPSS Modeler server installation directory (for version 17.1, the default is /usr/IBM/SPSS/ModelerServer/17.1).
2. Go to the bin/ subdirectory.
3. Change the used ODBC library, as shown in Example A-8.
Example A-8 Changing symbolic links to the used SPSS ODBC library (for SPSS SDAP or Progress DataDirect ODBC driver)
rm libspssodbc.so
ln -s libspssodbc_datadirect_utf16.so libspssodbc.so
4. In the modelersrv.sh file, make sure you source the odbc.sh file from the SDAP package, as shown in Example A-9 (assuming SDAP is installed to /root/SDAP711/).
Example A-9 File "modelersrv.sh" needs to source "SDAP711/odbc.sh" file
. /root/SDAP711/odbc.sh
5. In the SDAP directory, run the setodbcpath.sh script, then edit the odbc.ini file. In our environment, we used the file shown in Example A-10.
Example A-10 Sample odbc.ini file used for SDAP ODBC
ODBC Data Sources]
CB01=Montpellier CB01 - EBCDIC - z/OS DB2
 
[ODBC]
IANAAppCodePage=4
InstallDir=/root/SDAP711
Trace=0
TraceFile=odbctrace.out
TraceDll=/root/SDAP711/lib/XEtrc27.so
 
[CB01]
Driver=/root/SDAP711/lib/XEdb227.so
Description=IBM Corp. 7.1 DB2 Wire Protocol
AccountingInfo=
AddStringToCreateTable=
AlternateID=
AlternateServers=
ApplicationName=
ApplicationUsingThreads=1
AuthenticationMethod=0
BulkBinaryThreshold=32
BulkCharacterThreshold=-1
BulkLoadBatchSize=1024
BulkLoadFieldDelimiter=
BulkLoadRecordDelimiter=
CatalogSchema=
CharsetFor65535=0
ClientHostName=
ClientUser=
#Collection applies to z/OS and iSeries only
Collection=
ConcurrentAccessResolution=0
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
CurrentFuncPath=
#Database applies to DB2 UDB only
Database=<database_name>
DefaultIsolationLevel=1
DynamicSections=1000
EnableBulkLoad=0
EncryptionMethod=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
GrantAuthid=PUBLIC
GrantExecute=1
GSSClient=native
HostNameInCertificate=
IpAddress=200.1.1.20
KeyPassword=
KeyStore=
KeyStorePassword=
LoadBalanceTimeout=0
LoadBalancing=0
#Location applies to z/OS and iSeries only
Location=BA00CB00
LogonID=
MaxPoolSize=100
MinPoolSize=0
Password=
PackageCollection=NULLID
PackageNamePrefix=DD
PackageOwner=
Pooling=0
ProgramID=
QueryTimeout=0
ReportCodePageConversionErrors=0
TcpPort=446
TrustStore=
TrustStorePassword=
UseCurrentSchema=0
ValidateServerCertificate=1
WithHold=1
XMLDescribeType=-10
EBCDIC and z/OS DB2 are just comments, but viewable from SPSS Modeler client. In our installation (with complete odbc.ini file, where we show only the first entry), it looks similar to Figure A-7.
Figure A-7 SPSS Modeler client GUI shows available data sources if odbc.ini shown in Example A-10 on page 175 is used
 

1 IBM Netezza Analytics error reporting works only with SPSS Modeler 18; it does not perform in SPSS Modeler 17.1.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.128.113