Chapter 3: The Fundamentals of Using Python with CAS

Connecting to CAS

Running CAS Actions

Specifying Action Parameters

CAS Action Results

Working with CAS Action Sets

Details

Getting Help

Dealing with Errors

SWAT Options

CAS Session Options

Conclusion

The SAS SWAT package includes an object-oriented interface to CAS as well as utilities to handle results, format data values, and upload data to CAS. We have already covered the installation of SWAT in an earlier chapter, so let’s  jump right into connecting to CAS.

There is a lot of detailed information about parameter structures, error handling, and authentication in this chapter. If you feel like you are getting bogged down, you can always skim over this chapter and come back to it later when you need more formal information about programming using the CAS interface.

Connecting to CAS

In order to connect to a CAS host, you need some form of authentication. There are various authentication mechanisms that you can use with CAS. The different forms of authentication are beyond the scope of this book, so we use user name and password authentication in all of our examples. This form of authentication assumes that you have a login account on the CAS server that you are connecting to. The disadvantage of using a user name and password is that you typically include your password in the source code. However, Authinfo is a solution to this problem, so we’ll show you how to store authentication information using Authinfo as well.

Let’s make a connection to CAS using an explicit user name and a password. For this example, we use an IPython shell. As described previously, to run IPython, you use the ipython command from a command shell or the Anaconda menu in Windows.

The first thing you need to do after starting IPython is to import the SWAT package. This package contains a class called CAS that is the primary interface to your CAS server. It requires at least two arguments: CAS host name or IP address, and the port number that CAS is running on1. Since we use user name and password authentication, we must specify them as the next two arguments. If there are no connection errors, you should now have an open CAS session that is referred to by the conn variable.

In [1]: import swat

 

In [2]: conn = swat.CAS('server-name.mycompany.com', 5570,

                        'username', 'password')

 

In [3]: conn

Out[3]: CAS('server-name.mycompany.com', 5570, 'username',

            protocol='cas', name='py-session-1',

            session='ffee6422-96b9-484f-a868-03505b320987')

As you can see in Out[3], we display the string representation of the CAS object. You see that it echoes the host name, the port, the user name, and several fields that were not specified. The name and session fields are created once the session is created. The session value contains a unique ID that can be used to make other connections to that same session. The name field is a user-friendly name that is used to tag the session on the server to make it easier to distinguish when querying information about current sessions. This is discussed in more depth later in the chapter.

We mentioned using Authinfo rather than specifying your user name and password explicitly in your programs. The Authinfo specification is based on an older file format called Netrc. Netrc was used by FTP programs to store user names and passwords so that you don’t have to enter authentication information manually. Authinfo works the same way, but adds a few extensions.

The basic format of an Authinfo file follows: (The format occupies two lines to enhance readability.)

host server-name.mycompany.com port 5570

     user username password password

Where server-name.mycompany.com is the host name of your CAS server (an IP address can also be used), 5570 is the port number of the CAS server, username is your user ID on that machine, and password is your password on that machine. If you don’t specify a port number, the same user name and password are used on any port on that machine. Each CAS host requires a separate host definition line. In addition, the host name must match exactly what is specified in the CAS constructor. There is no DNS name expansion if you use a shortened name such as server-name.

By default, the Authinfo file is accessed from your home directory under the name .authinfo (on Windows, the name _authinfo is used). It also must have permissions that are set up so that only the owner can read it. This is done using the following command on Linux.

chmod 0600 ~/.authinfo

On Windows, the file permissions should be set so that the file isn’t readable by the Everyone group. Once that file is in place and has the correct permissions, you should be able to make a connection to CAS without specifying your user name and password explicitly.

In [1]: import swat

 

In [2]: conn = swat.CAS('server-name.mycompany.com', 5570)

 

In [3]: conn

Out[3]: CAS('server-name.mycompany.com', 5570, 'username',

            protocol='cas', name='py-session-1',

            session='ffee6422-96b9-484f-a868-03505b320987')

After connecting to CAS, we can continue to a more interesting topic: running CAS actions.

Running CAS Actions

In the previous section, we made a connection to CAS, but didn’t explicitly perform any actions. However, after the connection was made, many actions were performed to obtain information about the server and what resources are available to the CAS installation. One of the things queried for is information about the currently loaded action sets. An action set is a collection of actions that can be executed. Actions can do various things such as return information about the server setup, load data, and perform advanced analytics. To see what action sets and actions are already loaded, you can call the help action on the CAS object that we previously created.

In [4]: out = conn.help()

NOTE: Available Action Sets and Actions:

NOTE:    accessControl

NOTE:       assumeRole - Assumes a role

NOTE:       dropRole - Relinquishes a role

NOTE:       showRolesIn - Shows the currently active role

NOTE:       showRolesAllowed - Shows the roles that a user is

                               a member of

NOTE:       isInRole - Shows whether a role is assumed

NOTE:       isAuthorized - Shows whether access is authorized

NOTE:       isAuthorizedActions - Shows whether access is

            authorized to actions

NOTE:       isAuthorizedTables - Shows whether access is authorized

                                 to tables

NOTE:       isAuthorizedColumns - Shows whether access is authorized

                                  to columns

NOTE:       listAllPrincipals - Lists all principals that have

                                explicit access controls

NOTE:       whatIsEffective - Lists effective access and

                              explanations (Origins)

NOTE:       listAcsData - Lists access controls for caslibs, tables,

                          and columns

NOTE:       listAcsActionSet - Lists access controls for an action

                               or action set

NOTE:       repAllAcsCaslib - Replaces all access controls for

                              a caslib

NOTE:       repAllAcsTable - Replaces all access controls for a table

NOTE:       repAllAcsColumn - Replaces all access controls for

                              a column

NOTE:       repAllAcsActionSet - Replaces all access controls for

                                 an action set

NOTE:       repAllAcsAction - Replaces all access controls for

                              an action

NOTE:       updSomeAcsCaslib - Adds, deletes, and modifies some

                               access controls for a caslib

NOTE:       updSomeAcsTable - Adds, deletes, and modifies some

                              access controls for a table

NOTE:       updSomeAcsColumn - Adds, deletes, and modifies some

                               access controls for a column

NOTE:       updSomeAcsActionSet - Adds, deletes, and modifies some

                                  access controls for an action set

NOTE:       updSomeAcsAction - Adds, deletes, and modifies some

                               access controls for an action

NOTE:       remAllAcsData - Removes all access controls for a

                            caslib, table, or column

 

... truncated ...

This prints out a listing of all of the loaded action sets and the actions within them. It also returns a CASResults structure that contains the action set information in tabular form. The results of CAS actions are discussed later in this chapter.

The help action takes arguments that specify which action sets and actions you want information about. To display help for an action set, use the actionset keyword parameter. The following code displays the help content for the builtins action set.

In [5]: out = conn.help(actionset='builtins')

NOTE: Information for action set 'builtins':

NOTE:    builtins

NOTE:       addNode - Adds a machine to the server

NOTE:       removeNode - Remove one or more machines from the server

NOTE:       help - Shows the parameters for an action or lists all

                   available actions

NOTE:       listNodes - Shows the host names used by the server

NOTE:       loadActionSet - Loads an action set for use in this

                            session

NOTE:       installActionSet - Loads an action set in new sessions

                               automatically

NOTE:       log - Shows and modifies logging levels

NOTE:       queryActionSet - Shows whether an action set is loaded

NOTE:       queryName - Checks whether a name is an action or

                        action set name

NOTE:       reflect - Shows detailed parameter information for an

                      action or all actions in an action set

NOTE:       serverStatus - Shows the status of the server

NOTE:       about - Shows the status of the server

NOTE:       shutdown - Shuts down the server

NOTE:       userInfo - Shows the user information for your connection

NOTE:       actionSetInfo - Shows the build information from loaded

                            action sets

NOTE:       history - Shows the actions that were run in this session

NOTE:       casCommon - Provides parameters that are common to many

                        actions

NOTE:       ping - Sends a single request to the server to confirm

                   that the connection is working

NOTE:       echo - Prints the supplied parameters to the client log

NOTE:       modifyQueue - Modifies the action response queue settings

NOTE:       getLicenseInfo - Shows the license information for a

                             SAS product

NOTE:       refreshLicense - Refresh SAS license information from

                             a file

NOTE:       httpAddress - Shows the HTTP address for the server

                          monitor

Notice that help is one of the actions in the builtins action set. To display the Help for an action, use the action keyword argument. You can display the Help for the help action as follows:

In [6]: out = conn.help(action='help')

NOTE: Information for action 'builtins.help':

NOTE: The following parameters are accepted.

      Default values are shown.

NOTE:    string action=NULL,

NOTE:       specifies the name of the action for which you want help.

            The name can be in the form 'actionSetName.actionName' or

            just 'actionName'.

NOTE:    string actionSet=NULL,

NOTE:       specifies the name of the action set for which you

            want help. This parameter is ignored if the action

            parameter is specified.

NOTE:    boolean verbose=true

NOTE:       when set to True, provides more detail for each parameter.

Looking at the printed notes, you can see that the help action takes the parameters actionset, action, and verbose. We have previously seen the actionset and action parameters. The verbose parameter is enabled, which means that you will get a full description of all of the parameters of the action. You can suppress the parameter descriptions by specifying verbose=False as follows:

In [7]: out = conn.help(action='help', verbose=False)

NOTE: Information for action 'builtins.help':

NOTE: The following parameters are accepted.

      Default values are shown.

NOTE:    string action=NULL,

NOTE:    string actionSet=NULL,

NOTE:    boolean verbose=true

In addition to the Help system that is provided by CAS, the SWAT module also enables you to access the action set and action information using mechanisms supplied by Python and IPython. Python supplies the help function to display information about Python objects. This same function can be used to display information about CAS action sets and actions. We have been using the help action on our CAS object. Let’s see what the Python help function displays.

In [8]: help(conn.help)

Help on builtins.Help in module swat.cas.actions object:

 

class builtins.Help(CASAction)

 |  Shows the parameters for an action or lists all available actions

 |

 |  Parameters

 |  ----------

 |  action : string, optional

 |      specifies the name of the action for which you want help.

 |      The name can be in the form 'actionSetName.actionName' or

 |      just 'actionName'.

 |

 |  actionset : string, optional

 |      specifies the name of the action set for which you want help.

 |      This parameter is ignored if the action parameter is

 |      specified.

 |

 |  verbose : boolean, optional

 |      when set to True, provides more detail for each parameter.

 |      Default: True

 |

 |  Returns

 |  -------

 |  Help object

 

... truncated ...

It gets a little confusing in that code snippet because both the name of the function in Python and the name of the action are the same, but you see that the information displayed by Python’s Help system is essentially the same as what CAS displayed. You can also use the IPython/Jupyter Help system (our preferred method) by following the action name with a question mark.

In [9]: conn.help?

Type:           builtins.Help

String form:    ?.builtins.Help()

File: swat/cas/actions.py

Definition:     ?.help(_self_, action=None,

                               actionset=None,

                               verbose=True, **kwargs)

Docstring:

Shows the parameters for an action or lists all available actions

 

Parameters

----------

action : string, optional

    specifies the name of the action for which you want help. The name

    can be in the form 'actionSetName.actionName' or just 'actionName.

 

actionset : string, optional

    specifies the name of the action set for which you want help. This

    parameter is ignored if the action parameter is specified.

 

verbose : boolean, optional

    when set to True, provides more detail for each parameter.

    Default: True

 

Returns

-------

Help object

 

... truncated ...

These methods of getting help work both on actions and action sets. For example, we know that there is a builtins action set that the help action belongs to. The CAS object has an attribute that maps to the builtins action set just like the help action. We can display the help for the builtins action set as follows:

In [10]: conn.builtins?

Type:        Builtins

String form: <swat.cas.actions.Builtins object at 0x7f7ad35b9048>

File: swat/cas/actions.py

Docstring:

System

 

Actions

-------

builtins.about            : Shows the status of the server

builtins.actionsetinfo    : Shows the build information from loaded

                            action sets

builtins.addnode          : Adds a machine to the server

builtins.cascommon        : Provides parameters that are common to

                            many actions

builtins.echo             : Prints the supplied parameters to the

                            client log

builtins.getgroups        : Shows the groups from the authentication

                            provider

builtins.getlicenseinfo   : Shows the license information for a SAS

                            product

builtins.getusers         : Shows the users from the authentication

                            provider

builtins.help             : Shows the parameters for an action or

                            lists all available actions

builtins.history          : Shows the actions that were run in this

                            session

builtins.httpaddress      : Shows the HTTP address for the server

                            monitor

builtins.installactionset : Loads an action set in new sessions

                            automatically

builtins.listactions      : Shows the parameters for an action or

                            lists all available actions

builtins.listnodes        : Shows the host names used by the server

builtins.loadactionset    : Loads an action set for use in this

                            session

builtins.log              : Shows and modifies logging levels

builtins.modifyqueue      : Modifies the action response queue

                            settings

builtins.ping             : Sends a single request to the server to

                            confirm that the connection is working

builtins.queryactionset   : Shows whether an action set is loaded

builtins.queryname        : Checks whether a name is an action or

                            action set name

builtins.reflect          : Shows detailed parameter information for

                            an action or all actions in an action set

builtins.refreshlicense   : Refresh SAS license information from a

                            file

builtins.refreshtoken     : Refreshes an authentication token for this

                            session

builtins.removenode       : Remove one or more machines from the

                            server

builtins.serverstatus     : Shows the status of the server

builtins.setlicenseinfo   : Sets the license information for a SAS

                            product

builtins.shutdown         : Shuts down the server

builtins.userinfo         : Shows the user information for your

                            connection

Each time that an action set is loaded into the server, the information about the action set and its actions are reflected back to SWAT. SWAT then creates attributes on the CAS object that map to the new action sets and actions, including all of the documentation and Help system hooks. The advantage is that the documentation about actions can never get out of date in the Python client.

In addition to the documentation, since the action sets and actions appear as attributes on the CAS object, you can also use tab completion to display what is in an action set.

In [11]: conn.builtins.<tab>

conn.builtins.about             conn.builtins.loadactionset

conn.builtins.actionsetinfo     conn.builtins.log

conn.builtins.addnode           conn.builtins.modifyqueue

conn.builtins.cascommon         conn.builtins.ping

conn.builtins.echo              conn.builtins.queryactionset

conn.builtins.getgroups         conn.builtins.queryname

conn.builtins.getlicenseinfo    conn.builtins.reflect

conn.builtins.getusers          conn.builtins.refreshlicense

conn.builtins.help              conn.builtins.refreshtoken

conn.builtins.history           conn.builtins.removenode

conn.builtins.httpaddress       conn.builtins.serverstatus

conn.builtins.installactionset  conn.builtins.setlicenseinfo

conn.builtins.listactions       conn.builtins.shutdown

conn.builtins.listnodes         conn.builtins.userinfo

Now that we have seen how to query the server for available action sets and actions, and we know how to get help for the actions, we can move on to some more advanced action calls.

Specifying Action Parameters

We have already seen a few action parameters being used on the help action (action, actionset, and verbose). When a CAS action is added to the CAS object, all action parameters are mapped to Python keyword parameters.  Let’s look at the function signature of help to see the supported action parameters.

In [12]: conn.help?

Type:           builtins.Help

String form:    ?.builtins.Help()

File: swat/cas/actions.py

Definition:     ?.help(_self_, action=None,

                               actionset=None,

                               verbose=True, **kwargs)

... truncated ...

You see that there is one positional argument (_self_2). It is simply the Python object that help is being called on. The rest of the arguments are keyword arguments that are converted to action parameters and passed to the action. The **kwargs argument is always specified on actions as well. It enables the use of extra parameters for debugging and system use. Now let’s look at the descriptions of the parameters (the following output is continued from the preceding help? invocation).

Docstring:

Shows the parameters for an action or lists all available actions

 

Parameters

----------

action : string, optional

    specifies the name of the action for which you want help. The name

    can be in the form 'actionSetName.actionName' or just

    'actionName’.

 

actionset : string, optional

    specifies the name of the action set for which you want help. This

    parameter is ignored if the action parameter is specified.

verbose : boolean, optional

    when set to True, provides more detail for each parameter.

    Default: True

You see that action and actionset are declared as strings, and verbose is declared as a Boolean. Action parameters can take many types of values. The following table shows the supported types:

CAS Type Python Type Description
Boolean bool Value that indicates true or false. This should always be specified using Python’s True or False values.
double float
swat.float64
64-bit floating point number
int32 int
swat.int32
32-bit integer
int64 long (Python 2)
int (Python 3)
swat.int64
64-bit integer
string Unicode (Python 2)
str (Python 3)
Character content. Note that if a byte string is passed as an argument, SWAT attempts to convert it to Unicode using the default encoding.
value list list or dict Collection of items. Python lists become indexed CAS value lists. Python dicts become keyed CAS value lists.

The easiest way to practice more complex arguments is by using the echo action. This action simply prints the value of all parameters that were specified in the action call. The following code demonstrates the echo action with all of the parameter types in the preceding table.

In [13]: out = conn.echo(

   ...:                 boolean_true = True,

   ...:                 boolean_false = False,

   ...:                 double = 3.14159,

   ...:                 int32 = 1776,

   ...:                 int64 = 2**60,

   ...:                 string = u'I like snowmen! u2603',

   ...:                 list = [u'item1', u'item2', u'item3'],

   ...:                 dict = {'key1': 'value1',

   ...:                         'key2': 'value2',

   ...:                         'key3': 3}

   ...:                 )

NOTE: builtin.echo called with 8 parameters.

NOTE:    parameter 1: int32 = 1776

NOTE:    parameter 2: boolean_false = false

 

NOTE:    parameter 3: list = {'item1', 'item2', 'item3'}

NOTE:    parameter 4: boolean_true = true

NOTE:    parameter 5: int64 = 1152921504606846976

NOTE:    parameter 6: double = 3.14159

NOTE:    parameter 7: string = 'I like snowmen! '

NOTE:    parameter 8: dict = {key1 = 'value1', key3 = 3,

                              key2 = 'value2'}

You might notice that the parameters are printed in a different order than what was specified in the echo call. This is simply because keyword parameters in Python are stored in a dictionary, and dictionaries don’t keep keys in a specified order.

You might also notice that the printed syntax is not Python syntax. It is a pseudo-code syntax more similar to the Lua programming language. Lua is used in other parts of CAS as well (such as the history action), so most code-like objects that are printed from CAS are in Lua or syntax that is like Lua. However, the syntax of the two languages (as far as parameter data structures goes) are similar enough that it is easy to see the mapping from one to the other. The biggest differences are in the value list parameters. Indexed lists in the printout use braces, whereas Python uses square brackets. Also, in the keyed list, Python’s keys must be quoted, and the separator that is used between the key and the value is a colon (:) rather than an equal sign (=).

The complexity of the parameter structures is unlimited. Lists can be nested inside dictionaries, and dictionaries can be nested inside lists. A demonstration of nested structures in echo follows:

In [14]: out = conn.echo(

   ...:                 list = ['item1',

   ...:                         'item2',

   ...:                         {

   ...:                          'key1': 'value1',

   ...:                          'key2': {

   ...:                                   'value2': [0, 1, 1, 2, 3]

   ...:                                  }

   ...:                         }

   ...:                        ])

NOTE: builtin.echo called with 1 parameters.

NOTE:    parameter 1: list = {'item1', 'item2',

                              {key1 = 'value1',

                               key2 = {value2 = {0, 1, 1, 2, 3}}}}

Nested dictionary parameters are fairly common in CAS and can create some confusion with the nesting levels and differences between keyword arguments and dictionary literals. Because of this, some utility functions have been added to SWAT to aid in the construction of nested parameters.

Constructing Nested Action Parameters

While specifying dictionary parameters, you can suffer from cognitive dissonance when you switch from keyword arguments to dictionary literals. In keyword arguments, you don’t quote the name of the argument, but in dictionary literals, you do quote the key value. Also, in keyword arguments, you use an equal sign between the name and value, whereas in dictionary literals, you use a colon. Because of this potential confusion, we prefer to use the dict constructor with keyword arguments when nesting action parameters. The preceding code is shown as follows, but instead uses the dict object in Python for nested dictionaries.

In [15]: out = conn.echo(

    ...:                 list = ['item1',

    ...:                         'item2',

    ...:                         dict(

    ...:                             key1 = 'value1',

    ...:                             key2 = dict(

    ...:                                       value2 = [0, 1, 1, 2, 3]

    ...:                                    )

    ...:                         )

    ...:                        ])

NOTE: builtin.echo called with 1 parameters.

NOTE:    parameter 1: list = {'item1', 'item2',

                              {key1 = 'value1',

                               key2 = {value2 = {0, 1, 1, 2, 3}}}}

The SWAT package also includes a utility function called vl (for “value list”). This function returns an enhanced type of dictionary that enables you to build nested structures quickly and easily. It can be used directly in place of the dict call in the preceding code in its simplest form, but it can also be used outside of the action to build up parameter lists before the action call.

The primary feature of the dictionary object that vl returns is that it automatically adds any key to the dictionary when the key is accessed. For example, the following code builds the same nested structure that the previous example does.

In [16]: params = swat.vl()

 

In [17]: params.list[0] = 'item1'

 

In [18]: params.list[1] = 'item2'

 

In [19]: params.list[2].key1 = 'value1'

 

In [20]: params.list[2].key2.value2 = [0, 1, 1, 2, 3]

 

In [21]: params

Out[21]:

{'list': {0: 'item1',

  1: 'item2',

  2: {'key1': 'value1', 'key2': {'value2': [0, 1, 1, 2, 3]}}}}

As you can see in Out[21], just by accessing the key names and index values as if they existed, the nested parameter structure is automatically created behind the scenes. However, note that this does make it fairly easy to introduce errors into the structure with typographical errors. The object will create key values with mistakes in them since it has no way of telling a good key name from a bad one.

Using the special dictionary returned by vl does create some structures that might be surprising. If you look at Out[21], you see that the list parameter, which was a Python list in the previous example, is now a dictionary with integer keys. This discrepancy makes no difference to SWAT. It automatically converts dictionaries with integer keys into Python lists.

Using Python’s ** operator for passing a dictionary as keyword arguments, you see, as follows, that we get the same output from the echo action as we did previously while using the contents of our vl object.

In [22]: out = conn.echo(**params)

NOTE: builtin.echo called with 1 parameters.

NOTE:    parameter 1: list = {'item1', 'item2',

                              {key2 = {value2 = {0, 1, 1, 2, 3}},

                               key1 = 'value1'}}

In addition to constructing parameters, you can also tear them down using Python syntax. For example, the following code deletes the value2 key of the list[2].key2 parameter.

In [23]: del params.list[2].key2.value2

 

In [24]: params

Out[24]: {'list': {0: 'item1', 1: 'item2',

                   2: {'key1': 'value1', 'key2': {}}}}

With the ability to construct CAS action parameters under our belt, there are a couple of features of the CAS parameter processor that can make your life a bit easier. We look at those in the next section.

Automatic Type Casting

So far, we have constructed arguments using either the exact data types expected by the action or the arbitrary parameters in echo. However, the CAS action parameter processor on the server is flexible enough to allow passing in parameters of various types. If possible, those parameters are converted to the proper type before they are used by the action.

The easiest form of type casting to demonstrate is the conversion of strings to numeric values. If an action parameter takes a numeric value, but you pass in a string that contains a numeric representation as its content, the CAS action processor parses out the numeric and sends that value to the action. This behavior can be seen in the following action calls to history, which shows the action call history. The first call uses integers for first and last, but the second call uses strings. In either case, the result is the same due to the automatic conversion on the server side.

# Using integers

In [25]: out = conn.history(first=5, last=7)

NOTE: 5: action session.sessionname / name='py-session-1',

             _apptag='UI', _messageLevel='error'; /* (SUCCESS) */

NOTE: 6: action builtins.echo...; /* (SUCCESS) */

NOTE: 7: action builtins.echo...; /* (SUCCESS) */

 

# Using strings as integer values

In [26]: out = conn.history(first='5', last='7')

NOTE: 5: action session.sessionname / name='py-session-1',

             _apptag='UI', _messageLevel='error'; /* (SUCCESS) */

NOTE: 6: action builtins.echo...; /* (SUCCESS) */

NOTE: 7: action builtins.echo...; /* (SUCCESS) */

Although the server can do some conversions between types, it is generally a good idea to use the correct type. There is another type of automatic conversion that adds syntactical enhancement to action calls. This is the conversion of a scalar-valued parameter to a dictionary value. This is described in the next section.

Scalar Parameter to Dictionary Conversion

Many times when using an action parameter that requires a dictionary as an argument, you use only the first key in the dictionary to specify the parameter. For example, the history action takes a parameter called casout. This parameter specifies an output table to put the history information into. The specification for this parameter follows:  (You can use conn.history? in IPython to see the parameter definition.)

casout : dict or CASTable, optional

    specifies the settings for saving the action history to an

    output table.

 

    casout.name : string or CASTable, optional

        specifies the name to associate with the table.

 

    casout.caslib : string, optional

        specifies the name of the caslib to use.

 

    casout.timestamp : string, optional

        specifies the timestamp to apply to the table. Specify

        the value in the form that is appropriate for your

        session locale.

 

    casout.compress : boolean, optional

        when set to True, data compression is applied to the table.

        Default: False

 

    casout.replace : boolean, optional

        specifies whether to overwrite an existing table with the same

        name.

        Default: False

        

        ... truncated ...

The first key in the casout parameter is name and indicates the name of the CAS table to create. The complete way of specifying this parameter with only the name key follows:

In [27]: out = conn.history(casout=dict(name='hist'))

This is such a common idiom that the server enables you to specify dictionary values with only the first specified key given (for example, name), just using the value of that key. That is a mouthful, but it is easier than it sounds. It just means that rather than having to use the dict to create a nested dictionary, you could simply do the following:

In [28]: out = conn.history(casout='hist')

Of course, if you need to use any other keys in the casout parameter, you must use the dict form. This conversion of a scalar value to a dictionary value is common when specifying input tables and variable lists of tables, which we see later on.

Now that we have spent some time on the input side of CAS actions, let’s look at the output side.

CAS Action Results

Up to now, all of our examples have stored the result of the action calls in a variable, but we have not done anything with the results yet. Let’s start by using our example of all of the CAS parameter types.

In [29]: out = conn.echo(

   ....:                 boolean_true = True,

   ....:                 boolean_false = False,

   ....:                 double = 3.14159,

   ....:                 int32 = 1776,

   ....:                 int64 = 2**60,

   ....:                 string = u'I like snowmen! u2603',

   ....:                 list = [u'item1', u'item2', u'item3'],

   ....:                 dict = {'key1': 'value1',

   ....:                         'key2': 'value2',

   ....:                         'key3': 3}

   ....:                 )

Displaying the contents of the out variable gives:

In [30]: out

Out[30]:

[int32]

 

 1776

 

[boolean_false]

 

 False

 

[list]

 

 ['item1', 'item2', 'item3']

 

[boolean_true]

 

 True

 

[int64]

 

 1152921504606846976

 

[double]

 

 3.14159

 

[string]

 

 'I like snowmen! '

 

[dict]

 

 {'key1': 'value1', 'key2': 'value2', 'key3': 3}

 

+ Elapsed: 0.000494s, mem: 0.0546mb

The object that is held in the out variable is an instance of a Python class called CASResults. The CASResults class is a subclass of collections.OrderedDict. This class is a dictionary-like object that preserves the order of the items in it. If you want only a plain Python dictionary, you can convert it as follows, but you lose the ordering of the items.

In [31]: dict(out)

Out[31]:

{'boolean_false': False,

 'boolean_true': True,

 'dict': {'key1': 'value1', 'key2': 'value2', 'key3': 3},

 'double': 3.14159,

 'int32': 1776,

 'int64': 1152921504606846976,

 'list': ['item1', 'item2', 'item3'],

 'string': 'I like snowmen! '}

In either case, you can traverse and modify the result just as you could any other Python dictionary. For example, if you wanted to walk through the items and print each key and value explicitly, you could do the following:

In [32]: for key, value in out.items():

   ....:     print(key)

   ....:     print(value)

   ....:     print('')

   ....:

int32

1776

 

boolean_false

False

 

list

['item1', 'item2', 'item3']

 

boolean_true

True

 

int64

1152921504606846976

 

double

3.14159

 

string

I like snowmen!

 

dict

{'key1': 'value1', 'key3': 3, 'key2': 'value2'}

Although the object that is returned by an action is always a CASResults object, the contents of that object  depend completely on the purpose of that action. It could be as simple as key/value pairs of scalars and as complex as a complex nested structure of dictionaries such as our parameters in the previous section. Actions that perform analytics typically return one or more DataFrames that contain the results.

Since the results objects are simply Python dictionaries, we assume that you are able to handle operations on them. But we will take a closer look at DataFrames in the next section.

Using DataFrames

The DataFrames that are returned by CAS actions are extensions of the DataFrames that are defined by the Pandas package. Largely, both work the same way. The only difference is that the DataFrames returned by CAS contain extra metadata that is found in typical SAS data sets. This metadata includes things such as SAS data format names, the SAS data type, and column and table labels.

One of the builtins actions that returns a DataFrame is help. This action returns a DataFrame that is filled with the names and descriptions of all the actions that are installed on the server. Each action set gets its own key in the result. Let’s look at some output from help.

The following code runs the help action, lists the keys in the CASResults object that is returned, verifies that it is a SASDataFrame object using Python’s type function, and displays the contents of the DataFrame (some output is reformatted slightly for readability):

In [33]: out = conn.help()

 

In [34]: list(out.keys())

Out[34]:

['accessControl',

 'builtins',

 'loadStreams',

 'search',

 'session',

 'sessionProp',

 'table',

 'tutorial']

 

In [35]: type(out['builtins'])

Out[35]: swat.dataframe.SASDataFrame

 

In [36]: out['builtins']

Out[36]:

                name                              description

0            addNode             Adds a machine to the server

1         removeNode  Remove one or more machines from the...

2               help  Shows the parameters for an action o...

3          listNodes  Shows the host names used by the server

4      loadActionSet  Loads an action set for use in this ...

5   installActionSet  Loads an action set in new sessions ...

6                log        Shows and modifies logging levels

7     queryActionSet    Shows whether an action set is loaded

8          queryName  Checks whether a name is an action o...

9            reflect  Shows detailed parameter information...

10      serverStatus           Shows the status of the server

11             about           Shows the status of the server

12          shutdown                    Shuts down the server

13          userInfo  Shows the user information for your ...

14     actionSetInfo  Shows the build information from loa...

15           history  Shows the actions that were run in t...

16         casCommon  Provides parameters that are common ...

17              ping  Sends a single request to the server...

18              echo  Prints the supplied parameters to th...

19       modifyQueue  Modifies the action response queue s...

20    getLicenseInfo  Shows the license information for a ...

21    refreshLicense  Refresh SAS license information from...

22       httpAddress  Shows the HTTP address for the serve...

We can store this DataFrame in another variable to make it a bit easier to work with. Much like Pandas DataFrames, CASResults objects enable you to access keys as attributes (as long as the name of the key doesn’t collide with an existing attribute or method). This means that we can access the builtins key of the out variable in either of the following ways:

In [37]: blt = out['builtins']

 

In [38]: blt = out.builtins

Which syntax you use depends on personal preference. The dot syntax is a bit cleaner, but the bracketed syntax works regardless of the key value (including white space, or name collisions with existing attributes). Typically, you might use the attribute-style syntax in interactive programming, but the bracketed syntax is better for production code.

Now that we have a handle on the DataFrame, we can do typical DataFrame operations on it such as sorting and filtering. For example, to sort the builtins actions by the name column, you might do the following.

In [39]: blt.sort_values('name')

Out[39]:

                name                              description

11             about           Shows the status of the server

14     actionSetInfo  Shows the build information from loa...

0            addNode             Adds a machine to the server

16         casCommon  Provides parameters that are common ...

18              echo  Prints the supplied parameters to th...

20    getLicenseInfo  Shows the license information for a ...

2               help  Shows the parameters for an action o...

15           history  Shows the actions that were run in t...

22       httpAddress  Shows the HTTP address for the serve...

5   installActionSet  Loads an action set in new sessions ...

3          listNodes  Shows the host names used by the server

4      loadActionSet  Loads an action set for use in this ...

6                log        Shows and modifies logging levels

19       modifyQueue  Modifies the action response queue s...

17              ping  Sends a single request to the server...

7     queryActionSet    Shows whether an action set is loaded

8          queryName  Checks whether a name is an action o...

9            reflect  Shows detailed parameter information...

21    refreshLicense  Refresh SAS license information from...

1         removeNode  Remove one or more machines from the...

10      serverStatus           Shows the status of the server

12          shutdown                    Shuts down the server

13          userInfo  Shows the user information for your ...

If we wanted to combine all of the output DataFrames into one DataFrame, we can use the concat function in the Pandas package. We use the values method of the CASResults object to get all of the values in the dictionary (which are DataFrames, in this case). Then we concatenate them using concat with the ignore_index=True option so that it creates a new unique index for each row.

In [40]: import pandas as pd

 

In [41]: pd.concat(out.values(), ignore_index=True)

Out[41]:

                    name                              description

0             assumeRole                           Assumes a role

1               dropRole                      Relinquishes a role

2            showRolesIn          Shows the currently active role

3       showRolesAllowed  Shows the roles that a user is a mem...

4               isInRole          Shows whether a role is assu...

137          queryCaslib           Checks whether a caslib exists

138            partition                       Partitions a table

139          recordCount  Shows the number of rows in a Cloud ...

140       loadDataSource  Loads one or more data source interf...

141               update                  Updates rows in a table

 

[142 rows x 2 columns]

 

In addition to result values, the CASResults object also contains information about the return status of the action. We look at that in the next section.

Checking the Return Status of CAS Actions

In a perfect world, we always get the parameters to CAS actions correct and the results that we want are always returned. However, in the real world, errors occur. There are several attributes on the CASResults object that can tell you the return status information of a CAS action. They are described in the following table:

Attribute Name Description
severity An integer value that indicates the severity of the return status. A zero status means that the action ran without errors or warnings. A value of 1 means that warnings were generated. A value of 2 means that errors were generated.
reason A string value that indicates the class of error that occurred. Reason codes are described in the “Details” section later in this chapter.
status A text message that describes the error that occurred.
status_code An encoded integer that contains information that can be used by Technical Support to help determine the cause of the error.

In addition to the attributes previously described, the messages attribute contains any messages that were printed during the execution of the action. While you likely saw the messages as they were being printed by the action, it can sometimes be useful to have them accessible on the CASResults object for programmatic inspection. Let’s use the help action for help on an action that does exist and also an action that doesn’t exist to see the status information attributes in action.

The first example, as follows, asks for help on an existing action. The returned status attributes are all zeros for numeric values and None for reason and status. The messages attribute contains a list of all messages that are printed by the server.

In [42]: out = conn.help(action='help')

NOTE: Information for action 'builtins.help':

NOTE: The following parameters are accepted.

      Default values are shown.

NOTE:    string action=NULL,

NOTE:       specifies the name of the action for which you want help.

            The name can be in the form 'actionSetName.actionName'

            or just 'actionName'.

NOTE:    string actionSet=NULL,

NOTE:       specifies the name of the action set for which you

            want help. This parameter is ignored if the action

            parameter is specified.

NOTE:    boolean verbose=true

NOTE:       when set to True, provides more detail for each parameter.

 

In [43]: print(out.status)

None

 

In [44]: out.status_code

Out[44]: 0

 

In [45]: print(out.reason)

None

 

In [46]: out.severity

Out[46]: 0

 

In [47]: out.messages

Out[47]:

["NOTE: Information for action 'builtins.help':",

 'NOTE: The following parameters are accepted. Default values are shown.',

 'NOTE:    string action=NULL,',

 "NOTE:       specifies the name of the action for which you want help. The name can be in the form 'actionSetName.actionName' or just 'actionName.",

 'NOTE:    string actionSet=NULL,',

 'NOTE:       specifies the name of the action set for which you want help. This parameter is ignored if the action parameter is specified.',

 'NOTE:    boolean verbose=true',

 'NOTE:       when set to True, provides more detail for each parameter.']

Now let’s ask for help on a nonexistent action.

In [48]: out = conn.help(action='nonexistent')

ERROR: Action 'nonexistent' was not found.

ERROR: The action stopped due to errors.

 

In [49]: out.status

Out[49]: 'The specified action was not found.'

 

In [50]: out.status_code

Out[50]: 2720406

 

In [51]: out.reason

Out[51]: 'abort'

 

In [52]: out.severity

Out[52]: 2

 

In [53]: out.messages

Out[53]:

["ERROR: Action 'nonexistent' was not found.",

 'ERROR: The action stopped due to errors.']

In this case, all of the attributes contain information about the error that was generated. You can use this information about the CASResults object to capture and handle errors more gracefully in your programs.

If you prefer to use exceptions rather than status codes, you can set the cas.exception_on_severity option to 1 to raise exceptions on warnings, or you can set the option to 2 to raise exceptions on errors. The options system is covered in detail later in this chapter.

In [54]: swat.set_option('cas.exception_on_severity', 2)

 

In [55]: try:

   ....:     out = conn.help(action='nonexistent')

   ....: except swat.SWATCASActionError as err:

   ....:     print(err.response)

   ....:     print('')

   ....:     print(err.connection)

   ....:     print('')

   ....:     # Since this action call fails before producing

   ....:     # results, this will be empty. In actions that

   ....:     # fail partway through, this may contain results

   ....:     # up to the point of failure.

   ....:     print(err.results)

   ....:

ERROR: Action 'nonexistent' was not found.

ERROR: The action stopped due to errors.

CASResponse(messages=[],

disposition=CASDisposition(

    debug=0x88bfc196:TKCASA_GEN_ACTION_NOT_FOUND, reason=abort,

    severity=2, status=The specified action was not found.,

    status_code=2720406), performance=CASPerformance(cpu_system_time=0.0, cpu_user_time=0.0,

    data_movement_bytes=0, data_movement_time=0.0,

    elapsed_time=0.000279, memory=50080, memory_os=8441856,

    memory_quota=12111872, system_cores=32, system_nodes=1,

    system_total_memory=202931654656))

 

CAS('server-name.mycompany.com', 5570, 'username',

    protocol='cas', name='py-session-1',

    session='292319d5-151f-f241-b27c-c3b6a93c1814')

As you can see, working with results from CAS actions is the same as the workflow with any other Python framework. You connect to a CAS host, run a CAS action (either using keyword arguments, building the

parameters ahead of time, or using a mixture of methods), check the return status, and process the dict-like CASResults object that is returned.

Now that we understand the basics of the workflow, let’s look at how to add additional action sets and actions to your CAS session.

Working with CAS Action Sets

In the previous sections, we have already seen that a CAS session has access to multiple action sets that each contain multiple actions. However, all of the action sets we have seen so far have been installed automatically when we connect to CAS. We haven’t shown how to load additional action sets in order to do additional operations such as advanced analytics, machine learning, streaming data analysis, and so on.

In order to load new action sets, we must first see what action sets are available on our server. We can use the actionsetinfo action to do that. We are going to use the all=True option to see all of the action sets that are installed on the server rather than only the ones that are currently loaded.

# Run the actionsetinfo action.

In [56]: asinfo = conn.actionsetinfo(all=True)

 

# Filter the DataFrame to contain only action sets that

# have not been loaded yet.

In [57]: asinfo = asinfo.setinfo[asinfo.setinfo.loaded == 0]

 

# Create a new DataFrame with only columns between

# actionset and label.

In [58]: asinfo = asinfo.ix[:, 'actionset':'label']

 

In [59]: asinfo

Out[59]:

Action set information

 

            actionset                                 label

0              access                                     

1         aggregation                                     

2              astore                                     

3            autotune                                     

4            boolRule                                     

5         cardinality                                     

6          clustering                                     

7        decisionTree

 

…                   …                                     …

 

41                svm                                      

42         textMining                                     

43          textParse                                     

44          transpose                                     

45          varReduce                                     

46            casfors               Simple forecast service

47          tkcsestst                         Session Tests

48             cmpcas                                     

59             tkovrd                     Forecast override

50            qlimreg            QLIMREG CAS Action Library

51              panel                            Panel Data

52           mdchoice           MDCHOICE CAS Action Library

53             copula  CAS Copula Simulation Action Library

54       optimization                          Optimization

55        localsearch             Local Search Optimization

 

[56 rows x 2 columns]

Depending on your installation and licensing, the list varies from system to system. One very useful action set that should be automatically available on all systems is the simple action set. This action set contains actions for simple statistics such as summary statistics (max, min, mean, and so on), histograms, correlations, and frequencies. To load an action set, use the loadactionset action:

In [60]: conn.loadactionset('simple')

NOTE: Added action set 'simple'.

Out[60]:

[actionset]

 

 'simple

 

+ Elapsed: 0.0175s, user: 0.017s, mem: 0.255mb

As you can see, this action returns a CASResults object as described in the previous section. It contains a single key called actionset that contains the name of the action set that was loaded. Typically, you do not need this return value, but it can be used to verify that the action set has been loaded. If you attempt to load an action set that cannot be loaded for some reason (such as incorrect name, no license, or no authorization), the CASResults object is empty.

Now that we have loaded the simple action set, we can get help on it using the usual Python methods.

In [61]: conn.simple?

Type:        Simple

String form: <swat.cas.actions.Simple object at 0x7f3cdf7c07f0>

File: swat/cas/actions.py

Docstring:

Analytics

 

Actions

-------

simple.correlation : Generates a matrix of Pearson product-moment

                     correlation coefficients

simple.crosstab    : Performs one-way or two-way tabulations

simple.distinct    : Computes the distinct number of values of the

                     variables in the variable list

simple.freq        : Generates a frequency distribution for one or

                     more variables

simple.groupby     : Builds BY groups in terms of the variable value

                     combinations given the variables in the variable

                     list

simple.mdsummary   : Calculates multidimensional summaries of numeric

                     variables

simple.numrows     : Shows the number of rows in a Cloud Analytic

                     Services table

simple.paracoord   : Generates a parallel coordinates plot of the

                     variables in the variable list

simple.regression  : Performs a linear regression up to 3rd-order

                     polynomials

simple.summary     : Generates descriptive statistics of numeric

                     variables such as the sample mean, sample

                     variance, sample size, sum of squares, and so on

simple.topk        : Returns the top-K and bottom-K distinct values of

                     each variable included in the variable list based

                     on a user-specified ranking order

Once an action set has been loaded, it cannot be unloaded. The overhead for keeping an action set loaded is minimal, so this issue doesn’t make a significant difference.

That is really all there is to loading action sets. We still do not have data in our system, so we cannot use any of the simple statistics actions yet. Let’s review some final details about options and dealing with errors in the next section, then the following chapter gets into the ways of loading data and using the analytical actions on those data sets.

Details

We have covered the overall workings of connecting to a CAS host, running CAS actions, working with the results of CAS actions, and loading CAS action sets. However, there are some details that we haven’t covered. Although these items aren’t necessary for using SWAT and CAS, they can be quite useful to have in your tool belt.

Getting Help

Even though we have already covered the methods for getting help from CAS, it is an important topic to recap. Every object in the SWAT package uses the standard Python method of surfacing documentation. This includes the help function in Python (for example, help(swat.CAS)), the ? suffix operator in IPython and Jupyter (for example, swat.CAS?), and any other tool that uses Python’s docstrings.

In addition, action sets and actions that are loaded dynamically also support the same Python, IPython, and Jupyter Help system hooks (for example, conn.summary?).

Keep in mind that tab completion on the CAS objects and other objects in the SWAT package can be a quick reminder of the attributes and methods of that object.

These Help system hooks should be sufficient to help you get information about any objects in the SWAT package, CAS action sets, and CAS actions and their parameters. If more detailed information is needed, it is available in the official SAS Viya documentation on the SAS website.

Dealing with Errors

The issue of CAS action errors was discussed to some extent previously in the chapter. There are two methods for dealing with CAS action errors: return codes and exceptions. The default behavior is to surface return codes, but SWAT can be configured to raise exceptions. In the case of return codes, they are

available in the severity attribute of the CASResults object that is returned by CAS action methods. The possible values are shown in the following table:

Severity Description
0 An action was executed with no warnings or errors.
1 Warnings were generated.
2 An error occurred.

In addition to the severity attribute, the CASResults object has an attribute named reason, which is a string that contains the general reason for the warning or error. The possible reasons are shown in the following table:

Reason Description
ok The action was executed with no warnings or errors.
abort The action was aborted.
authentication The action could not authenticate user credentials.
authorization The action was unable to access a resource due to permissions settings.
exception An exception occurred during the execution of an action.
expired-token An authentication token expired.
io An input/output error occurred.
memory Out of memory.
network Networking failure.
session-retry An action restarted and results already returned should be ignored.
unknown The reason is unknown.

The last two attributes to note are status and status_code. The status attribute of CASResults contains a human-readable formatted message that describes the action result. The status_code is a numeric code that can be supplied to Technical Support if further assistance is required. The code contains information that might be useful to Technical Support for determining the source of the problem.

Using CAS Action Exceptions

As mentioned previously, it is possible for SWAT to raise an exception when an error or warning occurs in a CAS action. This is enabled by setting an option in the SWAT package. We haven’t covered SWAT options yet. It is covered in the next section in this chapter. However, the simplest way to enable exceptions is to submit the following code:

In [62]: swat.options.cas.exception_on_severity = 2

This causes SWAT to throw a swat.SWATCASActionError exception when the severity of the response is 2. For exceptions to be raised on warnings and errors, you set the value of the option to 1. Setting it to None disables this feature.

The swat.SWATCASActionError exception contains several attributes that contain information about the context of the exception when it was raised. The attributes are described in the following table:

Attribute Name Description
message The formatted status message from the CAS action.
response The CASResponse object that contains the final response from the CAS host.
connection The CAS connection object that the action was running in.
results The compiled result up to the point of the exception.

The message attribute is simply the same value as the status attribute of the response. The response attribute is an object that we haven’t discussed yet: CASResponse. This object isn’t seen when using the CAS action methods on a CAS object. It is used behind the scenes when compiling the results of a CAS action. Then it discarded. It is possible to use more advanced methods of traversing the responses from CAS where you deal with CASResponse objects directly, but that is not discussed until much later in this book. For now, it is sufficient to know that the CASResponse object has several attributes, including one named disposition, which contains the same result code fields that the CASResults object also contains.

The connection attribute of swat.SWATCASActionError contains the CAS connection object that executed the action. And finally, the results attribute contains the results that have been compiled up to that point. Normally, this is a CASResults object, but there are options on the CAS action methods that we haven’t yet discussed that cause other data values to be inserted into that attribute.

Catching a swat.SWATCASActionError is just like catching any other Python exception. You use a try/except block, where the except statement specifies swat.SWATCASActionError (or any of its parent classes). In the following code, we try to get help on a nonexistent action. The action call is wrapped in a try/except block in which the except statement captures the exception as the variable err. In the except section, the message attribute of the exception is printed. Of course, you can use any of the other fields to handle the exception in any way you prefer.

In [63]: try:

    ...:     out = conn.help(action='nonexistent')

    ...: except swat.SWATCASActionError as err:

    ...:     print(err.message)

    ...:

ERROR: Action 'nonexistent' was not found.

ERROR: The action stopped due to errors.

The specified action was not found.

In addition to CAS action errors, there are other types of errors that you might run into while working with CAS. Let’s look at how to resolve CAS action parameter problems in the next section. But first, let’s reset the exception option back to the default value.

In [64]: swat.options.reset_option('cas.exception_on_severity')

Resolving CAS Action Parameter Problems

CAS action parameter problems can come in many forms: invalid parameter names, invalid parameter values, incorrect nesting of parameter lists, and so on. It can sometimes be difficult to immediately identify the problem. We provide you with some tips in this section that hopefully simplify your correction of CAS action parameter errors.

Let’s start with an action call that creates a parameter error. We haven’t covered the actions being used in this example, but you don’t need to know what they do in order to see what the error is and how to fix it.

In [65]: out = conn.summary(table=dict(name='test',

   ....:                               groupby=['var1', 'var2', 3]))

ERROR: An attempt was made to convert parameter 'table.groupby[2]' from int64 to parameter list, but the conversion failed.

ERROR: The action stopped due to errors.

In the preceding action call, we see that we get an error concerning table.groupby[2]. When you start building parameter structures that are deeply nested, it can be difficult to see exactly what element the message refers to. One of the best tools to track down the error is the cas.trace_actions option. We haven’t reached the section on setting SWAT options, but you can simply submit the following code in order to enable this option:

In [66]: swat.set_option('cas.trace_actions', True)

With this option enabled, we see all of the actions and action parameters printed out in a form that matches the error message from the server. Let’s run the summary action from In[65] again.

In [67]: out = conn.summary(table=dict(name='test',

   ....:                               groupby=['var1', 'var2', 3]))

[simple.summary]

   table.groupby[0] = "var1" (string)

   table.groupby[1] = "var2" (string)

   table.groupby[2] = 3 (int64)

   table.name    = "test" (string)

 

ERROR: An attempt was made to convert parameter 'table.groupby[2]' from int64 to parameter list, but the conversion failed.

ERROR: The action stopped due to errors.

This time we can see from the printed output that table.groupby[2] is the value 3. According to the definition of the summary action, those values must be strings, so that is the source of the error. We can now go into our action call and change the 3 to the proper value.

If you still do not see the problem, it might be a good idea to separate the parameter construction from the action call as we saw in the section on specifying action parameters. Let’s build the action parameters, one at a time, including the erroneous value.

In [68]: params = swat.vl()

 

In [69]: params.table.name = 'test'

 

In [70]: params.table.groupby[0] = 'var1'

 

In [71]: params.table.groupby[1] = 'var2'

 

In [72]: params.table.groupby[2] = 3

 

In [73]: out = conn.summary(**params)

[simple.summary]

   table.groupby[0] = "var1" (string)

   table.groupby[1] = "var2" (string)

   table.groupby[2] = 3 (int64)

   table.name    = "test" (string)

 

ERROR: An attempt was made to convert parameter 'table.groupby[2]' from int64 to parameter list, but the conversion failed.

ERROR: The action stopped due to errors.

Of course, in this case the error is pretty obvious since we entered it in a line by itself. But you have parameters that are built in programmatic ways that might not be so obvious. Now our parameters are held in an object that has a syntax that maps perfectly to the output that is created by the cas.trace_actions option as well as the error message from the server. When we see this error message, we can simply display it directly from the params variable to see what the problem is and correct it.

In [74]: params.table.groupby[2]

Out[74]: 3

 

In [75]: params.table.groupby[2] = 'var3'

Now let’s move on to the remaining class of errors.

Handling Other Errors

All of the other errors that you encounter in SWAT are raised as swat.SWATErrors. Reasons for these errors include the inability to connect to the CAS host, out-of-memory errors, and any other errors that can occur in a networking environment. These can all be handled in the standard Python way of using try/except blocks in your code to capture them.

SWAT Options

As we have seen more than once in this chapter, the cas.exception_on_severity option is one way of changing the behavior of the SWAT package. But there are many others. The options system in SWAT is modeled after the options system in the Pandas package. Most of the function names and behaviors are the same between the two. The primary functions that are used to get, to set, and to query options are shown in the following table.

Function Name Description
describe_option Prints the description of one or more options.
get_option Gets the current value of an option.
set_option Sets the value of one or more options.
reset_option Resets the value of one or more options back to the default.
option_context Creates a context_manager that enables you to set options temporarily in a particular context.

The first thing you might want to do is run the swat.describe_option with no arguments. This prints out a description of all of the available options. Printing the description of all options can be rather lengthy. Only a portion is displayed here:

In [76]: swat.describe_option()

cas.dataset.auto_castable : boolean

    Should a column of CASTable objects be automatically

    created if a CASLib and CAS table name are columns in the data?

    NOTE: This applies to all except the 'tuples' format.

    [default: True] [currently: True]

 

 

cas.dataset.bygroup_as_index : boolean

    If True, any by group columns are set as the DataFrame index.

    [default: True] [currently: True]

 

 

cas.dataset.bygroup_collision_suffix : string

    Suffix to use on the By group column name when a By group column

    is also included as a data column.

    [default: _by] [currently: _by]

 

    ... truncated ...

As you can see, the option information is formatted very much like options in Pandas. The description includes the full name of the option, the expected values or data type, a short description of the option, the default value, and the current value.

Since we have already used the cas.exception_on_severity option, let’s look at its description using describe_option, as follows:

In [77]: swat.describe_option('cas.exception_on_severity')

cas.exception_on_severity : int or None

    Indicates the CAS action severity level at which an exception

    should be raised. None means that no exception should be raised.

    1 would raise exceptions on warnings. 2 would raise exceptions

    on errors.

    [default: None] [currently: None]

As you can see from the description, by default, this option is set to None, which means that exceptions are never thrown. The current value is 2, for exceptions on errors. We can also get the current value of the option by using swat.get_option as follows:

In [78]: print(swat.get_option('cas.exception_on_severity'))

Out[78]: None

Setting options is done using the swat.set_option function. This function accepts parameters in multiple forms. The most explicit way to set an option is to pass in the name of the option as a string followed by the value of the option in the next argument. We have seen this already when we set the cas.exception_on_severity option.

In [79]: swat.set_option('cas.exception_on_severity', 2)

Another form of setting options works only if the last segment of the option name (for example, exception_on_severity for cas.exception_on_severity) is unique among all of the options. If so, then you can use that name as a keyword argument to swat.set_option. The following code is equivalent to the last example:

In [80]: swat.set_option(exception_on_severity=2)

In either of the forms, it is possible to set multiple options with one call to swat.set_option. In the first form, you simply continue to add option names and values as consecutive arguments. In the second form, you add additional keyword arguments. Also, you can mix both. The only caveat is that if you mix them, you must put the positional arguments first just like with any Python function.

In [81]: swat.set_option('cas.dataset.index_name', 'Variable',

   ....:                 'cas.dataset.format', 'dataframe',

   ....:                 exception_on_severity=2,     

   ....:                 print_messages=False)

The next function, swat.reset_option, resets options back to their default value:

In [82]: swat.reset_option('cas.exception_on_severity')

 

In [83]: print(swat.get_option('cas.exception_on_severity'))

None

Note that we used the print function in the preceding code since IPython does not display a None value as a result. Nonetheless, you can see that the value of cas.exception_on_severity was set back to the default of None.

Just as with swat.describe_option, you can specify multiple names of options to reset. In addition, executing swat.reset_option with no arguments resets all of the option values back to their default values.

The final option function is swat.option_context. Again, this works just like its counterpart in Pandas. It enables you to specify option settings for a particular context in Python using the with statement. For example, if we wanted to turn on CAS action tracing for a block of code, the swat.option_context in conjunction with the Python with statement sets up an environment where the options are set at the beginning of the context and are reset back to their previous values at the end of the context. Let’s see this in action using the cas.trace_actions option:

In [84]: swat.reset_option('trace_actions')

 

In [85]: swat.get_option('trace_actions')

Out[85]: False

 

In [86]: with swat.option_context('cas.trace_actions', True):

   ....:     print(swat.get_option('cas.trace_actions'))

   ....:

True

 

In [87]: swat.get_option('cas.trace_actions')

Out[87]: False

As you can see in the preceding example, cas.trace_actions was False before the with context was run. The cas.trace_actions was True when it was inside the with context, and afterward, it went back to False. The swat.option_context arguments work the same way as swat.set_option. So you can specify as many options for the context as you prefer, and they can even be nested in multiple with contexts.

Partial Option Name Matches

As we have seen with swat.set_option and swat.option_context, you can use keyword arguments if the last segment of the option name is unique among all the options. You can also use the same technique with the option names using positional string arguments. In addition, in all of the functions, you can specify any number of the trailing segments as long as they match only one option name. For example, all of the following lines of code are equivalent:

In [88]: swat.set_option('cas.dataset.max_rows_fetched', 500)

 

In [89]: swat.set_option('dataset.max_rows_fetched', 500)

 

In [90]: swat.set_option('max_rows_fetched', 500)

The swat.describe_option also works with patterns that match the beginning of multiple option names. This means that you can display all cas.dataset options by just giving the cas.dataset prefix as an argument to swat.describe_option.

In [91]: swat.describe_option('cas.dataset')

cas.dataset.max_rows_fetched : int

    The maximum number of rows to fetch with methods that use

    the table.fetch action in the background (i.e. the head, tail,

    values, etc. of CASTable).

    [default: 3000] [currently: 500]

 

 

 

cas.dataset.auto_castable : boolean

    Should a column of CASTable objects be automatically

    created if a CASLib and CAS table name are columns in the data?

    NOTE: This applies to all except the 'tuples' format.

    [default: True] [currently: True]

This same technique also works to reset a group of options using swat.reset_option. In either case, you must specify full segment names (for example, cas.dataset), and not just any substring (for example, cas.data).

The swat.options Object

In addition to the option functions in the SWAT package, there is an object-based interface as well: swat.options. This method of settings options is just like using the Pandas options object.

Much like using describe_option without any arguments, using Python’s Help system, you can also display all of the option descriptions with swat.options?.

In [92]: swat.options?

Type:           AttrOption

String form:    <swat.utils.config.AttrOption object at 0x269a0d0>

File: swat/utils/config.py

Definition:     swat.options(self, *args, **kwargs)

Docstring

cas.dataset.auto_castable : boolean

    Should a column of CASTable objects be automatically

    created if a CASLib and CAS table name are columns in the data?

    NOTE: This applies to all except the 'tuples' format.

    [default: True] [currently: True]

 

cas.dataset.bygroup_as_index : boolean

    If True, any by group columns are set as the DataFrame index.

    [default: True] [currently: True]

 

cas.dataset.bygroup_collision_suffix : string

    Suffix to use on the By group column name when a By group column

    is also included as a data column.

    [default: _by] [currently: _by]

 

... truncated ...

Tab completion can also be used to see what options are available.

In [93]: swat.options.<tab>

swat.options.cas.dataset.auto_castable

swat.options.cas.dataset.bygroup_as_index

swat.options.cas.dataset.bygroup_collision_suffix

swat.options.cas.dataset.bygroup_columns

swat.options.cas.dataset.bygroup_formatted_suffix

swat.options.cas.dataset.drop_index_name

swat.options.cas.dataset.format

swat.options.cas.dataset.index_adjustment

swat.options.cas.dataset.index_name

swat.options.cas.dataset.max_rows_fetched

swat.options.cas.exception_on_severity

swat.options.cas.hostname

swat.options.cas.port

swat.options.cas.print_messages

swat.options.cas.protocol

swat.options.cas.trace_actions

swat.options.cas.trace_ui_actions

swat.options.encoding_errors

swat.options.interactive_mode

swat.options.tkpath

Getting the value of an option is as simple as entering the full name of an option as displayed in the tab-completed output.

In [94]: swat.options.cas.trace_actions

Out[94]: False

You set options as you would set any other Python variable.

In [95]: swat.options.cas.trace_actions = True

Just as with set_option and get_option, if the final segment of the option name is unique among all options, you can shorten your swat.options call to include only that segment. This is shown below by eliminating the cas portion of the option name.

In [96]: swat.options.trace_actions = False

The swat.options object also defines a callable interface that returns a context manager like option_context. The following is equivalent to using with swat.option_context(…).

In [97]: with swat.options(trace_actions=True):

   ....:     out = conn.help(action='loadactionset')

   ....:

[builtins.help]

   action = "loadactionset" (string)

The interface that you use is purely a personal preference. The only thing that the swat.options interface is missing is a way to reset options. You must fall back to reset_option to do that.

In addition to the SWAT options, CAS server sessions also have options. Let’s look at those in the next section.

CAS Session Options

The session on the CAS host has options that can be set to change certain behaviors for the current session. These options are set using the setsessopt action. You can view them and get current values using listsessopt and getsessopt. The best way to see all of the options that are available is to use Python’s Help system on the setsessopt action.

In [98]: conn.setsessopt?

 

...

 

Docstring

Sets a session option

 

Parameters

----------

actionmaxtime : int64, optional

    specifies the maximum action time.

    Default: -1

    Note: Value range is -1 <= n <= 86400

 

apptag : string, optional

    specifies the string to prefix to log messages.

    Default:

 

caslib : string, optional

    specifies the caslib name to set as the active caslib.

    Default:

 

collate : string, optional

    specifies the collating sequence for sorting.

    Default: UCA

    Values: MVA, UCA

 

fmtcaslib : string, optional

    specifies the caslib where persisted format libraries are retained.

    Default: FORMATS

 

fmtsearch : string, optional

    specifies the format library search order.

    Default:

 

fmtsearchposition : string, optional

    specifies the position in the format library list where additions

    are made.

    Default: APPEND

    Values: APPEND, CLEAR, INSERT, REPLACE

 

locale : string, optional

    specifies the locale to use for sorting and formatting.

    Default: en_US

 

logflushtime : int64, optional

    specifies the log flush time, in milliseconds. A value of -1

    indicates to flush logs after each action completes. A value of 0

    indicates to flush logs as they are produced.

    Default: 100

    Note: Value range is -1 <= n <= 86400

 

maxtablemem : int64, optional

    specifies the maximum amount of physical memory, in bytes, to

    allocate for a table. After this threshold is reached, the server

    uses temporary files and operating system facilities for memory

    management.

    Default: 16777216

 

metrics : boolean, optional

    when set to True, action metrics are displayed.

    Default: False

 

... truncated ...

There are options for setting the session locale, collation order, time-outs, memory limits, and so on. The metrics option is simple to demonstrate. Let’s get its current value using getsessopt:

In [99]: out = conn.getsessopt('metrics')

 

In [100]: out

Out[100]:

[metrics]

 

 0

 

+ Elapsed: 0.000365s, mem: 0.0626mb

The output is our usual CASResults object with a key that matches the requested option name. In this case, the metrics option is returned as an integer value of zero (corresponding to a Boolean false). You can get the actual value of the metrics option by accessing that key of the CASResults object.

In [101]: out.metrics

Out[101]: 0

Setting the values of options is done using setsessopt with keyword arguments for the option names. You can specify as many options in setsessopt as you need.

In [102]: conn.setsessopt(metrics=True, collate='MVA')

NOTE: Executing action 'sessionProp.setSessOpt'.

NOTE: Action 'sessionprop.setsessopt' used (Total process time):

NOTE:       real time               0.000370 seconds

NOTE:       cpu time                0.000000 seconds (0.00%)

NOTE:       total nodes             1 (32 cores)

NOTE:       total memory            188.99G

NOTE:       memory                  98.19K (0.00%)

Out[102]: + Elapsed: 0.000334s, mem: 0.0959mb

Notice that the metrics option takes effect immediately. We now get performance metrics of the action that is printed to the output. Checking the value of collate, you see that it has been set to MVA.

In [103]: conn.getsessopt('collate').collate

NOTE: Executing action 'sessionProp.getSessOpt'.

NOTE: Action 'sessionprop.getsessopt' used (Total process time):

NOTE:       real time               0.000302 seconds

NOTE:       cpu time                0.000000 seconds (0.00%)

NOTE:       total nodes             1 (32 cores)

NOTE:       total memory            188.99G

NOTE:       memory                  49.91K (0.00%)

Out[103]: 'MVA'

Conclusion

We have covered a lot of territory in this chapter, but you should now have the tools that you need in order to connect to CAS, call CAS actions, and traverse the results. We also covered the possible error conditions that you might run into, and what to do when they happen. Finally, we demonstrated some of the SWAT client and CAS session options to control certain behaviors of both areas. Now that we have that all out of the way, we can move on to something a little more interesting: data and how to get it into CAS.

1 Technically, these parameters can also be specified by setting environment variables CASHOST and CASPORT, and not specified in the CAS constructor.

2 The name _self_ is used instead of the more typical self argument to prevent possible name collisions with action parameters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.186