The SAS SWAT package includes an object-oriented interface to CAS as well as utilities to handle results, format data values, and upload data to CAS. We have already covered the installation of SWAT in an earlier chapter, so let’s jump right into connecting to CAS.
There is a lot of detailed information about parameter structures, error handling, and authentication in this chapter. If you feel like you are getting bogged down, you can always skim over this chapter and come back to it later when you need more formal information about programming using the CAS interface.
In order to connect to a CAS host, you need some form of authentication. There are various authentication mechanisms that you can use with CAS. The different forms of authentication are beyond the scope of this book, so we use user name and password authentication in all of our examples. This form of authentication assumes that you have a login account on the CAS server that you are connecting to. The disadvantage of using a user name and password is that you typically include your password in the source code. However, Authinfo is a solution to this problem, so we’ll show you how to store authentication information using Authinfo as well.
Let’s make a connection to CAS using an explicit user name and a password. For this example, we use an IPython shell. As described previously, to run IPython, you use the ipython command from a command shell or the Anaconda menu in Windows.
The first thing you need to do after starting IPython is to import the SWAT package. This package contains a class called CAS that is the primary interface to your CAS server. It requires at least two arguments: CAS host name or IP address, and the port number that CAS is running on1. Since we use user name and password authentication, we must specify them as the next two arguments. If there are no connection errors, you should now have an open CAS session that is referred to by the conn variable.
In [1]: import swat
In [2]: conn = swat.CAS('server-name.mycompany.com', 5570,
'username', 'password')
In [3]: conn
Out[3]: CAS('server-name.mycompany.com', 5570, 'username',
protocol='cas', name='py-session-1',
session='ffee6422-96b9-484f-a868-03505b320987')
As you can see in Out[3], we display the string representation of the CAS object. You see that it echoes the host name, the port, the user name, and several fields that were not specified. The name and session fields are created once the session is created. The session value contains a unique ID that can be used to make other connections to that same session. The name field is a user-friendly name that is used to tag the session on the server to make it easier to distinguish when querying information about current sessions. This is discussed in more depth later in the chapter.
We mentioned using Authinfo rather than specifying your user name and password explicitly in your programs. The Authinfo specification is based on an older file format called Netrc. Netrc was used by FTP programs to store user names and passwords so that you don’t have to enter authentication information manually. Authinfo works the same way, but adds a few extensions.
The basic format of an Authinfo file follows: (The format occupies two lines to enhance readability.)
host server-name.mycompany.com port 5570
user username password password
Where server-name.mycompany.com is the host name of your CAS server (an IP address can also be used), 5570 is the port number of the CAS server, username is your user ID on that machine, and password is your password on that machine. If you don’t specify a port number, the same user name and password are used on any port on that machine. Each CAS host requires a separate host definition line. In addition, the host name must match exactly what is specified in the CAS constructor. There is no DNS name expansion if you use a shortened name such as server-name.
By default, the Authinfo file is accessed from your home directory under the name .authinfo (on Windows, the name _authinfo is used). It also must have permissions that are set up so that only the owner can read it. This is done using the following command on Linux.
chmod 0600 ~/.authinfo
On Windows, the file permissions should be set so that the file isn’t readable by the Everyone group. Once that file is in place and has the correct permissions, you should be able to make a connection to CAS without specifying your user name and password explicitly.
In [1]: import swat
In [2]: conn = swat.CAS('server-name.mycompany.com', 5570)
In [3]: conn
Out[3]: CAS('server-name.mycompany.com', 5570, 'username',
protocol='cas', name='py-session-1',
session='ffee6422-96b9-484f-a868-03505b320987')
After connecting to CAS, we can continue to a more interesting topic: running CAS actions.
In the previous section, we made a connection to CAS, but didn’t explicitly perform any actions. However, after the connection was made, many actions were performed to obtain information about the server and what resources are available to the CAS installation. One of the things queried for is information about the currently loaded action sets. An action set is a collection of actions that can be executed. Actions can do various things such as return information about the server setup, load data, and perform advanced analytics. To see what action sets and actions are already loaded, you can call the help action on the CAS object that we previously created.
In [4]: out = conn.help()
NOTE: Available Action Sets and Actions:
NOTE: accessControl
NOTE: assumeRole - Assumes a role
NOTE: dropRole - Relinquishes a role
NOTE: showRolesIn - Shows the currently active role
NOTE: showRolesAllowed - Shows the roles that a user is
a member of
NOTE: isInRole - Shows whether a role is assumed
NOTE: isAuthorized - Shows whether access is authorized
NOTE: isAuthorizedActions - Shows whether access is
authorized to actions
NOTE: isAuthorizedTables - Shows whether access is authorized
to tables
NOTE: isAuthorizedColumns - Shows whether access is authorized
to columns
NOTE: listAllPrincipals - Lists all principals that have
explicit access controls
NOTE: whatIsEffective - Lists effective access and
explanations (Origins)
NOTE: listAcsData - Lists access controls for caslibs, tables,
and columns
NOTE: listAcsActionSet - Lists access controls for an action
or action set
NOTE: repAllAcsCaslib - Replaces all access controls for
a caslib
NOTE: repAllAcsTable - Replaces all access controls for a table
NOTE: repAllAcsColumn - Replaces all access controls for
a column
NOTE: repAllAcsActionSet - Replaces all access controls for
an action set
NOTE: repAllAcsAction - Replaces all access controls for
an action
NOTE: updSomeAcsCaslib - Adds, deletes, and modifies some
access controls for a caslib
NOTE: updSomeAcsTable - Adds, deletes, and modifies some
access controls for a table
NOTE: updSomeAcsColumn - Adds, deletes, and modifies some
access controls for a column
NOTE: updSomeAcsActionSet - Adds, deletes, and modifies some
access controls for an action set
NOTE: updSomeAcsAction - Adds, deletes, and modifies some
access controls for an action
NOTE: remAllAcsData - Removes all access controls for a
caslib, table, or column
... truncated ...
This prints out a listing of all of the loaded action sets and the actions within them. It also returns a CASResults structure that contains the action set information in tabular form. The results of CAS actions are discussed later in this chapter.
The help action takes arguments that specify which action sets and actions you want information about. To display help for an action set, use the actionset keyword parameter. The following code displays the help content for the builtins action set.
In [5]: out = conn.help(actionset='builtins')
NOTE: Information for action set 'builtins':
NOTE: builtins
NOTE: addNode - Adds a machine to the server
NOTE: removeNode - Remove one or more machines from the server
NOTE: help - Shows the parameters for an action or lists all
available actions
NOTE: listNodes - Shows the host names used by the server
NOTE: loadActionSet - Loads an action set for use in this
session
NOTE: installActionSet - Loads an action set in new sessions
automatically
NOTE: log - Shows and modifies logging levels
NOTE: queryActionSet - Shows whether an action set is loaded
NOTE: queryName - Checks whether a name is an action or
action set name
NOTE: reflect - Shows detailed parameter information for an
action or all actions in an action set
NOTE: serverStatus - Shows the status of the server
NOTE: about - Shows the status of the server
NOTE: shutdown - Shuts down the server
NOTE: userInfo - Shows the user information for your connection
NOTE: actionSetInfo - Shows the build information from loaded
action sets
NOTE: history - Shows the actions that were run in this session
NOTE: casCommon - Provides parameters that are common to many
actions
NOTE: ping - Sends a single request to the server to confirm
that the connection is working
NOTE: echo - Prints the supplied parameters to the client log
NOTE: modifyQueue - Modifies the action response queue settings
NOTE: getLicenseInfo - Shows the license information for a
SAS product
NOTE: refreshLicense - Refresh SAS license information from
a file
NOTE: httpAddress - Shows the HTTP address for the server
monitor
Notice that help is one of the actions in the builtins action set. To display the Help for an action, use the action keyword argument. You can display the Help for the help action as follows:
In [6]: out = conn.help(action='help')
NOTE: Information for action 'builtins.help':
NOTE: The following parameters are accepted.
Default values are shown.
NOTE: string action=NULL,
NOTE: specifies the name of the action for which you want help.
The name can be in the form 'actionSetName.actionName' or
just 'actionName'.
NOTE: string actionSet=NULL,
NOTE: specifies the name of the action set for which you
want help. This parameter is ignored if the action
parameter is specified.
NOTE: boolean verbose=true
NOTE: when set to True, provides more detail for each parameter.
Looking at the printed notes, you can see that the help action takes the parameters actionset, action, and verbose. We have previously seen the actionset and action parameters. The verbose parameter is enabled, which means that you will get a full description of all of the parameters of the action. You can suppress the parameter descriptions by specifying verbose=False as follows:
In [7]: out = conn.help(action='help', verbose=False)
NOTE: Information for action 'builtins.help':
NOTE: The following parameters are accepted.
Default values are shown.
NOTE: string action=NULL,
NOTE: string actionSet=NULL,
NOTE: boolean verbose=true
In addition to the Help system that is provided by CAS, the SWAT module also enables you to access the action set and action information using mechanisms supplied by Python and IPython. Python supplies the help function to display information about Python objects. This same function can be used to display information about CAS action sets and actions. We have been using the help action on our CAS object. Let’s see what the Python help function displays.
In [8]: help(conn.help)
Help on builtins.Help in module swat.cas.actions object:
class builtins.Help(CASAction)
| Shows the parameters for an action or lists all available actions
|
| Parameters
| ----------
| action : string, optional
| specifies the name of the action for which you want help.
| The name can be in the form 'actionSetName.actionName' or
| just 'actionName'.
|
| actionset : string, optional
| specifies the name of the action set for which you want help.
| This parameter is ignored if the action parameter is
| specified.
|
| verbose : boolean, optional
| when set to True, provides more detail for each parameter.
| Default: True
|
| Returns
| -------
| Help object
... truncated ...
It gets a little confusing in that code snippet because both the name of the function in Python and the name of the action are the same, but you see that the information displayed by Python’s Help system is essentially the same as what CAS displayed. You can also use the IPython/Jupyter Help system (our preferred method) by following the action name with a question mark.
In [9]: conn.help?
Type: builtins.Help
String form: ?.builtins.Help()
File: swat/cas/actions.py
Definition: ?.help(_self_, action=None,
actionset=None,
verbose=True, **kwargs)
Docstring:
Shows the parameters for an action or lists all available actions
Parameters
----------
action : string, optional
specifies the name of the action for which you want help. The name
can be in the form 'actionSetName.actionName' or just 'actionName.
actionset : string, optional
specifies the name of the action set for which you want help. This
parameter is ignored if the action parameter is specified.
verbose : boolean, optional
when set to True, provides more detail for each parameter.
Default: True
Returns
-------
Help object
... truncated ...
These methods of getting help work both on actions and action sets. For example, we know that there is a builtins action set that the help action belongs to. The CAS object has an attribute that maps to the builtins action set just like the help action. We can display the help for the builtins action set as follows:
In [10]: conn.builtins?
Type: Builtins
String form: <swat.cas.actions.Builtins object at 0x7f7ad35b9048>
File: swat/cas/actions.py
Docstring:
System
Actions
-------
builtins.about : Shows the status of the server
builtins.actionsetinfo : Shows the build information from loaded
action sets
builtins.addnode : Adds a machine to the server
builtins.cascommon : Provides parameters that are common to
many actions
builtins.echo : Prints the supplied parameters to the
client log
builtins.getgroups : Shows the groups from the authentication
provider
builtins.getlicenseinfo : Shows the license information for a SAS
product
builtins.getusers : Shows the users from the authentication
provider
builtins.help : Shows the parameters for an action or
lists all available actions
builtins.history : Shows the actions that were run in this
session
builtins.httpaddress : Shows the HTTP address for the server
monitor
builtins.installactionset : Loads an action set in new sessions
automatically
builtins.listactions : Shows the parameters for an action or
lists all available actions
builtins.listnodes : Shows the host names used by the server
builtins.loadactionset : Loads an action set for use in this
session
builtins.log : Shows and modifies logging levels
builtins.modifyqueue : Modifies the action response queue
settings
builtins.ping : Sends a single request to the server to
confirm that the connection is working
builtins.queryactionset : Shows whether an action set is loaded
builtins.queryname : Checks whether a name is an action or
action set name
builtins.reflect : Shows detailed parameter information for
an action or all actions in an action set
builtins.refreshlicense : Refresh SAS license information from a
file
builtins.refreshtoken : Refreshes an authentication token for this
session
builtins.removenode : Remove one or more machines from the
server
builtins.serverstatus : Shows the status of the server
builtins.setlicenseinfo : Sets the license information for a SAS
product
builtins.shutdown : Shuts down the server
builtins.userinfo : Shows the user information for your
connection
Each time that an action set is loaded into the server, the information about the action set and its actions are reflected back to SWAT. SWAT then creates attributes on the CAS object that map to the new action sets and actions, including all of the documentation and Help system hooks. The advantage is that the documentation about actions can never get out of date in the Python client.
In addition to the documentation, since the action sets and actions appear as attributes on the CAS object, you can also use tab completion to display what is in an action set.
In [11]: conn.builtins.<tab>
conn.builtins.about conn.builtins.loadactionset
conn.builtins.actionsetinfo conn.builtins.log
conn.builtins.addnode conn.builtins.modifyqueue
conn.builtins.cascommon conn.builtins.ping
conn.builtins.echo conn.builtins.queryactionset
conn.builtins.getgroups conn.builtins.queryname
conn.builtins.getlicenseinfo conn.builtins.reflect
conn.builtins.getusers conn.builtins.refreshlicense
conn.builtins.help conn.builtins.refreshtoken
conn.builtins.history conn.builtins.removenode
conn.builtins.httpaddress conn.builtins.serverstatus
conn.builtins.installactionset conn.builtins.setlicenseinfo
conn.builtins.listactions conn.builtins.shutdown
conn.builtins.listnodes conn.builtins.userinfo
Now that we have seen how to query the server for available action sets and actions, and we know how to get help for the actions, we can move on to some more advanced action calls.
We have already seen a few action parameters being used on the help action (action, actionset, and verbose). When a CAS action is added to the CAS object, all action parameters are mapped to Python keyword parameters. Let’s look at the function signature of help to see the supported action parameters.
In [12]: conn.help?
Type: builtins.Help
String form: ?.builtins.Help()
File: swat/cas/actions.py
Definition: ?.help(_self_, action=None,
actionset=None,
verbose=True, **kwargs)
... truncated ...
You see that there is one positional argument (_self_2). It is simply the Python object that help is being called on. The rest of the arguments are keyword arguments that are converted to action parameters and passed to the action. The **kwargs argument is always specified on actions as well. It enables the use of extra parameters for debugging and system use. Now let’s look at the descriptions of the parameters (the following output is continued from the preceding help? invocation).
Docstring:
Shows the parameters for an action or lists all available actions
Parameters
----------
action : string, optional
specifies the name of the action for which you want help. The name
can be in the form 'actionSetName.actionName' or just
'actionName’.
actionset : string, optional
specifies the name of the action set for which you want help. This
parameter is ignored if the action parameter is specified.
verbose : boolean, optional
when set to True, provides more detail for each parameter.
Default: True
You see that action and actionset are declared as strings, and verbose is declared as a Boolean. Action parameters can take many types of values. The following table shows the supported types:
CAS Type | Python Type | Description |
Boolean | bool | Value that indicates true or false. This should always be specified using Python’s True or False values. |
double | float swat.float64 |
64-bit floating point number |
int32 | int swat.int32 |
32-bit integer |
int64 | long (Python 2) int (Python 3) swat.int64 |
64-bit integer |
string | Unicode (Python 2) str (Python 3) |
Character content. Note that if a byte string is passed as an argument, SWAT attempts to convert it to Unicode using the default encoding. |
value list | list or dict | Collection of items. Python lists become indexed CAS value lists. Python dicts become keyed CAS value lists. |
The easiest way to practice more complex arguments is by using the echo action. This action simply prints the value of all parameters that were specified in the action call. The following code demonstrates the echo action with all of the parameter types in the preceding table.
In [13]: out = conn.echo(
...: boolean_true = True,
...: boolean_false = False,
...: double = 3.14159,
...: int32 = 1776,
...: int64 = 2**60,
...: string = u'I like snowmen! u2603',
...: list = [u'item1', u'item2', u'item3'],
...: dict = {'key1': 'value1',
...: 'key2': 'value2',
...: 'key3': 3}
...: )
NOTE: builtin.echo called with 8 parameters.
NOTE: parameter 1: int32 = 1776
NOTE: parameter 2: boolean_false = false
NOTE: parameter 3: list = {'item1', 'item2', 'item3'}
NOTE: parameter 4: boolean_true = true
NOTE: parameter 5: int64 = 1152921504606846976
NOTE: parameter 6: double = 3.14159
NOTE: parameter 7: string = 'I like snowmen! '
NOTE: parameter 8: dict = {key1 = 'value1', key3 = 3,
key2 = 'value2'}
You might notice that the parameters are printed in a different order than what was specified in the echo call. This is simply because keyword parameters in Python are stored in a dictionary, and dictionaries don’t keep keys in a specified order.
You might also notice that the printed syntax is not Python syntax. It is a pseudo-code syntax more similar to the Lua programming language. Lua is used in other parts of CAS as well (such as the history action), so most code-like objects that are printed from CAS are in Lua or syntax that is like Lua. However, the syntax of the two languages (as far as parameter data structures goes) are similar enough that it is easy to see the mapping from one to the other. The biggest differences are in the value list parameters. Indexed lists in the printout use braces, whereas Python uses square brackets. Also, in the keyed list, Python’s keys must be quoted, and the separator that is used between the key and the value is a colon (:) rather than an equal sign (=).
The complexity of the parameter structures is unlimited. Lists can be nested inside dictionaries, and dictionaries can be nested inside lists. A demonstration of nested structures in echo follows:
In [14]: out = conn.echo(
...: list = ['item1',
...: 'item2',
...: {
...: 'key1': 'value1',
...: 'key2': {
...: 'value2': [0, 1, 1, 2, 3]
...: }
...: }
...: ])
NOTE: builtin.echo called with 1 parameters.
NOTE: parameter 1: list = {'item1', 'item2',
{key1 = 'value1',
key2 = {value2 = {0, 1, 1, 2, 3}}}}
Nested dictionary parameters are fairly common in CAS and can create some confusion with the nesting levels and differences between keyword arguments and dictionary literals. Because of this, some utility functions have been added to SWAT to aid in the construction of nested parameters.
While specifying dictionary parameters, you can suffer from cognitive dissonance when you switch from keyword arguments to dictionary literals. In keyword arguments, you don’t quote the name of the argument, but in dictionary literals, you do quote the key value. Also, in keyword arguments, you use an equal sign between the name and value, whereas in dictionary literals, you use a colon. Because of this potential confusion, we prefer to use the dict constructor with keyword arguments when nesting action parameters. The preceding code is shown as follows, but instead uses the dict object in Python for nested dictionaries.
In [15]: out = conn.echo(
...: list = ['item1',
...: 'item2',
...: dict(
...: key1 = 'value1',
...: key2 = dict(
...: value2 = [0, 1, 1, 2, 3]
...: )
...: )
...: ])
NOTE: builtin.echo called with 1 parameters.
NOTE: parameter 1: list = {'item1', 'item2',
{key1 = 'value1',
key2 = {value2 = {0, 1, 1, 2, 3}}}}
The SWAT package also includes a utility function called vl (for “value list”). This function returns an enhanced type of dictionary that enables you to build nested structures quickly and easily. It can be used directly in place of the dict call in the preceding code in its simplest form, but it can also be used outside of the action to build up parameter lists before the action call.
The primary feature of the dictionary object that vl returns is that it automatically adds any key to the dictionary when the key is accessed. For example, the following code builds the same nested structure that the previous example does.
In [16]: params = swat.vl()
In [17]: params.list[0] = 'item1'
In [18]: params.list[1] = 'item2'
In [19]: params.list[2].key1 = 'value1'
In [20]: params.list[2].key2.value2 = [0, 1, 1, 2, 3]
In [21]: params
Out[21]:
{'list': {0: 'item1',
1: 'item2',
2: {'key1': 'value1', 'key2': {'value2': [0, 1, 1, 2, 3]}}}}
As you can see in Out[21], just by accessing the key names and index values as if they existed, the nested parameter structure is automatically created behind the scenes. However, note that this does make it fairly easy to introduce errors into the structure with typographical errors. The object will create key values with mistakes in them since it has no way of telling a good key name from a bad one.
Using the special dictionary returned by vl does create some structures that might be surprising. If you look at Out[21], you see that the list parameter, which was a Python list in the previous example, is now a dictionary with integer keys. This discrepancy makes no difference to SWAT. It automatically converts dictionaries with integer keys into Python lists.
Using Python’s ** operator for passing a dictionary as keyword arguments, you see, as follows, that we get the same output from the echo action as we did previously while using the contents of our vl object.
In [22]: out = conn.echo(**params)
NOTE: builtin.echo called with 1 parameters.
NOTE: parameter 1: list = {'item1', 'item2',
{key2 = {value2 = {0, 1, 1, 2, 3}},
key1 = 'value1'}}
In addition to constructing parameters, you can also tear them down using Python syntax. For example, the following code deletes the value2 key of the list[2].key2 parameter.
In [23]: del params.list[2].key2.value2
In [24]: params
Out[24]: {'list': {0: 'item1', 1: 'item2',
2: {'key1': 'value1', 'key2': {}}}}
With the ability to construct CAS action parameters under our belt, there are a couple of features of the CAS parameter processor that can make your life a bit easier. We look at those in the next section.
So far, we have constructed arguments using either the exact data types expected by the action or the arbitrary parameters in echo. However, the CAS action parameter processor on the server is flexible enough to allow passing in parameters of various types. If possible, those parameters are converted to the proper type before they are used by the action.
The easiest form of type casting to demonstrate is the conversion of strings to numeric values. If an action parameter takes a numeric value, but you pass in a string that contains a numeric representation as its content, the CAS action processor parses out the numeric and sends that value to the action. This behavior can be seen in the following action calls to history, which shows the action call history. The first call uses integers for first and last, but the second call uses strings. In either case, the result is the same due to the automatic conversion on the server side.
# Using integers
In [25]: out = conn.history(first=5, last=7)
NOTE: 5: action session.sessionname / name='py-session-1',
_apptag='UI', _messageLevel='error'; /* (SUCCESS) */
NOTE: 6: action builtins.echo...; /* (SUCCESS) */
NOTE: 7: action builtins.echo...; /* (SUCCESS) */
# Using strings as integer values
In [26]: out = conn.history(first='5', last='7')
NOTE: 5: action session.sessionname / name='py-session-1',
_apptag='UI', _messageLevel='error'; /* (SUCCESS) */
NOTE: 6: action builtins.echo...; /* (SUCCESS) */
NOTE: 7: action builtins.echo...; /* (SUCCESS) */
Although the server can do some conversions between types, it is generally a good idea to use the correct type. There is another type of automatic conversion that adds syntactical enhancement to action calls. This is the conversion of a scalar-valued parameter to a dictionary value. This is described in the next section.
Many times when using an action parameter that requires a dictionary as an argument, you use only the first key in the dictionary to specify the parameter. For example, the history action takes a parameter called casout. This parameter specifies an output table to put the history information into. The specification for this parameter follows: (You can use conn.history? in IPython to see the parameter definition.)
casout : dict or CASTable, optional
specifies the settings for saving the action history to an
output table.
casout.name : string or CASTable, optional
specifies the name to associate with the table.
casout.caslib : string, optional
specifies the name of the caslib to use.
casout.timestamp : string, optional
specifies the timestamp to apply to the table. Specify
the value in the form that is appropriate for your
session locale.
casout.compress : boolean, optional
when set to True, data compression is applied to the table.
Default: False
casout.replace : boolean, optional
specifies whether to overwrite an existing table with the same
name.
Default: False
... truncated ...
The first key in the casout parameter is name and indicates the name of the CAS table to create. The complete way of specifying this parameter with only the name key follows:
In [27]: out = conn.history(casout=dict(name='hist'))
This is such a common idiom that the server enables you to specify dictionary values with only the first specified key given (for example, name), just using the value of that key. That is a mouthful, but it is easier than it sounds. It just means that rather than having to use the dict to create a nested dictionary, you could simply do the following:
In [28]: out = conn.history(casout='hist')
Of course, if you need to use any other keys in the casout parameter, you must use the dict form. This conversion of a scalar value to a dictionary value is common when specifying input tables and variable lists of tables, which we see later on.
Now that we have spent some time on the input side of CAS actions, let’s look at the output side.
Up to now, all of our examples have stored the result of the action calls in a variable, but we have not done anything with the results yet. Let’s start by using our example of all of the CAS parameter types.
In [29]: out = conn.echo(
....: boolean_true = True,
....: boolean_false = False,
....: double = 3.14159,
....: int32 = 1776,
....: int64 = 2**60,
....: string = u'I like snowmen! u2603',
....: list = [u'item1', u'item2', u'item3'],
....: dict = {'key1': 'value1',
....: 'key2': 'value2',
....: 'key3': 3}
....: )
Displaying the contents of the out variable gives:
In [30]: out
Out[30]:
[int32]
1776
[boolean_false]
False
[list]
['item1', 'item2', 'item3']
[boolean_true]
True
[int64]
1152921504606846976
[double]
3.14159
[string]
'I like snowmen! ☃'
[dict]
{'key1': 'value1', 'key2': 'value2', 'key3': 3}
+ Elapsed: 0.000494s, mem: 0.0546mb
The object that is held in the out variable is an instance of a Python class called CASResults. The CASResults class is a subclass of collections.OrderedDict. This class is a dictionary-like object that preserves the order of the items in it. If you want only a plain Python dictionary, you can convert it as follows, but you lose the ordering of the items.
In [31]: dict(out)
Out[31]:
{'boolean_false': False,
'boolean_true': True,
'dict': {'key1': 'value1', 'key2': 'value2', 'key3': 3},
'double': 3.14159,
'int32': 1776,
'int64': 1152921504606846976,
'list': ['item1', 'item2', 'item3'],
'string': 'I like snowmen! ☃'}
In either case, you can traverse and modify the result just as you could any other Python dictionary. For example, if you wanted to walk through the items and print each key and value explicitly, you could do the following:
In [32]: for key, value in out.items():
....: print(key)
....: print(value)
....: print('')
....:
int32
1776
boolean_false
False
list
['item1', 'item2', 'item3']
boolean_true
True
int64
1152921504606846976
double
3.14159
string
I like snowmen! ☃
dict
{'key1': 'value1', 'key3': 3, 'key2': 'value2'}
Although the object that is returned by an action is always a CASResults object, the contents of that object depend completely on the purpose of that action. It could be as simple as key/value pairs of scalars and as complex as a complex nested structure of dictionaries such as our parameters in the previous section. Actions that perform analytics typically return one or more DataFrames that contain the results.
Since the results objects are simply Python dictionaries, we assume that you are able to handle operations on them. But we will take a closer look at DataFrames in the next section.
The DataFrames that are returned by CAS actions are extensions of the DataFrames that are defined by the Pandas package. Largely, both work the same way. The only difference is that the DataFrames returned by CAS contain extra metadata that is found in typical SAS data sets. This metadata includes things such as SAS data format names, the SAS data type, and column and table labels.
One of the builtins actions that returns a DataFrame is help. This action returns a DataFrame that is filled with the names and descriptions of all the actions that are installed on the server. Each action set gets its own key in the result. Let’s look at some output from help.
The following code runs the help action, lists the keys in the CASResults object that is returned, verifies that it is a SASDataFrame object using Python’s type function, and displays the contents of the DataFrame (some output is reformatted slightly for readability):
In [33]: out = conn.help()
In [34]: list(out.keys())
Out[34]:
['accessControl',
'builtins',
'loadStreams',
'search',
'session',
'sessionProp',
'table',
'tutorial']
In [35]: type(out['builtins'])
Out[35]: swat.dataframe.SASDataFrame
In [36]: out['builtins']
Out[36]:
name description
0 addNode Adds a machine to the server
1 removeNode Remove one or more machines from the...
2 help Shows the parameters for an action o...
3 listNodes Shows the host names used by the server
4 loadActionSet Loads an action set for use in this ...
5 installActionSet Loads an action set in new sessions ...
6 log Shows and modifies logging levels
7 queryActionSet Shows whether an action set is loaded
8 queryName Checks whether a name is an action o...
9 reflect Shows detailed parameter information...
10 serverStatus Shows the status of the server
11 about Shows the status of the server
12 shutdown Shuts down the server
13 userInfo Shows the user information for your ...
14 actionSetInfo Shows the build information from loa...
15 history Shows the actions that were run in t...
16 casCommon Provides parameters that are common ...
17 ping Sends a single request to the server...
18 echo Prints the supplied parameters to th...
19 modifyQueue Modifies the action response queue s...
20 getLicenseInfo Shows the license information for a ...
21 refreshLicense Refresh SAS license information from...
22 httpAddress Shows the HTTP address for the serve...
We can store this DataFrame in another variable to make it a bit easier to work with. Much like Pandas DataFrames, CASResults objects enable you to access keys as attributes (as long as the name of the key doesn’t collide with an existing attribute or method). This means that we can access the builtins key of the out variable in either of the following ways:
In [37]: blt = out['builtins']
In [38]: blt = out.builtins
Which syntax you use depends on personal preference. The dot syntax is a bit cleaner, but the bracketed syntax works regardless of the key value (including white space, or name collisions with existing attributes). Typically, you might use the attribute-style syntax in interactive programming, but the bracketed syntax is better for production code.
Now that we have a handle on the DataFrame, we can do typical DataFrame operations on it such as sorting and filtering. For example, to sort the builtins actions by the name column, you might do the following.
In [39]: blt.sort_values('name')
Out[39]:
name description
11 about Shows the status of the server
14 actionSetInfo Shows the build information from loa...
0 addNode Adds a machine to the server
16 casCommon Provides parameters that are common ...
18 echo Prints the supplied parameters to th...
20 getLicenseInfo Shows the license information for a ...
2 help Shows the parameters for an action o...
15 history Shows the actions that were run in t...
22 httpAddress Shows the HTTP address for the serve...
5 installActionSet Loads an action set in new sessions ...
3 listNodes Shows the host names used by the server
4 loadActionSet Loads an action set for use in this ...
6 log Shows and modifies logging levels
19 modifyQueue Modifies the action response queue s...
17 ping Sends a single request to the server...
7 queryActionSet Shows whether an action set is loaded
8 queryName Checks whether a name is an action o...
9 reflect Shows detailed parameter information...
21 refreshLicense Refresh SAS license information from...
1 removeNode Remove one or more machines from the...
10 serverStatus Shows the status of the server
12 shutdown Shuts down the server
13 userInfo Shows the user information for your ...
If we wanted to combine all of the output DataFrames into one DataFrame, we can use the concat function in the Pandas package. We use the values method of the CASResults object to get all of the values in the dictionary (which are DataFrames, in this case). Then we concatenate them using concat with the ignore_index=True option so that it creates a new unique index for each row.
In [40]: import pandas as pd
In [41]: pd.concat(out.values(), ignore_index=True)
Out[41]:
name description
0 assumeRole Assumes a role
1 dropRole Relinquishes a role
2 showRolesIn Shows the currently active role
3 showRolesAllowed Shows the roles that a user is a mem...
4 isInRole Shows whether a role is assu...
137 queryCaslib Checks whether a caslib exists
138 partition Partitions a table
139 recordCount Shows the number of rows in a Cloud ...
140 loadDataSource Loads one or more data source interf...
141 update Updates rows in a table
[142 rows x 2 columns]
In addition to result values, the CASResults object also contains information about the return status of the action. We look at that in the next section.
In a perfect world, we always get the parameters to CAS actions correct and the results that we want are always returned. However, in the real world, errors occur. There are several attributes on the CASResults object that can tell you the return status information of a CAS action. They are described in the following table:
Attribute Name | Description |
severity | An integer value that indicates the severity of the return status. A zero status means that the action ran without errors or warnings. A value of 1 means that warnings were generated. A value of 2 means that errors were generated. |
reason | A string value that indicates the class of error that occurred. Reason codes are described in the “Details” section later in this chapter. |
status | A text message that describes the error that occurred. |
status_code | An encoded integer that contains information that can be used by Technical Support to help determine the cause of the error. |
In addition to the attributes previously described, the messages attribute contains any messages that were printed during the execution of the action. While you likely saw the messages as they were being printed by the action, it can sometimes be useful to have them accessible on the CASResults object for programmatic inspection. Let’s use the help action for help on an action that does exist and also an action that doesn’t exist to see the status information attributes in action.
The first example, as follows, asks for help on an existing action. The returned status attributes are all zeros for numeric values and None for reason and status. The messages attribute contains a list of all messages that are printed by the server.
In [42]: out = conn.help(action='help')
NOTE: Information for action 'builtins.help':
NOTE: The following parameters are accepted.
Default values are shown.
NOTE: string action=NULL,
NOTE: specifies the name of the action for which you want help.
The name can be in the form 'actionSetName.actionName'
or just 'actionName'.
NOTE: string actionSet=NULL,
NOTE: specifies the name of the action set for which you
want help. This parameter is ignored if the action
parameter is specified.
NOTE: boolean verbose=true
NOTE: when set to True, provides more detail for each parameter.
In [43]: print(out.status)
None
In [44]: out.status_code
Out[44]: 0
In [45]: print(out.reason)
None
In [46]: out.severity
Out[46]: 0
In [47]: out.messages
Out[47]:
["NOTE: Information for action 'builtins.help':",
'NOTE: The following parameters are accepted. Default values are shown.',
'NOTE: string action=NULL,',
"NOTE: specifies the name of the action for which you want help. The name can be in the form 'actionSetName.actionName' or just 'actionName.",
'NOTE: string actionSet=NULL,',
'NOTE: specifies the name of the action set for which you want help. This parameter is ignored if the action parameter is specified.',
'NOTE: boolean verbose=true',
'NOTE: when set to True, provides more detail for each parameter.']
Now let’s ask for help on a nonexistent action.
In [48]: out = conn.help(action='nonexistent')
ERROR: Action 'nonexistent' was not found.
ERROR: The action stopped due to errors.
In [49]: out.status
Out[49]: 'The specified action was not found.'
In [50]: out.status_code
Out[50]: 2720406
In [51]: out.reason
Out[51]: 'abort'
In [52]: out.severity
Out[52]: 2
In [53]: out.messages
Out[53]:
["ERROR: Action 'nonexistent' was not found.",
'ERROR: The action stopped due to errors.']
In this case, all of the attributes contain information about the error that was generated. You can use this information about the CASResults object to capture and handle errors more gracefully in your programs.
If you prefer to use exceptions rather than status codes, you can set the cas.exception_on_severity option to 1 to raise exceptions on warnings, or you can set the option to 2 to raise exceptions on errors. The options system is covered in detail later in this chapter.
In [54]: swat.set_option('cas.exception_on_severity', 2)
In [55]: try:
....: out = conn.help(action='nonexistent')
....: except swat.SWATCASActionError as err:
....: print(err.response)
....: print('')
....: print(err.connection)
....: print('')
....: # Since this action call fails before producing
....: # results, this will be empty. In actions that
....: # fail partway through, this may contain results
....: # up to the point of failure.
....: print(err.results)
....:
ERROR: Action 'nonexistent' was not found.
ERROR: The action stopped due to errors.
CASResponse(messages=[],
disposition=CASDisposition(
debug=0x88bfc196:TKCASA_GEN_ACTION_NOT_FOUND, reason=abort,
severity=2, status=The specified action was not found.,
status_code=2720406), performance=CASPerformance(cpu_system_time=0.0, cpu_user_time=0.0,
data_movement_bytes=0, data_movement_time=0.0,
elapsed_time=0.000279, memory=50080, memory_os=8441856,
memory_quota=12111872, system_cores=32, system_nodes=1,
system_total_memory=202931654656))
CAS('server-name.mycompany.com', 5570, 'username',
protocol='cas', name='py-session-1',
session='292319d5-151f-f241-b27c-c3b6a93c1814')
As you can see, working with results from CAS actions is the same as the workflow with any other Python framework. You connect to a CAS host, run a CAS action (either using keyword arguments, building the
parameters ahead of time, or using a mixture of methods), check the return status, and process the dict-like CASResults object that is returned.
Now that we understand the basics of the workflow, let’s look at how to add additional action sets and actions to your CAS session.
In the previous sections, we have already seen that a CAS session has access to multiple action sets that each contain multiple actions. However, all of the action sets we have seen so far have been installed automatically when we connect to CAS. We haven’t shown how to load additional action sets in order to do additional operations such as advanced analytics, machine learning, streaming data analysis, and so on.
In order to load new action sets, we must first see what action sets are available on our server. We can use the actionsetinfo action to do that. We are going to use the all=True option to see all of the action sets that are installed on the server rather than only the ones that are currently loaded.
# Run the actionsetinfo action.
In [56]: asinfo = conn.actionsetinfo(all=True)
# Filter the DataFrame to contain only action sets that
# have not been loaded yet.
In [57]: asinfo = asinfo.setinfo[asinfo.setinfo.loaded == 0]
# Create a new DataFrame with only columns between
# actionset and label.
In [58]: asinfo = asinfo.ix[:, 'actionset':'label']
In [59]: asinfo
Out[59]:
Action set information
actionset label
0 access
1 aggregation
2 astore
3 autotune
4 boolRule
5 cardinality
6 clustering
7 decisionTree
… … …
41 svm
42 textMining
43 textParse
44 transpose
45 varReduce
46 casfors Simple forecast service
47 tkcsestst Session Tests
48 cmpcas
59 tkovrd Forecast override
50 qlimreg QLIMREG CAS Action Library
51 panel Panel Data
52 mdchoice MDCHOICE CAS Action Library
53 copula CAS Copula Simulation Action Library
54 optimization Optimization
55 localsearch Local Search Optimization
[56 rows x 2 columns]
Depending on your installation and licensing, the list varies from system to system. One very useful action set that should be automatically available on all systems is the simple action set. This action set contains actions for simple statistics such as summary statistics (max, min, mean, and so on), histograms, correlations, and frequencies. To load an action set, use the loadactionset action:
In [60]: conn.loadactionset('simple')
NOTE: Added action set 'simple'.
Out[60]:
[actionset]
'simple
+ Elapsed: 0.0175s, user: 0.017s, mem: 0.255mb
As you can see, this action returns a CASResults object as described in the previous section. It contains a single key called actionset that contains the name of the action set that was loaded. Typically, you do not need this return value, but it can be used to verify that the action set has been loaded. If you attempt to load an action set that cannot be loaded for some reason (such as incorrect name, no license, or no authorization), the CASResults object is empty.
Now that we have loaded the simple action set, we can get help on it using the usual Python methods.
In [61]: conn.simple?
Type: Simple
String form: <swat.cas.actions.Simple object at 0x7f3cdf7c07f0>
File: swat/cas/actions.py
Docstring:
Analytics
Actions
-------
simple.correlation : Generates a matrix of Pearson product-moment
correlation coefficients
simple.crosstab : Performs one-way or two-way tabulations
simple.distinct : Computes the distinct number of values of the
variables in the variable list
simple.freq : Generates a frequency distribution for one or
more variables
simple.groupby : Builds BY groups in terms of the variable value
combinations given the variables in the variable
list
simple.mdsummary : Calculates multidimensional summaries of numeric
variables
simple.numrows : Shows the number of rows in a Cloud Analytic
Services table
simple.paracoord : Generates a parallel coordinates plot of the
variables in the variable list
simple.regression : Performs a linear regression up to 3rd-order
polynomials
simple.summary : Generates descriptive statistics of numeric
variables such as the sample mean, sample
variance, sample size, sum of squares, and so on
simple.topk : Returns the top-K and bottom-K distinct values of
each variable included in the variable list based
on a user-specified ranking order
Once an action set has been loaded, it cannot be unloaded. The overhead for keeping an action set loaded is minimal, so this issue doesn’t make a significant difference.
That is really all there is to loading action sets. We still do not have data in our system, so we cannot use any of the simple statistics actions yet. Let’s review some final details about options and dealing with errors in the next section, then the following chapter gets into the ways of loading data and using the analytical actions on those data sets.
We have covered the overall workings of connecting to a CAS host, running CAS actions, working with the results of CAS actions, and loading CAS action sets. However, there are some details that we haven’t covered. Although these items aren’t necessary for using SWAT and CAS, they can be quite useful to have in your tool belt.
Even though we have already covered the methods for getting help from CAS, it is an important topic to recap. Every object in the SWAT package uses the standard Python method of surfacing documentation. This includes the help function in Python (for example, help(swat.CAS)), the ? suffix operator in IPython and Jupyter (for example, swat.CAS?), and any other tool that uses Python’s docstrings.
In addition, action sets and actions that are loaded dynamically also support the same Python, IPython, and Jupyter Help system hooks (for example, conn.summary?).
Keep in mind that tab completion on the CAS objects and other objects in the SWAT package can be a quick reminder of the attributes and methods of that object.
These Help system hooks should be sufficient to help you get information about any objects in the SWAT package, CAS action sets, and CAS actions and their parameters. If more detailed information is needed, it is available in the official SAS Viya documentation on the SAS website.
The issue of CAS action errors was discussed to some extent previously in the chapter. There are two methods for dealing with CAS action errors: return codes and exceptions. The default behavior is to surface return codes, but SWAT can be configured to raise exceptions. In the case of return codes, they are
available in the severity attribute of the CASResults object that is returned by CAS action methods. The possible values are shown in the following table:
Severity | Description |
0 | An action was executed with no warnings or errors. |
1 | Warnings were generated. |
2 | An error occurred. |
In addition to the severity attribute, the CASResults object has an attribute named reason, which is a string that contains the general reason for the warning or error. The possible reasons are shown in the following table:
Reason | Description |
ok | The action was executed with no warnings or errors. |
abort | The action was aborted. |
authentication | The action could not authenticate user credentials. |
authorization | The action was unable to access a resource due to permissions settings. |
exception | An exception occurred during the execution of an action. |
expired-token | An authentication token expired. |
io | An input/output error occurred. |
memory | Out of memory. |
network | Networking failure. |
session-retry | An action restarted and results already returned should be ignored. |
unknown | The reason is unknown. |
The last two attributes to note are status and status_code. The status attribute of CASResults contains a human-readable formatted message that describes the action result. The status_code is a numeric code that can be supplied to Technical Support if further assistance is required. The code contains information that might be useful to Technical Support for determining the source of the problem.
As mentioned previously, it is possible for SWAT to raise an exception when an error or warning occurs in a CAS action. This is enabled by setting an option in the SWAT package. We haven’t covered SWAT options yet. It is covered in the next section in this chapter. However, the simplest way to enable exceptions is to submit the following code:
In [62]: swat.options.cas.exception_on_severity = 2
This causes SWAT to throw a swat.SWATCASActionError exception when the severity of the response is 2. For exceptions to be raised on warnings and errors, you set the value of the option to 1. Setting it to None disables this feature.
The swat.SWATCASActionError exception contains several attributes that contain information about the context of the exception when it was raised. The attributes are described in the following table:
Attribute Name | Description |
message | The formatted status message from the CAS action. |
response | The CASResponse object that contains the final response from the CAS host. |
connection | The CAS connection object that the action was running in. |
results | The compiled result up to the point of the exception. |
The message attribute is simply the same value as the status attribute of the response. The response attribute is an object that we haven’t discussed yet: CASResponse. This object isn’t seen when using the CAS action methods on a CAS object. It is used behind the scenes when compiling the results of a CAS action. Then it discarded. It is possible to use more advanced methods of traversing the responses from CAS where you deal with CASResponse objects directly, but that is not discussed until much later in this book. For now, it is sufficient to know that the CASResponse object has several attributes, including one named disposition, which contains the same result code fields that the CASResults object also contains.
The connection attribute of swat.SWATCASActionError contains the CAS connection object that executed the action. And finally, the results attribute contains the results that have been compiled up to that point. Normally, this is a CASResults object, but there are options on the CAS action methods that we haven’t yet discussed that cause other data values to be inserted into that attribute.
Catching a swat.SWATCASActionError is just like catching any other Python exception. You use a try/except block, where the except statement specifies swat.SWATCASActionError (or any of its parent classes). In the following code, we try to get help on a nonexistent action. The action call is wrapped in a try/except block in which the except statement captures the exception as the variable err. In the except section, the message attribute of the exception is printed. Of course, you can use any of the other fields to handle the exception in any way you prefer.
In [63]: try:
...: out = conn.help(action='nonexistent')
...: except swat.SWATCASActionError as err:
...: print(err.message)
...:
ERROR: Action 'nonexistent' was not found.
ERROR: The action stopped due to errors.
The specified action was not found.
In addition to CAS action errors, there are other types of errors that you might run into while working with CAS. Let’s look at how to resolve CAS action parameter problems in the next section. But first, let’s reset the exception option back to the default value.
In [64]: swat.options.reset_option('cas.exception_on_severity')
CAS action parameter problems can come in many forms: invalid parameter names, invalid parameter values, incorrect nesting of parameter lists, and so on. It can sometimes be difficult to immediately identify the problem. We provide you with some tips in this section that hopefully simplify your correction of CAS action parameter errors.
Let’s start with an action call that creates a parameter error. We haven’t covered the actions being used in this example, but you don’t need to know what they do in order to see what the error is and how to fix it.
In [65]: out = conn.summary(table=dict(name='test',
....: groupby=['var1', 'var2', 3]))
ERROR: An attempt was made to convert parameter 'table.groupby[2]' from int64 to parameter list, but the conversion failed.
ERROR: The action stopped due to errors.
In the preceding action call, we see that we get an error concerning table.groupby[2]. When you start building parameter structures that are deeply nested, it can be difficult to see exactly what element the message refers to. One of the best tools to track down the error is the cas.trace_actions option. We haven’t reached the section on setting SWAT options, but you can simply submit the following code in order to enable this option:
In [66]: swat.set_option('cas.trace_actions', True)
With this option enabled, we see all of the actions and action parameters printed out in a form that matches the error message from the server. Let’s run the summary action from In[65] again.
In [67]: out = conn.summary(table=dict(name='test',
....: groupby=['var1', 'var2', 3]))
[simple.summary]
table.groupby[0] = "var1" (string)
table.groupby[1] = "var2" (string)
table.groupby[2] = 3 (int64)
table.name = "test" (string)
ERROR: An attempt was made to convert parameter 'table.groupby[2]' from int64 to parameter list, but the conversion failed.
ERROR: The action stopped due to errors.
This time we can see from the printed output that table.groupby[2] is the value 3. According to the definition of the summary action, those values must be strings, so that is the source of the error. We can now go into our action call and change the 3 to the proper value.
If you still do not see the problem, it might be a good idea to separate the parameter construction from the action call as we saw in the section on specifying action parameters. Let’s build the action parameters, one at a time, including the erroneous value.
In [68]: params = swat.vl()
In [69]: params.table.name = 'test'
In [70]: params.table.groupby[0] = 'var1'
In [71]: params.table.groupby[1] = 'var2'
In [72]: params.table.groupby[2] = 3
In [73]: out = conn.summary(**params)
[simple.summary]
table.groupby[0] = "var1" (string)
table.groupby[1] = "var2" (string)
table.groupby[2] = 3 (int64)
table.name = "test" (string)
ERROR: An attempt was made to convert parameter 'table.groupby[2]' from int64 to parameter list, but the conversion failed.
ERROR: The action stopped due to errors.
Of course, in this case the error is pretty obvious since we entered it in a line by itself. But you have parameters that are built in programmatic ways that might not be so obvious. Now our parameters are held in an object that has a syntax that maps perfectly to the output that is created by the cas.trace_actions option as well as the error message from the server. When we see this error message, we can simply display it directly from the params variable to see what the problem is and correct it.
In [74]: params.table.groupby[2]
Out[74]: 3
In [75]: params.table.groupby[2] = 'var3'
Now let’s move on to the remaining class of errors.
All of the other errors that you encounter in SWAT are raised as swat.SWATErrors. Reasons for these errors include the inability to connect to the CAS host, out-of-memory errors, and any other errors that can occur in a networking environment. These can all be handled in the standard Python way of using try/except blocks in your code to capture them.
As we have seen more than once in this chapter, the cas.exception_on_severity option is one way of changing the behavior of the SWAT package. But there are many others. The options system in SWAT is modeled after the options system in the Pandas package. Most of the function names and behaviors are the same between the two. The primary functions that are used to get, to set, and to query options are shown in the following table.
Function Name | Description |
describe_option | Prints the description of one or more options. |
get_option | Gets the current value of an option. |
set_option | Sets the value of one or more options. |
reset_option | Resets the value of one or more options back to the default. |
option_context | Creates a context_manager that enables you to set options temporarily in a particular context. |
The first thing you might want to do is run the swat.describe_option with no arguments. This prints out a description of all of the available options. Printing the description of all options can be rather lengthy. Only a portion is displayed here:
In [76]: swat.describe_option()
cas.dataset.auto_castable : boolean
Should a column of CASTable objects be automatically
created if a CASLib and CAS table name are columns in the data?
NOTE: This applies to all except the 'tuples' format.
[default: True] [currently: True]
cas.dataset.bygroup_as_index : boolean
If True, any by group columns are set as the DataFrame index.
[default: True] [currently: True]
cas.dataset.bygroup_collision_suffix : string
Suffix to use on the By group column name when a By group column
is also included as a data column.
[default: _by] [currently: _by]
... truncated ...
As you can see, the option information is formatted very much like options in Pandas. The description includes the full name of the option, the expected values or data type, a short description of the option, the default value, and the current value.
Since we have already used the cas.exception_on_severity option, let’s look at its description using describe_option, as follows:
In [77]: swat.describe_option('cas.exception_on_severity')
cas.exception_on_severity : int or None
Indicates the CAS action severity level at which an exception
should be raised. None means that no exception should be raised.
1 would raise exceptions on warnings. 2 would raise exceptions
on errors.
[default: None] [currently: None]
As you can see from the description, by default, this option is set to None, which means that exceptions are never thrown. The current value is 2, for exceptions on errors. We can also get the current value of the option by using swat.get_option as follows:
In [78]: print(swat.get_option('cas.exception_on_severity'))
Out[78]: None
Setting options is done using the swat.set_option function. This function accepts parameters in multiple forms. The most explicit way to set an option is to pass in the name of the option as a string followed by the value of the option in the next argument. We have seen this already when we set the cas.exception_on_severity option.
In [79]: swat.set_option('cas.exception_on_severity', 2)
Another form of setting options works only if the last segment of the option name (for example, exception_on_severity for cas.exception_on_severity) is unique among all of the options. If so, then you can use that name as a keyword argument to swat.set_option. The following code is equivalent to the last example:
In [80]: swat.set_option(exception_on_severity=2)
In either of the forms, it is possible to set multiple options with one call to swat.set_option. In the first form, you simply continue to add option names and values as consecutive arguments. In the second form, you add additional keyword arguments. Also, you can mix both. The only caveat is that if you mix them, you must put the positional arguments first just like with any Python function.
In [81]: swat.set_option('cas.dataset.index_name', 'Variable',
....: 'cas.dataset.format', 'dataframe',
....: exception_on_severity=2,
....: print_messages=False)
The next function, swat.reset_option, resets options back to their default value:
In [82]: swat.reset_option('cas.exception_on_severity')
In [83]: print(swat.get_option('cas.exception_on_severity'))
None
Note that we used the print function in the preceding code since IPython does not display a None value as a result. Nonetheless, you can see that the value of cas.exception_on_severity was set back to the default of None.
Just as with swat.describe_option, you can specify multiple names of options to reset. In addition, executing swat.reset_option with no arguments resets all of the option values back to their default values.
The final option function is swat.option_context. Again, this works just like its counterpart in Pandas. It enables you to specify option settings for a particular context in Python using the with statement. For example, if we wanted to turn on CAS action tracing for a block of code, the swat.option_context in conjunction with the Python with statement sets up an environment where the options are set at the beginning of the context and are reset back to their previous values at the end of the context. Let’s see this in action using the cas.trace_actions option:
In [84]: swat.reset_option('trace_actions')
In [85]: swat.get_option('trace_actions')
Out[85]: False
In [86]: with swat.option_context('cas.trace_actions', True):
....: print(swat.get_option('cas.trace_actions'))
....:
True
In [87]: swat.get_option('cas.trace_actions')
Out[87]: False
As you can see in the preceding example, cas.trace_actions was False before the with context was run. The cas.trace_actions was True when it was inside the with context, and afterward, it went back to False. The swat.option_context arguments work the same way as swat.set_option. So you can specify as many options for the context as you prefer, and they can even be nested in multiple with contexts.
As we have seen with swat.set_option and swat.option_context, you can use keyword arguments if the last segment of the option name is unique among all the options. You can also use the same technique with the option names using positional string arguments. In addition, in all of the functions, you can specify any number of the trailing segments as long as they match only one option name. For example, all of the following lines of code are equivalent:
In [88]: swat.set_option('cas.dataset.max_rows_fetched', 500)
In [89]: swat.set_option('dataset.max_rows_fetched', 500)
In [90]: swat.set_option('max_rows_fetched', 500)
The swat.describe_option also works with patterns that match the beginning of multiple option names. This means that you can display all cas.dataset options by just giving the cas.dataset prefix as an argument to swat.describe_option.
In [91]: swat.describe_option('cas.dataset')
cas.dataset.max_rows_fetched : int
The maximum number of rows to fetch with methods that use
the table.fetch action in the background (i.e. the head, tail,
values, etc. of CASTable).
[default: 3000] [currently: 500]
cas.dataset.auto_castable : boolean
Should a column of CASTable objects be automatically
created if a CASLib and CAS table name are columns in the data?
NOTE: This applies to all except the 'tuples' format.
[default: True] [currently: True]
This same technique also works to reset a group of options using swat.reset_option. In either case, you must specify full segment names (for example, cas.dataset), and not just any substring (for example, cas.data).
In addition to the option functions in the SWAT package, there is an object-based interface as well: swat.options. This method of settings options is just like using the Pandas options object.
Much like using describe_option without any arguments, using Python’s Help system, you can also display all of the option descriptions with swat.options?.
In [92]: swat.options?
Type: AttrOption
String form: <swat.utils.config.AttrOption object at 0x269a0d0>
File: swat/utils/config.py
Definition: swat.options(self, *args, **kwargs)
Docstring
cas.dataset.auto_castable : boolean
Should a column of CASTable objects be automatically
created if a CASLib and CAS table name are columns in the data?
NOTE: This applies to all except the 'tuples' format.
[default: True] [currently: True]
cas.dataset.bygroup_as_index : boolean
If True, any by group columns are set as the DataFrame index.
[default: True] [currently: True]
cas.dataset.bygroup_collision_suffix : string
Suffix to use on the By group column name when a By group column
is also included as a data column.
[default: _by] [currently: _by]
... truncated ...
Tab completion can also be used to see what options are available.
In [93]: swat.options.<tab>
swat.options.cas.dataset.auto_castable
swat.options.cas.dataset.bygroup_as_index
swat.options.cas.dataset.bygroup_collision_suffix
swat.options.cas.dataset.bygroup_columns
swat.options.cas.dataset.bygroup_formatted_suffix
swat.options.cas.dataset.drop_index_name
swat.options.cas.dataset.format
swat.options.cas.dataset.index_adjustment
swat.options.cas.dataset.index_name
swat.options.cas.dataset.max_rows_fetched
swat.options.cas.exception_on_severity
swat.options.cas.hostname
swat.options.cas.port
swat.options.cas.print_messages
swat.options.cas.protocol
swat.options.cas.trace_actions
swat.options.cas.trace_ui_actions
swat.options.encoding_errors
swat.options.interactive_mode
swat.options.tkpath
Getting the value of an option is as simple as entering the full name of an option as displayed in the tab-completed output.
In [94]: swat.options.cas.trace_actions
Out[94]: False
You set options as you would set any other Python variable.
In [95]: swat.options.cas.trace_actions = True
Just as with set_option and get_option, if the final segment of the option name is unique among all options, you can shorten your swat.options call to include only that segment. This is shown below by eliminating the cas portion of the option name.
In [96]: swat.options.trace_actions = False
The swat.options object also defines a callable interface that returns a context manager like option_context. The following is equivalent to using with swat.option_context(…).
In [97]: with swat.options(trace_actions=True):
....: out = conn.help(action='loadactionset')
....:
[builtins.help]
action = "loadactionset" (string)
The interface that you use is purely a personal preference. The only thing that the swat.options interface is missing is a way to reset options. You must fall back to reset_option to do that.
In addition to the SWAT options, CAS server sessions also have options. Let’s look at those in the next section.
The session on the CAS host has options that can be set to change certain behaviors for the current session. These options are set using the setsessopt action. You can view them and get current values using listsessopt and getsessopt. The best way to see all of the options that are available is to use Python’s Help system on the setsessopt action.
In [98]: conn.setsessopt?
...
Docstring
Sets a session option
Parameters
----------
actionmaxtime : int64, optional
specifies the maximum action time.
Default: -1
Note: Value range is -1 <= n <= 86400
apptag : string, optional
specifies the string to prefix to log messages.
Default:
caslib : string, optional
specifies the caslib name to set as the active caslib.
Default:
collate : string, optional
specifies the collating sequence for sorting.
Default: UCA
Values: MVA, UCA
fmtcaslib : string, optional
specifies the caslib where persisted format libraries are retained.
Default: FORMATS
fmtsearch : string, optional
specifies the format library search order.
Default:
fmtsearchposition : string, optional
specifies the position in the format library list where additions
are made.
Default: APPEND
Values: APPEND, CLEAR, INSERT, REPLACE
locale : string, optional
specifies the locale to use for sorting and formatting.
Default: en_US
logflushtime : int64, optional
specifies the log flush time, in milliseconds. A value of -1
indicates to flush logs after each action completes. A value of 0
indicates to flush logs as they are produced.
Default: 100
Note: Value range is -1 <= n <= 86400
maxtablemem : int64, optional
specifies the maximum amount of physical memory, in bytes, to
allocate for a table. After this threshold is reached, the server
uses temporary files and operating system facilities for memory
management.
Default: 16777216
metrics : boolean, optional
when set to True, action metrics are displayed.
Default: False
... truncated ...
There are options for setting the session locale, collation order, time-outs, memory limits, and so on. The metrics option is simple to demonstrate. Let’s get its current value using getsessopt:
In [99]: out = conn.getsessopt('metrics')
In [100]: out
Out[100]:
[metrics]
0
+ Elapsed: 0.000365s, mem: 0.0626mb
The output is our usual CASResults object with a key that matches the requested option name. In this case, the metrics option is returned as an integer value of zero (corresponding to a Boolean false). You can get the actual value of the metrics option by accessing that key of the CASResults object.
In [101]: out.metrics
Out[101]: 0
Setting the values of options is done using setsessopt with keyword arguments for the option names. You can specify as many options in setsessopt as you need.
In [102]: conn.setsessopt(metrics=True, collate='MVA')
NOTE: Executing action 'sessionProp.setSessOpt'.
NOTE: Action 'sessionprop.setsessopt' used (Total process time):
NOTE: real time 0.000370 seconds
NOTE: cpu time 0.000000 seconds (0.00%)
NOTE: total nodes 1 (32 cores)
NOTE: total memory 188.99G
NOTE: memory 98.19K (0.00%)
Out[102]: + Elapsed: 0.000334s, mem: 0.0959mb
Notice that the metrics option takes effect immediately. We now get performance metrics of the action that is printed to the output. Checking the value of collate, you see that it has been set to MVA.
In [103]: conn.getsessopt('collate').collate
NOTE: Executing action 'sessionProp.getSessOpt'.
NOTE: Action 'sessionprop.getsessopt' used (Total process time):
NOTE: real time 0.000302 seconds
NOTE: cpu time 0.000000 seconds (0.00%)
NOTE: total nodes 1 (32 cores)
NOTE: total memory 188.99G
NOTE: memory 49.91K (0.00%)
Out[103]: 'MVA'
We have covered a lot of territory in this chapter, but you should now have the tools that you need in order to connect to CAS, call CAS actions, and traverse the results. We also covered the possible error conditions that you might run into, and what to do when they happen. Finally, we demonstrated some of the SWAT client and CAS session options to control certain behaviors of both areas. Now that we have that all out of the way, we can move on to something a little more interesting: data and how to get it into CAS.
1 Technically, these parameters can also be specified by setting environment variables CASHOST and CASPORT, and not specified in the CAS constructor.
2 The name _self_ is used instead of the more typical self argument to prevent possible name collisions with action parameters.
18.118.12.186