Chapter 5: The CASAction and CASTable Objects

Getting Started with the CASAction Objects

Setting Nested Parameters

Setting Parameters as Attributes

Retrieving and Removing Action Parameters

First Steps with the CASTable Object

Manually Creating a CASTable Object

CASTable Action Interface

Setting CASTable Parameters

Managing Parameters Using the Method Interface

Managing Parameters Using the Attribute Interface

Materializing CASTable Parameters

Conclusion

All of the CAS action calls that we have covered so far look like method calls on an object (such as conn.loadtable(…), conn.columninfo(…), and so on). However, looks can be deceiving. Python has the ability to make any object look like a function. All of these action calls are actually instances of CASAction objects that are being called.

Another commonly used object in SWAT is the CASTable object. The CASTable object is the most important object in the SWAT package besides the CAS connection object. It keeps your CAS table settings in one object. Also, it enables you to directly call CAS actions on the table object rather than always having to supply them as a parameter to the action. There are other more advanced features that we discuss in the next chapter.

Both the CASAction and CASTable objects manage parameters using the same methods, which is the reason for discussing them together here. In this chapter, we first look at creating CASAction instances manually and interacting with them before we call the action on the server. Then we’ll move on to managing CASTable parameters and running actions on the CASTable objects directly.

Getting Started with the CASAction Objects

One of the actions that we’ve used so far and that includes several options to play with is the fetch action. Here is a partial listing of the options from the IPython Help facility.

In [1]: import swat

 

In [2]: conn = swat.CAS('server-name.mycompany.com', 5570)

 

In [3]: conn.fetch?

 

 

 

Parameters

----------

 

table : dict or CASTable

    specifies the table name, caslib, and other common parameters.

 

...

 

from, from_ : int64, optional

    specifies the ordinal position of the first row to return.

    Default: 1

 

to : int64, optional

    specifies the ordinal position of the last row to return.

    Default: 20

 

format : boolean, optional

    when set to True, formats are applied to the variables.

    Default: False

 

maxrows : int32, optional

    specifies the maximum number of rows to return.

    Default: 1000

 

sastypes : boolean, optional

    when set to True, converts data to fixed-width character and

    double data types.

    Default: True

 

sortlocale : string, optional

    Locale to use for comparisons during sort.

 

sortby : list of dicts, optional

    specifies the variables and variable settings for sorting results.

 

    sortby[*].name : string

        specifies the variable name to use for sorting.

 

    sortby[*].order : string, optional

        specifies whether the ascending or descending value for the

        variable is used.

        Default: ASCENDING

        Values: ASCENDING, DESCENDING

 

    sortby[*].formatted : string, optional

        specifies whether the formatted or raw value for the variable

        is used.

        Default: RAW

        Values: FORMATTED, RAW

 

usebinary : boolean, optional

    Default: False

 

index : boolean, optional

    When set to True, adds a column named Index to the results that is

    to identify each row.

    Default: True

 

fetchvars : list of dicts, optional

    fetchvars[*].name : string

        specifies the name for the variable.

 

... truncated ...

Until now, we have specified these options only in the action call itself. However, an alternative is to create the CASAction instance manually and then to apply parameters, one at a time. To do this, you simply capitalize the first character of the action name.

In [4]: fa = conn.Fetch()

 

In [5]: type(fa)

Out[5]: swat.cas.actions.table.Fetch

 

In [6]: type(fa).__bases__

Out[6]: (swat.cas.actions.CASAction,)

This instance that we created is equivalent to the instance that you get by accessing conn.fetch(…). It just hasn’t been executed yet. Let’s call the Fetch instance with a table parameter as we’ve done in previous examples.

In [7]: out = conn.loadtable('data/iris.csv', caslib='casuser')

 

In [8]: fa(table=dict(name='data.iris', caslib='casuser'), to=5)

Out[8]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  sepal_width  petal_length  petal_width species

 0           5.1          3.5           1.4          0.2  setosa

 1           4.9          3.0           1.4          0.2  setosa

 2           4.7          3.2           1.3          0.2  setosa

 3           4.6          3.1           1.5          0.2  setosa

 4           5.0          3.6           1.4          0.2  setosa

 

+ Elapsed: 0.00368s, user: 0.004s, mem: 1.65mb

Of course, the instance is reusable, so we can call it again with other options.

In [9]: fa(table=dict(name='data.iris', caslib='casuser'), to=5,

  ....:    sortby=['sepal_length', 'sepal_width'])

Out[9]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  sepal_width  petal_length  petal_width species

 0           4.3          3.0           1.1          0.1  setosa

 1           4.4          2.9           1.4          0.2  setosa

 2           4.4          3.0           1.3          0.2  setosa

 3           4.4          3.2           1.3          0.2  setosa

 4           4.5          2.3           1.3          0.3  setosa

 

+ Elapsed: 0.0155s, user: 0.012s, sys: 0.003s, mem: 8.58mb

As you can see, the call syntax for actions can get verbose rather quickly. An alternative way of setting parameters can improve readability.

The CASAction class defines two methods for setting and getting parameters: set_params and get_params. These methods also exist in the singular forms, set_param and get_param, but they share the exact same syntax. To set parameters on an action instance, you use set_params. The most basic usage of set_params is to specify the parameter name as a string, followed by the parameter value as the next argument. For example, to set the table and to parameters as in the fetch call in the preceding example, we do the following:

In [10]: fa.set_params('table', dict(name='data.iris',

   ....:                             caslib='casuser'),

   ....:               'to', 5)

 

In [11]: fa

Out[11]: ?.table.Fetch(table=dict(caslib='casuser',

                                  name='data.iris'), to=5)

As you can see, when we print the result of the action instance now, the parameters that we set are now embedded in the instance. If we were to call the action instance now, those parameters would automatically be used on the action call to the server.

In [12]: fa()

Out[12]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  sepal_width  petal_length  petal_width species

 0           5.1          3.5           1.4          0.2  setosa

 1           4.9          3.0           1.4          0.2  setosa

 2           4.7          3.2           1.3          0.2  setosa

 3           4.6          3.1           1.5          0.2  setosa

 4           5.0          3.6           1.4          0.2  setosa

 

+ Elapsed: 0.00317s, user: 0.002s, mem: 1.64mb

In addition to the “name string followed by value” form, you can also set parameters using two-element tuples of name/value pairs or dictionaries of name/value pairs, or just using keyword arguments. Which one you use is a personal choice. Each of the following methods is equivalent to the method that was used to set parameters in the previous example:

# Tuples method

In [13]: fa.set_params(('table', dict(name='data.iris',

   ....:                              caslib='casuser')),

   ....:               ('to', 5))

 

# Dictionary method

In [14]: fa.set_params({'table': dict(name='data.iris',

   ....:                              caslib='casuser'),

   ....:                'to': 5})

 

# Keyword argument method

In [15]: fa.set_params(table=dict(name='data.iris',

   ....:                          caslib='casuser'),

   ....:               to=5)

Although you can mix all of the methods in a single call, to avoid a messy look, it is not recommended.

Even though setting options on a CASAction object in this manner cleans up our syntax, it doesn’t solve the problem of nested parameters such as the table parameter. Such parameters still use the nested dictionary syntax that can become difficult to read. The good news is that all of the previously discussed forms of parameter setting, except for the keyword argument method, support a nested key syntax. Let’s look at nested parameters in the next section.

Setting Nested Parameters

Rather than setting only top-level parameters, as in our previous examples, we can use a dot-separated notation to indicate subparameters. For example, if we want to set a table name of data.iris and a caslib of casuser, we can use the parameter names table.name and table.caslib as top-level parameter names.

In [16]: fa = conn.Fetch()

 

In [17]: fa.set_params('table.name', 'data.iris',

   ....:               'table.caslib', 'casuser')

 

In [18]: fa

Out[18]: ?.table.Fetch(table=dict(caslib='casuser', name='data.iris'))

As you can see from the preceding output, the dot-separated key names expand into levels of a hierarchy in the parameter structure. We can do this with the sortby parameter as well. There is a little trick to sortby though since it uses a list of dictionaries as its argument. To specify items of a list, you use integers as the key name.

In [19]: fa.set_params('sortby.0.name', 'petal_length',

   ....:               'sortby.0.formatted', 'raw',

   ....:               'sortby.1.name', 'petal_width',

   ....:               'sortby.1.formatted', 'raw')

 

In [20]: fa

Out[20]: ?.table.Fetch(sortby=dict(0=dict(formatted='raw',

                                          name='petal_length'),

                                   1=dict(formatted='raw',

                                          name='petal_width')),

                       table=dict(caslib='casuser', name='data.iris'))

Printing the resulting action representation might look a bit odd since the numeric keys are actually inserted into a dictionary rather than a list.  However, both lists and numeric-indexed dictionaries work as action parameters equivalently.

We can now call the Fetch instance and see that the parameters set on the instance are now applied.

In [21]: fa(to=5)

Out[21]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  sepal_width  petal_length  petal_width species

 0           4.6          3.6           1.0          0.2  setosa

 1           4.3          3.0           1.1          0.1  setosa

 2           5.0          3.2           1.2          0.2  setosa

 3           5.8          4.0           1.2          0.2  setosa

 4           4.7          3.2           1.3          0.2  setosa

 

+ Elapsed: 0.0141s, user: 0.012s, sys: 0.002s, mem: 8.58mb

You might notice in the preceding code that we used keyword parameters on the action call itself even when we set parameters in set_params. This enables you to add or override parameters just for that call; those parameters are not embedded in the CASAction instance.

Although the nested parameter syntax is convenient and cleans up the syntax, the only downside is that the names contains periods (.), which are not allowed in keyword parameter names. However, there is yet another way to specify parameters that does enable you to use period-separated names. Let’s look at that next.

Setting Parameters as Attributes

Rather than calling a method to set parameters on a CASAction object, you can simply set the attributes directly. This can actually be done at two levels in the action instance: 1) on the params attribute of the action instance, or 2) directly on the action instance. Let’s look at the params version first.

In the previous section, we used the set_params method to set top-level and nested action parameters. Using the dot-separated syntax from that section, and applying that parameter name directly to the params attribute of the action instance, we can obtain the same effect.

In [22]: fa = conn.Fetch()

 

In [23]: fa.params.table.name = 'data.iris'

 

In [24]: fa.params.table.caslib = 'casuser'

 

In [25]: fa

Out[25]: ?.table.Fetch(table=dict(caslib='casuser',

                                  name='data.iris'))

Unfortunately, this won’t work with the list syntax of the sortby parameter because Python won’t accept a number as an attribute name. However, you can specify list indexes using bracket notation.

In [26]: fa.params.sortby[0].name = 'petal_width'

 

In [27]: fa.params.sortby[0].formatted = 'raw'

 

In [28]: fa.params.sortby[1].name = 'petal_length'

 

In [29]: fa.params.sortby[1].formatted = 'raw'

 

In [30]: fa

Out[30]: ?.table.Fetch(sortby=[dict(formatted='raw',

                                    name='petal_width'),

                               dict(formatted='raw',

                                    name='petal_width')],

                       table=dict(caslib='casuser',

                                  name='data.iris'))

To avoid entering “fa.params.sortby” repeatedly, you can store the sortby parameter in a variable and act on it separately. Since the variable contains a reference to the underlying parameter structure, it embeds the parameters in the action instance.

In [31]: sortby = fa.params.table.sortby

 

In [32]: sortby[0].name = 'petal_length'

 

In [33]: sortby[0].formatted = 'raw'

 

In [34]: sortby[1].name = 'petal_width'

 

In [35]: sortby[1].formatted = 'raw'

Although this method might produce the nicest looking syntax for setting parameters, it also has a better chance of failing due to name collisions. The params attribute is a subclass of Python’s dictionary. That means that if an action parameter name matches the name of a dictionary method or attribute, you might see some surprising behavior. Here is an example of setting a fictional parameter named pop, which is also a dictionary method.

In [36]: fa.params.pop = 'corn'

 

In [37]: fa

Out[37]: ?.table.Fetch(pop='corn',

                       table=dict(caslib='casuserhdfs',

                                  name='data.iris',

                       sortby=[dict(formatted='raw',

                                    name='petal_length'),

                               dict(formatted='raw',

                                    name='petal_width')]))

 

In [38]: fa.params.pop

Out[38]: <bound method xadict.pop of {'corn': 'foo', 'table': {'caslib': 'casuserhdfs', 'name': 'data.iris', 'sortby': {0: {'name': 'petal_length', 'formatted': 'raw'}, 1: {'name': 'petal_width', 'formatted': 'raw'}}}}>

Although setting the pop parameter works, you’ll see that if you try to get the value of the pop parameter, you’ll get the dictionary method returned instead. However, if you use the dictionary key syntax (for example, fa.params['pop']), you get the correct value back. This leads us to the second method of setting parameters as attributes. That is, you can set them directly on the CASAction instance.

Everything that we have just covered in setting action parameters on the params attribute also works directly on the CASAction instance. Essentially, it’s just a shortcut to not having to enter .params while setting each parameter. It’s just a slightly less formal form.

In [39]: fa

Out[39]: ?.table.Fetch()

 

In [40]: fa.table.name = 'data.iris'

 

In [41]: fa.table.caslib = 'casuser'

 

In [42]: fa

Out[42]: ?.table.Fetch(table=dict(caslib='casuser', name='data.iris'))

 

In [43]: fa(to=5)

Out[43]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  sepal_width  petal_length  petal_width species

 0           5.1          3.5           1.4          0.2  setosa

 1           4.9          3.0           1.4          0.2  setosa

 2           4.7          3.2           1.3          0.2  setosa

 3           4.6          3.1           1.5          0.2  setosa

 4           5.0          3.6           1.4          0.2  setosa

 

+ Elapsed: 0.00447s, user: 0.003s, sys: 0.002s, mem: 1.64mb

In addition to setting parameters, we can also get, delete, or check the existence of attributes. Let’s see how in the next section.

Retrieving and Removing Action Parameters

Just like Python dictionaries, the parameters on CASAction objects can be retrieved and removed. You can also check for the existence of a parameter name. The methods that are used to retrieve action parameters by name are get_params and get_param. To remove parameters, you use del_params or del_param. And finally, to check for the existence of parameters, you use has_params or has_param.

All of the previously mentioned methods accept any number of strings as parameter names. The get_param method returns the value of the parameter, and get_params returns a dictionary of all parameter/value pairs that are requested. The parameter names can be top-level names, or you can specify a subparameter using the dot-separated notation from set_params.

In [44]: fa = conn.Fetch(to=5, table=dict(name='data.iris',

   ....:                                  caslib='casuser'))

 

In [45]: fa

Out[45]: ?.table.Fetch(table=dict(caslib='casuser', name='data.iris'),

                       to=5)

 

In [46]: fa.get_param('to')

Out[46]: 5

 

In [47]: fa.get_params('to', 'table.name')

Out[47]: {'table.name': 'data.iris', 'to': 5}

To delete action parameters, you simply specify the names of the parameters to delete in either del_param or del_params. Again, the key names can be top-level names or the dot-separated names of subparameters.

In [48]: fa.del_params('to', 'table.caslib')

 

In [49]: fa

Out[49]: ?.table.Fetch(table=dict(name='data.iris'))

Finally, to check the existence of parameters, you use has_params or has_param. In each case, all parameter names that are requested must exist in order for the method to return True.

In [50]: fa.has_param('table.caslib')

Out[50]: False

 

In [51]: fa.has_param('table.name')

Out[51]: True

 

In [52]: fa.has_param('table.name', 'to')

Out[52]: False

It is also possible to retrieve and delete parameters using the attribute syntax. We have already seen an example of getting parameters when we create the intermediate variable for sortby in order to minimize keystrokes. Here are some other examples:

In [53]: fa = conn.Fetch(to=5, table=dict(name='data.iris',

   ....:                                  caslib='casuser'))

 

In [54]: fa.table.name

Out[54]: 'data.iris'

 

In [55]: fa.table

Out[55]: {'caslib': 'casuserhdfs', 'name': 'data.iris'}

 

In [56]: del fa.table.caslib

 

In [57]: fa.table

Out[57]: {'name': 'data.iris'}

However, using attribute syntax is unreliable for checking for the existence of a parameter.  The parameters dictionary is a bit magical. Whenever you request a key from the dictionary, the parameters dictionary automatically creates the object behind the scenes. Without the magic, the attribute setting method won’t work. It would always throw attribute exceptions from Python. Therefore, to check for the existence of parameters, always use has_param or has_params.

Setting parameters and calling CAS actions is pretty much all there is to CASAction objects. Let’s move on to something more interesting: the CASTable object. It supports all of the same parameter setting and getting methods of CASAction objects, but it also has the ability to clean up the duplication of code that results  when specifying table parameters on action calls.

First Steps with the CASTable Object

The first task we need to do before we work with CASTable objects is to create a data table in CAS. Let’s use one of the loadtable examples from the previous chapter that loads some data and returns a CASTable object.

In [58]: out = conn.loadtable('data/iris.csv', caslib='casuser')

NOTE: Cloud Analytic Services made the file data/iris.csv available as table DATA.IRIS in caslib CASUSER(username).

 

In [59]: out

Out[59]:

[caslib]

 

 'CASUSER(username)'

 

[tableName]

 

 'DATA.IRIS'

 

[casTable]

 

 CASTable('DATA.IRIS', caslib='CASUSER(username)')

 

+ Elapsed: 0.000495s, user: 0.001s, mem: 0.123mb

We have mentioned previously that the CASResults object is a subclass of the Python OrderedDict class. Therefore, any of the keys that are seen in the key/value pairs can be accessed using Python’s dictionary syntax.

In [60]: out['tableName']

Out[60]: 'DATA.IRIS'

 

In [61]: out['caslib']

Out[61]: 'CASUSER(username)'

 

In [62]: out['casTable']

Out[62]: CASTable('DATA.IRIS', caslib='CASUSER(username)')

In addition, the CASResults class enables you to access the keys as attributes as long as the key is a valid attribute name and doesn’t collide with an existing attribute.

In [63]: out.casTable

Out[63]: CASTable('DATA.IRIS', caslib='CASUSER(username)')

If you look at the last output from the preceding code, you’ll see that the CASTable object points to the DATA.IRIS table in the CASUSER(username) caslib. Also, the CASTable object is automatically bound to the session object that ran the loadtable action. That means that any actions that are executed on the CASTable object also run in that session. It also means that any action sets that get loaded into that session are automatically available on the CASTable object.

We used the tableinfo, columninfo, and fetch actions frequently in the previous chapter. Each time, we specified the table in the action call that was executed on the connection. Rather than doing that, you can execute the actions directly on the CASTable object.

In [64]: out.casTable.tableinfo()

Out[64]:

[TableInfo]

 

         Name  Rows  Columns Encoding CreateTimeFormatted  

 0  DATA.IRIS   150        5    utf-8  03Nov2016:12:07:32

 

      ModTimeFormatted JavaCharSet    CreateTime       ModTime  

 0  03Nov2016:12:07:32        UTF8  1.793794e+09  1.793794e+09

 

    Global  Repeated  View     SourceName     SourceCaslib  

 0       0         0     0  data/iris.csv  CASUSER(username)

 

    Compressed Creator Modifier

 0           0  username

 

+ Elapsed: 0.000651s, mem: 0.103mb

 

In [65]: out.casTable.columninfo()

Out[65]:

[ColumnInfo]

 

          Column  ID     Type  RawLength  FormattedLength  NFL  NFD

 0  sepal_length   1   double          8               12    0    0

 1   sepal_width   2   double          8               12    0    0

 2  petal_length   3   double          8               12    0    0

 3   petal_width   4   double          8               12    0    0

 4       species   5  varchar         10               10    0    0

 

+ Elapsed: 0.00067s, mem: 0.169mb

 

In [66]: out.casTable.fetch(to=5)

Out[66]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  sepal_width  petal_length  petal_width species

 0           5.1          3.5           1.4          0.2  setosa

 1           4.9          3.0           1.4          0.2  setosa

 2           4.7          3.2           1.3          0.2  setosa

 3           4.6          3.1           1.5          0.2  setosa

 4           5.0          3.6           1.4          0.2  setosa

 

+ Elapsed: 0.00349s, user: 0.003s, mem: 1.64mb

As you can see, calling actions on a table is much more concise and doesn’t require you to know the names of the table or the caslib. We briefly showed you the summary action in the previous chapter as well. Executing the summary action on our table now appears as follows:  

In [67]: out.casTable.summary()

Out[67]:

[Summary]

 

 Descriptive Statistics for DATA.IRIS

 

          Column  Min  Max      N  NMiss      Mean    Sum       Std  

 0  sepal_length  4.3  7.9  150.0    0.0  5.843333  876.5  0.828066

 1   sepal_width  2.0  4.4  150.0    0.0  3.054000  458.1  0.433594

 2  petal_length  1.0  6.9  150.0    0.0  3.758667  563.8  1.764420

 3   petal_width  0.1  2.5  150.0    0.0  1.198667  179.8  0.763161

 

      StdErr       Var      USS         CSS         CV     TValue  

 0  0.067611  0.685694  5223.85  102.168333  14.171126  86.425375

 1  0.035403  0.188004  1427.05   28.012600  14.197587  86.264297

 2  0.144064  3.113179  2583.00  463.863733  46.942721  26.090198

 3  0.062312  0.582414   302.30   86.779733  63.667470  19.236588

 

            ProbT

 0  3.331256e-129

 1  4.374977e-129

 2   1.994305e-57

 3   3.209704e-42

 

+ Elapsed: 0.0269s, user: 0.026s, sys: 0.004s, mem: 1.74mb

Now that you see how easily this works, we can try a new action: correlation.

In [68]: out.casTable.correlation()

Out[68]:

[CorrSimple]

 

 Summary Statistics in Correlation Analysis for DATA.IRIS

 

        Variable      N      Mean    Sum    StdDev  Minimum  Maximum

 0  sepal_length  150.0  5.843333  876.5  0.828066      4.3      7.9

 1   sepal_width  150.0  3.054000  458.1  0.433594      2.0      4.4

 2  petal_length  150.0  3.758667  563.8  1.764420      1.0      6.9

 3   petal_width  150.0  1.198667  179.8  0.763161      0.1      2.5

 

[Correlation]

 

 Pearson Correlation Coefficients for DATA.IRIS

 

        Variable  sepal_length  sepal_width  petal_length  

 0  sepal_length      1.000000    -0.109369      0.871754

 1   sepal_width     -0.109369     1.000000     -0.420516

 2  petal_length      0.871754    -0.420516      1.000000

 3   petal_width      0.817954    -0.356544      0.962757

 

    petal_width

 0     0.817954

 1    -0.356544

 2     0.962757

 3     1.000000

 

+ Elapsed: 0.0066s, user: 0.003s, sys: 0.008s, mem: 1.73mb

From these examples, you can see that any action that takes a table definition as an argument can be executed directly on the CASTable object. Even if the action doesn’t take a table parameter, you can call it on the table. The CASTable object acts like a CAS connection object.

In [69]: out.casTable.userinfo()

Out[69]:

[userInfo]

 

 {'anonymous': False,

  'groups': ['users'],

  'hostAccount': True,

  'providedName': 'username',

  'providerName': 'Active Directory',

  'uniqueId': 'username',

  'userId': 'username'}

 

+ Elapsed: 0.000232s, mem: 0.0656mb

In addition to the action interface, you can also call many of the Pandas DataFrame methods and attributes. For example, rather than using the columninfo action to get column information, you can use the columns and dtypes attributes or the info method. We’ll cover a few here. In the next chapter, we discuss the DataFrame compatibility features in more detail.

In [70]: out.casTable.columns

Out[70]: Index(['sepal_length', 'sepal_width', 'petal_length',

                'petal_width', 'species'], dtype='object')

 

In [71]: out.casTable.dtypes

Out[71]:

sepal_length     double

sepal_width      double

petal_length     double

petal_width      double

species         varchar

dtype: object

 

In [72]: out.casTable.info()

CASTable('DATA.IRIS', caslib='CASUSER(username)')

Data columns (total 5 columns):

                N   Miss     Type

sepal_length  150  False   double

sepal_width   150  False   double

petal_length  150  False   double

petal_width   150  False   double

species       150  False  varchar

dtypes: double(4), varchar(1)

data size: 8450

vardata size: 1250

memory usage: 8528

Even the describe method works the same way as in DataFrame objects complete with the percentiles, include, and exclude options. Of course, with the power of CAS behind this, you can retrieve the statistics computed by the describe method on data sets that are much larger than those that are supported by a conventional Pandas DataFrame.

In [73]: out.casTable.describe(include=['all'], percentiles=[.4, .8])

Out[73]:

       sepal_length sepal_width petal_length petal_width    species

count           150         150          150         150        150

unique           35          23           43          22          3

top               5           3          1.5         0.2  virginica

freq             10          26           14          28         50

mean        5.84333       3.054      3.75867     1.19867        NaN

std        0.828066    0.433594      1.76442    0.763161        NaN

min             4.3           2            1         0.1     setosa

40%             5.6           3          3.9        1.15        NaN

50%             5.8           3         4.35         1.3        NaN

80%            6.55         3.4         5.35         1.9        NaN

max             7.9         4.4          6.9         2.5  virginica

All of these attributes and methods call CAS actions in the background and reformat the results to the familiar DataFrame output types. So, if you are familiar with Pandas DataFrames, you should feel comfortable working with the CASTable objects this way.

So far, every table that we have used has resulted from a CAS action that loaded the table. But, what if you want to wrap previously loaded tables in a CASTable object?  Continue reading the next section.

Manually Creating a CASTable Object

If a table is already loaded in CAS, you can wrap the table in a CASTable object. There is a CASTable method on CAS connection objects that creates CASTable objects that are registered with that connection. So, to create a CASTable object that references our DATA.IRIS table, we do the following.

# Create the CASTable object manually

In [74]: newtbl = conn.CASTable('data.iris', caslib='casuser')

 

# Verify the result

In [75]: newtbl

Out[75]: CASTable('data.iris', caslib='casuser')

 

In [76]: newtbl.columninfo()

Out[76]:

[ColumnInfo]

 

         Column  ID     Type  RawLength  FormattedLength  NFL  NFD

0  sepal_length   1   double          8               12    0    0

1   sepal_width   2   double          8               12    0    0

2  petal_length   3   double          8               12    0    0

3   petal_width   4   double          8               12    0    0

4       species   5  varchar         10               10   10    0

Now that we have shown the basics of CASTable objects, let’s dig deeper into the action and parameter interfaces.

CASTable Action Interface

The action interface on CASTables is just like the action interface on CAS connection objects. The only difference is that if you call an action on a CASTable, any parameter that is marked as a table definition or a table name is automatically populated with the information from the CASTable parameters. We have already seen this in our previous examples with actions such as tableinfo, columninfo, and summary. The tableinfo action takes a table name and a caslib name as parameters, and the columninfo and summary actions take a table definition as a parameter. In either case, they were automatically populated.

We can see the result of this population of parameters by using an action class. As we mentioned at the beginning of this chapter, you can create an instance of an action by capitalizing the action name.

# Load the table

In [77]: out = conn.loadtable('data/iris.csv', caslib='casuser')

NOTE: Cloud Analytic Services made the file data/iris.csv available as table DATA.IRIS in caslib CASUSER(username).

 

# Store the CASTable object in a new variable

In [78]: iris = out.casTable

 

# Create an instance of the summary action

In [79]: summ = iris.Summary()

 

# Display the summary action definition

In [80]: summ

Out[80]: ?.simple.Summary(__table__=CASTable('DATA.IRIS',

                                caslib='CASUSER(username)'))

As you can see, the table is now part of the action object. You can then call the action to verify that it executed on our table.

In [81]: summ()

Out[81]:

[Summary]

 

 Descriptive Statistics for DATA.IRIS

 

          Column  Min  Max      N  NMiss      Mean    Sum       Std  

 0  sepal_length  4.3  7.9  150.0    0.0  5.843333  876.5  0.828066

 1   sepal_width  2.0  4.4  150.0    0.0  3.054000  458.1  0.433594

 2  petal_length  1.0  6.9  150.0    0.0  3.758667  563.8  1.764420

 3   petal_width  0.1  2.5  150.0    0.0  1.198667  179.8  0.763161

 

      StdErr       Var      USS         CSS         CV     TValue  

 0  0.067611  0.685694  5223.85  102.168333  14.171126  86.425375

 1  0.035403  0.188004  1427.05   28.012600  14.197587  86.264297

 2  0.144064  3.113179  2583.00  463.863733  46.942721  26.090198

 3  0.062312  0.582414   302.30   86.779733  63.667470  19.236588

 

            ProbT

 0  3.331256e-129

 1  4.374977e-129

 2   1.994305e-57

 3   3.209704e-42

 

+ Elapsed: 0.00671s, user: 0.007s, sys: 0.002s, mem: 1.73mb

As we mentioned previously, just like CASAction objects, CASTable objects support the same parameter setting techniques. Let’s look at some examples of managing parameters on CASTable objects.

Setting CASTable Parameters

So far, we’ve demonstrated how to set the table name and a caslib of CASTable. However, there are many more possible parameters to use on both input and output tables. To see the full listing of possible parameters, use the IPython ? operator on an existing CASTable object. Here is a partial listing:

In [82]: iris?

 

...

 

Parameters

----------

name : string or CASTable

    specifies the name of the table to use.

caslib : string, optional

    specifies the caslib containing the table that you want to use

    with the action. By default, the active caslib is used. Specify a

    value only if you need to access a table from a different caslib.

where : string, optional

    specifies an expression for subsetting the input data.

groupby : list of dicts, optional

    specifies the names of the variables to use for grouping

    results.

groupbyfmts : list, optional

    specifies the format to apply to each group-by variable. To

    avoid specifying a format for a group-by variable, use "" (no

    format).

    Default: []

orderby : list of dicts, optional

    specifies the variables to use for ordering observations within

    partitions. This parameter applies to partitioned tables or it

    can be combined with groupBy variables when groupByMode is set to

    REDISTRIBUTE.

computedvars : list of dicts, optional

    specifies the names of the computed variables to create. Specify

    an expression for each parameter in the computedvarsprogram parameter.

computedvarsprogram : string, optional

    specifies an expression for each variable that you included in

    the computedvars parameter.

groupbymode : string, optional

    specifies how the server creates groups.

    Default: NOSORT

    Values: NOSORT, REDISTRIBUTE

computedondemand : boolean, optional

    when set to True, the computed variables specified in the

    compVars parameter are created when the table is loaded instead

    of when the action begins.

    Default: False

singlepass : boolean, optional

    when set to True, the data does not create a transient table in

    the server. Setting this parameter to True can be efficient, but

    the data might not have stable ordering upon repeated runs.

    Default: False

importoptions : dict, optional

    specifies the settings for reading a table from a data source.

 

... truncated ...

Let’s create a CASTable object that includes a where parameter to subset the rows, and that also includes the computedvars and computedvarsprogram parameters to create computed columns. We haven’t used computedvars or computedvarsprogram previously. The computedvarsprogram parameter is a string that contains code to create the values for the computed columns. The computedvars parameter is a list of the variable names that are created by computedvarsprogram and that show up as computed columns in the table.

One way to think of a CASTable object is as a “client-side view.”  Even though you are subsetting rows and creating computed columns, you do not modify the table on the server side at all. These parameters are simply stored on the CASTable object and are automatically sent as table parameters when actions are called on the CASTable object. The referenced CAS table is always the same, but any methods or CAS actions that are called on the CASTable object are performed using the view of the data from that object.

In [83]: iris = conn.CASTable('data.iris', caslib='casuser',

   ....:                      where='''sepal_length > 6.8 and

   ....:                               species = "virginica"''',

   ....:                      computedvars=['length_factor'],

   ....:                      computedvarsprogram='''length_factor =

   ....:                             sepal_length * petal_length;''')

 

In [84]: iris

Out[84]: CASTable('data.iris', caslib='casuser',

                  where='sepal_length > 6.8 and

                         species = "virginica"',

                  computedvars=['length_factor'],

                  computedvarsprogram='length_factor =

                          sepal_length * petal_length;')

 

 

# Use the fetchvars= parameter to only fetch specified columns

In [85]: iris.fetch(fetchvars=['sepal_length', 'petal_length',

   ....:                       'length_factor'])

Out[85]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

     sepal_length  petal_length  length_factor

 0            6.9           5.4          37.26

 1            6.9           5.1          35.19

 2            7.1           5.9          41.89

 3            7.6           6.6          50.16

 4            7.3           6.3          45.99

 5            7.2           6.1          43.92

 6            7.7           6.7          51.59

 7            7.7           6.9          53.13

 8            6.9           5.7          39.33

 9            7.7           6.7          51.59

 10           7.2           6.0          43.20

 11           7.2           5.8          41.76

 12           7.4           6.1          45.14

 13           7.9           6.4          50.56

 14           7.7           6.1          46.97

 

+ Elapsed: 0.0135s, user: 0.01s, sys: 0.007s, mem: 2.55mb

CASTable objects can also be used as output tables.

In [86]: outtbl = conn.CASTable('summout', caslib='casuser',    

   ....:                        promote=True)

 

In [87]: iris.summary(casout=outtbl)

Out[87]:

[OutputCasTables]

 

               casLib     Name  Rows  Columns  

 0  CASUSER(username)  summout     5       15

 

                                   casTable

 0  CASTable('summout', caslib='CASUSER(...

 

+ Elapsed: 0.0179s, user: 0.012s, sys: 0.011s, mem: 2.72mb

 

In [88]: outtbl.fetch()

Out[88]:

[Fetch]

 

 Selected Rows from Table SUMMOUT

 

         _Column_  _Min_  _Max_  _NObs_  _NMiss_     _Mean_   _Sum_  

 0   sepal_length   6.90   7.90    15.0      0.0   7.360000  110.40

 1    sepal_width   2.60   3.80    15.0      0.0   3.126667   46.90

 2   petal_length   5.10   6.90    15.0      0.0   6.120000   91.80

 3    petal_width   1.60   2.50    15.0      0.0   2.086667   31.30

 4  length_factor  35.19  53.13    15.0      0.0  45.178667  677.68

 

       _Std_  _StdErr_      _Var_       _USS_       _CSS_       _CV_  

 0  0.337639  0.087178   0.114000    814.1400    1.596000   4.587485

 1  0.353486  0.091270   0.124952    148.3900    1.749333  11.305524

 2  0.501711  0.129541   0.251714    565.3400    3.524000   8.197898

 3  0.241622  0.062386   0.058381     66.1300    0.817333  11.579305

 4  5.527611  1.427223  30.554484  31044.4416  427.762773  12.235003

 

          _T_         _PRT_

 0  84.424990  2.332581e-20

 1  34.257443  6.662768e-15

 2  47.243615  7.680656e-17

 3  33.447458  9.278838e-15

 4  31.654945  1.987447e-14

 

+ Elapsed: 0.00403s, user: 0.003s, mem: 1.67mb

You might notice that the outtbl variable worked with the fetch action even when the CASTable object contained output table parameters (such as promote=True). The CASTable objects are polymorphic and send only the parameters that are appropriate for the current context. In this case, the context was an input table parameter, so the promote=True parameter was removed automatically before the action call was made.

In addition to setting parameters on the constructor, you can also set parameters on existing instances. Just like with CASAction objects, these options can be set using: 1) the function interface, and 2) the attribute/dictionary interface.

Managing Parameters Using the Method Interface

The formal way of setting parameters is to use the set_param, get_param, has_param, and del_param methods. These methods work the same way as they do with CASAction objects.

Just as with CAS action calls, CASTable definitions can be verbose. Here is an example:

conn.CASTable('data.iris', caslib='casuser',

                           where='''sepal_length > 6.8 and

                                    species = "virginica"''',

                                  computedvars=['length_factor'],

                                  computedvarsprogram='''length_factor

                                    = sepal_length * petal_length;''')

This CASTable definition uses only a simple where clause and specifies a computed column. However, readability would be difficult for a table that contains more computed columns and other parameters. We can use the same technique of setting individual parameters from the CASAction section to enhance readability. Let’s start with creating a CASTable object with a name and a caslib. Then, we add the where parameter and the computed column parameters separately.

In [89]: iris = conn.CASTable('data.iris', caslib='casuser')

 

In [90]: iris.set_param('where',

            'sepal_length > 6.8 and species = "viginica"')

 

In [91]: iris.set_param('computedvars', ['length_factor'])

 

In [92]: iris.set_param('computedvarsprogram',

            'length_factor = sepal_length * petal_length;')

 

In [93]: iris

Out[93]: CASTable('data.iris', caslib='casuser',

                  computedvars=['length_factor'],

                  computedvarsprogram='length_factor =

                                       sepal_length * petal_length;',

                  where='sepal_length > 6.8 and

                         species = "virginica"')

 

In [94]: iris.fetch(to=5, fetchvars=['sepal_length', 'petal_length',

                                    'length_factor'])

Out[94]:

[Fetch]

 

 Selected Rows from Table DATA.IRIS

 

    sepal_length  petal_length  length_factor

 0           6.9           5.4          37.26

 1           6.9           5.1          35.19

 2           7.1           5.9          41.89

 3           7.6           6.6          50.16

 4           7.3           6.3          45.99

 

+ Elapsed: 0.0167s, user: 0.015s, sys: 0.004s, mem: 2.48mb

All of the forms of set_param on CASAction objects work as well. This includes using two-element tuples, dictionaries, and keyword arguments. Therefore, all of the following are equivalent:

# String / value pairs

In [95]: iris.set_params('where', '''sepal_length > 6.8 and

   ....:                             species = "virginica"''',

   ....:                 'computedvars', ['length_factor'],

   ....:          'computedvarsprogram', '''length_factor =

   ....:                       sepal_length * petal_length;''')

 

# Tuples

In [95]: iris.set_params(('where', '''sepal_length > 6.8 and

   ....:                              species = "virginica"'''),

   ....:                 ('computedvars', ['length_factor']),

   ....:                 ('computedvarsprogram', '''length_factor =

   ....:                            sepal_length * petal_length;'''))

 

# Keyword arguments

In [96]: iris.set_params(where='''sepal_length > 6.8 and

   ....:                          species = "virginica"''',

   ....:                 computedvars=['length_factor'],

   ....:                 computedvarsprogram='''length_factor =

   ....:                           sepal_length * petal_length;''')

 

# Dictionaries

In [97]: iris.set_params({'where': '''sepal_length > 6.8 and

   ....:                              species = "virginica"''',

   ....:                  'computedvars': ['length_factor'],

   ....:                  'computedvarsprogram': '''length_factor =

   ....:                            sepal_length * petal_length;'''})

You can also check for the existence of parameters and then retrieve them just like on CASAction objects.

In [98]: iris.has_param('where')

Out[98]: True

 

In [99]: iris.has_param('groupby')

Out[99]: False

 

In [100]: iris.get_param('where')

Out[100]: 'sepal_length > 6.8 and species = "virginica"'

 

In [101]: iris.get_params('where', 'computedvars')

Out[101]: {'computedvars': ['length_factor'],

          'where': 'sepal_length > 6.8 and species = "virginica"'}

Finally, you can delete table parameters using del_param or del_params.

In [102]: iris

Out[102]: CASTable('data.iris', caslib='casuser',

                  computedvars=['length_factor'],

                  computedvarsprogram='length_factor =

                                       sepal_length * petal_length;',

                  where='sepal_length > 6.8 and

                         species = "virginica"')

 

In [103]: iris.del_params('computedvars', 'computedvarsprogram')

 

In [104]: iris

Out[104]: CASTable('data.iris', caslib='casuser',

                   where='sepal_length > 6.8 and

                          species = "virginica"')

You might recall that in addition to a function interface, CASAction objects also enable you to set parameters using an attribute interface. This is true of CASTable objects as well.

Managing Parameters Using the Attribute Interface

CASTable parameters can be changed using an attribute-style interface as well as the previously described function-style interface.

In [105]: iris = conn.CASTable('data.iris', caslib='casuser')

In [106]: iris

Out[106]: CASTable('data.iris', caslib='casuser')

 

In [107]: iris.params.where = '''sepal_length > 6.8 and

   ....:                         species = "virginica"'''

 

In [108]: iris

Out[108]: CASTable('data.iris', caslib='casuser',

                   where='sepal_length > 6.8 and

                          species = "virginica"')

If your parameters include lists, you can use array indexing syntax to set them individually.

In [109]: iris.params.groupby[0] = 'species'

 

In [110]: iris.params.groupby[1] = 'sepal_length'

 

In [111]: iris

Out[111]: CASTable('data.iris', caslib='casuser',

                   groupby=['species', 'sepal_length'],

                   where='sepal_length > 6.8 and

                          species = "virginica"')

Retrieving parameter values and deleting them also work with Python’s attribute syntax.

In [112]: iris.params.groupby

Out[112]: {0: 'length_factor', 1: 'width_factor'}

 

In [113]: del iris.params.groupby

 

In [114]: del iris.params.where

 

In [115]: iris

Out[115]: CASTable('data.iris', caslib='casuser')

You might notice that the groupby parameter is displayed as a dictionary. That is simply because when you add keys of an ordered list individually like we did in the previous example, the underlying structure ends up being a dictionary. When the parameters are passed to CAS, they are converted to an ordered list automatically. So, dictionaries with integer keys are equivalent to lists as far as SWAT is concerned.

Parameters can also be set directly on the CASTable object rather than on the params level. You must be more careful with this form because the chance of name collisions is much greater with other methods and attributes on the CASTable object.

In [116]: iris

Out[116]: CASTable('data.iris', caslib='casuser')

 

In [117]: iris.groupby = ['species']

 

In [118]: iris.where = '''sepal_length > 6.8 and

   .....:                 species = "virginica"'''

 

In [119]: iris

Out[119]: CASTable('data.iris', caslib='casuser',

                 groupby=['species'],

                 where='sepal_length > 6.8 and species = "virginica"')

That covers just about anything you need to do with CASTable parameters. We’ll show you how to materialize them in a real table in the server in the next section.

Materializing CASTable Parameters

We mentioned previously that CASTable objects are essentially client-side views of the data in a CAS table. Setting parameters on a CASTable object has no effect on the table in the server. Once you have created a CASTable with all of your computed columns and filters, you might want to materialize them on to the server as an in-memory table so that you can access it from other CASTable references. You can use the partition action to do this. Note that this is not the only use for the partition action, but it works for this case as well. Just as with loadtable, the partition action output has a casTable key that contains a reference to the new CASTable object.

In [122]: sub_iris = iris.partition()

 

In [123]: sub_iris

[caslib]

 

 'CASUSER(username)'

 

[tableName]

 

 '_T_ZX6QZEVP_6FIIJT25_2EYMPQQARS'

 

[rowsTransferred]

 

 0

 

[shuffleWaitTime]

 

 0.0

 

[minShuffleWaitTime]

 

 1e+300

 

[maxShuffleWaitTime]

 

 0.0

 

[averageShuffleWaitTime]

 

 0.0

 

[casTable]

 

 CASTable('_T_ZX6QZEVP_6FIIJT25_2EYMPQQARS',

          caslib='CASUSER(username)')

 

+ Elapsed: 0.00355s, user: 0.003s, mem: 1.62mb

 

In [124]: sub_iris = sub_iris.casTable

 

In [125]: sub_iris.fetch()

Out[125]:

[Fetch]

 

 Selected Rows from Table _T_ZX6QZEVP_6FIIJT25_2EYMPQQARS

 

     sepal_length  sepal_width  petal_length  petal_width    species

 0            6.9          3.1           5.4          2.1  virginica

 1            6.9          3.1           5.1          2.3  virginica

 2            7.1          3.0           5.9          2.1  virginica

 3            7.6          3.0           6.6          2.1  virginica

 4            7.3          2.9           6.3          1.8  virginica

 5            7.2          3.6           6.1          2.5  virginica

 6            7.7          3.8           6.7          2.2  virginica

 7            7.7          2.6           6.9          2.3  virginica

 8            6.9          3.2           5.7          2.3  virginica

 9            7.7          2.8           6.7          2.0  virginica

 10           7.2          3.2           6.0          1.8  virginica

 11           7.2          3.0           5.8          1.6  virginica

 12           7.4          2.8           6.1          1.9  virginica

 13           7.9          3.8           6.4          2.0  virginica

 14           7.7          3.0           6.1          2.3  virginica

 

+ Elapsed: 0.0034s, user: 0.003s, mem: 1.57mb

Conclusion

In this chapter, we introduced the CASAction and CASTable objects, and showed you various ways of setting parameters on CASAction and CASTable instances. Depending on your coding style or how your parameters are being generated, you can choose the appropriate method for setting your action parameters. Now that we have seen the basics of the CASAction and CASTable objects, let’s move on to advanced usage of the CASTable objects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.255.140