Decision tables

If there is a tool every business person—from a CEO to a secretary—knows how to use, it is a spreadsheet. In fact, most of the time, they know more about spreadsheets than most people in the IT department. If one of the goals of Drools is to be a business-oriented rule engine, then what could be better than to provide first-class integration with spreadsheets?

DSLs are very powerful, but, without a proper UI, the users still need to write their rules by themselves. Even if, by using a DSL, the available options to write rules are narrowed down to very specific sentences, the probability of syntax errors, misplaced statements, invalid code, and so on is still high.

Decision tables, on the other hand, provide a much more constrained environment than DSL, thus mitigating most of the risks DSL has.

What is a decision table?

A decision table in Drools is a document stored in an XLS (Microsoft Excel) or CSV (Comma Separated Value) formatted file, which defines a set of rules using a very compact syntax.

The advantage of using XLS and not any other spreadsheet format is that many of the office suite products already support it. An XLS file can be edited nowadays with any of the most popular office suites such as MS Office, LibreOffice, OpenOffice, and so on.

A decision table in Drools requires a specific structure in order to be executed. This structure aids the compiler in the identification of different sections of the spreadsheet that play different roles in the rules that get ultimately generated when the decision table is compiled. That's right, just like with DSL/DSLR, a decision table is first converted into DRL before it is compiled as part of a KIE Container.

Following our simple categorization scenario, where customer categories were assigned according to age, they could easily be rewritten using a very simple decision table, such as the one shown next:

What is a decision table?

Even if we haven't talked about the structure of a decision table yet, it's quite simple to understand what's going on by simply looking at it. The tabular nature of a decision table makes it easy to read and modify.

Let's now analyze what the different sections of a decision table are and what they mean.

Decision tables structure

There are 2 main keywords when defining a decision table: RuleSet and RuleTable. The RuleSet (B2) keyword identifies where the decision table actually begins. The column where this keyword is used is also important; it determines the column that has to be used for any of the other keywords in the sheet. The RuleTable (B6) keyword identifies the beginning of a group of rules.

Tip

Only the first worksheet of an XLS file will be scanned for rule definitions.

RuleSet section

The cell after RuleSet (C2) is optional and defines the package name for all the rules contained in this sheet. If empty, the default package name is rule_table.

The following section the RuleSet can be used to define DRL construct (except for rules) and rule attributes for all the rules contained in the document.

In our example, two key-value pair entries (B3-C3 and B4-C4) are used to specify the Java imports required by the rules and a global attribute of NO-LOOP. The available keywords in this section are:

Keyword

Value

Usage

RuleSet

The package name for the generated DRL file. Optional, the default is rule_table.

Must be the first entry.

Sequential

true or false. If true, then salience is used to ensure that rules fire from the top down.

Optional, at most once. If omitted, no firing order is imposed.

EscapeQuotes

true or false. If true, then quotation marks are escaped so that they appear literally in the DRL.

Optional, at most once. If omitted, quotation marks are escaped.

Import

A comma-separated list of Java classes to import.

Optional, may be used repeatedly.

Variables

Declarations of DRL globals—that is, a type followed by a variable name. Multiple global definitions must be separated with a comma.

Optional, may be used repeatedly.

Functions

One or more function definitions, according to DRL syntax.

Optional, may be used repeatedly.

Queries

One or more query definitions, according to DRL syntax.

Optional, may be used repeatedly.

Declare

One or more declarative types, according to DRL syntax.

Optional, may be used repeatedly.

Tip

Keywords inside a Drools decision table are case-insensitive.

Along with the previous keywords, a set of attributes could also be specified in the RuleSet section. These attributes will affect the behavior of all the rules present in the current document. A list of all available attributes is:

Keyword

Attribute

PRIORITY

An integer defining the "salience" value for the rule. Overridden by the "Sequential" flag.

DURATION

A long integer value defining the "duration" value for the rule.

TIMER

A timer definition. See "Timers and Calendars".

ENABLED

A Boolean value. true enables the rule; false disables the rule.

CALENDARS

A calendars definition. See "Timers and Calendars".

NO-LOOP

A Boolean value. true inhibits looping of rules due to changes made by its consequence.

LOCK-ON-ACTIVE

A Boolean value. true inhibits additional activations of all rules with this flag set within the same ruleflow or agenda group.

AUTO-FOCUS

A Boolean value. true for a rule within an agenda group causes activations of the rule to automatically give the focus to the group.

ACTIVATION-GROUP

A string identifying an activation (or XOR) group.

AGENDA-GROUP

A string identifying an agenda group, which has to be activated by giving it the "focus".

RULEFLOW-GROUP

A string identifying a ruleflow group.

All these attributes can only be used once per decision table.

Note

Attributes in the RuleSet section will affect the entire package where the rules are defined. This may include rules defined in other assets outside the decision table where they are defined. In order to get a more fine-grained control, the attributes can be used in the RuleTable section and its particular value for a particular rule can then independently be configured.

Let's now move on to the section where the rules themselves are defined: the RuleTable section.

RuleTable section

The second most important keyword after RuleSet is RuleTable. This keyword, which must be in the same column as RuleSet, identifies a section where rule templates are present. A single sheet in a decision table document could contain multiple RuleTable entries.

A String could be appended to the content of the RuleSet cell (C5) to specify a common prefix that will be shared among the names of all the generated rules. The name of the generated rules will be composed of this value and the row number where each rule is defined.

The row after RuleTable specifies the column type. There are five supported types of columns:

Keyword

Value

Usage

NAME

Provides the name for the rule generated from the row overriding the default name.

Optional, at most one column

DESCRIPTION

A text, resulting in a comment within the generated rule.

Optional, at most one column

CONDITION

Code snippet and interpolated values for constructing a constraint within a pattern in a condition.

At least one per rule table

ACTION

Code snippet and interpolated values for constructing an action for the consequence of the rule.

At least one per rule table

METADATA

Code snippet and interpolated values for constructing a metadata entry for the rule.

Optional, any number of columns

In addition, all the attributes introduced in the previous section can also be used in this row.

In our example, we have three conditions (B7, C7, and D7) and only one action (E7). Each condition corresponds to a pattern or a constraint inside a pattern. Each action represents the code to be executed in the right-hand side of the generated rules.

The cells in the row after the column type have a different meaning according to the type of the column.

For columns of type CONDITION, the values in these cells represent a pattern in the left-hand side of the generated rules. If multiple constraints inside a single pattern are intended, the cells can be merged into one (just like B8 in our example). The second row below a CONDITION column is used to specify one or more constraints in a pattern. A special variable called $param can be used to specify parts of the cell that will be interpolated with the values further down in the column. If the columns below specify a comma-separated list of values—as opposed to a single value, as in our example—the variables $1, $2, and so on, can be used to access each individual value. A text value matching the pattern forall(delimiter){snippet} could also be used to expand the list of values by repeating the snippet once for each of the values, inserting the value in place of the symbol $, and by joining these expansions by the given delimiter.

For columns of type ACTION, the value of the cell in the next row is optional and, if present, it represents an object reference, a global variable, or a bound variable from the left-hand side.

The second row after an ACTION column is the action's code. This code, which also accepts interpolation variables, will be appended to the right-hand side of the corresponding rule. If the preceding cell contained an object reference (that is, it was not empty), the code in this cell is appended to the reference by adding a leading period and an ending semicolon. If the object reference cell was empty, the value in this cell—with its variables interpolated—is used as is. The forall construct is also allowed in this cell.

For columns of type METADATA, the first row below is ignored and the second row is used as the value of the generated rule metadata. Interpolation variables are also allowed in this cell. To the value of this cell after interpolation, a @ character will be prepended and the result will be added in the metadata section (between the name of the rule and the when keyword) of the generated rule.

For columns of type NAME and DESCRIPTION, the preceding two rows are not used. The third row after the column type row is used to provide a friendly name for the column. Drools will not use the values in this row at all, but having this row makes decision tables easier to read.

From the fourth row on, non-blank entries provide data for interpolation as described earlier. A blank cell results in the omission of the corresponding condition/action/metadata statement for this rule.

Coming back to our scenario

In the previous section, we introduced how a decision table could be used for our simple scenario of customer classification by age. Let's now analyze how the example decision table, shown next, gets converted into DRL:

Coming back to our scenario

The preceding spreadsheet can be found as part of the sources bundle associated with this chapter, along with the corresponding unit tests.

Note

As opposed to DSL, decision tables in Drools require a specific dependency that allows the translation to DRL. This dependency is org.drools:drools-decisiontables.

As we already know, the two keywords used in the RuleSet section of our decision table specify the package name and the no-loop attribute of the rules defined in it.

The rules, according to the spreadsheet, are composed of three conditions applied to a single pattern. In this case, the pattern is of type Customer. Notice that we have bound a variable to our pattern: $c. The first two conditions make reference to the age attribute of our Customer class. When a condition is only composed of a binary operator (such as ==, >, <, and so on), the use of $param is optional. The condition in cell B9 could have been written as age >. In these cases, Drools will understand that the interpolation value has to be placed at the end of the condition. When the operator is ==, things are even simpler: it's enough to just name the attribute we want to use for the comparison—that is, age.

When a data cell is left empty (like C14 in our example), the associated condition will not be included in the generated rule. In our scenario, the rule in row 14 doesn't impose a maximum value for the age of a customer; condition C9 is then not required.

The spreadsheet also shows that the generated rules will contain a single action composed of a modify statement. This statement is used to set the category of the matching customer $c.

Starting from row 11, we find the definition of four rules. Column A for those rows is just a descriptive name of the rule and it is ignored by Drools. The values in columns B, C, D, and E provide the interpolation data used for the conditions and actions.

If we look closer, the values for the third condition (D11-D14) look suspicious. They all have the same "NA" value. For these types of fixed values, we have three options: we can repeat the value for each of the rules, we can merge all the cells together to avoid repeating the same value over and over, or we can set the value in the constraint itself. The latter option presents some challenges though. If we change the value of the condition (in D9) to "category == Customer.Category.NA" we still need to come up with a value for the cells D11:D14; otherwise, the entire condition will be omitted in the generated rules. The problem is that, if we do set a value in these cells, Drools will recognize that the condition doesn't contain any interpolation variable and will assume that we are trying to use an implicit " == $param " operator. The generated code will then become invalid. A possible solution to deal with conditions without interpolation variables is to append them to some other condition by using a comma. In our example, we could modify the B9 condition to look like "age > $param, category == Customer.Category.NA" or "category == Customer.Category.NA, age > $param". The condition on cell C9 is not a good candidate in this case because there is a blank cell in this column. As we can see, a condition is not restricted to a singular DRL condition.

Taking the rules in rows 11 and 14 as an example, let's see what the generated DRL for these rules looks like:

package chapter07.dtable.simple;
import org.drools.devguide.eshop.model.Customer;
no-loop true

rule "Simple Customer Categorization_11"
when
    $c: Customer(age > 18, age <= 21, category == Customer.Category.NA)
then
    modify($c) { setCategory(Customer.Category.NA)}
end
rule "Simple Customer Categorization_14"when
    $c: Customer(age > 40, category == Customer.Category.NA)
then
    modify($c) { setCategory(Customer.Category.GOLD)}
end

In the preceding DRL we can see the result of rows 11 and 14 being converted to DRL. There are some important things to be noted in that DRL:

  • The package name is the one specified by the RuleSet keyword.
  • The import sentence and global no-loop attribute also match with the attributes used in the RuleSet section.
  • Because we didn't use a NAME column for our rules, the default name was used. The default name is composed of the RuleTable value and the row number that originated the rule.
  • Given that row 14 contained a blank cell, the corresponding condition is not present in the generated rule.

Decision table troubleshooting

Because decision tables introduce a level of indirection between what the user writes and the DRL that actually gets generated, dealing with errors can be challenging.

As an example, let's assume that there is a typo in the condition present in C9. Instead of the correct value "age <= $param", let's assume that we inadvertently wrote "age =< $param". When the decision table containing this typo is compiled, it will generate the following error message:

Error while creating KieBase[
Message [id=1, level=ERROR, path=chapter07/dtable-simple/customer-classification-simple.xls, line=8, column=0 text=[ERR 102] Line  8:29 mismatched input '=' in rule "Simple Customer Categorization_11"],
Message [id=2, level=ERROR, path=chapter07/dtable-simple/customer-classification-simple.xls, line=16, column=0 text=[ERR 102] Line 16:29 mismatched input '=' in rule "Simple Customer Categorization_12"],
Message [id=3, level=ERROR, path=chapter07/dtable-simple/customer-classification-simple.xls, line=24, column=0 text=[ERR 102] Line 24:29 mismatched input '=' in rule "Simple Customer Categorization_13"],
Message [id=4, level=ERROR, path=chapter07/dtable-simple/customer-classification-simple.xls, line=0, column=0 text=Parser returned a null Package]]

The messages make reference to errors in three different rules: Simple Customer Categorization_11, Simple Customer Categorization_12, and Simple Customer Categorization_13. In all the cases, the error is the same: "mismatched input '='". The problem here is that each error makes reference to the line and column inside the generated DRL, but we don't actually know what that DRL looks like.

One of the ways, and probably the best way, to deal with errors in a decision table's generated DRL is to dump it into a place where it can be analyzed.

A decision table can easily be converted into DRL by using the class org.drools.decisiontable.DecisionTableProviderImpl from the drools-decisiontables project:

InputStream dtableIS = //get the input stream to the decision table file
DecisionTableProviderImpl dtp = new DecisionTableProviderImpl();String drl = dtp.loadFromInputStream(dtableIS, null);

DecisionTableProviderImpl defines a loadFromInputStream method that takes two arguments:

  • The InputStream to the decision table file
  • An optional org.kie.internal.builder.DecisionTableConfiguration instance that allow us to configure some of the aspects of the DRL conversion

The sources bundle associated with this chapter has a working example of the preceding code.

Being able to reproduce the DRL generated from a decision table is a valuable help when we deal with errors. The line and column numbers in the error messages can be traced to the DRL generated by the DecisionTableProviderImpl class.

Enhanced decision tables

The example we just covered shows the basics of decision tables in Drools. There are many more interesting things we can do to make the life of the users of these spreadsheets easier. Most of the nice features spreadsheets support are also supported by Drools' decision tables. The features we are talking about are: collapsed/fixed/merged rows and columns, functions, colors, links between cells, and so on. By combining these features, we can create much more customized spreadsheets that enhance the overall experience of the user.

As an example, we can take the original decision table introduced in this chapter and apply some changes to leave it like the one bellow:

Enhanced decision tables

This enhanced version of the original decision table can be found in the source bundle with the name customer-classification-enhanced.xls.

Some of the enhancements present in this new version of the decision table are:

  • Rows 3, 4, 8, and 9 are hidden to avoid showing cells with technical content.
  • D11:D14 are merged to avoid duplicated "NA" values.
  • E11:E14 are now using a drop-down to select a value between NA, BRONZE, SILVER, or GOLD. This drop-down is not visible in the image but we can check this in the spreadsheet associated with this example in the source bundle.
  • Cells in column B are using a conditional format that will mark them in red (that is, B13) when its value overlaps with the upper bound of the previous rule.

This example shows only a few of all the possibilities decision tables bring to the table to create a more elegant and concise way to define rules. The source bundle includes the decision table version of our advanced classification rules that uses the number of orders of a customer in order to set its category. The name of this decision table file is customer-classification-advanced.xls.

Decision tables provide an excellent way to easily create a considerable number of rules without too much work. Once the structure of the decision table is defined, the only job the rule author has is to add, update, or delete values in its cells. The possibility of making mistakes while authoring rules is still there, but, compared to DSLs/DSLRs, the risk is much lower.

Another advantage of decision tables over DSL is that we don't need any special UI for the former. Business users, most of the time, are already familiar with spreadsheets. There is no need to introduce a new UI to users before they can start writing their own rules.

But decision tables are not ideal for every situation: one of the biggest limitations is that rules we can model using decision tables must have the same structure. For cases like scoring, categorization, and classification, where the structure of the rules is almost the same and the only thing that changes is the values of their constraints, decision tables are a very efficient option. For situations where the structure of the rules doesn't necessarily remain the same, decision tables give us no benefits at all.

Another limitation decision tables have is that the structure of both the rules and the data is tightly coupled; they can't be reused separately from each other.

For situations where more flexibility is required, there is another option we may want to consider: rule templates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.17.154