Chapter 9. Introduction to PHREAK

As we already know, the pattern matching algorithm behind Drools 6 is called PHREAK. This algorithm is an evolution of the one used in previous versions of Drools: the RETE (also known as RETEOO) algorithm.

Even if, from previous chapters, we already have some idea of what PHREAK is and how it works, understanding its internals in more detail will give us the opportunity to write better and more performant rules. Another advantage of knowing how Drools works internally is that it will considerably increase our options when troubleshooting our knowledge assets.

One of the major drawbacks of PHREAK (in contrast to RETE) is that the former is a brand new algorithm that was developed for Drools and by the Drools team itself. The disadvantages of this young algorithm are the lack of adoption and the scarce documentation it provides. But don't be alarmed! PHREAK has so far shown itself to be a production-quality algorithm, able to deal with the complexities of critical applications in a variety of scenarios.

The idea of this chapter is to provide an introduction to the PHREAK algorithm, its characteristics, and particularities. To better understand the algorithm, different concrete examples will be presented and explained throughout this chapter.

The topics covered in this chapter are:

  • A PHREAK introduction
  • PHREAK network and nodes
  • Concrete examples of different PHREAK networks
  • Queries and backward-chaining reasoning in the context of PHREAK

Note

Given the finite space we have, the goal of this chapter is to serve as an introduction to Drools' PHREAK algorithm. Most of the concepts in this chapter are oversimplified to make them easier to explain and learn. After grasping the initial ideas behind PHREAK, you are recommended to head to Drools' documentation at: (http://docs.jboss.org/drools/release/6.3.0.Final/drools-docs/html_single/#ReteOO), if a deeper understanding is required.

Let's start with an introduction to the PHREAK algorithm and its components.

Introducing PHREAK

One of the first assumptions Drools' newcomers make is that rules in a DRL file are evaluated in the same order they are defined, as if each rule were some kind of if statement from an imperative language forming a sequential evaluation structure. But evaluating rule conditions in sequence is neither efficient nor scalable at all. Even worse, adding inference capabilities into this scenario would also be a nightmare.

In the mid-seventies, Dr. Charles L. Forgy introduced a new pattern matching algorithm to be used in production systems, called RETE (https://en.wikipedia.org/wiki/Rete_algorithm). The RETE algorithm sacrificed memory for increased speed, providing an improvement in several orders of magnitude over traditional pattern-matching algorithms. Ever since, multiple production rule systems have been using a derived or customized version of RETE as their pattern matching internal algorithm. This chapter is based on Drools' implementation of RETE and its latest evolution: PHREAK.

PHREAK shares most of the concepts present in RETE— especially with the object-oriented version of RETE implemented since Drools 2.0—and it is impossible to explain one without the other. We will first start with the common concepts between the two algorithms and then focus on the improvements of PHREAK over RETE.

Like RETE, PHREAK decomposes the patterns and constraints present in the left-hand side of the rules to create a directed network of nodes:

Introducing PHREAK

In PHREAK, the first level of nodes in the network is composed of entry-point (EP) nodes. Entry-points were introduced in Chapter 6, Complex Event Processing, as a way to partition the source of our facts. Each entry-point defined in a knowledge base will correspond to a node in the generated network.

Whenever a fact is inserted into a session, it will enter the PHREAK network through the corresponding entry-point node where the fact was inserted; if not specified, the default entry-point in Drools is called DEFAULT. The fact will then traverse each of the connected nodes in the network. Each node will perform some kind of evaluation on the fact that will determine whether it should be propagated or not to the next node.

The nodes in the PHREAK network are divided into three different categories, according to the type of evaluation they perform on the fact that is traversing it: Object Type Nodes (OTN), Alpha Nodes, and Beta Nodes. Every possible path in the PHREAK network ends either in a Rule Terminal Node or a Query Terminal Node (QTN). The former are in charge of the execution of the right-hand side logic of the rule they represent (in PHREAK there is one Rule Terminal Node for each of the rules in the knowledge base), and the latter are associated with the execution of the queries present in the knowledge base.

The following sections in this chapter are focused on the three types of evaluation nodes present in the PHREAK network: Object Type Nodes, Alpha Nodes, and Beta Nodes.

Object Type Nodes

Object Type Nodes perform a type evaluation on the fact being tested. This evaluation can be seen as an instanceof operation over the fact. Only when the current fact is an instance of the type (Java class) the node represents, will the fact be propagated to the next node/s in the network. The PHREAK network will contain as many Object Type Nodes as distinct classes used in the patterns of the rules it represents. This implies that rules using the same class in a pattern will share the same Object Type Node in the PHREAK network. So, when a particular Object Type Node is being evaluated, all the rules related to it are being evaluated at the same time.

Note

It is important to note that a single fact may satisfy multiple Object Type Nodes in a network. For example, a fact of type java.lang.String will satisfy an Object Type Node of type java.lang.String, but it will also satisfy another node of type java.lang.Object.

As an example, let's analyze the PHREAK network generated by the following two rules:

rule "Sample Rule 1"
when
    $c: Customer()
then
    channels["customer-channel"].send($c);
end

rule "Sample Rule 2"
when
    $o: Order()
then
    channels["order-channel"].send($o);
end

The preceding rules each define a simple pattern. The patterns in this case are of two different classes: Customer and Order, which means that the generated PHREAK network will contain two different Object Type Nodes:

Object Type Nodes

The preceding network is composed of a single Entry-point Node (EP), three Object Type Nodes (OTN), and two Rule Terminal Nodes (RTN). The extra Object Type Node in the network—of type InitialFactImpl—is used to support some special patterns in Drools that will be covered later in this chapter.

When a fact is inserted through the only entry-point in the knowledge base, the three Object Type nodes will be evaluated. If the fact is of type Customer or Order, the corresponding Rule Terminal Node will be executed. It is important to remember at this point that Drools separates the rule evaluation phase from the rule execution phase. The execution of a Rule Terminal Node will not execute the right-hand side code from the corresponding rule; it will just notify the Drools Agenda that a new match for the corresponding rule is ready to be executed.

Tip

The code bundled with this book contains a module called phreak-inspector used to generate the PHREAK diagrams for this chapter. The code associated with this chapter shows how phreak-inspector can be used and how the diagrams in this chapter can be recreated.

The Object Type sub-network (composed of all the Object Type Nodes in the network) is always present in PHREAK and its depth is always 1: we will never find an Object Type Node followed by another Object Type Node.

Alpha Nodes

In Drools, a pattern may contain no or multiple constraints. Each individual constraint a pattern has is represented in the PHREAK network as an Alpha Node. This type of node is then in charge of the evaluation of the particular constraint it represents. If the constraint evaluates to true, the next node in the network will be evaluated:

rule "Sample Rule 1"
when
    $c: Customer(age > 30, category == Category.GOLD)
then
    channels["customer-channel"].send($c);
end

The preceding rule contains a single pattern with two constraints: one on the age attribute and another on the category attribute. Each of these constraints will be represented as an Alpha Node in the corresponding PHREAK network:

Alpha Nodes

When a Customer fact is inserted in the network shown earlier, the first Alpha Node will be evaluated. If the age of the Customer is less than or equal to 30, the propagation will stop and the next Alpha Node in this path will not be evaluated at all. If the age of the Customer is indeed greater than 30, the next Alpha Node will be evaluated and, if it also evaluates to true (meaning that the Customer's category is GOLD), a match will be created in the Agenda for the Sample Rule 1 rule.

The order of the Alpha Nodes in the PHREAK network depends on the order in which the corresponding constraints are defined in DRL. If the age constraint precedes the category constraint in a rule, the age Alpha Node will precede the category Alpha Node in the generated network.

Alpha Node sharing

Just like with Object Type Nodes, Alpha Nodes can be shared among multiple rules (or patterns inside a rule) in a knowledge base. If the same constraint is used in more than one pattern, Drools will optimize the creation of the PHREAK network and a single Alpha Node will be used.

Let's assume the following two rules:

rule "Sample Rule 1"
when
    $c: Customer(age > 30, category == Category.GOLD)
then
    channels["gold-customer-channel"].send($c);
end

rule "Sample Rule 2"
when
    $c: Customer(age > 30, category == Category.SILVER)
then
    channels["silver-customer-channel"].send($c);
end

The earlier two rules have the same condition (age > 30) duplicated. The corresponding PHREAK network for these two rules will look like the following one:

Alpha Node sharing

As we can see, the Alpha Node that corresponds to the duplicated constraint only appears once in the network. In addition to saving some memory, Alpha Node sharing reduces the evaluation time of a knowledge base because a condition that is present in multiple rules needs to be evaluated just once.

We have already mentioned that the order of the constraints in a pattern dictates the order of the corresponding Alpha Node in the PHREAK network. This concept is particularly important regarding Alpha Node sharing. For an Alpha Node to be shared, the order of the constraint must be the same in all the patterns. For example, the following two rules are semantically identical to the two rules introduced before, but the order of the constraints in the pattern in the second rule was altered:

rule "Sample Rule 1"
when
    $c: Customer(age > 30, category == Category.GOLD)
then
    channels["gold-customer-channel"].send($c);
end

rule "Sample Rule 2"
when
    $c: Customer(category == Category.SILVER, age > 30)
then
    channels["silver-customer-channel"].send($c);
end

If we analyze the PHREAK network generated by the new version of our rules, we will notice something interesting:

Alpha Node sharing

Because the order of the age constraint is not the same in the two rules, the corresponding PHREAK network will contain a duplicated alpha node. The major drawback here is that certain facts (for example, a SILVER Customer) will trigger the same evaluation over the same logical constraint multiple times.

Constraint JIT compilation

In order to evaluate the constraints of the rules in a knowledge base, Drools heavily relies on MVEL. MVEL, by default, uses an interpreter to evaluate the constraint expressions. This means that, most of the time, the evaluation of a constraint is not really happening at the Java bytecode level, which means that it is not as efficient as it could be.

Fortunately, Drools provides a way to compile an MVEL expression into Java bytecode. Because this compilation happens in runtime, it is usually referred as JIT (just-in-time) compilation. But of course, there is a catch. Otherwise, why does Drools not simply compile to bytecode all the constraints in a knowledge base? One of the major drawbacks of JIT compilation is that it could be memory-intensive and may create problems related to the permanent generation heap.

The way Drools deals with this drawback in JIT compilation is by using a threshold on the number of times a constraint is evaluated before it is compiled into bytecode. The assumption here is that, if a constraint is never evaluated, or is only evaluated a couple of times, it is more efficient to evaluate it in interpreted mode than to JIT-compile it. In Drools 6.3 the default threshold is 20 times.

There are two common ways in Drools to tune the default JIT threshold according to the specific needs of an application:

  • By using the drools.jittingThreshold system property when running our application—that is, -Ddrools.jittingThreshold=10 to reduce the threshold to 10 times
  • By setting the desired threshold value in the KieBaseConfiguration object used to create our KIE Bases:
    KieBaseConfiguration kbConfig = KieServices.Factory.get().newKieBaseConfiguration();
    kbCofig.setOption(ConstraintJittingThresholdOption.get(10);

The entire JIT compilation can be disabled by setting a negative threshold value.

Beta Nodes

The rules covered so far in this chapter were constrained to none or a single pattern. We know so far that the type of the pattern is converted by Drools into an Object Type Node and that each of the constraints they contain is translated into an Alpha Node. But what happens when a rule is composed of more than one pattern? The answer is simple: a Beta Node representing the join operation between each pair of patterns is created.

A Beta Node has two inputs and one or more outputs and it will wait until data is available in both inputs before moving to the next node/s in the network.

As an example, let's evaluate the PHREAK network generated by the following rule:

rule "Sample Rule 1"
when
    $p: Provider(rating > 50)
    $pr: ProviderRequest()
then
    channels["request-channel"].send($pr);
end

The rule is straightforward: for each Provider with a rating greater than 50 and the ProviderRequest fact, an activation of the rule will be placed in Drools Agenda. The corresponding PHREAK network for this rule will look like the following:

Beta Nodes

In the preceding PHREAK network, we can easily identify the two Object Type Nodes for our two patterns and the Alpha Node for the rating constraint. We can also see that the patterns in the network are joined by a Beta Node. Because there is no explicit relation between the Provider and the ProviderRequest in our sample rule, the Beta Node will create the Cartesian product of all the Provider and ProviderRequest facts it has in its inputs. Each of the generated tuples will be propagated to the next node. In this case, the Rule Terminal Node will determine that a new activation for the rule should be placed in the Agenda.

If we want to avoid the full Cartesian product of our facts, we can set a constraint on the provider attribute of the Provide rRequest pattern:

rule "Sample Rule 1"
when
    $p:  Provider(rating > 50)
    $pr: ProviderRequest(provider == $p)
then
    channels["request-channel"].send($pr);
end

Now, the rule is not interested in any ProviderRequest, but only in those who are related to the Provider matched in the previous pattern. When a constraint involves a variable that is bound to something that is external to the pattern where it is defined, the evaluation of the constraint doesn't happen inside an Alpha Node but instead inside the Beta Node performing the corresponding joining. In our sample, the PHREAK network will now look like the following:

Beta Nodes

The Beta Node in the preceding network will not create the full Cartesian product of the facts coming from its inputs this time. Only those tuples matching the constraint in the Beta Node will be forwarded to the next node in the network.

Beta Node sharing

It shouldn't come as a surprise at this point that Beta Nodes can also be shared among multiple rules in the PHREAK network. As an example, let's see how Drools models the PHREAK network for the following two rules:

rule "Sample Rule 1"
when
    $p:  Provider(rating > 50)
    $pr: ProviderRequest()
then
    channels["provider-channel"].send($pr);
end

rule "Sample Rule 2"
when
    $p:  Provider(rating > 50)
    $pr: ProviderRequest()
    $o:  Order()
then
    channels["order-channel"].send($o);
end

The first two patterns in the preceding rules are identical. We already know that the Alpha Node belonging to the constraint on the first pattern will be shared among the rules. What we didn't know yet was that the Beta Node for the join between the Provider and ProviderRequest patterns will also be shared:

Beta Node sharing

As we can see, the Beta Node at the top is shared among both rules.

It is important to note that, in order for a Beta Node to be shared, the order of the patterns in the rules matters. The concept is similar to the order of the constraints in a pattern when we were talking about Alpha Node sharing: if the order is not the same, Drools will not optimize it. For example, if we modify the second rule in our example by inverting the order of the first two patterns, the semantics of the rule doesn't change, but its underlying implementation in PHREAK does:

rule "Sample Rule 2"
when
    $pr: ProviderRequest()
    $p:  Provider(rating > 50)
    $o:  Order()
then
    channels["order-channel"].send($o);
end

The resulting PHREAK network in this case will look like the following:

Beta Node sharing

Because the order of the first two patterns in our rules is no longer the same, a new Beta Node now appears in the corresponding PHREAK network. But the order of the patterns is not the only thing that matters when it comes to Beta Node sharing: the nodes previous to a shared Beta Node must also be the same in order for Drools to optimize it. As an example, let's go back to our original rules, but let's change the rating constraint of the Provider in the second rule to 60:

rule "Sample Rule 2"
when
    $p:  Provider(rating > 60)
    $pr: ProviderRequest()
    $o:  Order()
then
    channels["order-channel"].send($o);
end

What we have basically done is to break the shared Alpha Node both rules had. This means that our Beta Nodes for the first two patterns have different inputs now:

Beta Node sharing

The preceding network now shows two different Alpha Nodes that result in two different Beta Nodes.

As we can see, Beta Node sharing is not particularly easy to achieve; when designing our rules, we must always try to keep the same order between patterns and constraints inside the patterns when they are reused among multiple rules. DSL, Templates, and Decision Tables are all good alternatives to ensure that this order is kept for large knowledge bases.

Or between patterns

A curious thing happens in the PHREAK network when the or conditional element is used between patterns. As a matter of fact, Drools doesn't really understand the or conditional element at all; what it does is to convert it into a semantically equivalent set of sub-rules inside the PHREAK network.

As an example, let's assume we have a rule to detect when a Customer that is not GOLD has a SuspiciousOperation or an Order bigger than $100,000. This rule can be written as follows:

rule "Sample Rule 1"
when
    $c: Customer(category != Category.GOLD)
    (
        Order(customer == $c, total > 10000) or
        SuspiciousOperation(customer == $c)
    )
then
    channels["suspicious-customer"].send($c);
end

Ideally, we should have used the conditional element exists to make sure we don't have multiple activations of this rule when multiple Orders or SuspiciousOperations for a customer exist. But, given that we haven't yet introduced the exists conditional element in this chapter, we will not use it.

The preceding rule will be converted by Drools into the following PHREAK network:

Or between patterns

The network above shows the two sub-rules generated by Drools, evidenced by the presence of a duplicated Rule Terminal Node for the single rule we had. What Drools has basically done is to split the original rule into the following two rules:

rule "Sample Rule 1.1"
when
    $c: Customer(category != Category.GOLD)
    Order(customer == $c, total > 10000)
then
    channels["suspicious-customer"].send($c);
end

rule "Sample Rule 1.2"
when
    $c: Customer(category != Category.GOLD)
    SuspiciousOperation(customer == $c)
then
    channels["suspicious-customer"].send($c);
end

Each sub-rule in the PHREAK network is now independent: both of them can be independently activated and fired. This explains why there is no such thing as a short-circuit between patterns in Drools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.55.198