Fraud detection

It will be easier to explain these concepts using an example of a fraud detection system. Fraud in banking systems is becoming a major concern. The amount of online transactions are increasing every day. An automatic system for fraud detection is needed. The system should analyze various events happening in a bank, and based on a set of rules, raise an appropriate alarm.

This problem cannot be solved by the standard Drools rule engine. The volume of events is huge and they happen asynchronously. If we simply insert them into the knowledge session, we would soon run out of memory. While the Rete algorithm behind Drools doesn't have any theoretical limitation on the number of objects in a session, we could use the processing power more wisely. Drools Fusion is the right candidate for this kind of task.

Problem description

Let's consider the following set of business requirements for the fraud detection system:

  • If a notification is received from a customer about a stolen card, block this account and any withdrawals from this account.
  • Check each transaction against a blacklist of account numbers. If the transaction is transferring money from/to such an account, then flag this transaction as suspicious with the maximum severity.
  • If there are two large debit transactions from the same account within a 90-second period and each transaction is withdrawing more than 300 percent of the average monthly (30 days) withdrawal amount, flag these transactions as suspicious with minor severity.
  • If there is a sequence of three consecutive and increasing debit transactions originating from a same account within a 3-minute period and these transactions are together withdrawing more than 90 percent of the account's average balance over 30 days, then flag those transactions as suspicious with minor severity and suspend the account.
  • If the number of withdrawals over a day is 500 percent higher than the average number of withdrawals over a 30-day period and the account is left with less than 10 percent of the average balance over a month (30 days), then flag the account as suspicious with minor severity.
  • Perform a duplicate transactions check; if two transactions occur in a time window of 15 seconds that have the same source/destination account number, are of the same amount, and just differ in transaction ID, then flag those transactions as duplicates.

The following are the things that you need to monitor:

  • Monitor the average withdrawal amount over all accounts for 30 days
  • Monitor the average balance across all accounts

Design and modeling

Looking at the requirements we'll need a way of flagging a transaction as suspicious. This state can be added to the existing Transaction type or we can externalize this state to a new event type. We'll do the latter. The following new events will be defined:

  • TransactionCreatedEvent: This is an event that is triggered when a new transaction is created. It contains a transaction identifier, source account number, destination account number, and actual amount transferred.
  • TransactionCompletedEvent: This is an event that is triggered when an existing transaction has been processed. It contains the same fields as the TransactionCreatedEvent class.
  • AccountUpdatedEvent: This is an event triggered when an account has been updated. It contains the account number, current balance, and transaction identifier of a transaction that initiated this update.
  • SuspiciousAccount: This is an event triggered when there is some sort of suspicion around the account. It contains the account number and severity of the suspicion. The severity is an enumeration that can have two values: MINOR and MAJOR. This event's implementation is shown in the following code listing.
  • SuspiciousTransaction: Similar to SuspiciousAccount, this is an event that flags a transaction as suspicious. It contains a transaction identifier and severity level.
  • LostCardEvent: This is an event indicating that a card was lost. It contains an account number.

One of the SuspiciousAccount events described is as follows. It also defines the SuspiciousAccountSeverity enumeration that encapsulates various severity levels that the event can represent. The event will define two properties. One of them is already mentioned severity and the other one, accountNumber, will identify the account.

/**
 * marks an account as suspicious
 */
public class SuspiciousAccount implements Serializable {
  public enum SuspiciousAccountSeverity {
    MINOR, MAJOR
  }

  private final Long accountNumber;
  private final SuspiciousAccountSeverity severity;

  public SuspiciousAccount(Long accountNumber,
      SuspiciousAccountSeverity severity) {
    this.accountNumber = accountNumber;
    this.severity = severity;
  }

  private transient String toString;

  @Override
  public String toString() {
    if (toString == null) {
      toString = new ToStringBuilder(this).appendSuper(
          super.toString()).append("accountNumber",
          accountNumber).append("severity", severity)
          .toString();
    }
    return toString;
  }

Code listing 1: Implementation of the Suspic iousAccount event

Please note that the equals and hashCode methods in SuspiciousAccount from this code listing are not overridden. This is because an event represents an active entity, which means that each instance is unique. The toString method is implemented using org.apache.commons.lang.builder.ToStringBuilder. All these event classes are lightweight; they have no references to other domain classes (no object reference, only a number, accountNumber, is this case). They are also implementing the Serializable interface. This makes them easier to transfer between JVMs. As a best practice, this event is immutable; the two properties (accountNumber and severity) are marked as final. They can be set only through a constructor (there are no setter methods but only two getter methods). The getter methods were excluded from this code listing.

The events themselves don't carry a time of occurrence, a time stamp (they easily could if we needed it, and we'll see how in the next set of code listings). When the event is inserted into the knowledge session, the rule engine assigns such a time stamp to FactHandle that is returned (do you remember session.insert(..) returns a FactHandle?). In fact there is a special implementation of FactHandle called EventFactHandle. It extends DefaultFactHandle (which is used for normal facts) and adds a few additional fields; for example, startTimestamp and duration. Both contain millisecond values and are of type long.

We now have the event classes and we know that there is a special FactHandle for events. However, it is still unknown how to tell Drools that our class represents an event. Drools type declarations provide this missing link. As was explained in Chapter 2, Writing Basic Rules, type declarations can define new types; here we'll see how to enhance existing types. For example, to specify that the class TransactionCreatedEvent is an event, so we have to write:

declare TransactionCreatedEvent
  @role( event )
end

Code listing 2: Event role declaration (the cep.drl file)

This code can reside inside a normal .drl file. If our event had a time stamp property or a duration property, we could map it into startTimestamp or duration properties of EventFactHandle by using the following mapping:

  @duration( durationProperty )

Code listing 3: Duration property mapping

The name in brackets is the actual name of the property of our event that will be mapped to the duration property of EventFactHandle. This can be done similarly for the startTimestamp property.

Note

Since an event's state should not be changed (only unpopulated values can be populated), think twice before declaring existing beans as events. Modification to a property may result in an unpredictable behavior.

Fraud detection rules

Let's imagine that the system processes thousands of transactions at any given time. It is clear that this is challenging in terms of time and memory consumption. It is simply not possible to keep all data (transactions, accounts, and so on) in memory. A possible solution would be to keep all accounts in memory, since there won't be that many of them (in comparison to transactions) and keep only transactions for a certain period. With Drools Fusion we can do this very easily by declaring that a Transaction is an event.

The transaction will then be inserted into the knowledge session through a custom entry point. As the name suggests it is an entry point into the knowledge session. So far we were always inserting the facts into the session using the default entry point (when calling session.insert(..)) The advantage of using a custom entry point is that it applies only to a specific rule(s) (no other rules will see the event). This makes sense, especially if there are large quantities of data and only some rules are interested in them. We'll look at entry points in the following example.

If you are still concerned about the volume of objects in memory, this solution can be easily partitioned, for example, by account number. There might be more servers; each processing only a subset of accounts (a simple routing strategy might be accountNumber module totalNumberOfServersInCluster). Then each server would receive only appropriate events.

Notification

The requirement we're going to implement here is essentially to block an account whenever a LostCardEvent type is received. This rule will match on two facts. One of type Account and one of type LostCardEvent. The rule will then set the the status of this account to blocked. The implementation of the rule is as follows:

rule notification
  when
    $account : Account( status != Account.Status.BLOCKED )
    LostCardEvent( accountNumber == $account.number )
      from entry-point LostCardStream
  then
    modify($account) {
      setStatus(Account.Status.BLOCKED)
    };
end

Code listing 4: Notification rule that blocks an account (the cep.drl file)

As we already know, Account is an ordinary fact from the knowledge session. The second fact, LostCardEvent, is an event from an entry point called LostCardStream. Whenever a new event is created and goes through the custom entry point LostCardStream, this rule tries to match (checks if its conditions can be satisfied). If there is an Account fact in the knowledge session that didn't match with this event yet, and all conditions are met, the rule is activated. The consequence sets the status of the account to blocked in a modify block.

Since we're updating the account in the consequence and also matching on it in the condition, we have added a constraint that matches only on nonblocked accounts. This not only makes sense but also it is a way to prevent looping (see status != Account.Status.BLOCKED).

Test configuration setup

Following the best practice that every code/rule needs to be tested, we'll now set up a class for writing unit tests. All rules will be written in a file called cep.drl. When creating this file, just ensure it is on the classpath. The creation of the knowledgeBase object won't be shown. It is similar to the previous tests that we've written. We just need to change slightly the default knowledge base configuration:

    KnowledgeBaseConfiguration config = KnowledgeBaseFactory
        .newKnowledgeBaseConfiguration();
    config.setOption( EventProcessingOption.STREAM );

Code listing 5: Enabling event processing mode on knowledge base configuration

This will enable the event processing mode. The KnowledgeBaseConfiguration event is then used when creating the knowledge base KnowledgeBaseFactory.newKnowledgeBase(config).

Part of the setup is also clock initialization. We already know that every event has a time stamp. This time stamp comes from a clock, which is inside the knowledge session. Drools supports several clock types, for example, a real-time clock or a pseudo clock. The real-time clock is the default and should be used in normal circumstances. The pseudo clock is especially useful for testing, as we have complete control over the time. The initialize method shown in Code listing 6 sets up a pseudo clock. This is done by setting the clock type on a KnowledgeSessionConfiguration event and passing this object to the newStatefulKnowledgeSession method of knowledgeBase. The initialize method then makes this clock available as a test instance variable called clock when calling session.getSessionClock(), as we can see in the following code listing:

public class CepTest {
  static KnowledgeBase knowledgeBase;
  StatefulKnowledgeSession session;
  Account account;
  FactHandle accountHandle;
  SessionPseudoClock clock;
  TrackingAgendaEventListener trackingAgendaEventListener;
  WorkingMemoryEntryPoint entry;

  @Before
  public void initialize() throws Exception {
    KnowledgeSessionConfiguration conf =
      KnowledgeBaseFactory.newKnowledgeSessionConfiguration();
    conf.setOption( ClockTypeOption.get( "pseudo" ) );    
    session = knowledgeBase.newStatefulKnowledgeSession(conf,
        null);
    clock = (SessionPseudoClock) session.getSessionClock();

    trackingAgendaEventListener =
        new TrackingAgendaEventListener();
    session.addEventListener(trackingAgendaEventListener);

    account = new Account();
    account.setNumber(123456l);
    account.setBalance(BigDecimal.valueOf(1000.00));
    accountHandle = session.insert(account);

Code listing 6: Unit tests setup (the CepTest.java file)

The initialize method also creates an event listener and passes it into the session. The event listener is called TrackingAgendaEventListener. It simply tracks all rule executions. It is useful for unit testing to verify that a rule is fired or not. Its implementation is as follows:

public class TrackingAgendaEventListener extends
    DefaultAgendaEventListener {
  List<String> rulesFiredList = new ArrayList<String>();

  @Override
  public void afterActivationFired(
      AfterActivationFiredEvent event) {
    rulesFiredList.add(event.getActivation().getRule()
        .getName());
  }

  public boolean isRuleFired(String ruleName) {
    for (String firedRuleName : rulesFiredList) {
      if (firedRuleName.equals(ruleName)) {
        return true;
      }
    }
    return false;
  }

  public void reset() {
    rulesFiredList.clear();
  }
}

Code listing 7: Agenda Event listener that tracks all rules that have been fired

Note

Please note that the DefaultAgendaEventListener event comes from the package org.drools.event.rule that is part of the knowledge-api.jar file as opposed to coming from the package org.drools.event that is part of the old API in drools-core.jar.

All Drools agenda event listeners must implement the AgendaEventListener interface. Our TrackingAgendaEventListener interface extends the DefaultAgendaEventListener interface so that we don't have to implement all methods defined in the AgendaEventListener interface. Our listener just overrides the afterActivationFired method that will be called by Drools every time a rule's consequence has been executed. Our implementation of this method adds the fired rule name into a list of fired rules, rulesFiredList. Then the convenience method isRuleFired takes ruleName as a parameter and checks if this rule has been executed/fired. The reset method is useful for clearing out the state of this listener; for example, after the session.fireAllRules call.

Now, lets get back to the test configuration setup. The last part of the initialize method from Code Listing 6 is the account object creation (account = new Account(); ...). This is for convenience so that not every test has to create one. The account balance is set to 1000. The account is inserted into the knowledge session and its FactHandle is stored so that the account object can be easily updated.

Testing the notification rule

The test infrastructure is now fully set up and we can write a test for the notification rule from Code listing 4:

  @Test
  public void notification() throws Exception {
    session.fireAllRules();
    assertNotSame(Account.Status.BLOCKED,account.getStatus());

    entry = session
        .getWorkingMemoryEntryPoint("LostCardStream");
    entry.insert(new LostCardEvent(account.getNumber()));
    session.fireAllRules();
    assertSame(Account.Status.BLOCKED, account.getStatus());
  }

Code listing 8: Notification rule's unit test (the CepTest.java file)

The test verifies that the account is not blocked by accident first. Then it gets the LostCardStream entry point from the session by performing session.getWorkingMemoryEntryPoint("LostCardStream"). Then the code listing demonstrates how an event can be inserted into the knowledge session through an entry point, entry.insert(new LostCardEvent(...)).

Note

Note that in a real application you'll probably want to use Drools Pipeline for inserting events into the knowledge session. It can be easily connected to an existing enterprise service bus (ESB ) or a JMS topic or queue.

Drools entry points are thread-safe, meaning that each entry point can receive facts from a different thread. In this case it makes sense to start the engine in the fireUntilHalt mode in a separate thread like this:

    new Thread(new Runnable() {
      public void run() {
        session.fireUntilHalt();
      }
    }).start();

Code listing 9: Continuous execution of rules

The engine will then continuously execute activations until the session.halt() method is called.

The test then verifies that the status of the account is blocked. If we perform simply session.insert(new LostCardEvent(..)), the test would fail, because the rule wouldn't see the event.

Monitoring averageBalanceQuery

In this section we'll look at how to write some monitoring rules/queries over the data that is in the knowledge session. Let's say that we want to know what is the average balance across all accounts. Since all of them are in the knowledge session, we could use collect (that was introduced in the The collect conditional element section of Chapter 6, Working with Stateful Session) to collect all accounts into a collection and then iterate over this collection, sum all balances and then divide it by the number of accounts. Another more preferred solution is to use the neighbor of collect, accumulate. The following is a query that calculates the average balance across all accounts:

query averageBalanceQuery
  accumulate( Account($balance : balance);
    $averageBalance : average($balance) )  
end

Code listing 10: Query for calculating the average balance over all accounts (the cep.drl file)

Similar to collect, accumulate iterates over objects in the knowledge session that meet given criteria; however, in the case of accumulate, it performs some action on each individual object before returning the result or results. In our example, the action is average($balance). Finally, the result is returned as $averageBalance variable. The average balance is updated whenever there is a new account or an existing account is updated or retracted from the knowledge session. Similar to collect, you can think of it as a continuous query. Other useful functions that can be used within accumulate are: count, which is used for counting objects; min/max, which is used for finding the minimum/maximum value, sum, which is used for calculating the sum of all values, and others. Some of them will be shown in the following examples. We'll also define a new one.

Note

Note that the accumulate function can take any code block (written in the current dialect). This means that it is, for example, valid to write sum($account.getBalance().multiply($account.getInterestRate())).

Furthermore, you can specify constraints on the result of accumulate. For example, let's say that we want the averageBalance object to be at least 300. We could write accumulate( Account($balance : balance); $averageBalance : average($balance); $averageBalance > 300 ).

The accumulate element has many forms; we could have also written Number( $averageBalance : doubleValue ) from accumulate( Account($balance : balance), average($balance) ); however, the use of these other forms is discouraged.

Testing the averageBalanceQuery

The test for this averageBalanceQuery query is as follows. First, it will use the default setup, which includes one account in the knowledge session that has a balance of 1000. Then, it will add another account into the knowledge session and verify that the average balance is correct:

  @Test
  public void averageBalanceQuery() throws Exception {
    session.fireAllRules();
    assertEquals(account.getBalance(), getAverageBalance());

    Account account2 = new Account();
    account2.setBalance(BigDecimal.valueOf(1400));
    session.insert(account2);
    session.fireAllRules();
    assertEquals(BigDecimal.valueOf(1200.00),
        getAverageBalance());
  }

  BigDecimal getAverageBalance() {
    QueryResults queryResults = session
        .getQueryResults("averageBalanceQuery");
    return BigDecimal.valueOf((Double) queryResults
        .iterator().next().get("$averageBalance"));
  }

Code listing 11: Test for the averageBalanceQuery method (the CepTest.java file)

The getAverageBalance method gets the query results and extracts the $averageBalance variable.

Two large withdrawals

We'll now look at the next requirement. A rule that will flag two transactions as suspicious if they are withdrawing more than 300 percent of the average withdrawn amount over 30 days. The problem is how to find out the average withdrawn amount for an account over 30 days. This is when sliding time windows or sliding length windows come in handy. They allow us to match only those events that originated within the window. In the case of time windows the session clock's time minus the event's time stamp must be within the window time. In the case of length windows only the n most recent events are taken into account. Time/Length windows also have another very important reason. They allow Drools to automatically retract events that are no longer needed—those that are outside of the window. This applies to events that were inserted into the knowledge session through a custom entry point.

The average withdrawn amount can be calculated by averaging the amounts of TransactionCompletedEvents. We are only interested in transactions that have already been successfully completed. We can now match only those transactions that happened within the last 30 days: over window:time( 30d ) from entry-point TransactionStream. If we, for example, wanted the 10 most recent events, we'd write over window:length( 10 ) from entry-point TransactionStream.

We know how to calculate the average withdrawn amount. All that remains is to find two transactions happening over 90 seconds that are withdrawing 300 percent or more. TransactionCreatedEvent can be used to find those transactions. The implementation is as follows:

rule twoLargeWithdrawals
dialect "mvel"
  when
    $account : Account( )
    accumulate( TransactionCompletedEvent( fromAccountNumber
      == $account.number, $amount : amount ) over
      window:time( 30d ) from entry-point TransactionStream,
      $averageAmount : average( $amount ) )
    $t1 : TransactionCreatedEvent( fromAccountNumber ==
      $account.number, amount > $averageAmount * 3.00 ) over
      window:time(90s) from entry-point TransactionStream
    $t2 : TransactionCreatedEvent( this != $t1,
      fromAccountNumber == $account.number,
      amount > $averageAmount * 3.00 ) over
      window:time(90s) from entry-point TransactionStream
  then
    insert(new SuspiciousAccount($account.number,
      SuspiciousAccountSeverity.MINOR));
    insert(new SuspiciousTransaction($t1.transactionUuid,
      SuspiciousTransactionSeverity.MINOR));
    insert(new SuspiciousTransaction($t2.transactionUuid,
      SuspiciousTransactionSeverity.MINOR));
end

Code listing 12: Implementation of the twoLargeWithdrawals rule (the cep.drl file)

The rule is matching on an Account object, calculating the $averageAmount variable for this account and finally matching on two different TransactionCreatedEvents (we make sure that they are different by performing != $t1). These events represent transactions from this account, which have an amount 300 percent larger than the $averageAmount variable; this is enforced with this constraint: amount > $averageAmount * 3.00. These events must occur in a time window of 90 seconds as can be seen: over window:time(90s) from entry-point TransactionStream. The consequence then inserts three new events into the knowledge session. They flag the account and transactions as suspicious with minor severity.

As you may have noticed, in this rule we've used one stream, TransactionStream, for getting two types of events. This is completely valid. If the performance is your primary concern, ideally every event should have its own stream. That way no rule will see events that it is not interested in.

Note

Note that if using a real-time clock, think twice about the length of the time window. Under a heavy load, the CPU might be so busy that the event won't be processed in the expected time window (the event's startTimestamp may not be accurate). In that case the sliding length window makes more sense.

Testing the twoLargeWithdrawals rule

As usual, our unit test will exercise some of the corner cases where the rule is most likely to break. It will follow the sequence of events presented in the following timeline figure:

Testing the twoLargeWithdrawals rule

Figure 1: Time diagram – sequence of events

Each event is represented by an arrow pointing down. At the base of the arrow is the amount that is being withdrawn. The first two events are of type TransactionCompletedEvent and their task is to build the average amount that was withdrawn. The average will be 500. The following events are of type TransactionCreatedEvent and they are the ones we want to keep an eye on. The first two of them meet the time constraint of 90 seconds, but the first isn't three times greater than the average. Therefore, our rule won't be activated. The next event comes after 91 seconds, which doesn't meet the time window constraint. Finally, the last two events meet all constraints, and we can verify that the rule fired and that the account and transactions were marked as suspicious. The test implementation is as follows:

  @Test
  public void twoLargeWithdrawals() throws Exception {
    entry = session
        .getWorkingMemoryEntryPoint("TransactionStream");
    transactionCompletedEvent(400);
    clock.advanceTime(5, TimeUnit.DAYS);
    transactionCompletedEvent(600);
    clock.advanceTime(11, TimeUnit.DAYS);

    transactionCreatedEvent(100);
    clock.advanceTime(30, TimeUnit.SECONDS);
    transactionCreatedEvent(1600);
    assertNotFired("twoLargeWithdrawals");

    clock.advanceTime(91, TimeUnit.SECONDS);
    transactionCreatedEvent(2100);
    assertNotFired("twoLargeWithdrawals");

    clock.advanceTime(30, TimeUnit.SECONDS);
    transactionCreatedEvent(1700);
    assertFired("twoLargeWithdrawals");
  }

Code listing 13: Test for the twoLargeWithdrawals rule (the CepTest.java file)

For brevity, commonly used code snippets have been refactored into helper methods. For example, the creation of TransactionCompletedEvent and its insertion into the session has been refactored into the transactionCompletedEvent method as shown in the following code listing:

  private void transactionCompletedEvent(
      double amountTransferred) {
    entry.insert(new TransactionCompletedEvent(BigDecimal
        .valueOf(amountTransferred), account.getNumber()));
  }

Code listing 14: Helper method that creates TransactionCompletedEvent and inserts it into the knowledge session (the CepTest.java file)

The event is initialized with the transferred amount and source account number, as you may imagine the transactionCreatedEvent method from Code Listing 13 is similar.

Another helper method, assertFired, takes a rule name as an argument, fires a rule that matches this name, and verifies that the rule fired using trackingAgendaEventListener:

  private void assertFired(String ruleName) {
    session.fireAllRules(new RuleNameEqualsAgendaFilter(
        ruleName));
    assertTrue(trackingAgendaEventListener
        .isRuleFired(ruleName));
  }

Code listing 15: Helper method for verifying that a rule with specified name has fired (the CepTest.java file)

The agenda filter RuleNameEqualsAgendaFilter was already used in Chapter 4, Transforming Data. Do not use the deprecated org.drools.base.RuleNameEqualsAgendaFilter, otherwise you'll get compilation errors. The logic is the same; however, the deprecated agenda filter doesn't use the new API.

As you may imagine the assertNotFired method is similar to the assertFired method. If we now run the twoLargeWithdrawals test, everything should pass as expected.

Sequence of increasing withdrawals

We'll now focus on the next requirement from the list. Among other things, it talks about an account's average balance over 30 days. We shouldn't have any problem calculating this. Thinking about the implementation of the rule, it seems that more rules are calculating these averages. We should be able to separate this logic into another rule that will calculate this information and store it into some common data structure. Other rules will just match on this data structure and use the calculated averages. We've a plan. Now, let's define this data structure. Drools type declarations can be used for this purpose. The declaration may look as follows:

declare AccountInfo
  number : Long
  averageBalance : BigDecimal
  averageAmount : BigDecimal
end

Code listing 16: The AccountInfo type declaration (the cep.drl file)

Please note that in this use of the declare keyword, we're not modifying existing type (as was the case in Code listing 2) but adding a completely new one.

The common data structure is there; we can write the rule that will populate it. Our rule will match on an Account object, calculate its average balance over 30 days, and will set this calculated amount into the AccountInfo object:

rule averageBalanceOver30Days
no-loop true
  when
    $account : Account( )
    accumulate( AccountUpdatedEvent( accountNumber ==
      $account.number, $balance : balance ) over
      window:time( 30d ) from entry-point AccountStream,
      $averageBalance : average($balance) )
    $accountInfo : AccountInfo( number == $account.number )
  then
    modify($accountInfo) {
      setAverageBalance($averageBalance)
    };
end

Code listing 17: Rule that calculates the average balance for an account over 30 days (the cep.drl file)

The averageBalanceOver30Days rule accumulates AccocuntUpdateEvents in order to calculate the average balance over 30 days. Finally, the consequence sets calculated $averageBalance into the $accountInfo variable. Note the use of the no-loop rule attribute. It is needed, otherwise the rule will loop, because the AccountInfo fact is being matched in condition as well as being modified in the consequence. Later, we'll see how to eliminate the no-loop attribute.

Average balance test

The AccountInfo object needs to be added into the knowledge session before the averageBalanceOver30Days rule can be activated. Since it is an internal type, we cannot simply make a new instance of this class (for example, to call new AccountInfo()). This type will only be created at runtime, when the knowledge package is compiled. The Drools team thought about this and they have added a method to the KnowledgeBase called getFactType, which returns an object implementing the org.drools.definition.type.FactType interface. This interface encapsulates the type information about an internal type. It allows us to create new instances, get a list of fields, set/get their properties, even get a map of field-value pairs, and set the values from such a map.

The AccountInfo bean may be used by many rules, so we'll add it into our unit test initialize method that is called before every test method execution. First, let's add types to our test class that will be needed:

  FactType accountInfoFactType;
  Object accountInfo;
  FactHandle accountInfoHandle;

Code listing 18: CepTest unit test class properties (the CepTest.java file)

Now, the following AccountInfo setup logic can be added at the end of the initialize method. The following code listing will demonstrate how a new instance of an internal type can be created and its properties can be set:

    accountInfoFactType = knowledgeBase.getFactType(
      "droolsbook.cep", "AccountInfo");
    accountInfo = accountInfoFactType.newInstance();
    accountInfoFactType.set(accountInfo, "number",
      account.getNumber());
    accountInfoFactType.set(accountInfo, "averageBalance",
      BigDecimal.ZERO);
    accountInfoFactType.set(accountInfo, "averageAmount",
      BigDecimal.ZERO);    
    accountInfoHandle = session.insert(accountInfo);

Code listing 19: The AccountInfo internal type setup (the CepTest.java file)

The first line gets the fact type from the knowledge session. The getFactType method takes the .drl file package name and the name of the fact type. Then a new accountInfoFactType.newInstance() instance is created. The accountInfoFactType object is then used to set properties on the accountInfo instance. Finally, accountInfo is inserted into the session and its fact handle is kept.

Similarly, initialization code of AccountInfo might be needed in a real application. When the application starts up, AccountInfo should be preinitialized with some reasonable data.

The unit test of the averagebalanceover30days method is as follows, which will create some AccountUpdatedEvents and verify that they are used to calculate the correct average balance:

  @Test
  public void averageBalanceOver30Days() throws Exception {
    entry = session
        .getWorkingMemoryEntryPoint("AccountStream");

    accountUpdatedEvent(account.getNumber(), 1000.50,1000.50);
    accountUpdatedEvent(account.getNumber(), -700.40, 300.10);
    accountUpdatedEvent(account.getNumber(), 500, 800);
    accountUpdatedEvent(11223344l, 700, 1300);

    assertFired("averageBalanceOver30Days");
    assertEquals(BigDecimal.valueOf(700.20).setScale(2),
       accountInfoFactType.get(accountInfo,"averageBalance"));
  }

Code listing 20: Unit test for the averageBalanceOver30Days rule (the CepTest.java file)

The test first obtains the AccountStream entry point for inserting the events. It uses the accountUpdateEvent helper method to create AccountUpdatedEvents. This method takes the account number, amount transferred, and balance. These parameters are passed directly into the event's constructor as was the case in the previous unit test. The test also creates one unrelated AccountUpdatedEvent to verify that it won't be included in the calculation. Finally, the test verifies that the rule has been fired and the average is of the expected value 700.20 ((1000.50 + 300.10 + 800)/3 = 2100.60 / 3 = 700.20).

However, when we run the test, it fails as soon as it gets to creating the knowledge base with this error:

java.lang.RuntimeException: Rule Compilation error : [Rule name='averageBalanceOver30Days']
… The method setAverageBalance(BigDecimal) in the type AccountInfo is not applicable for the arguments (Number)

Drools is informing us that there is some problem with the rule's consequence. We're trying to set $averageBalance, which is of type Number (in fact java.lang.Double) into a property that is of type BigDecimal. It seems that the average function does not return BigDecimal as we'd like. Luckily, Drools is open source, so we can look under the hood. As was mentioned in the previous sections, the accumulate element supports pluggable functions (average, sum, count, and so on). These functions are implementations of the org.drools.runtime.rule.AccumulateFunction interface.

If we look at the average function's implementation in class AverageAccumulateFunction, we'll notice that its state consists of two fields: count of type int and total of type double. Here lies the problem. Our domain model uses BigDecimal as a best practice when working with floating point numbers; however, average casts all numbers to primitive doubles. We will now write our own implementation of AccumulateFunction that knows how to work with BigDecimal. This function will be called bigDecimalAverage and will be used as follows (note the last line):

    accumulate( AccountUpdatedEvent( accountNumber ==
      $account.number, $balance : balance ) over
      window:time( 30d ) from entry-point AccountStream,
      $averageBalance : bigDecimalAverage($balance) )

Code listing 21: Part of the averageBalanceOver30Days rule that calculates the average balance using the new bigDecimalAverage accumulate function (the cep.drl file)

The knowledge base setup needs to be modified so that Drools knows about our new accumulate function implementation. A new KnowledgeBuilderConfiguration object will hold this information:

KnowledgeBuilderConfiguration builderConf =
  KnowledgeBuilderFactory.newKnowledgeBuilderConfiguration();
builderConf.setOption(AccumulateFunctionOption.get(
  "bigDecimalAverage",
  new BigDecimalAverageAccumulateFunction()));

Code listing 22: Section of unit test's setupClass method (the CepTest.java file)

An AccumulateFunctionOption is set with the new accumulate function, BigDecimalAverageAccumulateFunction, on the knowledge builder configuration. This configuration can be passed to the KnowledgeBuilderFactory.newKnowledgeBuilder(builderConf) factory method that is used to create the knowledge base.

Let's move to the implementation of the accumulate function. We'll first need some value holder for the count and total fields. This value holder will encapsulate all information that the accumulate function invocation needs. The function itself must be stateless:

  /**
   * value holder that stores the total amount and how many
   * numbers were aggregated
   */
  public static class AverageData implements Externalizable {
    public int count = 0;
    public BigDecimal total = BigDecimal.ZERO;

    public void readExternal(ObjectInput in)
        throws IOException, ClassNotFoundException {
      count = in.readInt();
      total = (BigDecimal) in.readObject();
    }

    public void writeExternal(ObjectOutput out)
        throws IOException {
      out.writeInt(count);
      out.writeObject(total);
    }

  }

Code listing 23: The AverageData value holder (the BigDecimalAverageAccumulateFunction.java file)

Note that the AverageData holder is a static member class of BigDecimalAverageAccumulateFunction. The value holder implements the Externalizable interface so that it can be serialized. Finally, the implementation of BigDecimalAverageAccumulateFunction that will define the behavior of our custom function:

public class BigDecimalAverageAccumulateFunction implements
    AccumulateFunction {

  /**
   * creates and returns a context object
   */
  public Serializable createContext() {
    return new AverageData();
  }

  /**
   * initializes this accumulator
   */
  public void init(Serializable context) throws Exception {
    AverageData data = (AverageData) context;
    data.count = 0;
    data.total = BigDecimal.ZERO;
  }

  /**
   * @return true if this accumulator supports reverse
   */
  public boolean supportsReverse() {
    return true;
  }

  /**
   * accumulate the given value, increases count
   */
  public void accumulate(Serializable context, Object value) {
    AverageData data = (AverageData) context;
    data.count++;
    data.total = data.total.add((BigDecimal) value);
  }

  /**
   * retracts accumulated amount, decreases count
   */
  public void reverse(Serializable context, Object value)
      throws Exception {
    AverageData data = (AverageData) context;
    data.count++;
    data.total = data.total.subtract((BigDecimal) value);
  }

  /**
   * @return currently calculated value
   */
  public Object getResult(Serializable context)
      throws Exception {
    AverageData data = (AverageData) context;
    return data.count == 0 ? BigDecimal.ZERO : data.total
        .divide(BigDecimal.valueOf(data.count),
            RoundingMode.HALF_UP);
  }

Code listing 24: Custom accumulate function – BigDecimalAverageAccumulateFunction

The createContext method (at the beginning of Code listing 24) creates a new instance of the AverageData value holder. The init method initializes the accumulate function. supportsReverse informs the rule engine whether this accumulate function supports the retracting of objects (when a fact is being removed from the knowledge session, session.retract(..), or an existing fact, session.update(..)-, is modified). If it doesn't, the rule engine will have to do more work, and if an object is being retracted, the calculation will have to start over. The accumulate/reverse methods are there to execute/reverse the accumulate action (in this case, the calculation of count and total). The getResult method calculates the result. Our implementation uses a hardcoded rounding mode of type HALF_UP. This can be easily customized if needed.

Most, if not all, Drools pluggable components implement the Externalizable interface. This is also the case with the AccumulateFunction interface. We have to implement the two methods that this interface defines. As BigDecimalAverageAccumulateFunction is stateless, its readExternal and writeExternal methods are empty (they are not shown in the code listing).

If we now run the test for the averageBalanceOver30Days rule, it now fails with a different error message telling us that java.lang.Object cannot be casted to BigDecimal. This can be simply fixed by adding a cast into the consequence, setAverageBalance((BigDecimal)$averageBalance). The test should now pass without any errors. Note that instead of defining a custom accumulate function, we could have used a different form of the accumulate construct that uses inline custom code. However, this is not recommended. Please look into the Drools documentation for more information.

Looping prevention – property reactive

As promised, when we're looking at Code listing 17, we'll look at how to drop the no-loop attribute. We'll see another alternative option for how to prevent looping.

Facts can be annotated with the @propertyReactive annotation. For example:

declare AccountInfo
  @propertyReactive
  number : Long
  averageBalance : BigDecimal
  averageAmount : BigDecimal
end

Code listing 25: The AccountInfo property reactive type declaration (the cep.drl file)

What this does is it affects the behavior of the modify call. Normally, when we call modify with a fact, Drools will simply reevaluate every condition that triggers on this fact. However, facts that are annotated with the @propertyReactive annotation behave differently. When such a fact is modified, Drools analyzes which properties have been modified (note that it only works with a modify block) and then reevaluates only those conditions that trigger on these properties. For example, the AccountInfo( number == $account.number ) condition would only be reevaluated if we modify the number property of AccountInfo and not the averageBalance property as is done in the rule from Code listing 17. This means that by adding the @propertyReactive annotation to the AccountInfo fact, we can now drop the no-loop attribute from the averageBalanceOver30Days rule.

You can also explicitly specify for each rule condition on which properties it should trigger on. For example, AccountInfo( number == $account.number ) @watch( averageAmount, !averageBalance ). Here we're telling the condition to trigger on the number property (implicitly) and the averageAmount property (explicitly) and also not to trigger on the averageBalance property. See the documentation for all the possibilities.

Caveat: if you look at the rule in Code listing 17, please note that the Account fact has no constraints (nothing is inside the brackets); this means that if we declare it with the @propertyReactive annotation, it wouldn't trigger on any property modifications. We'd have to explicitly tell it to trigger at least on the number property; in other words, $account : Account( ) @watch( number ). Currently, Drools does not automatically recognize that another condition within the same rule is using a property of the Account fact (like in this case: AccountInfo( number == $account.number )).

After a little side trip, we can now continue with writing the rule sequenceOfIncreasingWithdrawals. To refresh our memory; it is about three consecutive increasing debit transactions. With the arsenal of Drools keywords, we've learned so far that it should be no problem to implement this rule. To make it more interesting, we'll use temporal operators. The temporal operators (after and before) are a special type of operator that know how to work with events (their time stamp and duration properties). In our case we'll simply match on three transactions that happened one after another (with no transactions in between):

rule sequenceOfIncreasingWithdrawals
  when
    $account:Account($number : number)
    $t1:TransactionCreatedEvent(fromAccountNumber == $number)
      from entry-point TransactionStream    
    $t2:TransactionCreatedEvent(amount > $t1.amount,
      fromAccountNumber == $number, this after[0, 3m] $t1)
      from entry-point TransactionStream
    not (TransactionCreatedEvent(fromAccountNumber == $number,
      this after $t1, this before $t2 )
      from entry-point TransactionStream)
    $t3:TransactionCreatedEvent(amount > $t2.amount,
      fromAccountNumber == $number, this after[0, 3m] $t2 )
      from entry-point TransactionStream
    not (TransactionCreatedEvent(fromAccountNumber == $number,
      this after $t2, this before $t3 )
      from entry-point TransactionStream)
    AccountInfo(number == $number, $t1.amount + $t2.amount
      + $t3.amount > averageBalance * BigDecimal.valueOf(0.9))
  then
    insert(new SuspiciousAccount($number,
      SuspiciousAccountSeverity.MAJOR));
    insert(new SuspiciousTransaction($t1.transactionUuid,
      SuspiciousTransactionSeverity.MAJOR));
    insert(new SuspiciousTransaction($t2.transactionUuid,
      SuspiciousTransactionSeverity.MAJOR));
    insert(new SuspiciousTransaction($t3.transactionUuid,
      SuspiciousTransactionSeverity.MAJOR));
end

Code listing 26: Implementation of the sequenceOfIncreasingWithdrawals rule (the cep.drl file)

For example, $t2, in the code we have just seen, is TransactionCreatedEvent that is withdrawing more than $t1; they are from the same account and temporal operator after; (this after[0, 3m] $t1) ensures that event $t2 occurred after event $t1, but within 3 minutes. The next line, not (TransactionCreatedEvent( this after $t1, this before $t2 ) from ... ), is making sure that no event occurred between events $t1 and $t2.

Note

Please note that instead of using sliding time windows to check that two events happened within 3 minutes (over window:time(3m)), we're using temporal operators (this after[0, 3m] $t1). They are much cheaper in terms of used resources.

Operators in Drools are pluggable. This means that the temporal operators we've just seen are simply one of many implementations of the org.drools.runtime.rule.EvaluatorDefinition interface. Others are, for example, soundslike, matches, coincides, meets, metby, overlaps, overlappedby, during, includes, starts, startedby, finishes, or finishedby. Please see Appendix B, Creating Custom Operators, on how to define a custom operator.

As we've seen, operators support parameters that can be specified within the square brackets. Each operator can interpret these parameters differently. It may also depend on an event's time stamp and duration (events we've used in our examples are so-called point in time events and they don't have any duration). For example, this before[1m30s, 2m] $event2 means that the time when this event finished and $event2 started is between 1 minute, 30 seconds and 2 minutes. Please consult the documentation on Drools Fusion for more details on each operator.

The last line of the sequenceOfIncreasingWithdrawals rule's condition tests whether the three matched transactions are withdrawing more than 90 percent of the average balance. The rule's consequence marks these transactions and account as suspicious.

Testing the sequenceOfIncreasingWithdrawals rule

The unit test for the sequenceOfIncreasingWithdrawals rule will follow this sequence of events:

Testing the sequenceOfIncreasingWithdrawals rule

Figure 2: Time diagram – sequence of events

We're using averageBalance preinitialized to 1000. All withdrawals fit into the time window of 3 minutes. The first three withdrawals are not increasing and their sum is not over 90 percent of the average balance. The first, third, and fourth events meet all constraints; they are increasing over 90 percent except one, and they are not consecutive. The second, third, and fourth events are not over 90 percent. Finally, the last three events meet all constraints and the rule should fire. The test method implementation is as follows:

  @Test
  public void sequenceOfIncreasingWithdrawals()
      throws Exception {
    entry = session
        .getWorkingMemoryEntryPoint("TransactionStream");    
    accountInfoFactType.set(accountInfo, "averageBalance",
        BigDecimal.valueOf(1000));
    session.update(accountInfoHandle, accountInfo);

    transactionCreatedEvent(290);
    clock.advanceTime(10, TimeUnit.SECONDS);
    transactionCreatedEvent(50);
    clock.advanceTime(10, TimeUnit.SECONDS);
    transactionCreatedEvent(300);
    assertNotFired("sequenceOfIncreasingWithdrawals");

    clock.advanceTime(10, TimeUnit.SECONDS);
    transactionCreatedEvent(350);
    assertNotFired("sequenceOfIncreasingWithdrawals");

    clock.advanceTime(10, TimeUnit.SECONDS);
    transactionCreatedEvent(400);
    clock.advanceTime(1, TimeUnit.MICROSECONDS);
    assertFired("sequenceOfIncreasingWithdrawals");
  }

Code listing 27: Unit test for the sequenceOfIncreasingWithdrawals rule (the CepTest .java file)

At the beginning of the test averageBalance, the property of AccountInfo is set to 1000. The knowledge session is updated. The test executes successfully.

High activity

The next rule should catch fraudulent activities involving lots of small transactions, especially when the number of transactions over a day is more than 500 percent of the average number of transactions and the account's balance is less than 10 percent of the average balance. Let's pretend that the AccountInfo object has all averages that we need to be calculated and ready to be used in other rules. We'll be able to use just the AccountInfo object to see if the conditions are met for an Account object:

rule highActivity
  when
    $account : Account( )
    $accountInfo : AccountInfo( number == $account.number,
      numberOfTransactions1Day > averageNumberOfTransactions *
      BigDecimal.valueOf(5.0), $account.balance <
      averageBalance * BigDecimal.valueOf(0.1) )
  then
    insert(new SuspiciousAccount($account.getNumber(),
      SuspiciousAccountSeverity.MINOR));
end

Code listing 28: Implementation of the highActivity rule (the cep.drl file)

The rule looks simple, thanks to decomposition! It will fire if the number of transactions per one day is greater than 500 percent of the average number of transactions per one day over 30 days (numberOfTransactions1Day > averageNumberOfTransactions*500%) and if 10 percent of the average balance over 30 days is greater than the account's balance (averageBalance*10% > account's balance).

Testing the highActivity rule

The test for the highActivity rule is divided into four parts. The first one tests cases with a low number of transactions and a low average balance. The second part tests cases with a low number of transactions, the third part tests cases with a low average balance, and the fourth part tests the successful execution of the rule. The account's balance is set to 1000 by the initialize method. averageNumberOfTransactions of AccountInfo is set to 10. That means for a successful rule execution, averageBalance of accountInfo needs to be over 10,000 and numberOfTransactions1Day needs to be over 50.

  @Test
  public void highActivity() throws Exception {
    accountInfoFactType.set(accountInfo,
        "averageNumberOfTransactions",BigDecimal.valueOf(10));
    accountInfoFactType.set(accountInfo,
        "numberOfTransactions1Day", 40l);
    accountInfoFactType.set(accountInfo, "averageBalance",
        BigDecimal.valueOf(9000));
    session.update(accountInfoHandle, accountInfo);
    assertNotFired("highActivity");

    accountInfoFactType.set(accountInfo, "averageBalance",
        BigDecimal.valueOf(11000));
    session.update(accountInfoHandle, accountInfo);
    assertNotFired("highActivity");

    accountInfoFactType.set(accountInfo,
        "numberOfTransactions1Day", 60l);
    accountInfoFactType.set(accountInfo, "averageBalance",
        BigDecimal.valueOf(6000));
    session.update(accountInfoHandle, accountInfo);
    assertNotFired("highActivity");

    accountInfoFactType.set(accountInfo, "averageBalance",
        BigDecimal.valueOf(11000));
    session.update(accountInfoHandle, accountInfo);
    assertFired("highActivity");
  }

Code listing 29: Unit test for the highActivity rule (the CepTest.java file)

While implementing the rules in this chapter, we've seen how to store some intermediate calculation results into an internal type; in our case it is AccountInfo. This is a common practice. Another alternative would be to extract this calculation into a query and then call this query within our rule. For example, imagine that the AccountInfo object is missing the numberOfTransactions1Day value and we need to calculate it. The query might look like this:

query numberOfTransactions1DayQuery(Long accountNumber,
    Number sum)
  accumulate( TransactionCompletedEvent( accountNumber :=
    fromAccountNumber ) over window:time(1d) from entry-point
    TransactionStream; sum : count(1) )
end

Code listing 30: Query that calculates the number of transactions per one day (the cep.drl file)

The query has two arguments, and an account number and a sum. It accumulates all TransactionCompletedEvents that are debiting the given account and happened within the last day. They are simply counted up, and this count is returned as the sum argument. Please note the use of the := symbol. It is a called the unification symbol; it is a special symbol that allows us to use the accountNumber object as input as well as an output argument, depending on how we'll call this query. If we call this query while passing in the account number, this would be the same as writing fromAccountNumber == accountNumber; however, if we call this query without passing in the accountNumber object, this would be the same as writing accountNumber : fromAccountNumber. Note that in our case it does not make sense to use the account number as an output argument, as we're using it in a place where it is scoped to the accumulate construct. It is just for illustration. Using this query, our rule would then become:

...
$account : Account( $accountNumber : number )
numberOfTransactions1DayQuery( $accountNumber := accountNumber, $numberOfTransactions1Day := sum )
$accountInfo : AccountInfo( number == $accountNumber,
    $numberOfTransactions1Day > averageNumberOfTransactions *
    BigDecimal.valueOf(5.0), ...

Code listing 31: Excerpt of the highActivity rule using a query (the cep.drl file)

As you can see, the numberOfTransactions1DayQuery object is being called with $accountNumber as an input parameter and $numberOfTransactions1Day as an output parameter (it is an output parameter because it has not been defined yet). In both cases the unification symbol is being used as it is necessary here. It is not possible to use field constraints when calling a query.

Queries are also very useful at removing duplication from rules. If two or more rules share common conditions, they can be sometimes refactored into a common query and the rules can then call this query.

This concludes rule implementations for the fraud detection system. We haven't implemented all rules specified in the requirements section, but they shouldn't be hard to do. I am sure that you can now implement a lot more sophisticated rules.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.236.174