Chapter 11: Identifying Anomalies Using the Relative Size Factor Test

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11

Identifying Anomalies Using the Relative Size Factor Test

The previous chapter introduced tests to identify abnormally large subsets and subsets that had experienced explosive growth. The tests concluded with a test of the dollar totals for all the days in a year. The focus in the previous chapter was on size. In this chapter we compare large amounts to a benchmark to see how large they are relative to some norm, hence the name the relative size factor test. The relative size factor test is a powerful test for detecting errors. The test identifies subsets where the largest amount is out of line with the other amounts for that subset. This difference could be because the largest record either (a) actually belongs to another subset, or (b) belongs to the subset in question, but the numeric amount is incorrectly recorded.

The relative size factor (RSF) test is an important error-detecting test. An airline auditor reported at an IATA conference that his airline had found errors that amounted to around $1 million as a result of running this test on their accounts payable data. This test was developed in the mid-1990s after I learned of a case where a company in Cleveland wired $600,000 in error to the bank account of a charity. The $600,000 was supposed to have gone to a vendor. Once the “wrong bank account” error was discovered the company contacted the charity, which claimed that the money had already been spent and was largely unrecoverable. The $600,000 was significantly more than any amount that had been donated to the charity before. Had the company run a reasonableness test before authorizing the transfer, it would have seen that the $600,000 was much more than the $3,000 it donated to the charity every quarter.

The RSF test identifies subsets where the largest amount is significantly larger than the other items in the subset. The largest amount would be significantly larger if the numbers were (say) $8,000, $200, $200, and $150. The RSF calculation uses the ratio of the largest amount ($8,000) to the second largest amount ($200) to give a ratio of 40. The RSF formula is shown in Equation (11.1).

(11.1)

The formula in Equation (11.1) divides the largest amount by the second largest amount to give a ratio that is greater than or equal to 1.00. The output table usually includes some additional information to help the investigator to identify suspicious or questionable transactions. The additional information could include (a) the subset name or number, (b) the largest amount for the subset, (c) the second largest amount for the subset, (d) the record count for the subset, and (e) the relative size factor. The results for the InvoicesPaid data table are shown in Figure 11.1.

Figure 11.1 The Relative Size Factors of the InvoicesPaid Data

Relative Size Factor Test Findings

This forensic analytics test has most often been run using the largest and second largest numbers in the various subsets. Forensic investigators can adapt this formula to bring attention to the outliers, depending on what is seen to be an outlier. Examples of adaptations include (a) the largest amount divided by the average amount, (b) the largest divided by the average where the average excludes the largest number, and (c) the smallest number divided by the average (which is used when looking for understatements). Some notes and findings from the use of the RSF test are outlined here:

A frequent finding is the detection of the decimal point error in accounts payable data. This happens when an amount such as $3200.00 is entered into the system as $320000 (the decimal point is omitted) and the vendor is paid 100 times the actual invoice amount. One telltale sign of this error occurring is an amount paid that has no cents to the right of the decimal point. If $421.69 is entered as $42,169 then the amount paid has no cents. This error might only occur once in every 5,000 transactions but in a data set of 200,000 records there would be a good return from identifying 40 such errors. This error is reasonably easy to detect and the InvoicesPaid data shows some likely candidates. A partial decimal point error could occur if $421.69 is paid as $4216.90 and the telltale sign here is a RSF of 10 for the vendor.
Large RSFs are more indicative of error when the subset has many records. The more records in the subset, the more the largest amount stands out from a large crowd. The general rule is that the larger the crowd, the more suspicious the RSF.
This test showed some valuable findings in a forensic audit of purchasing card transactions in Washington, DC, where the merchant was the subset variable and the numeric field was the amount charged. Investigators found interesting and suspicious items, and possible errors where the RSFs were equal to 10, 5, 4, 3, 2, and 1. An RSF of 1.00 occurs when the largest and second largest items are equal.
An investigation of perpetual inventory records in Toronto, Ontario, using the extended inventory value (cost times quantity) as the Amount and the location as the Subset, showed an RSF of 800 for one location. A review of the data showed that this very large amount was a $500,000 error. The error would have caused the location's profits to have been overstated and would have earned the local manager a performance bonus based on the incorrect profit number.
The RSF test gave some valuable findings in an investigation of insurance claims in Houston, Texas. In the investigation the subset variable was the insurance adjuster (a person in the company who approves claims payments) and the approved claim was the numeric amount. Two adjusters were found to have RSFs of 6.00 and claims authorized of around $30,000 each where the $30,000 amount was far in excess of their authorized limits.
An investigation of health care payments by a company in Cleveland, Ohio, showed some interesting results. The subset variable was the CPT code (a five digit code describing the tasks and services provided by medical practitioners). The test was run twice using two different numeric amounts. Using the Amount claimed as the numeric amount gave RSFs as high as 200, and with Amount paid as the numeric amount gave RSFs as high as 6. This means that for some identical procedures the amount paid to a medical practitioner was up to six times as high for the largest payment when compared to the second largest payment. Both the largest and second largest numbers could be excessive, so for this type of data, a modified RSF could be calculated using the largest and the average amount for each subset.
An investigation of onboard beverage sales by an airline based in Texas showed some interesting findings. These beverage sales have few controls because the beverages in the first-class cabin are free and the beverages in the economy-class cabin are sold to passengers. The analysis used the flight number as the subset variable because a Friday evening flight from LAX to Las Vegas would have a higher alcohol consumption than a Monday morning flight from New York City to Washington, DC. The investigators looked for understatements and the formula used was the minimum amount of sales for a flight number divided by the average amount of sales for the flight number.
An investigation of sales data at a sportswear manufacturer in Oregon showed some interesting results. The subset variable was the SKU (stock keeping unit) number and the numeric variable was the selling price per unit. The goal was to find sales of goods at prices much below the average price. The investigators were concerned that internal salespeople might be selling goods to “friends” at a discount to the usual selling price. As with the beverage sales, the formula was the minimum amount per unit for an SKU divided by the average selling price per unit for the SKU. The results showed some sales at prices close to zero dollars.

Running the RSF Test

The test seems just like the largest subsets test with a complication or two. This test is actually quite difficult to program. One complication is that a subset with only one record cannot have an RSF because there is no second largest number. It is usually a good idea to delete all records less than 1.00, or 10.00 to avoid small numbers influencing the results. Including negative numbers could give us a negative RSF, which is quite meaningless. Also, small positive numbers might give highly inflated RSFs if we (say) divided $8,000 by $0.50.

It is quite difficult to identify the second largest amount in each subset. The main steps in running the test are to delete the small and irrelevant numbers, and then to delete all the subsets with only one record. We then need to identify the largest and second largest numbers for each subset. We also need to establish a rule that would apply if the largest and the second largest numbers were both equal. The logic that could be used with any data analysis program is:

1. Sort the data by Subset and by Amount descending and delete all numbers less than (say) 1.00.

2. Identify the subsets that have only one numeric record and delete these records.

3. Identify the largest Amount for each subset and also calculate the Count for each subset.

4. Identify those cases where any Amount appeared more than once in a subset.

5. Identify those subsets where the count of the largest amount was two or more and calculate the RSF of those subsets to be 1.00. Save these results.

6. Delete all the subsets with RSFs equal to 1.00 from the main data table.

7. Identify and remove the maximum Amount from each subset.

8. Identify the maximum Amount of the remaining records (which will be the second largest number).

9. Calculate the RSF and the other statistics to be included in the results table (e.g., the Subset number, the largest Amount, the second largest Amount, the count, and the RSF itself).

It might be possible to run the RSF test in Access with fewer than 11 queries. However, 11 queries make it easier to follow the logic of each query. Several queries include the Join command and the series of queries ends with a Union query to tidy things up. The tests can be run in Excel and surprisingly this test is easier to program in Excel than it is to program in Access.

Running the Relative Size Factor Test in Access

In this section, we use the InvoicesPaid data table. This goal is to identify subsets where the largest Amount is much larger than the other numbers in the subset. The VendorNum field will be the subset field and Amount will be the numeric field. The first step is to sort the data and to only keep the records that are greater than or equal to $1.00. This is done with the query shown in Figure 11.2.

Figure 11.2 This Step Sorts the Data and Keeps Those Records that Are $1.00 and Larger

The qryRSF1 query (results not shown) gives us the largest, the second largest, and all the other records for each subset. The second step is to create a master list of those vendors that have more than one record. Any subset with only one record cannot possibly have a second largest amount.

The query to create a master list of vendors with valid RSFs is shown in Figure 11.3. The result of this query is a master vendor list of 6,457 vendors. This vendor count is much less than the vendor count for the largest subsets test shown in Figure 10.2. The largest subsets test included all vendors (26,166 vendors) whereas qryRSF2 limits the results to vendors with two or more transactions of $1.00 or more. There were about 20,000 vendors with only one transaction (one record) or where the second transaction was for less than $1.00.

Figure 11.3 The Creation of a Master List of Subsets with More than One Record

The next step is to keep only the records in qryRSF1 that match a subset listed in qryRSF2. This will give a data table of Amounts ≥1.00 for vendors with more than one record. This is done using a Join and qryRSF3; Figure 11.4 shows the details.

Figure 11.4 The Query to Keep Only Those Subsets with More than One Record

Figure 11.4 shows the query used to keep only those subsets with more than one record and only amounts greater than or equal to 1.00. The Join used is an inner join and this is the default join unless an outer join is specified by using the second or third radio button. The Join Properties dialog box is shown only for informational purposes. The results of qryRSF3 are shown in Figure 11.5.

Figure 11.5 The Data that Will Be Used to Calculate the RSFs

Figure 11.5 shows that there were 165,257 records remaining after deleting amounts less than 1.00 and those subsets with only one record. The next step is to identify the maximum amount for each subset. The maximum and the second largest amounts might be equal in some cases.

The query in Figure 11.6 calculates the largest amount for each subset using the Max function in Access. The query is run against qryRSF3 and the maximum field is named MaxAmount. This name is shorter than the default name assigned by Access. This query will take longer to run than the preceding queries because the three queries that precede this query need to be run first. The record count should match the record count for qryRSF2. The count of Amount is included in the final results. The next step is to identify the second largest amount in each subset. This would be straightforward if the second largest number was never equal to the largest number. The goal now is to identify the cases where the largest and second largest are equal and to calculate an RSF of 1.00 for these cases. The query to see whether any amounts are duplicated at all in the subset is shown in Figure 11.7.

Figure 11.6 The Query Used to Identify the Largest (Maximum) Amount

Figure 11.7 The Query to Identify the Duplicates in the Subsets

The query in Figure 11.7 produces a table of all the duplicates in all the subsets. The last page of the results is shown in Figure 11.8.

Figure 11.8 The Last Page of the Query Identifying the Number Duplications

The result of qryRSF5 in Figure 11.8 shows all the cases of number duplication. The cases of interest are those cases where the maximum amount is duplicated. Our results in qryRSF4 gives the maximum for each subset. The next step is to find all those instances where it was the maximum amount that was duplicated. It is here that our RSFs will equal 1.00 because the largest and second largest amounts are equal. This query takes some fancy footwork with a Join in Access.

The query to identify the subsets with RSFs of 1.00 is shown in Figure 11.9. The calculated field RelativeSize should be formatted as Fixed with two decimal places to keep the results neat and tidy. There are two joins in the query and both are the inner joins that are the first (default) option in Join Properties. This query needs to run all five of the prior queries.

Figure 11.9 The Query to Identify Subsets with RSFs Equal to 1.00

Figure 11.10 shows a listing of vendors with RSFs equal to 1.00. The calculated RSF has been neatly formatted to two decimal places. The next step is the calculation of RSFs for the rest of the subsets. We now need to identify “the rest” of the subsets. This is done by removing the 1,313 RSF = 1.00 subsets (shown in Figure 11.10) from the data. The query to remove the subsets is still called a Join. The result of this Join will be a smaller data set.

Figure 11.10 Vendors with RSFs Equal to 1.00

Figure 11.11 shows the query used to remove the RSF = 1.00 vendors and their transactions from the data. The criteria Is Null is made up of two words separated by a space. The join is an outer join combined with the Is Null criteria. The Show box is unchecked for the third field, which means that our results will only show two fields. Running qryRSF7 query leaves 153,167 records (result not shown). This record count equals the 165,257 records from qryRSF3 minus the 12,090 records that were removed with qryRSF7. To calculate the number of records removed with qryRSF6 requires a new query (not shown) that sums the CountOfAmount field in qryRSF6.

Figure 11.11 The Query that Removes the RSF = 1.00 Vendors from the Data

The next step is to calculate the RSFs for the subsets with RSFs larger than 1.00. The maximum Amount for each subset was calculated in qryRSF4. The next step is to remove all the maximums so that we can recalculate the maximum. The second maximum will be the second largest Amount. The query qryRSF8 to remove the maximum (again involving a Join) is shown in Figure 11.12.

Figure 11.12 The Query that Removes the Maximum for Each Subset

The query qryRSF8 has two right-facing arrows and uses the Is Null criteria. The result is a table of 148,023 records (not shown), which reconciles with the previous queries. The query correctly removed the maximums of the subsets where the second largest amount was not equal to the maximum.

The next step is to identify the second largest number for each subset. We have removed the largest number, so the second largest number is the maximum amount for each subset. The query to extract the maximum, qryRSF9, is shown in Figure 11.13.

Figure 11.13 The Query to Calculate the Second Largest Amount

The query qryRSF9 shown in Figure 11.13 calculates the second largest amount. The MAX function can be used because the largest Amounts were removed in the prior query. The results of qryRSF9 are shown in Figure 11.14.

Figure 11.14 The Results of the Query that Calculates the Second Largest Amount

The second largest amounts for each subset are shown in Figure 11.14. The first entry agrees with Figure 11.5, which shows that the second largest amount for vendor 2001 is $283,969.14. There are only two more steps. The next query qryRSF10 calculates the RSF for each subset. The setup and the formula is shown in Figure 11.15.

Figure 11.15 The RSF Calculation

Figure 11.15 shows the query to calculate the RSFs for those subsets with RSFs larger than 1.00. The RelativeSize formula is quite simple being the maximum amount divided by the second largest amount. There is no need to cater for division by zero because the zero amounts were deleted quite early in the process. The RelativeSize field is formatted as Fixed with two decimal places. Access seems to accept sorting on this field even though it is a calculated field. This query will take a while to run because all the prior steps are run before the results can be displayed.

Figure 11.16 shows us the (almost final) set of results. The RSFs are sorted from largest to smallest and the maximum Amount, the second largest Amount, and the count for the subsets are shown as additional information.

Figure 11.16 The Relative Size Factor Results

We need one query to combine the results from qryRSF10 (RSFs greater than 1) with qryRSF6 showing the RSFs equal to 1.00. This is done with qryRSF11, which is constructed in two parts shown in Figure 11.17. The first step is to prepare a normal Select query as is shown in the top panel, and then after switching to SQL view, the text in the box is updated.

Figure 11.17 The Query to Combine the Results

Figure 11.17 shows the query used to combine the qryRSF6 and qryRSF10 results. The RelativeSize field in the top panel should be formatted as Fixed with 2 decimal places before switching to SQL view. The SQL text added to the select query from the grid in the top panel is

UNION

SELECT qryRSF10.VendorNum, qryRSF10.MaxAmount, qryRSF10.SecondLargest, qryRSF10.CountOfAmount, qryRSF10.RelativeSize

FROM qryRSF10

ORDER BY RelativeSize DESC;

The results are the same as can be seen in Figure 11.16 except that the record count is now 6,457 records because we have combined the results from two queries. As a final touch another query can be used to select the records for a specified subset.

Figure 11.18 shows the query that is used to select all the transactions for a specified vendor. This type of query is called a parameter query. When the query is run, the user will be prompted to enter the vendor number (“Enter the Vendor Number” as is shown on the Criteria line). The records returned after entering 4863 are shown in Figure 11.19.

Figure 11.18 The Query to Select the Data for a Vendor

Figure 11.19 The Transactions for Vendor 4863

The follow-up investigations showed that the first vendor in Figure 11.16 was a payment for a land purchase. The large dollar amount was the cost of the land as paid to the title company and the other costs were small incidentals related to the purchase. The results in Figure 11.16 show a number of possible decimal point errors. These include the vendors with maximum amounts of $667.00, $4,330.00, $500.00, $1,385.00, $3,500.00, $205,242.00, and $1,575.00. There are also possible overpayments where the RSFs are exactly 10, 5, 4, 3, and 2. Also, any investigation should at least on a test basis look at the cases where the RSFs are slightly above 1.00 (e.g., 1.01 or 1.02). The two largest invoices for a vendor could actually be the same invoice. The first invoice might include a shipping charge or a late fee and the second invoice might not have included the small additional amount.

Forensic investigators should focus on relatively large RSFs (usually shown at the top of the results table), RSFs where the dollar amount of the largest item is relatively high, and cases where the RSFs are a round number such as 100, 10, 5, 4, 3, 2, and 1.

Running the Relative Size Factor Test in Excel

It would seem that this test is more difficult to run in Excel than it is to run in Access. The test is only moderately complex in Excel mainly because Excel has the ability to look up (to the previous row) and down (to the next row) when performing calculations. This makes it quite easy to identify the largest and second largest amounts for a vendor provided that the table is sorted correctly. It is best to start with a new worksheet with only the invoices data. The worksheet sheet will look just like Figure 10.10. The first step is to delete all the Amounts less than $1.00. This is done by sorting by Amount descending and then finding the first row where we have amounts equal to $0.99.

Figure 11.20 The Sorted InvoicesPaid Data

The row where the Amounts are less than $1.00 is shown in Figure 11.20. We will only run the RSF test using records that are ≥1.00. To keep only the >=$1.00 records we simply delete the records less than $1.00. This is done by highlighting rows 184414 to 189471 and then deleting them.

The next step is to sort the worksheet so that we have the largest and second largest amounts as the first two amounts shown for every vendor. This is done using Home→ Editing→Sort&Filter→Custom Sort followed by the options shown in Figure 11.21.

Figure 11.21 The Sort Procedure for the RSF Calculations

The sort procedure shown in Figure 11.21 lists the largest Amount for each vendor followed by the second largest Amount. The RSF calculation is now fairly straightforward except for the possibility that a vendor only has one record. The calculations will use several indicator variables along the way. The process will be started by adding a heading Indic1 to column E. The first entry is the numeric value 1 in cell E2. The next step is to enter the formula shown in cell E3 in Figure 11.22 and then to copy this formula down to the last record (E184413).

Figure 11.22 The Formula to Calculate the Count

The formula to count the number of records for each subset is shown in Figure 11.22. The formula used in E3 is

E3: = IF(A3 = A2,E2+1,1)

The Indic1 formula gives us the count for each subset. The count is the largest Indic1 value for the subset. We will use the Indic1 count in the results. The next step is to copy the Indic1 field and use the Paste Special feature to paste the formulas as values. We can now resort the field and still keep the maximums correctly calculated. This step converts the Indic1 formulas to values. The next step is to sort the data table on Vendor Ascending and Amount Descending. The effect of these steps is that the largest and second largest amounts for each subset are shown first. The formula in column F will identify the largest and second largest amounts. The largest value for the first subset is always on the second row and so we can simply enter “1” in cell F2.

The formula to identify the largest and second largest amounts is shown in Figure 11.23. The formula used in F3 is

Figure 11.23 The Formula to Identify the Largest and Second Largest Amounts

The rows of interest are those rows where Indic2 equals 1 and the very next row if the Indic2 value equals 2. Two 1s in succession in Indic2 indicate that we have a subset with only one record. The second largest value can be identified by using a slightly complex AND function. This formula is shown in Figure 11.24.

Figure 11.24 The Formula for the Second Largest Amount

Figure 11.24 shows the procedure to “lift” the second largest amount to the same row as the largest amount. The formula for column G in Figure 11.24 is

G2: = IF(AND(A2 = A3,F3 = 2),D3,“”)

The G2 formula has actually been copied to the last row (cell G184413) in Figure 11.24. The reason that nothing is visible in cells G3:G11 above is that the formula shows a blank cell if the row is not the first row of a subset with more than one record. We have what we need. All the subsets of interest have a numeric value in column G. All the rows that are irrelevant have blanks (null values) in column G. The final steps involve some tidying up. We need to first change the formulas in column G to values using Copy and Paste Special. The data then needs to be sorted so that all the relevant results are shown first. This is done by sorting on column G and sorting from A to Z (this will put the blank cells at the bottom of the table).

Figure 11.25 shows the results of those subsets with valid RSFs at the top of the worksheet. The next step is to find the first blank cell in column G and delete all the rows in the table that are below this row. This is shown in Figure 11.26.

Figure 11.25 The Subsets and Their RSFs

Figure 11.26 The End of the Relevant RSF Data

Figure 11.26 shows the first blank record that needs to be deleted. A quick way to highlight from row 6459 all the way down to the end is to use Control+Shift+Down Arrow. A right click and Delete is all that is then needed to delete all that we do not need. It is possible to apply a filter to column G to display only the rows with numeric values greater than $0.99. Deleting the rows makes it easier to report the results in a very neat format. With a filter the unneeded rows are still there.

The worksheet includes some columns that can now be deleted. The next step is to rename Indic1 to Count. Indic3 should be renamed SecondLargest. Amount should be renamed Largest. The date and invoice number fields should be deleted. The Indic2 field can also be deleted. The RSF needs to be calculated using equation (11.1).

The final results are shown in Figure 11.27. The table has been formatted using Home→Styles→Format As Table with a Table Style Medium 2 applied. This style automatically adds the filters on the first row. The table was also formatted as Calibri 11 points throughout.

Figure 11.27 The Final RSF Results

Summary

This chapter introduces, discusses, and demonstrates the Relative Size Factor (RSF) test. This test is a powerful test for detecting errors and fraud. The test identifies subsets where the largest (and perhaps the smallest) amounts seem to be out of line with the other amounts for that subset. The large difference could be because the record either (a) actually belongs to another subset, or (b) belongs to the subset in question, but the numeric amount was incorrectly recorded.

The RSF test was developed as a result of a case where a company wired a large amount of money to the bank account of a charity in error. The funds were supposed to go to a vendor. The amount was significantly more than any amount that had been donated to the charity before. Had the company run a reasonableness test before initiating the transfer, it would have seen that the amount was way out of line with any amount previously sent to the charity. This test has led to large recoveries in accounts payable audits. The test has also found interesting forensic results in an investigation of sales numbers, insurance claim payments, inventory numbers, and health care claims.

The RSF test identifies subsets where one amount is significantly larger than the other items in the subset. The formula identifies the largest amount in a subset and divides it by the second largest amount. The RSF report usually includes (a) the subset name or number, (b) the largest amount for the subset, (c) the second largest amount for the subset, (d) the record count for the subset, and (e) the relative size factor. This test has most often been run using the largest and second largest numbers in the various subsets. Investigators can adapt this formula to (a) the largest amount divided by the average amount, (b) the largest divided by the average where the average excludes the largest number and (c) the smallest number divided by the average (which is used when looking for understatements).

The RSF test can be run in Access. The process is a bit tedious and it takes a series of 11 queries. The first steps delete all amounts less than 1.00 and those subsets with only one transaction. The next step is to identify all the subsets where the largest amount is also coincidentally also the second largest amount. These subsets have RSFs equal to 1.00. The next step is to identify both the largest and the second largest Amounts for the remaining subsets. The final steps involve calculating the RSFs, preparing the final report, and combining the RSF equal to 1.00 and the RSFs greater than 1.00 reports. The Access queries can be reused on other data tables with the same field names. The RSF test is surprisingly easier to program in Excel.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11: Identifying Anomalies Using the Relative Size Factor Test

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11: Identifying Anomalies Using the Relative Size Factor Test