Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Robert D. Brown IIIBusiness Case Analysis with Rhttps://doi.org/10.1007/978-1-4842-3495-2_15

15. Building the Simulation in R

Robert D. Brown III¹

(1)

Cumming, Georgia, USA

As for the solution approach, you want to calculate the VOI for the problem displayed in our last influence diagram. To do so, we take the following algorithm approach.

1.
Run the simulation for both decision strategies (or more as indicated) and calculate the mean NPV and NPV cumulative probability distributions. If one of the cumulative probability distributions does not exhibit strict dominance, then we know that sensitivity analysis might be helpful for identifying critical uncertainties.
2.
Test the sensitivity of the mean NPV of each decision to the underlying uncertainties using tornado analysis and identify which uncertainties might cause us to regret taking the best strategy (on the basis of expected value) in contrast to the next best strategy.
3.
Find the VOI as the difference between the improved NPV by exercising choice based on prior information and the best mean NPV from Step 1.

The Model Algorithms

Before we proceed, we need to set up a few important files that we will import into our R code. The first is a file that contains important functions, one of which is the business model that we will use in the simulation of the values expressed in the influence diagram. We use the following functions:

1.
CalcBrownJohnson : Takes p10, p50, p90 assessments from an SME and generates a simulation of samples that spans beyond the p10 and p90 to include the virtual tails.
2.
CalcScurv : Models the saturation of uptake into a population using a sigmoid curve. For example, this can be used to model how long it takes to reach maximum saturation into a marketplace.
3.
CalcBizModel : The function that represents the business model reflected by the influence diagram.
4.
CalcModelSensitivity : Takes a list of uncertainty simulation values and calculates the sequential sensitivity of the NPV returned by the CalcBizModel to specified quantiles (e.g., the p10, p50, p90) for each uncertainty.

Name this file Functions.R, and populate it with the following code.

CalcBrownJohnson <- function(minlim=-Inf, p10, p50, p90, maxlim=Inf, samples) {

# This function simulates a distribution from three expert estimates

# for the 80th percentile probability interval of a predicted outcome.

# The user specifies the three parameters and the number of samples.

# The user can also enter optional minimum and maximum limits that

# represent constraints imposed by the system being modeled. These are

# set to -inf and inf, respectively, by default. The process of

# simulation is simple Monte Carlo with 100 samples by default .

# Create a uniform variate sample space in the interval (0,1).

U <- runif(samples, 0, 1)

lenU <- length(U)

# Create an index in the interval (1,samples) with samples members.

Uindex <- 1:lenU

# Calculates the virtual tails of the distribution given the p10, p50, p90

# inputs.

p0 <- pmax(minlim, p50 - 2.5 * (p50 - p10))

p100 <- pmin(maxlim, p50 + 2.5 * (p90 - p50))

# This next section finds the linear coefficients of the system of linear

# equations that describe the linear spline, using... [C](A) = (X) (A) = [C]^-1

# * (X) In this case, the elements of (C) are found using the values (0, 0.1,

# 0.5, 0.9, 1) at the endpoints of each spline segment. The elements of (X)

# correspond to the values of (p0, p10, p10, p50, p50, p90, p90, p100). Solving

# for this system of linear equations gives linear coefficients that transform

# values in U to intermediate values in X. Because there are four segments in

# the linear spline, and each segment contains two unknowns, a total of eight

# equations are required to solve the system.

# The spline knot values in the X domain.

knot_vector <- c(p0, p10, p10, p50, p50, p90, p90, p100)

# The solutions to the eight equations at the knot points required to describe

# the linear system.

coeff_vals <- c(0, 1, 0, 0, 0, 0, 0, 0,

0.1, 1, 0, 0, 0, 0, 0, 0,

0, 0, 0.1, 1, 0, 0, 0, 0,

0, 0, 0.5, 1, 0, 0, 0, 0,

0, 0, 0, 0, 0.5, 1, 0, 0,

0, 0, 0, 0, 0.9, 1, 0, 0,

0, 0, 0, 0, 0, 0, 0.9, 1,

0, 0, 0, 0, 0, 0, 1, 1)

# The coefficient matrix created from the prior vector. It looks like the

# following matrix :

# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

# [1,] 0.0 1 0.0 0 0.0 0 0.0 0

# [2,] 0.1 1 0.0 0 0.0 0 0.0 0

# [3,] 0.0 0 0.1 1 0.0 0 0.0 0

# [4,] 0.0 0 0.5 1 0.0 0 0.0 0

# [5,] 0.0 0 0.0 0 0.5 1 0.0 0

# [6,] 0.0 0 0.0 0 0.9 1 0.0 0

# [7,] 0.0 0 0.0 0 0.0 0 0.9 1

# [8,] 0.0 0 0.0 0 0.0 0 1.0 1

coeff_matrix <- t(matrix(coeff_vals, nrow=8, ncol=8))

#The inverse of the coefficient matrix.

inv_coeff_matrix <- solve(coeff_matrix)

#The solution vector of the linear coefficients.

sol_vect <- inv_coeff_matrix %*% knot_vector

X = (U <= 0.1) * (sol_vect[1, 1] * U + sol_vect[2, 1]) +

(U > 0.1 & U <= 0.5) * (sol_vect[3, 1] * U + sol_vect[4, 1]) +

(U > 0.5 & U <= 0.9) * (sol_vect[5, 1] * U + sol_vect[6, 1]) +

(U > 0.9 & U <= 1) * (sol_vect[7, 1] * U + sol_vect[8, 1])

return(X)

}

CalcScurv <- function(y0, t, tp, k = 0) {

# This function models the sigmoid 1 / (1 + exp(-g*t)) with four analytic

# parameters .

# y0 = saturation in first period.

# t = the index along which the s-curve responds, usually thought of as time.

# k = the offset in t for when the sigmoid begins. Subtract k for a right

# shift. Add k for a left shift.

# tp = the t at which s-curve achieves 1-y0.

this.s.curve = 1 / (1 + (y0 / (1 - y0)) ^ (2 * (t + k) / tp - 1))

return(this.s.curve)

}

CalcBizModel <- function(Time, N, pus, ttp, p, sga, cogs, tr, i, dr) {

# The function that represents the business model reflected by the influence

# diagram.

# Time = the time index

# N = the number of simulation samples

# pus = peak units sold

# ttp = time to peak units sold

# p = price, $/unit

# sga = sales, general, and admin, % revenue

# cogs = cost of goods sold, $/unit

# tr = tax rate, %

# i = initial investment, $

# dr = discount rate, %/year

init <- t(array(0, dim=c(length(Time), N)))

ann.units.sold <- init

revenue <- init

profit <- init

tax <- init

cash.flow <- init

npv <- rep(0, N)

samp.index <- 1:N

for (s in samp.index) {

# annual units sold

ann.units.sold[s, ] <-

(Time > min(Time)) * pus[s] * CalcScurv(0.02, Time, ttp[s], -1)

# annual period revenue

revenue[s, ] <- p[s] * ann.units.sold[s, ]

# annual period profit

profit[s, ] <-

revenue[s, ] - (sga[s] * revenue[s, ]) - (cogs[s] * ann.units.sold[s, ])

# annual period tax

tax[s, ] <- tr * profit[s, ]

# annual period cash flow

cash.flow[s, ] <-

(Time > min(Time)) * (profit[s, ] - tax[s, ]) - (Time == min(Time)) * i[s]

# net present value of the annual period cash flow

npv[s] <- sum(cash.flow[s, ] / (1 + dr) ^ Time)

}

# Collect all intermediate calculations in a list to be used for other

# calculations or reporting.

calc.vals <- list(ann.units.sold = ann.units.sold,

revenue = revenue,

profit = profit,

tax = tax,

cash.flow = cash.flow,

npv = npv

)

return(calc.vals)

}

CalcModelSensitivity <- function(unc.list, sens.q) {

# unc.list = A list that contains the uncertain variables' samples used in

# the business model .

# sens.q = a vector that contains sensitivity test quantiles

# Create an index from 1 to the number of uncertainties used in the business

# model.

unc.index <- 1:length(unc.list)

# Assign the values of the uncertainties list to a temporary list

uncs.temp <- unc.list

# Initialize a table to contain mean NPVs of the business model as each

# uncertainty is set to the sensitivity quantile values.

sens.table <-

array(0, dim = c(length(unc.list), length(sens.q)))

row.names(sens.table) <- names(unc.list)

colnames(sens.table) <- sens.q

for (u in unc.index) {

for (s in 1:length(sens.q)) {

# Iterate across the uncertainties and elements of the sensitivity

# quantile values and temporarily replace each uncertainty's samples with

# the uncertainty's value at each quantile value.

uncs.temp[[u]] <- rep(quantile(unc.list[[u]], sens.q[s]), samps)

# Populate the sensitivity table with the mean NPV values calculated in

# the business model using the values in the temporary uncertainty list.

sens.table[u, s] <- mean(

CalcBizModel(

time,

samps,

uncs.temp$peak.units.sold,

uncs.temp$time.to.peak,

uncs.temp$price,

uncs.temp$sga,

uncs.temp$cogs,

tax.rate,

uncs.temp$investment,

disc.rate

)$npv

)

}

# Reset the temporary uncertainty list back to the original uncertainty list.

uncs.temp <- unc.list

}

return(sens.table)

}

Note that the business model from the influence diagram is treated as the function CalcBizModel (). The reason I chose to do it this way is that I want to make sure that the logic is consistent among the decision pathways I consider, and I only want to pass relevant information (facts and uncertainties) to the model for the purpose of calculating VOI. If you want access to the intermediate results for other analytic uses and later financial analysis and planning, these values are returned as a list when the function runs. Be aware that because the model is a probabilistic simulation, the values in the returned list are simulation samples; use set.seed() if you want to ensure reproducibility in your results.

You might also notice that while reviewing the functions, and elsewhere in the tutorial, that I depend on for() loops for iteration blocks rather than the often prescribed apply() functions. Yes, the code runs more slowly than otherwise, but in the end I decided that goal of explaining the purpose of the algorithms was clearer using for() rather than apply(). If you are new to R and don’t know what the apply() family accomplishes as a replacement for for() loops, you should learn that quickly. The overall speed of your code will drastically improve.

Next, we need a file that contains our various assumptions and uncertainty assessments. You can name this file Assumptions.R.

# time horizon of the model

time.horizon <- 10 # yr

time <- 0:time.horizon

set.seed(98)

# number of samples used for the simulation.

samps <- 30000

tax.rate <- 0.35 # %

disc.rate <- 0.10 # %/yr

# Uncertainties have the following units.

# peak.units.sold, max units/year after year 0

# time.to.peak, years

# price, $/unit

# cogs, $/unit

# sga, % revenue

# investment, $ in year 0

# uncertainty parameters for decision 1

dec1.uncs <- list(

peak.units.sold = round(CalcBrownJohnson(0, 10000, 12500, 18000, , samps)),

time.to.peak = CalcBrownJohnson(0, 2, 3, 5, , samps),

price = CalcBrownJohnson(0, 90, 95, 98, , samps),

cogs = CalcBrownJohnson(0, 15, 17, 20, , samps),

sga = CalcBrownJohnson(0, 0.1, 0.12, 0.14, 1, samps),

investment = CalcBrownJohnson(0, 100000, 115000, 150000, , samps)

)

# uncertainty parameters for decision 2

dec2.uncs = list(

peak.units.sold = round(CalcBrownJohnson(0, 9000, 17000, 25000, , samps)),

time.to.peak = CalcBrownJohnson(0, 2, 4, 6, , samps),

price = CalcBrownJohnson(0, 95, 100, 103, , samps),

cogs = CalcBrownJohnson(0, 16, 18, 21, , samps),

sga = CalcBrownJohnson(0, 0.1, 0.13, 0.16, 1, samps),

investment = CalcBrownJohnson(0, 135000, 155000, 200000, , samps)

)

Normally, I recommend keeping the parameter data for the uncertainties in CSV files, importing them through the read.csv() function as demonstrated in Chapter 2, and then assigning the uncertainty parameters to each uncertainty with the CalcBrownJohnson() function. In this case, however, the list of uncertainties is short for the purpose of demonstration, so to make things a little simpler and to focus more on the VOI calculation, I took a shortcut.

Now that we have the functional and data pieces in place to initialize and run our model, we can use the following code in, say, Business_Decision_Model.R , to accomplish Step 1 in our solution approach.

# Import function and data source files.

source("/Applications/R/RProjects/Value of Information/Functions.R")

source("/Applications/R/RProjects/Value of Information/Assumptions.R")

# Run the business decision models and return sample outputs of NPV.

bdm1 <-

CalcBizModel(

time,

samps,

dec1.uncs$peak.units.sold,

dec1.uncs$time.to.peak,

dec1.uncs$price,

dec1.uncs$sga,

dec1.uncs$cogs,

tax.rate,

dec1.uncs$investment,

disc.rate

)$npv / 1e6

bdm2 <-

CalcBizModel(

time,

samps,

dec2.uncs$peak.units.sold,

dec2.uncs$time.to.peak,

dec2.uncs$price,

dec2.uncs$sga,

dec2.uncs$cogs,

tax.rate,

dec2.uncs$investment,

disc.rate

)$npv / 1e6

# Calculate the mean NPV of each decision and their difference.

m.bdm1 <- mean(bdm1)

m.bdm2 <- mean(bdm2)

diff.bdm <- mean(bdm2 - bdm1)

# Tabulate the average model results.

m.bdm <- array(c(m.bdm1, m.bdm2), dim=c(2, 1))

colnames(m.bdm) <- "Mean NPV of decision [$M]"

rownames(m.bdm) <- c("Decision1", "Decision2")

print(signif(m.bdm,3))

print(paste("The difference in mean value between strategies:", "$",

signif(diff.bdm, 3), "M"))

# Plot the cumulative probability distribution of the NPVs.

par(mar = c(12, 5, 5, 2) + .05, xpd = TRUE)

legend.idents <- c("Decision 1", "Decision 2")

plot.ecdf(

bdm1,

do.points = TRUE,

col = "red",

main = "NPV of Business Decision ",

xlab = "NPV [$M]",

ylab = "Cumulative Probability",

xlim = c(min(bdm1, bdm2), max(bdm1, bdm2)),

tck = 1

)

plot.ecdf(

bdm2,

do.points = TRUE,

add = TRUE,

col = "blue")

legend(

"bottom",

inset = c(0, -.45),

legend = legend.idents,

text.width = 2.5,

ncol = 2,

pch = c(13, 14),

col = c("red", "blue")

)

Running the prior code gives the following results¹ based on our assumptions and logic.

Mean NPV of decision [$M]

Decision1 2.39

Decision2 2.91

[1] "The difference in mean value between strategies: $ 0.524 M"

The chart of the cumulative probability is shown in Figure 15-1.

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig1_HTML.jpg — Figure 15-1
The overlapping cumulative distribution functions (CDFs) for Decisions 1 and 2 illustrate intervals of dominance of one decision compared to another

Decreasing dominance of one decision over another implies that we face increasing ambiguity about which decision to choose clearly. In such cases, we need to know which uncertainties need more attention for determining how to choose clearly. VOI analysis can tell us how much to spend on that attention.

The results in Figure 15-1 show that, given our current state of knowledge about our potential investment opportunity decision choices, we face a dilemma. This dilemma derives from classic finance theory, which tells that, if we are rational, we should prefer those investment opportunities with the highest average (or mean) return and those with the least variance. Here we face the situation in which the best decision based on average return (Decision 2) is inferior to Decision 1 based on its overall variance (Decision 2 is nearly twice as wide as Decision 1 in its full range of potential outcome). Furthermore, we observe that Decision 2 neither strictly dominates over Decision 1 nor stochastically dominates it.

Strict Dominance

There is no sample from the highest valued decision that is lower than any sample from a lower valued decision. Looking at both the probability density function (PDF) and cumulative distribution function (CDF) curves (Figure 15-2), the position of the lowest tail of the highest valued decision would at least be just separated from the highest tail of the lower valued decision.

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig2_HTML.jpg — Figure 15-2
Strict dominance illustrated by the spatial relationship of probability distributions

Stochastic Dominance

The tails of the PDF curves overlap, but CDF curves do not cross over as the highest valued decision remains strictly offset from the lower valued decision across the full range of variation (Figure 15-3).

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig3_HTML.jpg — Figure 15-3
Stochastic dominance illustrated by the spatial relationship of probability distributions

In fact, we observe that the lower tail of Decision 2 crosses over and extends beyond the lower tail of Decision 1 to expose us potentially to a lower outcome than had we chosen Decision 1. The situation we face by our analysis is ambiguous, clearly implying that before we commit to a specific decision pathway, we might want to refine our current state of knowledge. Which of the pieces of information should we focus on? We can develop guidance on how to improve our current state of knowledge with a kind of two-way sensitivity analysis that I have alluded to previously, the tornado analysis.

The Sensitivity Analysis

What we need is a way to prioritize our attention on the uncertain variables based on our current quality of information about them and how strongly the likely range of their behavior might affect the average value of the objective function, NPV . Because we know that the NPV curves overlap, one or more of the uncertainties are causing this overlap as they are the only sources of variation in the model. So, before we go seeking higher quality information about our uncertainties, we first need to understand which of the probable range of outcomes for any of the uncertainties is significant enough to potentially make us regret choosing our initial inclination to accept Decision 2. To accomplish this, we use tornado sensitivity analysis.

Tornado sensitivity analysis works by sequentially observing how much the average NPV changes in response to the 80th percentile range of each uncertain variable. We choose a variable and set it to its p10 value, then we record the effect on average NPV. Next we set the same variable to its p90 and record the effect on average NPV. During both of these iterations, we can set the other uncertainties to their mean value or let them run according their defined variation. Whereas the former approach is generally faster, the latter approach is more accurate, especially if the business model represents a highly nonlinear transform of the inputs to the objective function. If the business model used exhibits strong nonlinearity, Jensens’s inequality ² effects can manifest.³ The code that I provide here follows the more accurate latter approach. Repeating this process for each variable, we observe how much each variable influences the objective function both by its functional strength and across the range of its assessed likelihood of occurrence.

Before we look at the sensitivity analysis code, let’s recall the following characteristics about each of the variables in our model, both the kind of information they represent and their units.

time -> the time index, year
peak units sold -> uncertainty
time to peak units sold -> uncertainty, years
price -> uncertainty, $/unit
sales, general, and admin -> uncertainty, % revenue
cost of goods sold -> uncertainty, $/unit
tax rate -> % profit
initial investment -> uncertainty, $
discount rate -> %/year

The following code (which I’ve named Sensitivity_Analysis.R) requires that the namespace and sample values of the variables that come from Business_Decision_Model.R be available for use in R’s workspace; therefore, make sure you run Business_Decision_Model.R before running this code.

# Create a vector of values representing the sensitivity quantiles.

sens.range <- c(0.1, 0.5, 0.9)

npv.sens1 <- CalcModelSensitivity(

unc.list = dec1.uncs,

sens.q = sens.range

) / 1e6

npv.sens2 <- CalcModelSensitivity(

unc.list = dec2.uncs,

sens.q = sens.range

) / 1e6

print("Decision 1 Sensitivity to Uncertainty [$M]")

print(signif(npv.sens1[, c(1, length(sens.range))], 3))

print("Decision 2 Sensitivity to Uncertainty [$M]")

print(signif(npv.sens2[, c(1, length(sens.range))], 3))

The results of the sensitivity analysis look like the following tables.

[1] "Decision 1 Sensitivity to Uncertainty [$M]"

0.1 0.9

peak.units.sold 1.72 3.21

time.to.peak 2.69 2.03

price 2.24 2.51

cogs 2.48 2.29

sga 2.46 2.31

investment 2.41 2.36

[1] "Decision 2 Sensitivity to Uncertainty [$M]"

0.1 0.9

peak.units.sold 1.44 4.35

time.to.peak 3.50 2.39

price 2.74 3.06

cogs 3.02 2.79

sga 3.04 2.78

investment 2.94 2.87

Note that the peak.units.sold variable for Decision 1 returns a value of $1.72 million under the 0.1 column. This is the average NPV of Decision 1 when peak.units.sold is set to its p10 value (e.g., 10,000) while all the other uncertain variables follow their natural variation. Likewise, it returns an average NPV = $3.21 million when it is set to its p90 value (e.g., 18,000) while all the other uncertain variables follow their natural variation. The rest of the table is read in a similar manner.

Please note that a random seed of 98 is used in the Assumptions. (set.seed(98)). Keeping this value set to any fixed value will ensure that you always get the same values between simulation runs of the model. If you change this value, you will observe slightly different values in the reported tables. If you remove the set.seed(98) statement altogether (or comment it out), you will see different values every time you run the model. Just how stable these values remain between runs indicates how sensitive your model is to the noisiness of simple Monte Carlo simulation. Therefore, it is often helpful to run a model several times to get a good feel for this stability (or lack of it), then select a seed value that does a good job of reflecting the ensemble of tests. Otherwise, when you report your values to others who aren’t familiar with the nuances of Monte Carlo simulation, you will face having to explain the nuances of Monte Carlo simulation. Guess how productive that conversation usually is.

Because we already know that Decision 2 has a higher average NPV than Decision 1, we need to know which uncertainties could potentially cause us regret for taking that higher average valued decision. We answer that question by observing which uncertainties cause overlap in the range of NPV between the decisions. If we look down the rows of uncertainties, we see that the two uncertainties that cause such an overlap of value are the peak.units.sold and time.to.peak, as the lowest value of Decision 2 for each of these variables overlaps the highest one for Decision 1.

We refer to these overlapping uncertainties as critical uncertainties because they pose the greatest potential for making us wish we had taken a different route once we commit to the execution of a decision. This is not to say that the other uncertainties are not important and will not need to be monitored once we do commit to a decision; rather, it simply means that for the purpose of making a clear decision now, these critical uncertainties are the greatest contributors to our current state of ambiguity.

Normally, this list of uncertainties should be graphed as a vertical floating bar chart such that the following are the case.

Each bar represents the range of variation caused by an uncertainty around the mean value of a given decision .
The order of the uncertainties follows a declining order of importance as determined by the range between the p10 and p90 of each variable.

Overlaying both of these charts provides a better visual cue about which uncertainties should be deemed critical. Here is the code I use to produce the iconic tornado charts for each decision to obtain that visual insight. You can append this to the end of the Sensitivity_Analysis.R file.

# Find the rank order the uncertainties by the declining range of variation

# they cause in the NPV. Base this on the decision with the highest mean value.

mean.bdm.list <- c(m.bdm1, m.bdm2)

bdm.sensitivity.list <- list(npv.sens1, npv.sens2)

npv.sens.rank <-

order(abs(bdm.sensitivity.list[[which.max(mean.bdm.list)]][, 1] -

bdm.sensitivity.list[[which.max(mean.bdm.list)]][, length(sens.range)]),

decreasing = FALSE)

# Reorder the variables in the sensitivity arrays and names array by the rank

# order.

ranked.npv.sens2 <- npv.sens2[npv.sens.rank, c(1, length(sens.range))]

ranked.npv.sens1 <- npv.sens1[npv.sens.rank, c(1, length(sens.range))]

# Plot the tornado chart for Decision 2

par(mai = c(1, 1.75, .5, .5))

barplot(

t(ranked.npv.sens2) - m.bdm2,

main = "NPV Sensitivity to Uncertainty Ranges",

names.arg = names(ranked.npv.sens2),

col = "blue",

xlim = c(min(npv.sens1, npv.sens2), max(npv.sens1, npv.sens2)),

xlab = "Decision2 NPV [$M]",

beside = TRUE,

horiz = TRUE,

offset = m.bdm2,

las = 1,

space = c(-1, 1),

cex.names = 1

)

# Plot the tornado chart for Decision 1

par(mai = c(1, 1.75, .5, .5))

barplot(

t(ranked.npv.sens1) - m.bdm1,

main = "NPV Sensitivity to Uncertainty Ranges ",

names.arg = names(ranked.npv.sens1),

col = "red",

xlim = c(min(npv.sens1, npv.sens2), max(npv.sens1, npv.sens2)),

xlab = "Decision1 NPV [$M]",

beside = TRUE,

horiz = TRUE,

offset = m.bdm1,

las = 1,

space = c(-1, 1),

cex.names = 1

)

Note that we use the rank ordering for Decision 2 as the basis for ordering the display of uncertainty effects across decisions. This will keep the uncertainties across decisions on the same row if it’s the case that the rank ordering differs between decisions. Notice also that we keep the colors consistent with the original CDF charts, and we set the xlimits (xlim = c(min(npv.sens1, npv.sens2), max(npv.sens1, npv.sens2))) for both graphs to comprehend the full range of sensitivity across all the decisions. This latter chart parameter setting ensures that both charts are scaled across the same relative range and that the relative width of the bars for each chart remains consistent.

In the tornado charts shown in Figure 15-4, overlapping bars across decisions reveal critical uncertainties that are candidates for VOI analysis. In this case, peak.units.sold and time.to.peak present themselves as the critical uncertainties .

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig4a_HTML.jpg — Figure 15-4
Tornado sensitivity charts for Decision 2 (left) vs. Decision 1 (right)

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig4b_HTML.jpg — Figure 15-4
Tornado sensitivity charts for Decision 2 (left) vs. Decision 1 (right)

VOI Algorithms

Following the pattern that I established from the beginning of this discussion, we develop the code for VOI using a simple three-branch decision tree. The purpose is to reinforce our understanding of the method. However, because this method can mask over subtleties in the resultant distributions, we finish the discussion by developing the code that goes much further to preserve those details.

Coarse Focus First

Recall that the CDF curves for the decision values we produced when we first ran Business_Decision_Model.R demonstrated the effect of all the uncertainties acting on the NPV for each decision. Now that we know which uncertainties need our attention for VOI analysis, we need to isolate their effects from those of the other uncertainties. This is easy enough to do now because we already did this in Chapter 14 by computing the reversed decision tree with an analog matrix calculation. Effectively what we’re doing by this is acting as if we have prior knowledge about which outcome will occur more to our favor in many possible future states, and we take the best course of action on each iterated state. The mean value of this new distribution would be the value of having perfect information prior to making a decision. The difference between this value and the highest decision mean value without the prior information is the rational maximum we should be willing to pay for that prior knowledge, the VOI.

Again, run the following Value_of_Information_1.R code after the Business_Decision_Model.R and Sensitivity_Analysis.R to retain the R workspace values.

# Define a vector of the extended Swanson-Megill probability weights. These

# correspond to the values in the sens.range vector.

prob.wts <- c(0.3, 0.4, 0.3)

# Create the 3x3 matrix of the products of the uncertainty branch probabilities.

probs.branch.matrix <- prob.wts %*% t(prob.wts)

# Typically, we run business model for each decision with critical uncertainty

# peak.units.sold set to its respective p10, p50, p90 values. Find the mean NPV

# of the business model at each of these points. However, we already did this in

# the Sensitivity_Analysis.R script, so all we need to do is use the slice of

# the sensitivity table associated with the peak.units.sold row.

bdm1.sens <- npv.sens1[1, ]

bdm2.sens <- npv.sens2[1, ]

# Create a 3x3 matrix of the values in the bdm1.sens and bdm2.sens vectors.

# Transpose the values in the second matrix

bdm1.sens.matrix <- array(rep(bdm1.sens, 3), dim = c(3, 3))

bdm2.sens.matrix <- t(array(rep(bdm2.sens, 3), dim = c(3, 3)))

# Find the parallel maximum value between these matrices. This represents

# knowing the maximum value given the prior information about the combinations

# of the outcomes of each uncertainty.

bdm.sens.prior.info <- pmax(bdm1.sens.matrix, bdm2.sens.matrix)

# Find the expected value of the matrix that contains the decision values with

# prior information. This is the value of knowing the outcome before making a

# decision.

bdm.prior.info <- sum(bdm.sens.prior.info * probs.branch.matrix)

# Find the maximum expected decison value without prior information.

bdm.max.curr.info <- max(mean(bdm1), mean(bdm2))

# The value of information is the net value of knowing

# the outcome beforehand compared to the decison with

# the highest value before the outcome is known.

val.info <- (bdm.prior.info - bdm.max.curr.info)

print("Decision Value [$M]")

value.report <- list("Prior Information" = signif(bdm.prior.info, 3),

"Current Information" = signif(bdm.max.curr.info, 3),

"Value of Information" = signif(val.info, 3))

print(value.report)

The final results for the VOI analysis produce the following report.

[1] "Decision Value [$M]"

$`Prior Information`

[1] 3.2

$`Current Information`

[1] 2.91

$`Value of Information`

[1] 0.291

This implies that if we could buy perfect information about the outcome of peak.units.sold prior to making a decision, we should be willing to spend no more than approximately $300,000 to gain that insight. As we will soon discuss, this value is a coarse focus result. We might be able to improve this value some with a fine focus approach of including more granularity in our uncertainty branches.

We can also interpret VOI in a slightly different manner. Imagine that we are poised at just the point in time before we commit to a decision. In some sense two potential universes exist ahead of each of our decisions, and each universe evolves along multiple potential alternate routes depending on how each conditional uncertainty manifests itself. Suppose that each route has a coordinate pair associated with it indicating the decision and route such that the first possible route for Decision 1 would be (d, i) = (1, 1), and the first possible route for Decision 2 would be (d, i) = (2, 1). The next possible set of routes would be (d, i) = (1, 2) and (d, i) = (2, 2), and so forth for each iteration of our simulated universes. A clairvoyant’s crystal ball makes it possible for us to compare routes (1, i) and (2, i) simultaneously for free with the added benefit of observing only the variation on the NPV of each decision due to the peak.units.sold. Even though Decision 2 has a greater average NPV than Decision 1, there will be some future routes in which the NPV of Decision 1 exceeds that of Decision 2. Given that we prefer futures with higher values and that we are (hopefully) rational, we will always choose the future route (d, i) with the highest value. As a consequence, the resulting value distribution of chosen routes will manifest a higher average NPV than Decision 2. The average incremental improvement in value across those rationally chosen future routes would equal approximately $300,000.

The Finer Focus

The approach I have followed so far relies on the extended Swanson-Megill probability weights to approximate the probabilities of the outcomes of the p10, p50, p90 branches of the decision tree. As approximations, these weights do not strictly conform to the required probability weights of any arbitrary distribution such that its mean and variance are perfectly preserved. The extended Swanson-Megill weights are “rules of thumb” weights for estimating the mean of a distribution, not exact predictors of it.

Of course, the only way to have an exact predictor of an uncertain event’s expected value is to have all the possible samples associated with it (which could be impossible) or to have a symmetry or structure about the event that dictates the expected value, as in the case with fair coins, dice, or a deck of cards. However, we could improve the precision of our calculation simply by using more sensitivity points that represent discrete bins along the distribution of the critical uncertainty instead of predefined quantiles. The bins are analogs of decision tree pathways just as the quantile points were. Given that, we can replicate our matrix approach, but in a more general fashion.

Once our sensitivity analysis identifies a critical uncertainty (i.e., peak.units.sold), we can use it for the more granular analysis. We start by finding the histogram of the critical uncertainty for each decision. (If you do not want to plot the histograms, set the plot parameter to FALSE.)

# Find the histogram of the critical uncertainty for each decision pathway.

dec1.branch.hist <- hist(dec1.uncs$peak.units.sold, plot = TRUE)

dec2.branch.hist <- hist(dec2.uncs$peak.units.sold, plot = TRUE)

From these histograms, we need appropriate values from the critical uncertainty to serve as test values to find the contingent NPV values. The plot of the histogram is based on the breakpoints in the domain of the uncertainty, but we don’t need the breakpoints. We need values that represent the bins that are demarcated by the breakpoints. Fortunately, R provides this information in the calculation of the histogram and stores it in the $mid list element.

# Find the midpoints of the bins.

dec1.branch.bins <- dec1.branch.hist$mids

dec2.branch.bins <- dec2.branch.hist$mids

The histogram returns the following values for the Decision 1 and 2 branch bins:

> dec1.branch.bins

[1] 6500 7500 8500 9500 10500 11500 12500 13500 14500 15500 16500 17500

[13] 18500 19500 20500 21500 22500 23500 24500 25500 26500

> dec2.branch.bins

[1] 1000 3000 5000 7000 9000 11000 13000 15000 17000 19000 21000 23000

[13] 25000 27000 29000 31000 33000 35000 37000

Note that the first set of bins is longer than the second set.

Next, we need to find a vector of the frequency of the bins in the histograms. We accomplish this by using the $counts values for each bin, then dividing them by the number of simulation samples.

1 dec1.branch.cnts <- dec1.branch.hist$counts

2 dec1.branch.probs <- dec1.branch.cnts / samps

4 dec2.branch.cnts <- dec2.branch.hist$counts

5 dec2.branch.probs <- dec2.branch.cnts / samps

We observe the following frequency values for each bin value.

> dec1.branch.probs

[1] 0.01893333 0.02743333 0.02530000 0.02740000 0.15906667 0.15916667

[7] 0.11796667 0.07133333 0.07136667 0.07366667 0.07326667 0.07350000

[13] 0.01326667 0.01260000 0.01213333 0.01353333 0.01183333 0.01096667

[19] 0.01193333 0.01233333 0.00300000

> dec2.branch.probs

[1] 0.02296667 0.02220000 0.02163333 0.02183333 0.05960000 0.10106667

[7] 0.09976667 0.10246667 0.10070000 0.10110000 0.09603333 0.10083333

[13] 0.05650000 0.01743333 0.01683333 0.01673333 0.01703333 0.01636667

[19] 0.00890000

Each element in the respective vector represents the probability that the given bin, a branch in the decision tree, will manifest; therefore, each of these vectors should sum to 1.

Now we can find the sensitivity of the decision expected NPV by iterating across the uncertainty bins, using each bin as a value for peak.units.sold in the CalcBizModel() function. Make sure to replace the peak.units.sold parameter with a vector of the current bin with a length equal to the sample size (i.e., dec1.uncs$peak.units.sold –> rep(dec1.branch.bins[b], samps)).

# Find the sensitivity of the NPV to the midpoints of the bins in each

# uncertainty.

bdm1.sens <- c(0)

for (b in 1:length(dec1.branch.bins)) {

bdm1.sens[b] <- mean(

CalcBizModel(

time,

samps,

rep(dec1.branch.bins[b], samps),

dec1.uncs$time.to.peak,

dec1.uncs$price,

dec1.uncs$sga,

dec1.uncs$cogs,

tax.rate,

dec1.uncs$investment,

disc.rate

)$npv / 1e6

)

}

bdm2.sens <- c(0)

for (b in 1:length(dec2.branch.bins)) {

bdm2.sens[b] <- mean(

CalcBizModel(

time,

samps,

rep(dec2.branch.bins[b], samps),

dec2.uncs$time.to.peak,

dec2.uncs$price,

dec2.uncs$sga,

dec2.uncs$cogs,

tax.rate,

dec2.uncs$investment,

disc.rate

)$npv / 1e6

)

}

We observe the following results :

> bdm1.sens

[1] 1.072199 1.255956 1.439714 1.623472 1.807229 1.990987 2.174744 2.358502

[9] 2.542260 2.726017 2.909775 3.093533 3.277290 3.461048 3.644805 3.828563

[17] 4.012321 4.196078 4.379836 4.563594 4.747351

> bdm2.sens

[1] 0.01501496 0.37361250 0.73221004 1.09080758 1.44940511 1.80800265

[7] 2.16660019 2.52519773 2.88379527 3.24239281 3.60099034 3.95958788

[13] 4.31818542 4.67678296 5.03538050 5.39397804 5.75257558 6.11117311

[19] 6.46977065

Now we follow the pattern in the simpler example. Create the MxN matrix of the Cartesian products of the uncertainty branch probabilities.

probs.branch.matrix <- dec1.branch.probs %*% t(dec2.branch.probs)

Figure 15-5 graphically illustrates this last step.

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig5_HTML.jpg — Figure 15-5
The matrix of the product of the bin frequencies of the critical uncertainty under current inspection is the Cartesian product of those bin frequencies

Next, create an MxN matrix of the values in the bdm1.sens and bdm2.sens vectors. Transpose the values in the second matrix.

bdm1.sens.matrix <-

array(rep(bdm1.sens, length(dec2.branch.probs)),

dim = c(length(dec1.branch.probs), length(dec2.branch.probs)))

bdm2.sens.matrix <-

t(array(rep(bdm2.sens, length(dec1.branch.probs)),

dim = c(length(dec2.branch.probs), length(dec1.branch.probs)

)))

Figure 15-6 demonstrates the parallel relationship of the results of finding the contingent relationship of the decision value to each critical uncertainty’s bins’ midvalues.

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig6_HTML.jpg — Figure 15-6
The matrix of the decision values of each possible branch combination that occurs on the outcome of the critical uncertainties’ midvalues

Figure 15-7 illustrates finding the parallel maximum value between these matrices. Recall that this represents knowing the maximum value given the prior information about the combinations of the outcomes of each uncertainty.

bdm.sens.prior.info <- pmax(bdm1.sens.matrix, bdm2.sens.matrix)

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig7_HTML.jpg — Figure 15-7
The decision value of each possible branch outcome is found by the pairwise (or parallel) max of each possible outcome combination

Find the expected value of the matrix that contains the decision values with prior information (Figure 15-8). This is the value of knowing the outcome before making a decision.

bdm.prior.info <- sum(bdm.sens.prior.info * probs.branch.matrix)

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig8_HTML.jpg — Figure 15-8
The steps outlined in Figures 15-5 through 15-7 are summarized by this expression

Find the maximum expected decison value without prior information.

bdm.max.curr.info <- max(mean(bdm1), mean(bdm2))

The VOI is the net value of knowing the outcome beforehand compared to the decison with the highest value before the outcome is known (Figure 15-9).

val.info <- (bdm.prior.info - bdm.max.curr.info)

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig9_HTML.jpg — Figure 15-9
Finally, the VOI is the net of the decision value with prior information and the decision value with current information

This more granular analysis returns the following report:

[1] "Decision Value [$M]"

$`Prior Information`

[1] 3.28

$`Current Information`

[1] 2.91

$`Value of Information`

[1] 0.372

As in the previous sections, I placed the raw script code for this section in a file called Value_of_Information_2.R. This file should also be run after the Sensitivity_Analysis.R script.

Notice that Prior Information and Value of Information values are a little higher than those obtained by using our coarse focus extended Swanson-Megill branches. Of course, this is due to the fact that our fine-focus use of more branches extends into the tails of the critical uncertainties (which might extend much farther than the truncated inner 80th percentile prediction interval), and we are using a finer distinction in the detail of the whole domain of the critical uncertainty than the ham-fisted discrete blocks of the extended Swanson-Megill.

As you might have inferred, I liken the VOI analysis process to that of using an optical microscope to focus attention on interesting details. Using the extended Swanson-Megill branches is perfectly fine for initially identifying the critical uncertainties as one would use the coarse focus knob. The final approach described here is like using the fine focus knob to gain insights from a more granular inspection of the distribution of probability.

Please note that the approach described in this tutorial applies to calculating VOI on continuous uncertainties by providing a means to discretize the uncertainty into a manageable and informative set of branches. If the critical uncertainty is discrete to begin with, there’s no need to go through the discretization process. One would just use the assessed probabilities for the specific discrete branches.

Is histogram() the Best Way to Find the Uncertainty Bins?

In this last section, I chose to use the histogram() function to discretize the critical uncertainty. This approach is not necessary. In fact, you could use your own defined set of breakpoints that span the full range of uncertain variable behavior, maybe with block sizes of your own choosing or block sizes determined by equally spaced probability. Then you would find the associated frequencies and midpoints within these blocks. Although that might give you a sense of accomplishment of going through the process of writing the code, let me suggest that although VOI is important, it is not a value that needs to be pursued to an extremely high degree of precision. Its purpose is to point us in the right direction to resolve decision ambiguity, to limit the unnecessary expenditure of resources to resolve that ambiguity, and to ensure that whatever resources we do apply, they are applied in a resource-efficient manner. The histogram() function finds an appropriate discretization well enough.

To ensure that the order of operations taken through this chapter is clear to reproduce the code and its results, you can follow the flowchart in Figure 15-10. Appendix E contains the full uninterrupted source code according to this order, too.

../images/461101_1_En_15_Chapter/461101_1_En_15_Fig10_HTML.jpg — Figure 15-10
The steps outlined in this chapter to calculate value of information according to the source code files that are used

Concluding Comments

The world is full of uncertainties that frustrate our ability to choose clearly. If we think long enough about what those might be, it doesn’t take long to be overwhelmed. The problem is that we live in an economic universe, a place where there are an unlimited number of needs and desires, but only a limited amount of available resources to address those needs. Finding the right information of sufficient quality to clarify the impact of uncertainties is no less an economic concern than determining the right allocation of capital in a portfolio to achieve desired returns.

An interesting irony has arisen in the information age: We are swamped in data, yet we struggle to comprehend its information content. We might think that having access to all the data we now have would significantly reduce the anxiety of choosing clearly. To be sure, advances in data science and Big Data management have produced some great insights for some organizations that have invested in those capabilities, but whether investments in data science and Big Data programs are mostly economically productive remains an open question.⁴ The problem, it seems, is that data still need to be parsed, cleaned, and tested for their contextual relationship to the strategic questions at hand before systematic relationships between inputs and outputs are understood well enough to reduce uncertainty about which related decisions are valuable. Not only are we concerned about which uncertainties matter, we now have to ask which data matter, and it isn’t always clear from the beginning of such efforts that if we apply resources to understand which data matter that we will, in fact, arrive at valuable understandings. Regardless, the most important activity any kind of analyst can do to improve the quality of his or her efforts is to frame the analytic problem well before cleaning any data, getting more data, or settling on the best analysis and programming environment. Indeed, solving the right problem accurately is much more important than solving irrelevant problems with high precision or technical sophistication. The advent of the age of Big Data has, perhaps, compounded our original problem.

Every day, I enjoy three shots of espresso. I hope the summarizing shots that follow are just as enjoyable to you in your quest to improve your business case analysis skill set.

Espresso Shot 1

Evaluating VOI helps us address the issues of living in an economic universe. VOI focuses and concentrates our attention on the issues that matter, like an espresso of information. As you noticed in the tornado charts we developed, the width of the bars displays a type of Pareto distribution; that is, the amount of variation observed in the objective measure that is attributable to any uncertainty appears to decline in an exponential manner. The effect demonstrated here did not derive from cleverly chosen values to emphasize a point . This pattern repeats itself regularly. Personally, I’ve observed over dozens and dozens of decision analysis efforts that included anywhere from 10 to 100 uncertainties that the largest amount of uncertainty in the objective is attributable to between 10% and 20% of the uncertainties in question. So here’s an important understanding: Not all uncertainties we face are equally important to the level of worry we initially lend them. The twist of lemon peel is that what we were originally biased to focus on the most actually matters the least.

Espresso Shot 2

The tornado sensitivity analysis delivers a bonus feature. When we compare the charts between important decisions, we see that not all of the most significant uncertainties really matter either. For the purpose of choosing clearly, not every significant uncertainty is critical. Again, from my experience, of the 10% to 20% of uncertainties that are significant, generally no more than one or two are critical.

Espresso Shot 3

The first two shots of information espresso should significantly reduce our worry and anxiety about what can cause us harm and regret as well as reduce the amount of activity we spend trying to obtain better information about them. Now we have a third shot of espresso: When we have identified the critical uncertainties, we can know just how much we should rationally budget to get that higher quality information. VOI analysis, properly applied, should save us worry, time, and money, making us more economically efficient and competitive with the use of our limited resources. As my grandfather used to say, “Don’t spend all of that in one place.” When you are called to spend it, though, don’t spend more than you need to. VOI tells us what that no-more-than-you-need-to actually is.

Footnotes

The values you observe could vary due to the simulation error that arises from R’s simple Monte Carlo engine. Simple Monte Carlo can be somewhat “noisy.”

E.K. Godunova (originator), “Jensen inequality.” In Encyclopedia of Mathematics. http://www.encyclopediaofmath.org/index.php?title=Jensen_inequality&oldid=16975

Jensen’s inequality, stated as E(f( X ))) ≥ f(E( X )), demonstrates that it is not necessarily the case that the expected value of a function of descriptor variables X is equal to the function of the expected values of its descriptor variables. The equality holds true for linear systems of equations, but usually not for nonlinear ones. Our model has a nonlinearity in it due to the inclusion of the time value of money effects on the cash flow. The skewness in many of the underlying uncertainties compounds the problem of this distortion.

See, for example, Eric Almquist, John Senior, and Tom Springer, “Three Promises and Perils of Big Data,” Bain Brief, April 8, 2015: “Through 2017, 60% of Big Data projects will fail to go beyond piloting and experimentation and will be abandoned.” Although I can’t prove it here, I suspect that most of these failures occur for the same reasons all projects fail, the most frequent and pernicious one being the failure to properly frame the reason for the project anyway.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15. Building the Simulation in R

Create new playlist

Sign In

Sign Up

15. Building the Simulation in R

The Model Algorithms

Strict Dominance

Stochastic Dominance

The Sensitivity Analysis

VOI Algorithms

Coarse Focus First

The Finer Focus

Is histogram() the Best Way to Find the Uncertainty Bins?

Concluding Comments

Espresso Shot 1

Espresso Shot 2

Espresso Shot 3

Table of Contents for
15. Building the Simulation in R