The Stochastic Model Approach

If a parametric model is too static, or you’d rather see the basis for the parametric coefficients in use, you can use a stochastic model. A stochastic model takes into account the random variations and gives you a probability distribution for an answer.

Perhaps you estimate (or assume a constant) the duration of each story, and then you model a random delta around that. Or you estimate the minimum duration of each story and model a random delay added to that. Either gives you a probability distribution for completion of that story. And if you add up the probability distributions for all the stories, you get a probability distribution for the release. In some environments, the contributions of delays swamp the amount of time actually building the product.

Modeling Delays

Casey stopped by Kai’s office. "I’ve got an idea of how we can better model the delays when we need information from the client. Most of the requests for more information go through you, so I was thinking that we could mine your email archives for the data. If you can go through measuring the length of time between when you contact them and when they respond with the information, I can do some analysis on the distribution of those delays. And if you can separate those according to client, I can also analyze how the distribution varies from client to client."

Kai looked worried. "That would be really useful information to have, wouldn’t it. But I don’t know how to collect that information without scanning through all my emails manually. I’m worried I won’t have time to do that. When do you need it?"

"Oh, there’s no deadline. This can be an ongoing project between us. How about I set up a spreadsheet on a network drive where you can enter data in three columns: company name, query date, and response date? I can use this as input to determine the distribution of response times across all queries, and if there’s enough data, perhaps get a distribution of the median response time from company to company. You could work on this when you have time to kill. I’ll crunch the numbers periodically, and we can take a look at them. I think it would help us characterize the uncertainty we currently have in our quote estimates."

"I’m in! Some companies really keep us waiting, and it’s been a problem several times in the past year."

Be careful about the model of randomness you choose. Those of us who only studied a little bit of probability in school probably default to a normal distribution. This is generally fine for phenomena in the natural sciences where the sum of variances is evenly matched in both directions. The range of time it takes to accomplish a task is not evenly distributed, however. It’s much easier to have an unanticipated delay than an early completion. The task can never be less than zero time, but can be arbitrarily long.

If we model our work as a series of tasks of known duration with normally distributed random delays between them, then we end up with a probability distribution that has a long tail to the right. Troy Magennis demonstrates that this behavior in The Economic Impact of Software Development Process Choice -- Cycle-Time Analysis and Monte Carlo Simulation Results [Mag15] produces a Weibull Distribution.

Troy Magennis describes how he uses Monte-Carlo simulation to estimate Scrum (timeboxed) and Kanban (non-timeboxed) software development projects in Forecasting and Simulating Software Development Projects: Effective Modeling of Kanban Scrum Projects using Monte-carlo Simulation [Mag11]. Note that you still have to build the mathematical model to be simulated. For example, in a Kanban simulation, you’ll need to provide upper and lower bounds of cycle-times for a unit of work for each work stage in your process. If your work queue has multiple sizes of items or dependency on certain specialties in the work process, you’ll need to multiply your cycle-time estimates by the number of categories you use. You can also specify the frequency and range of impact of events such as added scope, work blockages for external events, and remediation of defects. When you run the simulation, the simulator goes through many iterations using random values within the ranges of your model. The result is a probability density of your completion date. If your model specification is accurate, this will tell you the probability of hitting a particular date, or the most probable completion date. It can also tell you what which events likely have the most impact on that date.

So, if we have to estimate the sizes of our stories, ranges of cycle times, and frequency of events, what’s the advantage of stochastic forecasting? It combines all of these individual estimates into probabilities. When actual events are outside the expected ranges, you can adjust your parameters and recalculate. You also know in more detail what aspect is outside your expectations. For example, if the defect rate is higher than what you modeled, you can measure the impact by adjusting the model, and you can focus on behavioral changes to bring the defect rate within your expected tolerances. You might, for example, increase your cycle-time estimates to allow more development time for preventing defects, and see the probable results of that intervention.

Table of Contents for The Stochastic Model Approach

Create new playlist

Sign In

Sign Up

The Stochastic Model Approach

Table of Contents for
The Stochastic Model Approach