Why can estimates not be accurate?

Estimating software development effort is not just a well-established routine anymore. I'm afraid that it is evolving into a branch of science! For decades now, software teams have tried to come up with various methods to predict timelines.

About 15 years ago, I used Function Points as a way to assess the size of an application, based on how much input / output data the application had to handle. This was at a stage where a system view of a software was more prevalent and the applications were mostly about data flow. There was little user input. User experience was never discussed. This meant that software developers could be given requirement specifications, with details of what field to input and what to validate for each input. Here, implementation specifics were closely tied to the requirements specifications. A change in implementation specifics or the addition of a new field to capture, would impact the size of the application in terms of Function Points. Managing scope was the most important task at hand. We sized scope, not effort.

The next variation was Use Case Points. This was sort of a bridge between the system's view and a user's view of an application. This tracked all the use cases of a system, but the sizing was similar to Function Points. We considered user transactions, along with some weightage for technical complexity and so on, but these were still waterfall methods with bulky requirement specifications that instructed developers on exactly what to build. I have written requirement documents with UML (Unified Modeling Language) diagrams that showed use cases, classes, state charts, and system touchpoints. Once again, this was less about coding effort and more about project sizing. Scope negotiation again was critical. Scope had to be finalized and signed off before handing over to development. Any changes thereafter would go through an elaborate change request approval process.

The problem with Function Points and Use Case Points was that they were dependent on knowing all the details of implementation beforehand. Budgets and timelines were based on the application size, which in turn was determined by a system that locked in an implementation plan even before we began development. This meant that any new discovery during development would have to go through an elaborate change management process, resulting in rework, loss of time, and too many approval processes, since it involved a change in timeline/budgets.

The Agile manifesto was created as a way to counter some of these drawbacks. The Agile manifesto called for responding to change, rather than following a plan. This was such a refreshing change from the way projects were previously run. There were no bulky requirement documents. There was no upfront freeze on implementation specifics. This gave more power to the software developers, since implementation could now evolve to meet needs and respond to feedback. Still, businesses needed to understand the cost of building and subsequent changes. Someone had to tell a business how much this would cost and how long it would take, because budgets were limited or had to be decided early on. If we build incrementally, how can we establish scope size upfront? So, effort estimations and managing scope became critical.

It is easy to argue the case for estimating in days. Why can't we simply state how many days/weeks it would take to build a given scope? However, what does a day mean? Is it eight hours of uninterrupted work or is it more like six and a half hours, given that I have to attend meetings, respond to emails and eat? Developers had to agree whether to estimate in man-days or ideal days. They had to remember this decision every time they estimated (which was pretty much every week or two weeks). Each time someone new joined the team, they had to bring them on board with this agreement. This could slow down the team, instead of increasing productivity.

There were also other open questions here: what if a task took only three hours to complete? What if something else took 10 days to complete? Will it take a less experienced developer the same time to code this as someone more experienced/skilled?

Estimating effort for smaller scope with higher details on domain and user context, could be more accurate. So, user stories became necessary. User stories follow the INVEST principles. They are meant to be Independent, Negotiable, Valuable, Estimable, Small, and Testable. Still, estimating effort in days was not accurate because of all the preceding open questions. So, we had to somehow break from an absolute effort to relative sizing. If we could agree that user story1 was the size of a grape (it was as small, easy to eat), then we could try to figure out whether user story2 was also a grape or an orange (requiring more effort to eat, bigger) or a watermelon (much bigger and more complex to eat):

User story

Fruit size

User story 1

Grape

User story 2

Orange

User story 3

Orange

User story 4

Watermelon

User story 5

Orange

User story 6

Grape

User story 7

Grape

We could then use these sizes to establish how many grapes, oranges, and watermelons a team was able to eat in an iteration:

Iteration

# of grapes

# of oranges

# of watermelons

Week 1

1

1

 

Week 2

  

1

Week 3

2

  

Week 4

 

1

 

Now, if we assign a number to a grape, orange, or watermelon, it becomes a story point. For instance, we could say that a grape is 1, an orange is 2 (twice as big and complex as a grape), and a watermelon is 5 (more than twice as big and complex as an orange). We can replace the preceding table with story points and count the number of points:

Iteration

# of fruits

Story points

Week 1

1 grape + 1 orange

3

Week 2

1 watermelon

5

Week 3

2 grapes

2

Week 4

1 orange

2

This gives us team velocity (how many fruits we can eat in a given time or how many story points we can complete in an iteration). We could then compare this over time, and this would tell us our burn rate. How many fruits do we eat on an average every week?

Now this seems like a great way to get away from thinking about actual days or hours, and instead focus on the size, complexity, risks, resources needed, and so on. For instance, cutting a watermelon would need knives. The ideal way to create user stories would be to try and make them all as small as possible. If they were all grapes, it would just become so much easier. We could count all the grapes, and we could assess our capacity to eat grapes, and then we could forecast how many grapes we could eat in the next six months. So, it appears that if we come up with the right user stories, we can get pretty good at estimating accurately. However, it isn't as easy as it sounds.

There are many aspects that can influence our plans (based on estimates). Here are few such aspects:

  • Team composition can change: The people who estimated stories as oranges may move out of the team to be replaced with new people. New people could bring in other perspectives and ideas. What the original team thought was an orange could turn out to be a grape because there are now better ideas and easier ways to eat.
  • Technology and business context assumptions change: Technology changes can make it easier or harder to build something. We may have estimated a user feature as 10 watermelons and 5 oranges, assuming that we had to build it from scratch, but our product could have matured to a state where we could integrate with an existing software rather than build one. The feature could now be 3 oranges.
  • Familiarity increases: A steady team of developers working on the same product for a long period of time will gain a much higher level of familiarity. They may get influenced by their knowledge of the current way of doing things and may be able to find a faster way to implement. An orange can be eaten as easily as a grape.
  • Biases can kick-in: No matter how much we try to position story sizing as an effort sizing activity and not a time estimation. I have seen that people naturally sink into thinking about efforts in days and hours. Also, having an expert on the team can make people submissive and consequently, conformation bias can kick in. Of course, asking people to call out their estimates in the blind (such as Planning Poker, which is an estimation technique that can help teams arrive at consensus-based estimates) can help in this, but eventually team culture will determine whether the expert will force down their ideas on the team. They will say, "I know how to do this. So, it is an orange if I say that it is an orange."
  • Decision fatigue sets in: Typical estimation sessions are conducted for a backlog of user stories. At the beginning, everyone comes in with a fresh mind and discussions are detailed. As time passes, and the team has to make many decisions, there is a tendency to rush through the stories, without discussing details. Estimates are thrown for the sake of getting the meeting over with. A team member might say, "Some of us think that it's an orange. Some of us think that it's a grape. Ah, I'm too tired. Let's go with grape."

    If the purpose of story point-based estimates is to forecast future velocity (how much scope can we deliver in a given amount of time), then all the preceding aspects actually hinder us from doing this. There are too many variables that affect our forecasting. A side-effect of tracking velocity is also that the focus shifts from outcomes (did we deliver something of value?) to output (was the team productive? Did we eat as many fruits as we said we would?). We forget that eating grapes and oranges is not the goal.

Budget allocation strategies may have to change if we do away with estimates. In an evolving product, for a business operating under ambiguous market conditions, budget planning needs to be iterative too. While we still need to know the cost of a feature idea, accurate effort estimates cannot be our only way to arrive at that cost.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.81.58