4. Project Management

image

Program Manager
Project Manager

“The deficiencies of the theory of the project and of the theory of management reinforce each other and their detrimental effects propagate through the life cycle of a project. Typically, customer requirements are poorly investigated at the outset, and the process of requirement clarification and change leads disruption in the progress of the project. The actual progress starts to drift from the plan, the updating of which is too cumbersome to be done regularly. Without an up-to-date plan, the work authorization system transforms to an approach of informal management. Increasingly, tasks are commenced without all inputs and prerequisites at hand, leading to low efficiency or task interruption and increased variability downstream. Correspondingly, controlling by means of a performance baseline that is not based on the actual status becomes ineffective or simply counterproductive. All in all, systematic project management is transformed to a facade, behind which the job actually gets done, even if with reduced efficiency and lessened value to the customer.”1

—L. Koskela and G. Howell, “The Underlying Theory of Project Management Is Obsolete”

image

Figure 4.1 Without transparent data, project management can descend into a game of differing bets based on partial information and divergent perspectives of stakeholders. Poker is a good metaphor for this pattern.

“A Friend in Need” by C.M. Coolidge, c. 1870

The previous chapter addressed the gathering of requirements. This chapter focuses on managing a running project that implements those requirements. First, I’ll cover three concepts that are core to the value-up paradigm:

• Variance to distinguish in- and out-of-control projects

• Descriptive rather than prescriptive metrics

• Multiple dimensions of project health

VSTS applies these concepts with its work item database and metrics warehouse to give you a practical basis for value-up project management. To illustrate this aspect of VSTS, I work through a large number of examples in this chapter using reports from the metrics warehouse. These are “happy day” examples, in contrast to troubleshooting the “unhappy” examples that I’ll use in Chapter 9, “Troubleshooting the Project.”

Finally, in this chapter I cover estimation and triage from a value-up perspective. These two essential project management practices rely closely on the metrics and queries that VSTS enables.

Understanding Variation

Fundamental to the Value-Up Paradigm is the concept of natural variation. Variation and its impact on quality were originally studied in manufacturing engineering and well taught by W. Edwards Deming:

Common causes and special causes. [Dr. Walter A. Shewhart of Bell Labs] saw two kinds of variation—variation coming from common causes and variation from special causes . . . . Common causes of variation stay the same day to day, lot to lot. A special cause of variation is something special, not part of the system of common causes . . . . Dr. Shewhart also saw two kinds of mistakes . . . .

Mistake #1. To react to an outcome as if it came from a special cause, when it actually came from common causes of variation.

Mistake #2. To treat an outcome as if it came from common causes of variation, when actually it came from a special cause.2

The same distinction between common cause and special cause variation applies to software projects. Processes that exhibit common-cause variation are in control; those with special-cause variation are out of control. In software projects, some things take longer than anticipated, and some take shorter. Some software integrates perfectly; in other cases, bugs appear and need to be fixed. Some scenarios delight customers exactly as hoped; others need to be refined with usability testing and learnings from iteration to iteration. These are usually common-cause variations.

Mistake #1 is tampering with an in-control process showing this natural variation. This tampering only increases variance and quickly sends a process out of control. Perhaps more importantly, it leads to micromanagement that demoralizes the team.

Deming uses a beautifully simple experiment to illustrate the effect of tampering.3 With a funnel, a marble, and a table, you point the funnel at a constant target and let the marble roll through many times (say 50), marking the spots where the marble falls. After each drop, you move the funnel to compensate for each error. Deming proposes three correction rules that, on their own, sound plausible. The first pass shows a constrained variance, while the others drift much more broadly.

At the same time, failing to identify and correct the special cause of a process going out of control can be disastrous (Mistake #2). This mistake leads to a vicious cycle of the process spinning further away from the desired target.

Determining variation has been too difficult for most software projects because the data have been too hard to collect. Even agile methods have suffered from this oversight. For example, the management pattern called “Yesterday’s Weather,” common to XP and SCRUM, specifies that “you can’t sign up for more work in an iteration than you did in the previous iteration,”4 which potentially leads to lower estimates with each iteration. This flies in the face of a primary value of iterations—that they allow for continuous learning, course correction, and improvement.

So how do you know what normal variation is? You observe and measure the running process without tampering. Fortunately, VSTS with its instrumentation of the process makes this observation easy, and an iterative approach with small batch sizes and short intervals provides frequent opportunities for observation and, when necessary, correction. You can never distinguish every common-cause variance from every special-cause one, but you can get very close by watching the reports described later in the chapter and managing the issues and risks in the project.

Using Descriptive Rather Than Prescriptive Metrics

Often there are tacit or even explicit assumptions about the “right” answers to questions of expectations. These expectations can determine how individuals are recognized or not recognized for their performance. Developers are praised for completing tasks on time. Testers are praised for running lots of tests or finding lots of bugs. Hotline specialists are praised for handling lots of calls and marking them resolved. Everyone is praised for keeping billable hours up. And so on. Using metrics to evaluate individual performance is horribly counterproductive, as Robert Austin describes:

When a measurement system is put in place, performance measures begin to increase. At first, the true value of an organization’s output may also increase. This happens in part because workers do not understand the measurement system very well early on, so their safest course is to strive to fulfill the spirit of the system architects’ intentions. Real improvement may result as well, because early targets are modest and do not drive workers into taking severe shortcuts. Over time, however, as the organization demands ever greater performance measurements, by increasing explicit quotas or inducing competition between coworkers, ways of increasing measures that are not consistent with the spirit of intentions are used. Once one group of workers sees another group cutting corners, the” slower” group feels pressure to imitate. Gradually, measures fall (or, more accurately, are pushed) out of synchronization with true performance, as workers succumb to pressures to take shortcuts. Measured performance trends upward; true performance declines sharply. In this way, the measurement system becomes dysfunctional.5

These are prescriptive metrics. They can have unforeseen side effects. There is a well-identified pattern of organizational behavior adapting to fit the expectations of a prescriptive measurement program, as shown in Figure 4.2. Typically, a metrics program produces an initial boost in productivity, followed by a steep return to the status quo ante but with different numbers. For example, if bug find and fix rates are critically monitored, then bug curves start conforming to desirable expectations.

image

Figure 4.2 This graph summarizes the common experience with prescriptive, one-dimensional metrics programs. Performance shoots up early in accord with management aspirations, and the numbers get better and better, but the desired effect tapers off quickly.

Robert D. Austin, Measuring and Managing Performance in Organizations (New York: Dorset House, 1996), 16

Consider some examples of prescriptive metric misuse:

• Imagine measuring programmer productivity based on lines of code written per day. An individual has a choice of calling a framework method (perhaps 5 lines with error handling) or of copying 200 lines of open-source example code. Which one gets rewarded? Which one is easier to maintain, to code-review, to security-review, to test, and to integrate? Or similarly, the individual has the chance to refactor three overlapping methods into one, reducing the size of the code base. (Now ask the same questions.)

• Imagine rewarding programmers based on number of bugs fixed. This was once the subject of a Dilbert cartoon, which ended with Wally saying, “I’m going to code me up a Winnebago.”

• Imagine rewarding the team for creating tests and code to achieve 90% code coverage. Do they spend their time writing complex test setups for every error condition, or easily comment out the error handling code that tests aren’t able to trigger? After all, if the tests can’t invoke those conditions, how important can they be? (Not very, until a customer encounters them.)

• Imagine measuring testers based on the number of bugs found. Do they look for easy-to-find, overlapping, simple bugs or go after significant ones that require setting up complex customer data and configurations? Which approach gets rewarded? Which one yields more customer value?

Each example would lead to obvious dysfunction—discouraging reuse and maintainability, encouraging buggy check-ins, reducing error handling, and discouraging finding the important bugs. Other dysfunctions from metrics misuse will be less obvious but equally severe. People who don’t get the best scores will be demoralized and will be faced with the choice of gaming the numbers or leaving the team.

Preventing Distortion

At the root of the distortion is the prescriptive, rather than descriptive, use of these metrics. This problem has several facets. First, the metrics are only approximations of the business objective, such as customer satisfaction or solution marketability. The team aims to deliver customer value, but that can’t be counted easily on a daily basis. So the available metrics, such as task completion, test pass rate, or bug count, are imperfect but easily countable proxies.

A tenet of the value-up paradigm is to give credit only for completed, customer-deliverable units of work at known quality.6 With iterations of two to four weeks and assessment at the end of the iteration, this practice allows for intervals of project monitoring at iteration boundaries. Treat all interim measurements as hypothetical until you can assess delivery of working scenarios at known qualities of service.

Second, the measurements are made one dimension at a time. The negative consequences of a one-dimensional view are dramatic. If you’re only measuring one dimension at a time and are prescribing expected results, behavioral distortion is a natural consequence. Most experienced project managers know this. However, gathering data from multiple sources at the same time in a manner that lends itself to reasonable correlation is usually very difficult.

Third, when applied to individuals, metrics create all sorts of disincentives, as illustrated in the previous examples. Keep the observations, even descriptive ones, at the team level.

Fourth, expect common-cause variations and think in ranges. Don’t reward the most prolific coder or highest-count bug finder. Expect the numbers to show variance and don’t punish cases of in-control variance. Instead, reward a team based on completed customer-deliverable units of functionality and make the iteration cycle for that assessment frequent.

Many Dimensions of Project Health

Take a minute to think of the most common reports that you use in monitoring your projects. A typical list would include the following:

• Time spent on tasks (hopefully, planned and unplanned)

• Bug count (and hopefully, find and fix rates)

• Project task completion, as reported by the individual performers (hopefully measured by scenario, feature, QoS, or other customer-visible unit)

• Resource utilization, typically in accounting terms

Each is typically measured separately.

Software projects have many interacting dimensions, and any of them can be relevant. Looking at these dimensions provides an opportunity for early discovery of exceptions and bottlenecks that need course corrections. Yet having multiple dimensions helps you make sure that the measurements as a whole tell a consistent story.

In the first chapter, I describe how VSTS instruments the software process. A common project backlog is maintained in the single work item database. At the same time, version control, tests, and builds are instrumented so that many dimensions of project data can be gathered automatically and correlated in a metrics warehouse. This mechanism enables daily assessment of the project while it is running. This idea of a metrics warehouse is illustrated in Figure 4.3.

image

Figure 4.3 Unlike in many older project models, where project metrics are gathered in isolation, VSTS combines the measures in a metrics warehouse using a common set of analysis cubes and reports on the dimensions together.

In estimation and monitoring, the backlog also helps considerably. You can do the same analyses on all work items regardless of type. Work item changes become metrics that automatically track progress for you to provide data for the next estimation, and you can assess the quality of your estimates to approve them next time. Quality can be measured against task progress using many simultaneous views.

Answering Everyday Questions

Any project manager should be able to answer the following questions, hopefully in the day-to-day course of business:

• How much work is left and when will it be done?

• How productive is the team?

How much unplanned work do we have?

• What’s the quality of the software we’re producing?

• How effectively are we finding, fixing, and closing bugs?

• How many false task and bug resolutions do we have?

• Are we finding and triaging the right bugs?

• How fast can we go before quality suffers?

Each question is relevant at many scales—for the days within an iteration, for the iterations within a project, and for the projects within a program. The questions are also relevant for all kinds of work—scenarios and other requirements, tasks, bugs, and so on.

In the next few pages, I present graphs that help to answer these questions. Not unlike the way one develops scenarios, I show them first with “happy day” examples, and in Chapter 9, I show their use for troubleshooting unhealthy projects.

Remaining Work

Of the several ways to track work, one of the most useful is a cumulative flow diagram (see Figure 4.4).7 This is most useful when looking at days within an iteration or iterations within a project.

image

Figure 4.4 How much work is left and when will it be done? This cumulative flow diagram shows work remaining measured as the Scenario and QoS Work Items being resolved and closed in the iteration.

Every work item has state. In MSF for Agile Software Development, scenarios, which are a type of work item, have three states: Active (that is, in the hands of a developer and not yet coded), Resolved (that is, coded and ready for test), and Closed (that is, tested and verified).

Each data series is a colored band (reproduced here as shading), that represents the number of scenarios that have reached the corresponding state as of the given date. The total height is the total amount of work to be done in the iteration.

• If the top line increases, it means that total work is increasing. Typically, the reason is that unplanned work is adding to the total required. That may be expected if you’ve scheduled a buffer for unplanned work such as fixing newly discovered bugs. (See Figure 4.6.)

• If the top line decreases, it means that total work is decreasing, probably because work is being rescheduled out of the iteration.

Current status is measured by height on a particular date.

• The remaining backlog is measured by the current height of the leftmost area, Active in this case.

• The current completions are shown by the current height of the rightmost area, Closed.

The height of the band in-between indicates the work in progress, in this case items Resolved but not Closed.

Watch for variation in the middle bands. An expansion can reveal a bottleneck, for example, if too many items are waiting to be tested and testing resources are inadequate. Alternatively, a significant narrowing of the band could indicate spare capacity.

Visually, it’s easy to extrapolate an end completion inventory or end date for the backlog from a cumulative flow diagram like Figure 4.4. A small caution applies, however. Many projects observe an S-curve pattern, where progress is steepest in the middle.8 The common-sense explanation for the slower starting and ending rates is that startup is always a little difficult, and unforeseen tough problems need to be handled before the end of a cycle.

Project Velocity

The rate at which the team is processing and closing work items is captured in the Project Velocity graph (see Figure 4.5). Similar to Remaining Work, this is most useful when looking at days within an iteration or iterations within a project.

image

Figure 4.5 How productive is the team? Unlike the cumulative flow diagram, velocity shows the count of work items resolved and closed on each day. In this example, the range is 4 ± 2 scenarios/day after a seven-day lag.

This chart is one of the key elements for estimation. It shows how quickly the team is actually completing planned work and how much the rate varies from day to day or iteration to iteration. In examining variance, be sure to distinguish between common-cause and special-cause events. Special-cause variances include team illness, power outage, an office move, and so on. If you eliminate them, you can look at the actual normal variance to get a feel for the upper and lower bounds of your team’s normal productivity. Use this data to plan a target range for the next iteration in conjunction with the quality measures discussed in the following.

Unplanned Work

Very few project teams know all the itemizable work to be done ahead of time, even within the iteration. This condition can be perfectly acceptable if you schedule a sufficient buffer for handling the load of unplanned work (for example, maintenance tasks and bugs). On the other hand, it can be a real problem if you have not scheduled the capacity and can force you to cut back on the planned work.

The top line of Figure 4.6 matches the top line of the Remaining Work graph in Figure 4.4. The total height is the total amount of work to be done in the iteration.

image

Figure 4.6 How much unplanned work do we have? This graph breaks down the total work from the Remaining Work chart into the planned and unplanned. In this example, planned work is declining slightly. This might be due to the unplanned work added later, forcing planned work items to be triaged out of the iteration or due to overestimation, or it might be occurring simply because certain items were discovered to be no longer necessary.

The areas then divide that work into the planned and unplanned segments, where “unplanned” means unscheduled as of the beginning of the iteration.

For monitoring, use this graph to determine the extent to which unplanned work is forcing you to cut into planned work. For estimation, use this to determine the amount of schedule buffer to allow for unplanned work in future iterations.

Quality Indicators

Quality needs to be seen from many dimensions. Figure 4.7 combines the test results, code coverage from testing, code churn, and bugs to help you see many perspectives at once.

image

Figure 4.7 What’s the quality of the software? Ideally, test rates, bugs, and code churn would all produce the same picture, as they do in this example. Test passes and code coverage rise, while bugs and code churn fall. On the other hand, when you find a discrepancy in these relationships, you need to drill down into the appropriate build and data series.

The bars show you how many tests have been run and of those, how many have returned Pass, Fail, and Inconclusive results.

The first series of points is the code coverage attained by those tests (specifically, the ones run with code coverage enabled). Ordinarily, as more tests are run, more code should be covered. On the other hand, if test execution and test pass rates rise without a corresponding increase in code coverage, then it can indicate that the incremental tests are redundant.

The second series of points is code churn, that is, the number of lines added and modified in the code under test. High churn obviously indicates a large amount of change and the corresponding risk that bugs will be introduced as a side effect of the changes. In a perfectly refactored project, you can see code churn with no change in code coverage or test pass rates. Otherwise, high code churn can indicate falling coverage and the need to rewrite tests.

The third series is the active bug count. Clearly, there should be a correlation between the number of active bugs and the number of test failures. If the active bug count is rising and your tests are not showing corresponding failures, then your tests are probably not testing the same functionality in the same context that the bugs are reporting. Similarly, if active bug count is falling and test pass rates are not increasing, then you may be at risk for a rising reactivation rate.

The different series are scaled by different powers of ten to make the graph legible with one Y-axis. This is comparable to a Performance Monitor graph, which has different counters at different orders of magnitude.

Bug Rates

Bugs are, of course, a key part of the team workload and a key risk area. To assess bugs, you need to look at three trends (see Figure 4.8):

image

Figure 4.8 How effectively are we finding, fixing, and closing bugs? When the fix rate exceeds the find rate, the active bug count falls.

• What is the total active bug count (sometimes called “bug debt”)?

• How quickly are we finding new bugs?

• How quickly are we fixing bugs already found?

Bug rates are best interpreted with your knowledge of all the current project activities and the other metrics on the Quality Indicators graph (see Figure 4.7). For example, a high find rate can be a sign of sloppy code (a bad thing), newly integrated code (an expected thing), effective testing (a good thing), or exceptional events, such as a bug bash (an infrequent event, where large numbers of people try ad hoc testing for a day). On the other hand, a low find rate can indicate a high-quality solution or ineffective testing. Use code coverage, code churn, and test rates to help you assess the meaning.

Similarly, a high resolve rate is usually a good thing, but check Work Progress to make sure that the resolved bugs are getting promptly closed and check Reactivations to make sure that the resolutions are not premature.

Reactivations

Reactivations occur when work items have been resolved or closed prematurely. They are a huge warning sign of project dysfunction.

People sometimes record bugs as resolved when the underlying problem has not been fixed. (Similarly, they can record scenarios resolved before they are working or development tasks closed when the work hasn’t been finished.) Every time this happens, it introduces significant waste into the process. Someone has to test and reopen the work item, the developer needs to scrap and rework the original code, and then the code needs to be retested. At a minimum, the reactivation doubles the number of handoffs and usually more than doubles the total effort required to complete the corresponding work.

Watching the reactivation rate (also sometimes called the Fault Feedback Ratio) is important.9 A small amount of noise (for example, less than 5% of the work items resolved at any time) might be acceptable, but a high or rising rate of reactivation should warn the project manger to diagnose the root cause and fix it.

The top line of Figure 4.9 shows the number of total work items of the selected types (for example, bugs) resolved in the build.

image

Figure 4.9 How many false task and bug resolutions do we have? These false resolutions show up as reactivations—bugs or tasks that were made active again because the original work wasn’t completed. The line shows the count of reactivated items each day. The bars show the cumulative active bugs as of that day, broken into the reactivations (top segment) and non-reactivations (bottom segment).

The height of the top area is the number of reactivations, that is, work items previously resolved or closed that are now active again.

The height of the lower area is the difference, that is, the number of work items resolved less the reactivations.

Bugs by Priority

Bugs happen, and finding them is a good thing. Often, however, the easy-to-find bugs aren’t the ones that will annoy customers the most. If the high-priority bugs are not being found and a disproportionate number of low-priority bugs are, then you need to redirect the testing efforts to look for the bugs that matter.

In triage, it is easy to over-prioritize bugs beyond your capacity to resolve them or under-prioritize them to the point where customers will be highly dissatisfied.

Figure 4.10 assesses the effectiveness of two things: bug hunting and triage, that is, the process of prioritizing the bugs to fix, which is described later in this chapter.

image

Figure 4.10 Are we finding and triaging the right bugs? This chart shows for each day by priority the cumulative total active bugs and the daily subtotals for found and fixed. The counts for found include both new and reactivated bugs.

In Figure 4.10, the three series of bars represent similar data to the Bug Rates graph (Figure 4.8). The series are as follows:

• Total active bugs at the time of the build

• Number found in build

• Number resolved in build

Each series is further broken into priority so that each bar stacks from highest to lowest priority, with lowest on top.

If you have too many high-priority bugs active, you need to be sure that you have capacity to address them. On the other hand, a significant lack of low-priority bugs can also lead to customer dissatisfaction. (See the discussion of the Broken Windows theory in the “Defining ‘Good Enough’” section of Chapter 7.)

Actual Quality Versus Planned Velocity

As much as teams believe the saying, “Haste makes waste,” there is usually a business incentive to go faster. A project manager’s goal should be to find the maximum rate of progress that does not make quality suffer. Figure 4.11 presents the relationship for each iteration of estimated size to overall quality.

image

Figure 4.11 How fast can we go before quality suffers?

The X-axis is the number of scenarios actually closed (completed) in the iteration.

The Y-axis is the total number of bugs found divided by the scenarios closed (in other words, the average number of bugs per scenario).

Each bubble is labeled according to its iteration.

On screen, stoplight colors go from green for the lowest bugs per iteration to red for the highest. Here you see them reproduced as lightest to darkest for green-amber-red.

If haste is in fact making waste, position of the iterations to the right (larger iterations) will be higher, indicating more bugs per requirement, and smaller (left) ones will be lower, indicating fewer bugs. If you are working at a comfortable pace and have not seen quality drop with planned iteration size, then all iterations will be at roughly the same height on the Y-axis.

Estimating an Iteration

This section describes a simple process for estimating projects at the level of person days using the data from the project health reports shown previously. With Visual Studio Team System, all of the backlog is one database, and all the historical metrics are in one data warehouse, so it’s easy to get the numbers that you need here.

Estimating the work for an iteration is best done in multiple passes.

Top-Down

Before the Renaissance and the advent of celestial navigation, explorers (such as Christopher Columbus) relied on a technique called dead reckoning. While imperfect, dead reckoning did enable the discovery of the New World and the drawing of early maps, which in turn made further exploration and settlement possible. Here’s a standard definition of dead reckoning:

Dead Reckoning is the process of estimating your position by advancing a known position using course, speed, time and distance to be traveled. In other words figuring out where you will be at a certain time if you hold the speed, time and course you plan to travel.10

Top-down estimation is like dead reckoning:

1. Calculate the number of team-days available in the iteration. This is your gross capacity.

2. Looking at the actual data in the Unplanned Work chart, make a rough estimate of the buffer you will need for unplanned work. Subtract this from the gross capacity to determine your net capacity for planned work.

3. Looking at the Project Velocity chart, determine the number of scenarios and QoS per day that your team can resolve and close. This is your gross velocity. Consult the Quality Indicators and Actual Quality versus Planned Velocity charts to look for possible negative side effects of your current pace. If you see them, adjust gross velocity downward to a sustainable net velocity.

4. Multiply net capacity for planned work with sustainable net velocity to determine the number of scenarios and QoS that you can implement in the iteration. On your ranked list of scenarios and QoS, count down from the top until you have reached that number. Draw a cut line there. Everything above the cut line is your candidate list.

Dead reckoning might just be good enough, especially if you have low variance among your scenarios and QoS on the Project Velocity chart (see Figure 4.5). If you have high variance, then you can attack the problem both by breaking down scenarios and QoS into more consistently sized work items and by estimating bottom-up.

Bottom-Up

After you have a candidate list of scenarios and QoS, you can perform bottom-up estimation as well. (You could do it earlier, but this is not recommended. You’d spend a lot of time estimating work that won’t be done yet, and you might never be done according to the current understanding of the requirements.)

1. If necessary, the business analyst and architect should refine the scenarios, QoS, storyboards (refer to Chapter 3, “Requirements”), and solution architecture (see Chapter 5, “Architectural Design”). These need to be articulated to the point where they communicate sufficiently to all team members the details needed to estimate work for the next iteration.

2. Next, individual team members decompose these into tasks, typically to a granularity of one to three days.

3. The team triages bugs to identify work needed to fix bug backlogs for the iteration.

4. Working with the teams, the project manager rolls up the total volume of these task estimates to see whether they fit the iteration capacity for planned work.

To the extent that the estimates and capacity don’t match, you should lather, rinse, and repeat. If you do bottom-up estimation, don’t forget the feedback loop of the retrospective to assess how actuals compare to estimates.

Refinements

Obviously, there are some refinements to this method you want to make.

Variance in Size

The way I described the technique previously, I assumed that all scenarios and QoS are roughly the same size, with little variance. Depending on your project and your styles of solution definition and architecture, this might be true. If the variance in the Project Velocity graph is small, then it is a good assumption.

Alternatively, it might be more useful to categorize your scenarios and QoS into “T-shirt size” groupings, that is, Large, Medium, and Small. You can plan, track, and graph these separately.

Another alternative is to use Rough Order of Magnitude (ROM) estimates as part of top-down costing. Both alternatives let you draw the cut line for the iteration based on more careful initial estimation. Again, depending on your project, it may or may not be worth the extra effort because you will calibrate the top-down estimates based on bottom-up costing.

Changes in Team

I also simplified the previous technique to assume static team size and composition. Clearly, that’s often false. When your team grows or shrinks, capacity changes. As skills and domain expertise improve with experience, your capacity grows, too.

Differences in Iterations

Not all QoS are alike. For example, at Microsoft, it is typical to single out blocks of time, usually a month or more, for a “Security Push.” This is a period when all architecture, designs, and code are reviewed for security considerations. This enables experts to assist the team in an intensive focus on a particular QoS that requires specialized analysis and addresses specialized risks. Usability, Manageability, and Performance are other QoS that might, again depending on your project, lend themselves to specific emphasis in particular iterations when specialists are available.

Similarly, some scenarios may be more important than others for solution success, or they may require integration of several components and therefore be possible only after a number of prerequisites have been finished.

It is also common to see an S-curve pattern in the Remaining Work chart, in which startup and finish iterations (or tasks within an iteration) take longer than the middle ones. If you anticipate this pattern, you can plan capacity around it.

Working on Multiple Projects

Note that in the techniques described previously, estimation and monitoring do not require explicit time or effort tracking. Because work items track changes automatically, and the changes are all time-stamped for auditability, the data for actual elapsed time is available for free. This makes it very easy to do the estimates based on days planned and track on days actually spent.

The limitation is that calendar time estimation assumes that the team members are working on only one project at a time. If that’s not true, or in other words, if some of you are working on multiple projects, then you do need to introduce effort tracking. Just as importantly, you also need to significantly alter your capacity estimates to accommodate the overhead cost of switching contexts.

Estimation Quality

Twenty years ago, Tom DeMarco introduced the idea of an Estimating Quality Factor (EQF) as a way of tracking and tuning the accuracy of estimation.11 The idea is simple and powerful. Keep track of your estimates, compare them to your actual completion time, compute the difference, and analyze the reasons for the difference. Use this information to tune your next estimation. Over multiple project cycles, you can assess whether you’re getting better and how much variance there is in your estimation accuracy.

Originally proposed for a project as a whole, EQF is

1 / ∑t |(estimated completion – actual completion) |

You can also apply this idea to iterations. On a time-boxed iteration or project, you can use this technique against the backlog size. On a feature-boxed one, you can use it against time.

EQF can also be a way of surfacing project concerns among the team members. Survey the team members during the retrospectives to ask when they think the project will be “done” or when a certain milestone, such as live deployment, will be reached. Then ask follow-up questions to drill into the perceptions. If there are changes, ask, “What did you see or hear that made you change your estimate?” This approach can yield great information about risks, unplanned work, skills gaps, or other factors that need to be addressed.12

Alternatively, if the team is not worried about dates and functionality, that is great information. Frequently, teams get optimistic in the middle of a project as they become more proficient at closing a level of tasks and have not yet encountered hard integration issues. This optimism will produce an S-curve in the Remaining Work chart and a hump in the EQF tracking.

Of course, your mileage may vary. Unless you are running a project very much like one you’ve run before, with a very stable team, it is reasonable to expect variation in the estimation quality. Excluding special causes (half the team was diverted to an emergency project, the company was reorganized, and so on) and examining the residual variance will give you good guidelines for the factors to apply in the next round of estimates.

Retrospectives

One of the great benefits of iterative development is that you can learn and correct course frequently. Retrospectives are meetings in which you distill the learning from the iteration. The most important prerequisite for a retrospective is a blame-free environment. A great mindset to bring is summarized by Norman L. Kerth in his book and web site:

Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.13

During the retrospective, you should strive to identify the common- and special-cause variances. Look at the charts and identify unusual dips in Velocity, bloats in Remaining Work, or inconsistencies in the Quality Indicators. Look for root causes of problems, reusable lessons learned, and corrective actions. In other words, find where the real bottlenecks are. For example, testing may appear slow, but on further investigation you find frequent unusable builds. Looking into builds, you discover that services on which you depend or incoming components that you integrate were not performing adequately. So the appropriate corrective action might be to provide better component acceptance tests that must pass before those external services or external components are accepted into your lab and to have the providing teams run these as build verification tests (BVTs).

The output of the retrospective should inform the next iteration plan. The scenarios, QoS, tasks, and their prioritization may all change based on the discoveries you make in the retrospective.

Triage

In addition to planning, monitoring, and estimating, a good project manager needs to prioritize and schedule the work to fit available resources and time. This process is called triage. The term was introduced from medical practice into the software world. Originally, triage only described the handling of bugs, but now it is equally applicable for all work items.

Triage is successful only if the workload is cut to the point where it fits the available hours, skills, and schedule of the team. In ordinary, daily triage sessions, the resources and time dimensions are fixed, so it is the prioritization of the work that changes the functionality and quality levels of the planned solution.

A Triage Exercise

To illustrate the point of triage, here is a simple exercise: I’ll use bugs only, not all work item types, and I’ll assume that there is no other work on the team.

Suppose you are approaching the end of an iteration. Along the way, you have an iteration goal to reduce your bug backlog to zero. There are currently 10 P1 (“must fix”) bugs outstanding, as shown in Figure 4.12. Your team of five developers can average two verified bug fixes per person per day. Your team of three testers is averaging four bugs found per tester per day. Every four days, their find rate decreases by one bug per tester.

image

Figure 4.12 In VSTS, you can query work items from the backlog, here the Active Bugs, for daily triage meetings.

When will you hit zero backlog at those rates? (Don’t look ahead—try to solve this by yourself first. Even with such simple numbers, it’s not easy.) After you have solved this, check your answer against Figure 4.13.

image

Figure 4.13 In this idealized chart, the capacity of the team is static at 50 per day, and the find rate drops from 60 to 30 in even increments, but it is still not obvious at first when you’ll eliminate the backlog.

In practice, of course, the rates are not constant and may not be reliable. There are several points to note here:

• As long as the find rate is above the fix rate, backlog will increase.

• When the find rate is slightly below the fix rate, convergence will be slow.

• It’s easy to cheat—artificially lowering the find rate, for example, by deprioritizing bugs, can make the numbers look much better than they should look.

• These are only averages. (Refer to the earlier Bug Rates chart, Figure 4.8, for a more realistic example of variance.)

The point remains: Every day you should assess workload, progress, impediments, and probability of success.

What Makes Triage Effective: The Red Line

Averages, like in the previous exercise, can be deceiving. Averages don’t tell you who has the heavy backlog, where the bugs are being found, and who’s fixing them and how quickly they are doing it.

It is typical to see big differences in all these areas. (See the Quality Indicators chart, Figure 4.7, to relate the different measures.) Take the exercise a little further. Imagine that there are five components, with one developer each, and that 50% of the backlog and found bugs are in Component A. You’d see a breakdown on day four like Figure 4.14.

The Red Line: (Fix Rate – Find Rate) × Days Available < Backlog

image

Figure 4.14 Bug rates among components can vary widely. It is important to draw a “red line” that tracks capacity and to load balance or reprioritize the excess beyond available capacity.

The red line on your car’s tachometer shows you the speed at which your engine is likely to fail. Similarly, the red line on your work item distribution chart is the level at which your project or one of its components will overheat. It is the level beyond which you are at risk of not hitting the planned iteration exit with current find and fix rates.

Bear in mind that sometimes you will let your queues climb past the red line, just as a racecar driver lets the tachometer needle cross the red line for short periods. For example, to focus on increasing bug find rates, you might hold a “bug bash” where everyone on the team tests for a period and triage is deferred. Part of your plan just needs to be bringing the queues and red line back in balance. It’s important to manage your red line regularly and consciously.

What Happens During Triage

Triage is your opportunity to manage the red line in real time. During triage, you can use the Bugs by Priority report in VSTS to understand the current red line distribution and the query All Active Work Items to manage priorities and ownership.

Depending on your process, your triage group may be a committee, with a representative of each discipline, or it may be a select group. It is important that the owner be clearly designated (usually the project manager or test manager) and have ultimate responsibility for the resulting priority list.

The mechanics are very easy. VSTS lets you use a standard query (like the previous one) or a bound Excel worksheet to walk through the queues and edit priority, owner, and other appropriate fields.

Escalating and Resolving Issues

It is important not only to plan to capacity but also to remove any blocking issues that prevent team members from completing their tasks. In MSF for Agile Software Development, any work item can be escalated using the Issue field, and you can see these with a simple query called Issues. If it is not done elsewhere, the triage process should pick up all these issues and address them to unblock the corresponding work.

Iterations and Triage

One of the primary benefits of iterative development (described in Chapter 2, “Value-Up Processes”) is that you can plan for now and later, rather than now and never. Use the iterations to manage the quantity of bugs in the current queues and keep them off the red line.

Many project managers, especially in agile projects, believe that you should not allow any bugs to be carried forward. According to this view, any backlog of bugs greater than zero is bad, and those bugs should be fixed before any new feature work proceeds. This is certainly true of bugs caused by simple coding errors. These bugs are cheapest to catch in the development process with unit tests (see Chapter 6, “Development”). Unless you have specific reasons otherwise, the unit tests should pass before you check in code.

Unfortunately, however, not all bugs are caused by coding errors, and not all bugs are caught before check-in. Many bugs appear for subtler reasons of scenarios, QoS, architecture, configurations, data, and so on, and will not be caught by unit tests. In these cases, there may be good reasons to postpone bug fixes to the next iteration, and there will be nothing wrong with revisiting the priorities when you plan that iteration.

Triage Goals Early, Middle, and Late in the Iteration

As you enter an iteration, you need to be clear about how much time will be spent on the bug backlog and how much on other work items, and you need to re-triage bug priorities accordingly. As scenarios become available for testing, you may focus on finding bugs for a period, intentionally letting queues lengthen. Watch the find rates. As they stabilize and fall, even after you vary testing techniques (see Chapter 7), you need to be very clear about your fix capacity and triage the queues accordingly.

Triage Daily, Unless You Have a Good Reason Otherwise

Long lists are hard to manage. The easiest way to keep the incoming lists for triage short is to triage daily, especially during the middle and end of an iteration.

Satisfying the Auditor

It is now common for software teams to face regulatory requirements. For example, all financial systems of U.S. public companies are subject to audit under the Sarbanes-Oxley Act of 2002. Fortunately, VSTS automatically keeps an audit trail of all changes to work items and automatically links work items to all changes in source code and tests.

If you are subject to audit, you should enforce the association of work items to check-ins in order to make sure that the changes to source code are associated with work item changes. For the developer, this means nothing more than ticking a box at check-in. (See Figure 6.17.) With the policy on, the developer can’t forget without being warned and generating email notification to an appropriate team member.

image

Figure 4.15 Every work item, in this case a scenario, accumulates its full change history every time it is edited and saved. This gives you a complete audit trail.

image

Figure 4.16 Change sets, containing the code changes for each check-in, are automatically linked to the scenario or other work item so that you can see both what changed and why. These changes and associations flow through to the build reports, discussed in Chapter 6.

Summary

In this chapter, I covered the basics of project management with VSTS. First, I covered three value-up concepts that inform the activities: understanding variance, using descriptive rather than prescriptive metrics, and keeping multiple dimensions of project health in mind. Then I reviewed examples of several of the reports in VSTS and how to use them. Finally, we looked at applying these reports and work item queries to estimation and triage. I did not yet explore troubleshooting with these reports, as that will be covered in Chapter 9.

In the next chapter, I will look at the architecture and design of a solution.

Endnotes

1. L. Koskela and G. Howell, (2002), “The Underlying Theory of Project Management is Obsolete.” Proceedings of the PMI Research Conference, 2002, 293–302, available at http://www.leanconstruction.org/pdf/ObsoleteTheory.pdf.

2. W. Edwards Deming, The New Economics: For Industry, Government, Education, Second Edition (Cambridge: MIT Press, 1994), 174.

3. Ibid., 190 ff.

4. http://www.c2.com/cgi/wiki?YesterdaysWeather

5. Robert D. Austin, Measuring and Managing Performance in Organizations (New York: Dorset House, 1996), 15.

6. For example, Beck (2000), op. cit., 72–3.

7. Cumulative flow diagrams were introduced to software in Anderson 2004, op.cit., 61.

8. Ibid., 90 ff.

9. For example: Johanna Rothman, “What’s Your Fault Feedback Ratio?,” ComputerWorld (November 04, 2002), available from http://www.computerworld.com/developmenttopics/development/story/0,10801,75654,00.html.

10. http://www.auxetrain.org/Nav1.html

11. Tom DeMarco, 12. http://www.jrothman.com/pragmaticmanager.html

12. http://www.jrothman.com/pragmaticmanager.html

13. http://www.retrospectives.com/pages/retroPrimeDirective.html

14. http://www.etymonline.com/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.31.41