8 Simple Statistical Methods for Software Engineering
of cumulative count of code written till date. e actual code written is plotted
alongside the plan in Figure 1.2. By visual reasoning upon the plot, one can guess
the time to nish the project.
Data must be transformed into charts, till then they do not enter deci-
sion space.
0
0.1
0.2
0.3
0.4
0.5
Communication
Schedule
Responsiveness
Quality
Cost
Attrition
Figure 1.1 Radar chart for project risks.
0
1000
2000
3000
4000
5000
6000
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
LOC
Plan cumulative
Actual cumulative
Figure 1.2 Cumulative count of code.
Data, Data Quality, and Descriptive Statistics 9
Even lower-scale data can be graphed. For example, a bar graph on discovered
defect types can be very instructive. Most categorical variables are plotted as bar
graphs and pie charts, and they make a lot of sense.
e graphs must be interpreted. A picture is worth a thousand words; but each one
needs a few words of explanation articulating the context and meaning. Commentaries
on graphs are rare; it may perhaps be assumed that truth is self-evident in the graphs.
However, it makes a huge dierence to add a line of comment to a graph.
Box 1.1 Show Me a Graph
is organization was dedicated to software maintenance. Every month,
a huge list of change requests are received. e operations manager found
backlog” a burning issue. e backlog seemed to grow every month. After
due contemplation, he devised a simple management technique to address this
issue. He suggested a simple pie chart report at the end of every month. e
pie chart showed distribution of bugs according to the following category:
a. Bugs taken upcomplex category
b. Bugs taken upsimple category
c. Bugs analyzed but found as nonissues
d. Bugs in queue—yet to be taken up
e. Bugs delivered
(a) Bugs taken
up—complex
category, 100,
6%
(b) Bugs taken
up—simple
category, 400,
36%
(c) Bugs
analyzed but
found as non-
issues, 200,
13%
(d) Bugs in
queue—yet to
be taken up,
670, 43%
(e) Bugs
delivered, 200,
13%
e pie chart had a noteworthy consequence. e backlog queue dwin-
dled, and more bugs were xed monthly. Later, the manager happened to
10 Simple Statistical Methods for Software Engineering
Numerical Descriptive Statistics
(Numerical Summary of Data)
e numerical summary of data has a standard set of statistics. ere is a dier-
ence between data and statistic. Data are a result of measurement. Statistic is a
result of statistical processing of data. ere is a prerequisite for doing descriptive
statistics. We need a set of observationsa sample of data pointsto prepare a
numerical summary. A few components should have been made or a few executions
of a process should have been made before we think of a numerical summary. is
constraint is not imposed on graphs. Data 1.1 presents the data sample and shows
the eort variance data in a typical development project.
What does the data mean? Quantitative reasoning begins with a statistical
inquiry into eort variance. What is the center of the process? What is the range of
the process? Is the process data symmetrical as one would expect, or is it skewed?
Does the process have a strong peak or is it at? e answers to such queries are for-
mally available in the form of some basic statistics. ese statistics have been com-
puted for the eort variance data using the Excel Data Analysis Tool “Descriptive
Statistics.” Data 1.2 presents the report from this tool.
know about “visual managementand ascribed success of the pie chart to
visual management.
e pie chart was so simple and yet so eective; it soon became a weekly
report and became very popular. e pie chart turned the company around.
Data 1.1 Effort
Variance Data (%)
20.0
12.4
18.0
30.0
5.0
12.0
15.0
0.4
−3.0
4.0
7.0
9.0
10.0
6.0
Data, Data Quality, and Descriptive Statistics 11
ere are fourteen basic “statistics” in the table. We can add the kth largest and
kth smallest values to this list by ticking o the options in the tool. Denitions of
these statistics are presented in Appendix 1.1.
Box 1.2 power of TaBle
Managing software development is a complex task. A manager applied
data-driven management in a novel manner to make his task easy. He
identied 12 milestones and selected the data he needed to collect for
each milestone for eective management. at led him to design a data
table with 12 rows and 10 columns.e data columns included dates, size
defects, eort, and pertinent feature numbers. e milestones coincided
with deliveries, and the data table came to be called the milestone table.
With this simple table, he realized he could manage a project almost of
any size and duration. He also found extra bandwidth to manage many
more projects simultaneously. His teams never missed milestones because
he took milestone level data seriously and reviewed the results objectively
and with precision. His projects were often delivered on time, with quality
and within budget.
Data 1.2 Descriptive Statistics of Effort Variance
Data Descriptive Statistics
20.0
12.4
Mean 10.41429
18.0
Standard error 2.278081
30.0
Median 9.5
5.0
Mode N/A
12.0
Standard deviation 8.5238
15.0
Sample variance 72.65516
0.4
Kurtosis 0.908722
–3.0
Skewness 0.720095
4.0
Range 33
7.0
Minimum –3
9.0
Maximum 30
10.0
Sum 145.8
6.0
Count 14
Confidence level (95.0%) 4.921496
12 Simple Statistical Methods for Software Engineering
Special Statistics
A few special statistics are explained in later chapters. Standard error is described in
Chapter 13. Condence interval is described in Chapter 21. Percentiles, quartiles,
and interquartile range are explained in Chapter 4. We can assemble our preferred
statistics into the descriptive statistics banner.
Three Categories of Descriptive Statistics
e simple and most commonly used descriptive statistics can be divided into three
categories and analyzed more deeply:
Central tendency (discussed in Chapter 2)
Dispersion (discussed in Chapter 3)
Tukeys ve-point summary (discussed in Chapter 4)
Such a deeper exploration might be viewed as part of exploratory data analysis.
Case Study: Interpretation of Effort
Variance Descriptive Statistics
Let us look at the descriptive statistics of eort variance data provided in Data 1.2.
e number of data points is 14. We would have preferred more than 30 data points
Data Table Project Name
Customer Ref.
1 Start Architecture
2 Package 1 F1–F5
3 Package 2 F6–F20
4 Package 3 F21–F40
5 Package 4 F41–F50
6 Package 5 F51–F67
7 Package 6 F68–F73
8 Package 7 F74–F85
9 Package 8 F86–F91
10 Package 9 F92–F100
11 Package 10 F101–F104
12 End Integration
UAT Defects
Test Defects
Test Effort
Review Effort
Dev Effort
Finish DT
Start DT
Features
Delivery
Milestone
1 2 3 4 5 6 7 8 9 10
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.138.89