Chapter 1 - Data, Data Quality, and Descriptive Statistics (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8 ◾ Simple Statistical Methods for Software Engineering

of cumulative count of code written till date. e actual code written is plotted

alongside the plan in Figure 1.2. By visual reasoning upon the plot, one can guess

the time to nish the project.

Data must be transformed into charts, till then they do not enter deci-

sion space.

0.1

0.2

0.3

0.4

0.5

Communication

Schedule

Responsiveness

Quality

Cost

Attrition

Figure 1.1 Radar chart for project risks.

1000

2000

3000

4000

5000

6000

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

LOC

Plan cumulative

Actual cumulative

Figure 1.2 Cumulative count of code.

Data, Data Quality, and Descriptive Statistics ◾ 9

Even lower-scale data can be graphed. For example, a bar graph on discovered

defect types can be very instructive. Most categorical variables are plotted as bar

graphs and pie charts, and they make a lot of sense.

e graphs must be interpreted. A picture is worth a thousand words; but each one

needs a few words of explanation articulating the context and meaning. Commentaries

on graphs are rare; it may perhaps be assumed that truth is self-evident in the graphs.

However, it makes a huge dierence to add a line of comment to a graph.

Box 1.1 Show Me a Graph

is organization was dedicated to software maintenance. Every month,

a huge list of change requests are received. e operations manager found

“backlog” a burning issue. e backlog seemed to grow every month. After

due contemplation, he devised a simple management technique to address this

issue. He suggested a simple pie chart report at the end of every month. e

pie chart showed distribution of bugs according to the following category:

a. Bugs taken up—complex category

b. Bugs taken up—simple category

c. Bugs analyzed but found as nonissues

d. Bugs in queue—yet to be taken up

e. Bugs delivered

(a) Bugs taken

up—complex

category, 100,

(b) Bugs taken

up—simple

category, 400,

36%

analyzed but

found as non-

issues, 200,

13%

(d) Bugs in

queue—yet to

be taken up,

670, 43%

(e) Bugs

delivered, 200,

13%

e pie chart had a noteworthy consequence. e backlog queue dwin-

dled, and more bugs were xed monthly. Later, the manager happened to

10 ◾ Simple Statistical Methods for Software Engineering

Numerical Descriptive Statistics

(Numerical Summary of Data)

e numerical summary of data has a standard set of statistics. ere is a dier-

ence between data and statistic. Data are a result of measurement. Statistic is a

result of statistical processing of data. ere is a prerequisite for doing descriptive

statistics. We need a set of observations—a sample of data points—to prepare a

numerical summary. A few components should have been made or a few executions

of a process should have been made before we think of a numerical summary. is

constraint is not imposed on graphs. Data 1.1 presents the data sample and shows

the eort variance data in a typical development project.

What does the data mean? Quantitative reasoning begins with a statistical

inquiry into eort variance. What is the center of the process? What is the range of

the process? Is the process data symmetrical as one would expect, or is it skewed?

Does the process have a strong peak or is it at? e answers to such queries are for-

mally available in the form of some basic statistics. ese statistics have been com-

puted for the eort variance data using the Excel Data Analysis Tool “Descriptive

Statistics.” Data 1.2 presents the report from this tool.

know about “visual management” and ascribed success of the pie chart to

visual management.

e pie chart was so simple and yet so eective; it soon became a weekly

report and became very popular. e pie chart turned the company around.

Data 1.1 Effort

Variance Data (%)

20.0

12.4

18.0

30.0

5.0

12.0

15.0

0.4

−3.0

4.0

7.0

9.0

10.0

6.0

Data, Data Quality, and Descriptive Statistics ◾ 11

ere are fourteen basic “statistics” in the table. We can add the kth largest and

kth smallest values to this list by ticking o the options in the tool. Denitions of

these statistics are presented in Appendix 1.1.

Box 1.2 power of TaBle

Managing software development is a complex task. A manager applied

data-driven management in a novel manner to make his task easy. He

identied 12 milestones and selected the data he needed to collect for

each milestone for eective management. at led him to design a data

table with 12 rows and 10 columns. e data columns included dates, size

defects, eort, and pertinent feature numbers. e milestones coincided

with deliveries, and the data table came to be called the milestone table.

With this simple table, he realized he could manage a project almost of

any size and duration. He also found extra bandwidth to manage many

more projects simultaneously. His teams never missed milestones because

he took milestone level data seriously and reviewed the results objectively

and with precision. His projects were often delivered on time, with quality

and within budget.

Data 1.2 Descriptive Statistics of Effort Variance

Data Descriptive Statistics

20.0

12.4

Mean 10.41429

18.0

Standard error 2.278081

30.0

Median 9.5

5.0

Mode N/A

12.0

Standard deviation 8.5238

15.0

Sample variance 72.65516

0.4

Kurtosis 0.908722

–3.0

Skewness 0.720095

4.0

Range 33

7.0

Minimum –3

9.0

Maximum 30

10.0

Sum 145.8

6.0

Count 14

Conﬁdence level (95.0%) 4.921496

12 ◾ Simple Statistical Methods for Software Engineering

Special Statistics

A few special statistics are explained in later chapters. Standard error is described in

Chapter 13. Condence interval is described in Chapter 21. Percentiles, quartiles,

and interquartile range are explained in Chapter 4. We can assemble our preferred

statistics into the descriptive statistics banner.

Three Categories of Descriptive Statistics

e simple and most commonly used descriptive statistics can be divided into three

categories and analyzed more deeply:

Central tendency (discussed in Chapter 2)

Dispersion (discussed in Chapter 3)

Tukey’s ve-point summary (discussed in Chapter 4)

Such a deeper exploration might be viewed as part of exploratory data analysis.

Case Study: Interpretation of Effort

Variance Descriptive Statistics

Let us look at the descriptive statistics of eort variance data provided in Data 1.2.

e number of data points is 14. We would have preferred more than 30 data points

Data Table Project Name

Customer Ref.

1 Start Architecture

2 Package 1 F1–F5

3 Package 2 F6–F20

4 Package 3 F21–F40

5 Package 4 F41–F50

6 Package 5 F51–F67

7 Package 6 F68–F73

8 Package 7 F74–F85

9 Package 8 F86–F91

10 Package 9 F92–F100

11 Package 10 F101–F104

12 End Integration

UAT Defects

Test Defects

Test Eﬀort

Review Eﬀort

Dev Eﬀort

Finish DT

Start DT

Features

Delivery

Milestone

1 2 3 4 5 6 7 8 9 10

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1 - Data, Data Quality, and Descriptive Statistics (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1 - Data, Data Quality, and Descriptive Statistics (2/4)