Chapter 16 - The Law of Life: Pareto Distribution

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

253

Chapter 16

The Law of Life:

Pareto Distribution—

80/20 Aphorism

Pareto distribution is a fat-tailed skewed distribution invented by Vifredo Pareto. A

brief biography of Pareto is given in Box 16.1. e distribution was originally used to

describe wealth distribution in society. Larger wealth is controlled by fewer people.

Box 16.1 Vilfredo Pareto—the economist

Who discoVered management (1848–1923)

Vilfredo Pareto was an Italian sociologist, engineer, economist, philosopher,

political scientist, and mathematician.

Between 1859 and 1864, Vilfredo changed schools several times. From 1864

to 1867, Vilfredo studied mathematics and physics at the Università di Torino.

In 1869, he earned a doctor’s degree in engineering from what is now the

Polytechnic University of Turin. His dissertation was titled “e Fundamental

Principles of Equilibrium in Solid Bodies.” His later interest in equilibrium

analysis in economics and sociology can be traced back to this paper.

After his studies, Pareto worked for some years at the Italian Railway

Company and traveled to Germany, England, Belgium, Switzerland, and

Austria. In the ﬁeld of statistics, Pareto worked for insurances and the calcu-

lation of pensions.

254 ◾ Simple Statistical Methods for Software Engineering

Structure of Pareto

Pareto is known as a fat-tailed distribution. Gaussian, exponential, and Pareto tails

are compared in Box 16.2. It is shown that Pareto has the largest tail.

A graph of the Pareto distribution is plotted in Figure 16.1. e probability of

usage of software features is the metric plotted in Figure 16.1. e distribution begins

from its mode and extends asymptotically to the right. e decline of usage is gradual.

e Pareto probability density function (PDF) depends on two parameters,

mode m and shape factor α. e equation to the PDF is shown as follows:

PDF =

(16.1)

e equation can be rewritten by marking the constant term separately and

bringing the variable term to the numerator, as follows:

f (x) = (αm

−(α+1)

(16.2)

e equation is clearly a form of the power law with a negative exponential x

–b

Power law is one of the favorite curves used in data mining.

Pareto became famous by the Pareto Optimum in economics and the

Pareto distribution. In 1896, he found that the distribution of income does

not follow the normal distribution but is mostly inclined to the right side. His

discovery of the “distribution curve for wealth and incomes” of 1895 made

Pareto famous as a statistician.

e Pareto principle was named after him and built on observations of his

such as that 80% of the land in Italy was owned by 20% of the population.

Pareto was the ﬁrst to realize that utility was a preference ordering. With this,

Pareto not only inaugurated modern microeconomics but also demolished the

alliance of economics and utilitarian philosophy. Pareto said “good” cannot be

measured. He replaced it with the notion of Pareto optimality, the idea that a sys-

tem is enjoying maximum economic satisfaction when no one can be made better

oﬀ without making someone else worse oﬀ. Pareto optimality is widely used in

welfare economics and game theory. A standard theorem is that a perfectly com-

petitive market creates distributions of wealth that are Pareto optimal.

His legacy as an economist was profound. Partly because of him, the ﬁeld

evolved from a branch of social philosophy as practiced by Adam Smith

into a data-intensive ﬁeld of scientiﬁc research and mathematical equations.

(http://en.wikipedia.org/wiki/Pareto_principle; http://en.wikipedia.org/wiki

/Pareto_distribution)

The Law of Life ◾ 255

Box 16.2 a story of tails

e Gaussian tail dies soon. e exponential tail stretches longer but is lim-

ited. e Pareto tail, resulting from the power law, is unlimited. We can

compare the standard forms of these three tail equations:

Gaussian, standard form =

x−

Exponential, standard form = e

−x

Pareto, standard form = x

−1

In the previously mentioned expressions, scale factor = 1 and location = 0.

If we check the value of tails at x = 6, we ﬁnd

Gaussian tail = 0.0000000152

Exponential tail = 0.00248

Pareto tail = 0.167

At x = 6, the Gaussian tail is nearly zero, and the exponential tail is

162,755 times bigger. In turn, the Pareto tail is 67 times stronger than the

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10 12

Number of features

m = 1

α = 1.2

Usage probability

14 16 18 20 22

Figure 16.1 Pareto distribution of features usage.

256 ◾ Simple Statistical Methods for Software Engineering

e cumulative distribution function is shown in Figure 16.2. e y-axis

directly reads usage probability, while the x-axis reads the number of features.

Using this model, we can ﬁnd quickly the usage probability of n number of features

in a software product.

e equation to cumulative distribution is rather simple and is shown as follows:

exponential. For larger values of x, divergence among the three tails increases

further. e Gaussian tail will be dead, the exponential tail will slide toward

zero, and the Pareto tail will still have signiﬁcant values for a long distance.

ese three tails represent three aspects of engineering and management.

Gaussian is drawn to its center; its body is accentuated and its tail attenuated,

a true model of process behavior. e Gaussian tails are either process defects

or rejection areas.

Exponential curve represents decay or defects in a product. ere seem to

be special mechanisms in a product that cause decay or vulnerabilities that

cause defects. By deﬁnition, exponential tail represents failure, not perfor-

mance of products.

Pareto is often a model for external factors that inﬂuence a product or a

process from outside the organization.

Business comprises eﬀects represented by these three tails.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

2 4 6 8 10

80/20 Law

Number of features

Usage probability

14 16 18 20 22

m = 1

α = 1.2

Figure 16.2 Cumulative Pareto distribution of features usage.

The Law of Life ◾ 257

CDF = −













(16.3)

It may be noted that the previously mentioned equations are deﬁned for values

of x greater than mode m.

Key statistics of the distribution are given as follows:

Mean =

−

(16.4)

Median = m2

(16.5)

e mean is deﬁned for values of shape factor α > 1.

An Example

A Pareto model has been established with mode m = 1 and shape factor α = 1.2 in

Data 16.1. e mean for this model turns out to be 6 while the median is 1.8. e

fact that the mean is so far away from the median explains a model skew. e mean

has shifted toward the tail. e PDF and cumulative distribution function (CDF)

have computed and the values are shown in Data 16.1. Pareto calculations are easy

and can be managed with basic Excel.

The 80/20 Law: Vital Few and Trivial Many

e CDF shown in Figure 16.2 allows us to think of the famous 80/20 due to

Pareto. It may be seen that 20% of features have 80% usage probability. is is a

basic principle used in statistical testing. is model is also called the operational

proﬁle of the product. ere are many 80/20 laws that rule life. A brief list is given

in Box 16.3.

e 80/20 law depicts the phenomenon of “vital few and trivial many.” Illes-

Seifert and Paech [1] have analyzed application of this principle to software defects.

ey report,

e distribution of about 430 defects over about 500 modules has been

analysed and conﬁrms the Pareto Principle, i.e. approximately 80% of

the defects were contained in 20% of the modules.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 16 - The Law of Life: Pareto Distribution—80/20 Aphorism (1/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 16 - The Law of Life: Pareto Distribution—80/20 Aphorism (1/3)