254 ◾ Simple Statistical Methods for Software Engineering
Structure of Pareto
Pareto is known as a fat-tailed distribution. Gaussian, exponential, and Pareto tails
are compared in Box 16.2. It is shown that Pareto has the largest tail.
A graph of the Pareto distribution is plotted in Figure 16.1. e probability of
usage of software features is the metric plotted in Figure 16.1. e distribution begins
from its mode and extends asymptotically to the right. e decline of usage is gradual.
e Pareto probability density function (PDF) depends on two parameters,
mode m and shape factor α. e equation to the PDF is shown as follows:
PDF =
+
α
α
α
m
1
(16.1)
e equation can be rewritten by marking the constant term separately and
bringing the variable term to the numerator, as follows:
f (x) = (αm
α
)x
−(α+1)
(16.2)
e equation is clearly a form of the power law with a negative exponential x
–b
.
Power law is one of the favorite curves used in data mining.
Pareto became famous by the Pareto Optimum in economics and the
Pareto distribution. In 1896, he found that the distribution of income does
not follow the normal distribution but is mostly inclined to the right side. His
discovery of the “distribution curve for wealth and incomes” of 1895 made
Pareto famous as a statistician.
e Pareto principle was named after him and built on observations of his
such as that 80% of the land in Italy was owned by 20% of the population.
Pareto was the first to realize that utility was a preference ordering. With this,
Pareto not only inaugurated modern microeconomics but also demolished the
alliance of economics and utilitarian philosophy. Pareto said “good” cannot be
measured. He replaced it with the notion of Pareto optimality, the idea that a sys-
tem is enjoying maximum economic satisfaction when no one can be made better
off without making someone else worse off. Pareto optimality is widely used in
welfare economics and game theory. A standard theorem is that a perfectly com-
petitive market creates distributions of wealth that are Pareto optimal.
His legacy as an economist was profound. Partly because of him, the field
evolved from a branch of social philosophy as practiced by Adam Smith
into a data-intensive field of scientific research and mathematical equations.
(http://en.wikipedia.org/wiki/Pareto_principle; http://en.wikipedia.org/wiki
/Pareto_distribution)