The uncertainty characterizing decisions and indeed the process of decision‐making on the basis of statistical reasoning can be traced, in many cases, to the lack of imprecise information coming from vagueness in the data considered. In this sense, fuzzy set theory comes to the forefront and, as it is plausibly expected, it plays a prominent role. Before inserting fuzziness, we will present a brief review of random variables (also known as stochastic variables) and their properties as well as the classical statistical notions of point estimation, interval estimation, hypothesis testing, and regression. Then, their fuzzy analogues will be introduced.
A mapping from the set of possible outcomes (sample points) of an experiment to a subset of real numbers is a random variable. A rigorous definition of a random variable is the following:
Some remarks are in order at this point:
It is possible to assign probabilities corresponding to the aforesaid events, for example,
and so on. Now, let us define the distribution function of a random variable:
The cumulative distribution function has to satisfy the following properties:
with the second property showing that is a nondecreasing function and the fifth property pointing out its continuity on the right.
When the range of is finite or countably infinite, then the random variable is discrete and a discrete probability distribution (known as probability mass function) can be defined assigning a certain probability to each value in the range of , that is, for each sample value . The probability mass function has to satisfy the following properties:
Then, the cumulative distribution function of a discrete random variable is given by
In the case of an uncountably infinite range, is a continuous random variable and, if it has a first derivative that is piecewise continuous and exists everywhere except possibly at a finite number of points, then a probability density function can be defined:
which can be integrated in order to find the probability. The probability density function has to satisfy the following properties:
Then, the cumulative distribution function of a continuous random variable is given by
Now, one can determine the expectation value (or mean) of a random variable :
Based upon the above relations, the th moment of a random variable can be introduced:
Clearly, the first () moment of is its expectation value .
Finally, the concepts of variance and standard deviation of a random variable can be defined:
which for a discrete random variable becomes
while for a continuous random variable, one obtains
In fact, by expanding the expression for , one gets the useful formula for the variance of a random variable :
Finally, the positive square root of yields the standard deviation, , of a random variable .
There are many important distributions for random variables, most notably the binomial distribution, the Poisson distribution, and the normal (Gaussian) distribution (see, e.g. [233]). At this point, one last remark concerning the so‐called conditional distributions is deemed necessary. Following the definition of the conditional probability of an event given event :
Again, one can distinguish between a discrete and a continuous random variable. Thus, for a discrete random variable, the above equation yields for the conditional probability mass function
while in the case of a continuous random variable, we have for the conditional probability density function
In Section 6.1, we have reviewed classical random variables and their basic properties. In order to handle fuzzy data or observations, one can introduce fuzzy‐valued random variables to grasp vagueness (see, e.g. [78] for a short review), in other words, randomness and vagueness are now allowed to appear simultaneously. In the literature, one can find different approaches to the notion of a fuzzy random variable, most notably the (mathematically equivalent) definitions introduced by Erich Peter Klement et al. [176] and Madan L. Puri, and Dan A. Ralescu [245], Huibert Kwakernaak [186, 187], and Rudolf Kruse and Klaus Dieter Meyer [183]. In what follows, we will adopt the Kruse–Meyer approach in which a fuzzy random variable is studied as a fuzzy perception/observation of a classical real‐valued random variable. This approach is actually a combination of the other authors' considerations. However, we stress that we are not going to introduce the notions of expected value, variance, and distribution function for fuzzy random variables. The reader is referred to [183] for a detailed presentation.
First, let us see what is meant by perception/observation in this context. Assume a measurable space with denoting the set of all possible outcomes of a random experiment, a σ‐algebra of subsets of 1 and the Borel‐measurable space (see footnote in page 241). Further, let the probability space () with a set function defining a probability measure on the space . Suppose that the results of the random experiment are described by which assigns a random value to each random choice. Then, is a random variable.
The perception/observation of the aforementioned random variable means that for each , we can investigate whether for some , where . Now, if we examine another mapping, say with , then we associate with each not a real number as in the case of ordinary random variables, but a set called random set (see, e.g. [210] for a detailed study of random sets).2 The random variable of which is a perception is called an original of . In general, given a random set , the corresponding true original is not known, we only have a possible set of originals. If no further information is available, then each random variable with , for all , is a possible candidate for being the original.
In other words, a fuzzy random variable is a (fuzzy) perception of an unknown random variable , with being a possible original of .
Now, an interesting theorem (see Ref. [183] for a proof) describing the behavior of fuzzy random variables is the following:
Now, we come to the definition of the moment of a fuzzy random variable. Let the fuzzy random variable , and . Then, the ‐th moment of with respect to the original is defined as
In practice, to compute the expected value for a series of results given by random experimental results that are in the class of fuzzy sets , , one proceeds as follows:
Given a finite probability space , the corresponding probabilities , for the s, respectively, and the fuzzy random variable , then the expected value is calculated as
and the fuzzy number is the fuzzy expectation value of if , for all .
Next, we consider the concept of variance of a fuzzy random variable with respect to , whereby we will use the notion of the moment of a fuzzy random variable.
Finally, we come to the definition of the distribution function of a fuzzy random variable. First, we need the notion of a normal set representation of a fuzzy set.
With the help of this definition, we come to the notion of the distribution function:
Finally, two fuzzy random variables and are identically distributed if , and , are identically distributed for all , while and are independent if each variable from the set
is independent from each variable from the set
(see Ref. [183]). Furthermore, is a normal (or Gaussian) fuzzy random sample of size if all the , , are independent and identically distributed (iid) normal fuzzy random variables, whereby is a normal fuzzy random variable when with denoting the extended operation of addition, and is a normal (not fuzzy) random variable with zero mean and variance , so that [122].
Classical statistical analysis is based on random variables, point estimations, statistical hypotheses, and so on. The first major question encountered in statistical inference concerns the point estimation for one of more unknown parameters. Just to give an idea, a point estimator estimates a parameter by giving a specific numerical value. Thus, for example, the best point estimate of the population mean is the calculated sample mean . So the general question becomes how can we choose an estimator on a sample of a fixed size taken on a random variable with a probability density function containing one or more unknown parameters, in order to have a best estimate of that parameter?
Let us formulate the classical problem as it is encountered in statistical inference theory and, in fact, best considered as a problem of decision theory: Let a random variable with a probability density function , where , are unknown parameters. Then, given the value of a random sample of size from the population , we ask, based on this value, for the estimation (“best guess”) of the parameters . If the estimation of the parameters is given as a single value, then we speak of a point estimation, otherwise, we refer to an interval estimation. In this sense, an interval estimation of a parameter gives an estimation of the parameter as an interval or a range of values.
In the present section, we shall examine the point estimation. Let be a point estimation of the parameter , . In fact, this estimation is but a decision , so that . This decision depends on the parameter , and it is a function of the value of the random sample , so . Further, is a value of the function . The latter is called the estimator function (or decision function) or simply estimator of the parameter and, indeed, the process of finding the point estimation of the parameters amounts to finding the estimators of these parameters. In fact, the finding of the estimator depends on the properties of this function and, usually, the determination of these properties leads to the way of finding the estimator. In this sense, we can seek for various kinds of point estimators, such as the sufficient, the unbiased, the consistent, the efficient, or the maximum likelihood estimator. For more details see Ref. [161] or [191] where a more advanced approach to point estimators is provided.
In what follows, we choose to briefly present three, very commonly used, classical point estimators, the unbiased estimator, the consistent estimator, and the maximum likelihood estimator.
Suppose we have a random sample from a population and the point estimator . Then, the following theorem holds
Consequently, if it is possible to find an estimator of the parameter that has very small bias and , then the mean squared error will be correspondingly small. Naturally, one desires a zero bias, , or . So we come to the following definition:
Again, suppose that we have a random sample from a population and let an estimator of be , whereby the index denotes the quantity of the sample. Intuitively speaking, a good estimator is one for which the so‐called risk function will decrease with increasing . Let us briefly introduce the notion of the aforementioned risk function.
We know from decision theory that the estimators (i.e. the decision rules) of the parameters for the values of the random sample establish a mapping of the sample space to the decision space. The wrong choice of the estimators produces a loss or cost, expressing the difference between estimated and true values. This loss is quantified by the so‐called loss function and its expected value is the aforementioned risk function:
Hence, good estimators are those which minimize the risk function.
So, suppose we have a sequence of estimators with for the parameter , generated by for . Then, we demand
and if the risk function is the mean squared error , then we have the following theorem:
Now, having said all that, we finally come to the following definition:
As an example, one can readily show that for a random sample from a population with the normal distribution , is a consistent estimator of the population variance .
First, we shall present the definition of the likelihood function:
Apparently, when is a random sample from the population , then the likelihood function of this sample is
Now, we can define the maximum likelihood estimator as follows:
When the likelihood function contains only one parameter and is differentiable w.r.t. , then the maximum likelihood estimator of is the solution of the equation
In fact, often the equation
is used in applications, since and are maximized for the same values of .
In the case where the likelihood function depends on more than one parameters , then the maximum likelihood estimators are found as the solution of the system of equations
or
We start with the problem of fuzzy point estimation that generalizes what has been presented in Section 6.3. First of all, we should point out that often fuzzy point estimation is considered as more basic than fuzzy interval estimation, since the latter can be determined by the point estimations of the lower and upper boundaries of the interval. In fact, if the point estimation is characterized by very low confidence, it can be relaxed to an interval estimation (for an application see, e.g. [7]).
Now, let us adopt the more formal approach to fuzzy point estimation and suppose we have a fuzzy random sample, that is, a fuzzy random vector, . A fuzzy parameter for our fuzzy point estimation is a perception of the unknown parameter we seek.
So we come to the following definition of the fuzzy point estimation:
As we have done with classical point estimation, we can distinguish between different kinds of fuzzy point estimators.
Similarly, when we desire to obtain a parameter estimation by using a sequence of fuzzy random variables, then we have the following:
Concerning the maximum likelihood estimator presented in Section 6.3.3 due to its simplicity and the consequent usefulness in statistical inference, its implementation in the realm of fuzzy data is fairly complex in most practical situations. Indeed, one “classical” approach is to consider a maximum likelihood estimation of a desired parameter as the crisp value that maximizes the probability of observing the fuzzy data (see, e.g. [59]). We shall not go into details, however, we point out that there have been more efficient ways to handle the aforementioned difficulty and the reader is referred, for instance, to the so‐called “expectation–maximization algorithm” (see Ref. [95] and references therein).
A final, worth noticing, comment is deemed proper at this point. It concerns the interesting and different approach to the problem of fuzzy point estimation presented in [3] by introducing the methods of the fuzzy uniformly minimum variance unbiased estimation and the fuzzy Bayesian estimation, both based on the notions of the Yao‐Wu signed distance and the ‐metric. When the fuzzy random variables become crisp random variables, the aforementioned methods are reduced to the classical uniformly minimum variance unbiased estimation and the Bayesian estimation.
In order to determine an interval of plausible values for an unknown sample population parameter, the notion of interval estimation is used in classical statistics. In other words, the unknown parameters are estimated (notably, but not only, these parameters are the mean and the variance of the population) as an interval or as an entire range of numerical values within which the aforesaid parameter is estimated to lie. One of the most prevalent forms of interval estimation is the frequentist approach known as confidence interval (see, e.g. [215]) that was introduced by the Polish mathematician Jerzy Neyman [227] in 1937.
Let a random sample of size with the value . This value consists in measured data characterized, in general, by uncertainties expressed as errors that have been neglected in the previously presented process of finding a point estimator . So, in order to increase the reliability of the estimate, one ought to take into account these errors and give the point estimator in some interval, say, , . Let us see how this can be realized in our case. Suppose that our random sample has a continuous distribution with an unknown mean and a known variance . Now, it is a well‐known fact from classical statistics that if is a random variable with the distribution , then for the random variable defined as
where , we can always determine the distribution , where . For large samples, is the standard normal distribution (i.e. the Gaussian distribution with and ), as it is inferred by the Central Limit Theorem (see Ref. [161]).
Now, for two arbitrary values and of , we have
or
Here, denotes the confidence level (also known as significance level).3 The interval defined by the inequality
is called confidence interval of the parameter with the probability . The length of this interval is the confidence length or width:
The confidence length depends mainly on the size of the sample but also on its variability as well as on the confidence level. One always desires the confidence length to be the least possible. Obviously, this can be achieved by minimizing the difference . In fact, for a constant , it can be rather easily shown that the difference gets its minimum value when .
The study of the notion of the fuzzy interval or confidence estimation has started mainly in the 1980s by Roger A. McCain [212] and begun to escalate after 2000 with a plethora of results (see, e.g. [169] for more information on the historical development and on various approaches to the problem). Here, we have chosen to present a rather practical method presented by Norberto Corral and Maria Ángeles Gil [79] for the construction of an interval estimation of an unknown parameter for a given sample fuzzy information. One has to stress that, although the method yields rather crude confidence intervals (the probability of the parameter being within the estimated interval may be larger than the confidence level), it has the great advantage of being always applicable, that is, for any membership function and any class of distribution functions.
Suppose that we have a random experiment with probability space , where denotes the set of all possible outcomes of a random experiment, being a σ‐algebra of subsets of , and defining a probability measure. Let be the parameter whose value lies in an interval of the experiment. Further, suppose that the set of all fuzzy observations defines our fuzzy information. Let us first recall some basic definitions:
It is assumed that the increased sampling from will not yield a precise observation but a sample fuzzy information:
Now, based on Definition 6.6.3, we can proceed to a formal definition of the confidence interval:
Suppose that we have a sample fuzzy information with
its support and the size of the sample. Then, the following theorem provides a way to determine a ‐confidence interval for the parameter :
(For a proof of this theorem, see Ref. [79]).
Very often, samples of measurement data can be interpreted by a priori assuming a structure (i.e. a specific distribution) of the measurement results and then apply certain statistical tests to determine the probability of the initial assumption (hypothesis) being true or not. In other words, a statistical hypothesis is the assumption about a population parameter, and it is an important part of empirical evidence‐based research (see, e.g. [192] for a general overview).
The procedure starts with the examination of a random sample of the population considered. First, it is assumed either that the sample data are the result of pure chance (null hypothesis, denoted as ) or they are sufficiently affected by some nonrandom influence (alternative hypothesis, denoted as ). Then, an appropriate statistical test is chosen in order to assess the truth of the null hypothesis. In the next step, one determines the probability that the given data would occur when is assumed. This probability is called the ‐value (or ‐level), and it is used to interpret the result obtained. The smaller the ‐value, the stronger is the evidence against . In other words, the ‐value is a measure of how likely the data would be observed if were true. Finally, one compares the calculated ‐value with the selected significance or confidence level (see Section 6.3). If , then the observed influence is statistically significant, must be rejected, and holds. If , then is true.
Now, concerning the magnitude of the ‐value, one of two types of error may appear. When the ‐value is small, there is a possibility that is true, but an unlikely event has been measured (type I error or false positive) and we have incorrectly rejected , while if the ‐value is large, there is a possibility that is false, but an unlikely event has been measured (type II error or false negative), and we have incorrectly accepted . The probability of making a type I error, that is, the probability of rejecting , given that it is true, is (i.e. the significance level), while the probability of making a type II error, that is, the probability of accepting , given that is true, is denoted by . A usual way out of this apparent impasse is provided by the demand for independent verification of the data.
In practice, in order to find when the null hypothesis can be rejected or not, the concept of the test function can be used. To this purpose, suppose that we have the null hypothesis , and let be the acceptance set of on a significance level. Then, with the set we have
so that
In other words, any set of acceptance on a significance level yields a confidence set on the confidence level . The confidence set lets us conclude, for each , whether the null hypothesis should be accepted or rejected on the significance level for the measured . Indeed, we can define a test function :
The following example borrowed from [62] illustrates very clearly the use of the test function.
Starting in the 1980s with the work of María Rosa Casals et al. [59], a rather large amount of work has been published in the field of fuzzy statistics (see Ref. [282] for a review and references therein). More generally, statistical hypothesis testing in a fuzzy environment was taken up by Przemysław Grzegorzewski and Olgierd Hryniewicz [151]. The problem of fuzzy hypothesis testing has been linked to the notion of the ‐value described in Section 6.7 by Glen Meeden and Siamak Noorbaloochi [214], who instead of determining a null hypothesis and an alternative hypothesis have given a reformulation of the problem “as the problem of estimating the membership function of the set of good or useful or interesting parameter points.” Here, we shall present a different approach introduced by Jalal Chachi et al. [62], who alternatively, in the form of six steps, have given a constructive method to connect fuzzy hypothesis testing with confidence intervals.
Now, let us assume that both the hypothesis parameter and the confidence interval are fuzzy [61], that is, and are, respectively, the fuzzy parameter value (according to what we have said in Section (6.4), is considered as a fuzzy perception of ) and the degree of hypothesis acceptance that depends on , then we have for the set of the parameter values for which the tested hypothesis is accepted
while
for the set of the parameter values for which the tested hypothesis is rejected, with the fuzzy confidence interval. The hypothesis to be tested is the null hypothesis against the alternative for observational data with unknown fuzzy mean but known variance , so that . To that purpose, we must find the degrees of acceptability for and and the “Chachi–Taheri–Viertl algorithm” is codified as follows (see Ref. [62] and the nice numerical examples therein):
At this point, we must stress that if is between zero and one so that it is not absolutely clear what to do, then necessarily one has to make a subjective “fuzzy” decision on the acceptance or rejection of the null hypothesis. In such a case, as it is expected, the null hypothesis is accepted, the more the values of tend to one, and rejected, the more they tend to zero. Naturally, the most difficult decision to be made is when the value of is .
Regression analysis is a statistical methodology for the estimation of the conditionally expected value of a dependent random variable (the response variable) given one or more independent nonrandom variables (the predictor variables) and one or more involved unknown parameters to be estimated from the data, in other words, we have . According to the linearity of the parameters in the function , one can build linear or nonlinear regression models to fit the measured data. If one has pairs of data points , and , where is the length of the vector of the unknown parameters , the regression model is underdetermined and cannot be specified. If and the function is assumed linear, then the regression equation can be exactly solved, i.e. to find one can solve a quadratic system which has one unique solution provided the s are linearly independent. Finally, if , which is the main problem in regression analysis, one can estimate values for the s that best fit the data. The method of least squares belongs to the latter case. Further, the performance of regression analysis depends on the chosen process for collecting and measuring the data and the assumptions made in this process (for a more detailed account of regression see, e.g. [106]).
As an example, let us consider the most simple linear regression model. It involves two‐dimensional data points (given, say, in Cartesian coordinates) and contains one independent and one dependent variable. So let us assume that the model function is linear in and of first order in with the regression equation
where the unknowns are , , and is a random error term denoting the deviation from the true line. Usually, the distribution of the random errors is modeled as a normal distribution with zero mean. In order to estimate the parameters, we shall apply the method of least squares and , will denote the estimated by the given data values of , . Then, the predicted value of , denoted by , is obtained as
that is, a straight‐line fit. Now, based on the pairs of data points, we can write (6.3) as
and we have
The estimated values and generate the least possible value of . For their calculation, we start with the derivatives
from which it follows that
respectively. From these equations, we get
respectively. Hence,
From these equations, one readily obtains the estimated parameter :
By denoting the mean as and , the latter can be written in the more usual form
while the other estimated parameter is found to be
Therefore, (6.4) becomes now
Evidently, for the last equation yields , so is a point of the fitted line.
The estimates of the errors are given by , thus, we have
from which it follows that
However, one should point out that, in practical applications, the rounding of the measured data always results in a nonzero error.
Simply put, statistical regression as presented in Section 6.9 is based on crisp random errors, while fuzzy regression is based on fuzzy errors. The difference between these two kinds of uncertainty has led to the necessity of extending statistical regression for the case of a fuzzy environment. This extension started in 1982 with the pioneering work of Hideo Tanaka et al. [283], who applied the methodology of linear programming to develop a fuzzy linear regression model. In fact, Tanaka's approach constitutes one of two main approaches in fuzzy regression, namely the so‐called possibilistic regression analysis that relies on the notion of possibility, and the approach known as fuzzy least squares method that aims at the minimalization of the errors between the observed and the estimated data.
Returning to Tanaka's approach, it must be stressed that it can be applied only to linear functions. However, it is very simple in its computational implementation, while the fuzzy least squares method has an advantage compared to Tanaka's approach on that it keeps the degree of fuzziness between observed and estimated results to a minimum (see, e.g. [103] and references therein for some critiques on Tanaka's model). This possibilistic regression model is based on the idea of fuzziness minimization through a minimization of the total support of the regression fuzzy coefficients [258], subject to the inclusion of all the observed data.
The basic form of the model is the general linear function
with the fuzzy response, , the fuzzy coefficients (or parameters), and the components of a nonfuzzy input vector. The s are assumed to be triangular fuzzy numbers and the fuzzy coefficients are characterized by a membership function . As an example, suppose that we have a fuzzy relationship between the variables (i.e. fuzzy‐dependent variables), while the observed data are crisp. Further, the triangular fuzzy numbers are assumed symmetric. Then, the membership function of the th coefficient can be defined as
where is the mean value of the fuzzy number , is the mode (center value of ), and is the spread (width around the center value). The choice of symmetric triangular fuzzy numbers assures that the structure of the model depends only on the data involved in the determination of the upper and lower bounds, while other data points do not play any role. So we have [258]
Thus, we obtain
Now, if the support is just enough to contain all the sample's data points, then the confidence in an out‐of‐sample projection is limited unless one extends the support. To this purpose, one chooses a value (called the ‐certain factor) of with the ‐line cutting the two sides of the triangle graph of at points with coordinates on the ‐axis and , denoting “left” and “right” to , respectively. The interval on the ‐axis is the feasible data interval. This ‐certain factor extends, by controlling the size of this data interval, the support of the membership function. In fact, the increase of leads to the increase of and .
When the observed data are fuzzy, the application of the aforementioned ‐certain factor is also possible. Assuming that the observed fuzzy result can be described by a symmetric triangular fuzzy number , where is the mode (center value) and is the spread, then the actual data points belong to the interval . So the optimization of the model requires the minimization of the spread (width around the center value),
Then, the observed fuzzy data that are adjusted for the ‐certain factor, are contained in the estimated fuzzy result [258]:
Thus, the increase of the ‐certain factor extends the confidence interval and, consequently, the probability for out‐of‐sample values to be covered by the model.
Let us now examine the fuzzy least‐squares regression. Remembering (6.5) from Section 6.9 one has, in a fuzzy environment,
and one has to optimize
The most commonly used approach for this is the method of distance measures that was introduced by Phil M. Diamond [98]. Namely, by defining a measure of the distance between two triangular fuzzy numbers , :
with the model written as
one has to optimize
Assuming , it follows for the distance from (6.6)
while a similar expression can be derived for .
A system of equations yields the parameters of , (see, e.g. [86] and references therein for an implementation of the above algorithm).
The following exercises are inspired by examples presented in [41].
6.1 Let the fuzzy random vector , , be independent and identically distributed (iid) from the normal distribution , where is the unknown mean. Suppose there is a random sample of size , and by performing an experiment, we observe that . Test the null hypothesis against the alternative hypothesis at the significance level .
1 | 20 | 27 |
2 | 24 | 44 |
3 | 22 | 38 |
4 | 18 | 30 |
5 | 8 | 21 |
6 | 4 | 26 |
7 | 32 | 38 |
8 | 14 | 30 |
9 | 30 | 40 |
10 | 11 | 19 |
Conclude whether there is any additional information missing in order to find the fuzzy coefficients , , of the fuzzy linear regression model for this data set. Then, by assuming that the necessary missing information is known, determine the fuzzy coefficients .
100.24.20.141