6
Fuzzy Statistics

The uncertainty characterizing decisions and indeed the process of decision‐making on the basis of statistical reasoning can be traced, in many cases, to the lack of imprecise information coming from vagueness in the data considered. In this sense, fuzzy set theory comes to the forefront and, as it is plausibly expected, it plays a prominent role. Before inserting fuzziness, we will present a brief review of random variables (also known as stochastic variables) and their properties as well as the classical statistical notions of point estimation, interval estimation, hypothesis testing, and regression. Then, their fuzzy analogues will be introduced.

6.1 Random Variables

A mapping from the set images of possible outcomes (sample points) of an experiment to a subset of real numbers images is a random variable. A rigorous definition of a random variable is the following:

Some remarks are in order at this point:

  1. a random variable is not a variable in the usual sense, but a function with the domain images and the range images;
  2. the random variable images may be undefined or infinite for a subset of images with zero probability;
  3. the mapping images (i.e. the sample value for a sample point images) must be such that
    equation
    for images, where images is an event for a fixed sample value images, for all images. In fact, one can similarly define the events
    equation
    for images, or
    equation
    for images, or even
    equation
    for images (see, e.g. [133, 233]).

It is possible to assign probabilities corresponding to the aforesaid events, for example,

equation

and so on. Now, let us define the distribution function of a random variable:

The cumulative distribution function has to satisfy the following properties:

  1. images,
  2. images, for images,
  3. images,
  4. images,
  5. images,

with the second property showing that images is a nondecreasing function and the fifth property pointing out its continuity on the right.

When the range of images is finite or countably infinite, then the random variable is discrete and a discrete probability distribution (known as probability mass function) images can be defined assigning a certain probability to each value in the range of images, that is, images for each sample value images. The probability mass function has to satisfy the following properties:

  1. images, images,
  2. images for images,
  3. images.

Then, the cumulative distribution function images of a discrete random variable is given by

equation

In the case of an uncountably infinite range, images is a continuous random variable and, if it has a first derivative that is piecewise continuous and exists everywhere except possibly at a finite number of points, then a probability density function can be defined:

equation

which can be integrated in order to find the probability. The probability density function has to satisfy the following properties:

  1. images,
  2. images,
  3. images.

Then, the cumulative distribution function images of a continuous random variable images is given by

equation

Now, one can determine the expectation value (or mean) images of a random variable images:

equation

Based upon the above relations, the images th moment images of a random variable images can be introduced:

equation

Clearly, the first (images) moment of images is its expectation value images.

Finally, the concepts of variance and standard deviation of a random variable images can be defined:

equation

which for a discrete random variable becomes

equation

while for a continuous random variable, one obtains

equation

In fact, by expanding the expression for images, one gets the useful formula for the variance of a random variable images:

equation

Finally, the positive square root of images yields the standard deviation, images, of a random variable images.

There are many important distributions for random variables, most notably the binomial distribution, the Poisson distribution, and the normal (Gaussian) distribution (see, e.g. [233]). At this point, one last remark concerning the so‐called conditional distributions is deemed necessary. Following the definition of the conditional probability of an event images given event images:

Again, one can distinguish between a discrete and a continuous random variable. Thus, for a discrete random variable, the above equation yields for the conditional probability mass function

equation

while in the case of a continuous random variable, we have for the conditional probability density function

equation

6.2 Fuzzy Random Variables

In Section 6.1, we have reviewed classical random variables and their basic properties. In order to handle fuzzy data or observations, one can introduce fuzzy‐valued random variables to grasp vagueness (see, e.g. [78] for a short review), in other words, randomness and vagueness are now allowed to appear simultaneously. In the literature, one can find different approaches to the notion of a fuzzy random variable, most notably the (mathematically equivalent) definitions introduced by Erich Peter Klement et al. [176] and Madan L. Puri, and Dan A. Ralescu [245], Huibert Kwakernaak [186, 187], and Rudolf Kruse and Klaus Dieter Meyer [183]. In what follows, we will adopt the Kruse–Meyer approach in which a fuzzy random variable is studied as a fuzzy perception/observation of a classical real‐valued random variable. This approach is actually a combination of the other authors' considerations. However, we stress that we are not going to introduce the notions of expected value, variance, and distribution function for fuzzy random variables. The reader is referred to [183] for a detailed presentation.

First, let us see what is meant by perception/observation in this context. Assume a measurable space images with images denoting the set of all possible outcomes of a random experiment, images a σ‐algebra of subsets of images1 and images the Borel‐measurable space (see footnote in page 241). Further, let the probability space (images) with images a set function defining a probability measure on the space images. Suppose that the results of the random experiment are described by images which assigns a random value images to each random choice. Then, images is a random variable.

The perception/observation of the aforementioned random variable means that for each images, we can investigate whether images for some images, where images. Now, if we examine another mapping, say images with images, then we associate with each images not a real number images as in the case of ordinary random variables, but a set images called random set (see, e.g. [210] for a detailed study of random sets).2 The random variable images of which images is a perception is called an original of images. In general, given a random set images, the corresponding true original images is not known, we only have a possible set of originals. If no further information is available, then each random variable images with images, for all images, is a possible candidate for being the original.

In other words, a fuzzy random variable is a (fuzzy) perception of an unknown random variable images, with images being a possible original of images.

Now, an interesting theorem (see Ref. [183] for a proof) describing the behavior of fuzzy random variables is the following:

Now, we come to the definition of the moment of a fuzzy random variable. Let the fuzzy random variable images, and images. Then, the images ‐th moment of images with respect to the original images is defined as

In practice, to compute the expected value for a series of results given by random experimental results that are in the class of fuzzy sets images, images, one proceeds as follows:

Given a finite probability space images, the corresponding probabilities images, images for the imagess, respectively, and the fuzzy random variable images, then the expected value is calculated as

equation

and the fuzzy number images is the fuzzy expectation value of images if images, for all images.

Next, we consider the concept of variance of a fuzzy random variable with respect to images, whereby we will use the notion of the moment of a fuzzy random variable.

Finally, we come to the definition of the distribution function of a fuzzy random variable. First, we need the notion of a normal set representation of a fuzzy set.

With the help of this definition, we come to the notion of the distribution function:

Finally, two fuzzy random variables images and images are identically distributed if images, images and images, images are identically distributed for all images, while images and images are independent if each variable from the set

equation

is independent from each variable from the set

equation

(see Ref. [183]). Furthermore, images is a normal (or Gaussian) fuzzy random sample of size images if all the images, images, are independent and identically distributed (iid) normal fuzzy random variables, whereby images is a normal fuzzy random variable when images with images denoting the extended operation of addition, and images is a normal (not fuzzy) random variable with zero mean and variance images, so that images [122].

6.3 Point Estimation

Classical statistical analysis is based on random variables, point estimations, statistical hypotheses, and so on. The first major question encountered in statistical inference concerns the point estimation for one of more unknown parameters. Just to give an idea, a point estimator estimates a parameter by giving a specific numerical value. Thus, for example, the best point estimate of the population mean images is the calculated sample mean images. So the general question becomes how can we choose an estimator on a sample of a fixed size taken on a random variable with a probability density function containing one or more unknown parameters, in order to have a best estimate of that parameter?

Let us formulate the classical problem as it is encountered in statistical inference theory and, in fact, best considered as a problem of decision theory: Let a random variable images with a probability density function images, where images, images are unknown parameters. Then, given the value images of a random sample images of size images from the population images, we ask, based on this value, for the estimation (“best guess”) of the parameters images. If the estimation of the parameters is given as a single value, then we speak of a point estimation, otherwise, we refer to an interval estimation. In this sense, an interval estimation of a parameter gives an estimation of the parameter as an interval or a range of values.

In the present section, we shall examine the point estimation. Let images be a point estimation of the parameter images, images. In fact, this estimation is but a decision images, so that images. This decision depends on the parameter images, and it is a function of the value images of the random sample images, so images. Further, images is a value of the function images. The latter is called the estimator function (or decision function) or simply estimator of the parameter images and, indeed, the process of finding the point estimation of the parameters images amounts to finding the estimators images of these parameters. In fact, the finding of the estimator images depends on the properties of this function and, usually, the determination of these properties leads to the way of finding the estimator. In this sense, we can seek for various kinds of point estimators, such as the sufficient, the unbiased, the consistent, the efficient, or the maximum likelihood estimator. For more details see Ref. [161] or [191] where a more advanced approach to point estimators is provided.

In what follows, we choose to briefly present three, very commonly used, classical point estimators, the unbiased estimator, the consistent estimator, and the maximum likelihood estimator.

6.3.1 The Unbiased Estimator

Suppose we have a random sample images from a population images and the point estimator images. Then, the following theorem holds

Consequently, if it is possible to find an estimator images of the parameter images that has very small bias images and images, then the mean squared error images will be correspondingly small. Naturally, one desires a zero bias, images, or images. So we come to the following definition:

6.3.2 The Consistent Estimator

Again, suppose that we have a random sample images from a population images and let an estimator of images be images, whereby the index images denotes the quantity of the sample. Intuitively speaking, a good estimator is one for which the so‐called risk function images will decrease with increasing images. Let us briefly introduce the notion of the aforementioned risk function.

We know from decision theory that the estimators images (i.e. the decision rules) of the parameters images for the values images of the random sample images establish a mapping of the sample space to the decision space. The wrong choice of the estimators produces a loss or cost, expressing the difference between estimated and true values. This loss is quantified by the so‐called loss function images and its expected value is the aforementioned risk function:

equation

Hence, good estimators are those which minimize the risk function.

So, suppose we have a sequence of estimators images with images for the parameter images, generated by images for images. Then, we demand

equation

and if the risk function is the mean squared error images, then we have the following theorem:

Now, having said all that, we finally come to the following definition:

As an example, one can readily show that for a random sample images from a population with the normal distribution images, images is a consistent estimator of the population variance images.

6.3.3 The Maximum Likelihood Estimator

First, we shall present the definition of the likelihood function:

Apparently, when images is a random sample from the population images, then the likelihood function of this sample is

equation

Now, we can define the maximum likelihood estimator as follows:

When the likelihood function contains only one parameter images and is differentiable w.r.t. images, then the maximum likelihood estimator images of images is the solution of the equation

equation

In fact, often the equation

equation

is used in applications, since images and images are maximized for the same values of images.

In the case where the likelihood function images depends on more than one parameters images, then the maximum likelihood estimators images are found as the solution of the system of equations

equation

or

equation

6.4 Fuzzy Point Estimation

We start with the problem of fuzzy point estimation that generalizes what has been presented in Section 6.3. First of all, we should point out that often fuzzy point estimation is considered as more basic than fuzzy interval estimation, since the latter can be determined by the point estimations of the lower and upper boundaries of the interval. In fact, if the point estimation is characterized by very low confidence, it can be relaxed to an interval estimation (for an application see, e.g. [7]).

Now, let us adopt the more formal approach to fuzzy point estimation and suppose we have a fuzzy random sample, that is, a fuzzy random vector, images. A fuzzy parameter for our fuzzy point estimation is a perception of the unknown parameter we seek.

So we come to the following definition of the fuzzy point estimation:

As we have done with classical point estimation, we can distinguish between different kinds of fuzzy point estimators.

Similarly, when we desire to obtain a parameter estimation by using a sequence of fuzzy random variables, then we have the following:

Concerning the maximum likelihood estimator presented in Section 6.3.3 due to its simplicity and the consequent usefulness in statistical inference, its implementation in the realm of fuzzy data is fairly complex in most practical situations. Indeed, one “classical” approach is to consider a maximum likelihood estimation of a desired parameter as the crisp value that maximizes the probability of observing the fuzzy data (see, e.g. [59]). We shall not go into details, however, we point out that there have been more efficient ways to handle the aforementioned difficulty and the reader is referred, for instance, to the so‐called “expectation–maximization algorithm” (see Ref. [95] and references therein).

A final, worth noticing, comment is deemed proper at this point. It concerns the interesting and different approach to the problem of fuzzy point estimation presented in [3] by introducing the methods of the fuzzy uniformly minimum variance unbiased estimation and the fuzzy Bayesian estimation, both based on the notions of the Yao‐Wu signed distance and the images‐metric. When the fuzzy random variables become crisp random variables, the aforementioned methods are reduced to the classical uniformly minimum variance unbiased estimation and the Bayesian estimation.

6.5 Interval Estimation

In order to determine an interval of plausible values for an unknown sample population parameter, the notion of interval estimation is used in classical statistics. In other words, the unknown parameters are estimated (notably, but not only, these parameters are the mean and the variance of the population) as an interval or as an entire range of numerical values within which the aforesaid parameter is estimated to lie. One of the most prevalent forms of interval estimation is the frequentist approach known as confidence interval (see, e.g. [215]) that was introduced by the Polish mathematician Jerzy Neyman [227] in 1937.

Let a random sample images of size images with the value images. This value consists in measured data characterized, in general, by uncertainties expressed as errors that have been neglected in the previously presented process of finding a point estimator images. So, in order to increase the reliability of the estimate, one ought to take into account these errors and give the point estimator images in some interval, say, images, images. Let us see how this can be realized in our case. Suppose that our random sample has a continuous distribution images with an unknown mean images and a known variance images. Now, it is a well‐known fact from classical statistics that if images is a random variable with the distribution images, then for the random variable defined as

equation

where images, we can always determine the distribution images, where images. For large samples, images is the standard normal distribution (i.e. the Gaussian distribution with images and images), as it is inferred by the Central Limit Theorem (see Ref. [161]).

Now, for two arbitrary values images and images of images, we have

equation

or

equation

Here, images denotes the confidence level (also known as significance level).3 The interval defined by the inequality

equation

is called confidence interval of the parameter images with the probability images. The length of this interval is the confidence length or width:

equation

The confidence length depends mainly on the size of the sample but also on its variability as well as on the confidence level. One always desires the confidence length to be the least possible. Obviously, this can be achieved by minimizing the difference images. In fact, for a constant images, it can be rather easily shown that the difference images gets its minimum value when images.

6.6 Interval Estimation for Fuzzy Data

The study of the notion of the fuzzy interval or confidence estimation has started mainly in the 1980s by Roger A. McCain [212] and begun to escalate after 2000 with a plethora of results (see, e.g. [169] for more information on the historical development and on various approaches to the problem). Here, we have chosen to present a rather practical method presented by Norberto Corral and Maria Ángeles Gil [79] for the construction of an interval estimation of an unknown parameter images for a given sample fuzzy information. One has to stress that, although the method yields rather crude confidence intervals (the probability of the parameter being within the estimated interval may be larger than the confidence level), it has the great advantage of being always applicable, that is, for any membership function and any class of distribution functions.

Suppose that we have a random experiment images with probability space images, where images denotes the set of all possible outcomes of a random experiment, images being a σ‐algebra of subsets of images, and images defining a probability measure. Let images be the parameter whose value lies in an interval of the experiment. Further, suppose that the set of all fuzzy observations defines our fuzzy information. Let us first recall some basic definitions:

It is assumed that the increased sampling from images will not yield a precise observation but a sample fuzzy information:

Now, based on Definition 6.6.3, we can proceed to a formal definition of the confidence interval:

Suppose that we have a sample fuzzy information images with

equation

its support and images the size of the sample. Then, the following theorem provides a way to determine a images‐confidence interval for the parameter images:

(For a proof of this theorem, see Ref. [79]).

6.7 Hypothesis Testing

Very often, samples of measurement data can be interpreted by a priori assuming a structure (i.e. a specific distribution) of the measurement results and then apply certain statistical tests to determine the probability of the initial assumption (hypothesis) being true or not. In other words, a statistical hypothesis is the assumption about a population parameter, and it is an important part of empirical evidence‐based research (see, e.g. [192] for a general overview).

The procedure starts with the examination of a random sample of the population considered. First, it is assumed either that the sample data are the result of pure chance (null hypothesis, denoted as images) or they are sufficiently affected by some nonrandom influence (alternative hypothesis, denoted as images). Then, an appropriate statistical test is chosen in order to assess the truth of the null hypothesis. In the next step, one determines the probability that the given data would occur when images is assumed. This probability is called the imagesvalue (or imageslevel), and it is used to interpret the result obtained. The smaller the imagesvalue, the stronger is the evidence against images. In other words, the images‐value is a measure of how likely the data would be observed if images were true. Finally, one compares the calculated images‐value with the selected significance or confidence level images (see Section 6.3). If images, then the observed influence is statistically significant, images must be rejected, and images holds. If images, then images is true.

Now, concerning the magnitude of the images‐value, one of two types of error may appear. When the images‐value is small, there is a possibility that images is true, but an unlikely event has been measured (type I error or false positive) and we have incorrectly rejected images, while if the images‐value is large, there is a possibility that images is false, but an unlikely event has been measured (type II error or false negative), and we have incorrectly accepted images. The probability of making a type I error, that is, the probability of rejecting images, given that it is true, is images (i.e. the significance level), while the probability of making a type II error, that is, the probability of accepting images, given that images is true, is denoted by images. A usual way out of this apparent impasse is provided by the demand for independent verification of the data.

In practice, in order to find when the null hypothesis can be rejected or not, the concept of the test function can be used. To this purpose, suppose that we have the null hypothesis images, and let images be the acceptance set of images on a images significance level. Then, with the set images we have

equation

so that

equation

In other words, any set of acceptance on a images significance level yields a confidence set images on the confidence level images. The confidence set lets us conclude, for each images, whether the null hypothesis images should be accepted or rejected on the images significance level for the measured images. Indeed, we can define a test function images:

The following example borrowed from [62] illustrates very clearly the use of the test function.

6.8 Fuzzy Hypothesis Testing

Starting in the 1980s with the work of María Rosa Casals et al. [59], a rather large amount of work has been published in the field of fuzzy statistics (see Ref. [282] for a review and references therein). More generally, statistical hypothesis testing in a fuzzy environment was taken up by Przemysław Grzegorzewski and Olgierd Hryniewicz [151]. The problem of fuzzy hypothesis testing has been linked to the notion of the images‐value described in Section 6.7 by Glen Meeden and Siamak Noorbaloochi [214], who instead of determining a null hypothesis and an alternative hypothesis have given a reformulation of the problem “as the problem of estimating the membership function of the set of good or useful or interesting parameter points.” Here, we shall present a different approach introduced by Jalal Chachi et al. [62], who alternatively, in the form of six steps, have given a constructive method to connect fuzzy hypothesis testing with confidence intervals.

Now, let us assume that both the hypothesis parameter images and the confidence interval are fuzzy [61], that is, images and images are, respectively, the fuzzy parameter value (according to what we have said in Section (6.4), images is considered as a fuzzy perception of images) and the degree of hypothesis acceptance that depends on images, then we have for the set of the parameter values for which the tested hypothesis is accepted

equation

while

equation

for the set of the parameter values for which the tested hypothesis is rejected, with images the fuzzy confidence interval. The hypothesis to be tested is the null hypothesis images against the alternative images for observational data with unknown fuzzy mean images but known variance images, so that images. To that purpose, we must find the degrees of acceptability for images and images and the “Chachi–Taheri–Viertl algorithm” is codified as follows (see Ref. [62] and the nice numerical examples therein):

  1. Convert the hypothesis to be tested to a set of crisp problems on (for images) the fuzzy parameter. Then, for each images‐level for the samples images and images, one must solve at the confidence level images the classical hypothesis testing problems where the fuzzy parameters are images, images.
  2. Determine the images confidence intervals images and images for the crisp parameters images and images, respectively, for each images.
  3. Test the hypotheses (6.1) and (6.2) through the examination of the images confidence intervals images and images to see whether they contain images and images, respectively. Here, the corresponding test functions are
    equation
    and
    equation
  4. Gather the results in the third case to get a fuzzy confidence interval in order to proceed to the construction of a fuzzy test on the basis of the membership degree of each fuzzy parameter images in the fuzzy confidence interval. To this purpose, it is necessary to group the images‐values for which the null hypotheses (6.1) and (6.2) are accepted or rejected. This grouping can be performed by making a graph of images vs. the images confidence intervals so that every confidence interval has images as its height, and then determine confidence bounds from their intersection (obviously, images is the maximal height). Then, the membership function of the fuzzy parameter images is compared with the confidence bound obtained. From this comparison, the images‐values for which the null hypotheses are accepted or rejected are found.
  5. Construct the fuzzy set images by applying the method given in [61]. This set is a fuzzy confidence interval for the fuzzy parameter images.
  6. Construct the fuzzy test function
    equation
    for the fuzzy random sample images. Evidently, the above fuzzy test function images is described by a fuzzy set leading to the acceptance (with degree of acceptance images) of the tested null hypothesis, or to its rejection with degree images.

At this point, we must stress that if images is between zero and one so that it is not absolutely clear what to do, then necessarily one has to make a subjective “fuzzy” decision on the acceptance or rejection of the null hypothesis. In such a case, as it is expected, the null hypothesis is accepted, the more the values of images tend to one, and rejected, the more they tend to zero. Naturally, the most difficult decision to be made is when the value of images is images.

6.9 Statistical Regression

Regression analysis is a statistical methodology for the estimation of the conditionally expected value of a dependent random variable images (the response variable) given one or more independent nonrandom variables images (the predictor variables) and one or more involved unknown parameters images to be estimated from the data, in other words, we have images. According to the linearity of the parameters images in the function images, one can build linear or nonlinear regression models to fit the measured data. If one has images pairs of data points images, images and images, where images is the length of the vector of the unknown parameters images, the regression model is underdetermined and images cannot be specified. If images and the function images is assumed linear, then the regression equation images can be exactly solved, i.e. to find images one can solve a images quadratic system which has one unique solution provided the imagess are linearly independent. Finally, if images, which is the main problem in regression analysis, one can estimate values for the imagess that best fit the data. The method of least squares belongs to the latter case. Further, the performance of regression analysis depends on the chosen process for collecting and measuring the data and the assumptions made in this process (for a more detailed account of regression see, e.g. [106]).

As an example, let us consider the most simple linear regression model. It involves two‐dimensional data points (given, say, in Cartesian coordinates) and contains one independent and one dependent variable. So let us assume that the model function is linear in images and of first order in images with the regression equation

where the unknowns are images, images, images and images is a random error term denoting the deviation from the true line. Usually, the distribution of the random errors is modeled as a normal distribution with zero mean. In order to estimate the parameters, we shall apply the method of least squares and images, images will denote the estimated by the given data values of images, images. Then, the predicted value of images, denoted by images, is obtained as

that is, a straight‐line fit. Now, based on the images pairs of data points, we can write (6.3) as

and we have

equation

The estimated values images and images generate the least possible value of images. For their calculation, we start with the derivatives

equation

from which it follows that

equation

respectively. From these equations, we get

equation

respectively. Hence,

equation

From these equations, one readily obtains the estimated parameter images:

equation

By denoting the mean as images and images, the latter can be written in the more usual form

equation

while the other estimated parameter is found to be

equation

Therefore, (6.4) becomes now

equation

Evidently, for images the last equation yields images, so images is a point of the fitted line.

The estimates of the errors images are given by images, thus, we have

equation

from which it follows that

equation

However, one should point out that, in practical applications, the rounding of the measured data always results in a nonzero error.

6.10 Fuzzy Regression

Simply put, statistical regression as presented in Section 6.9 is based on crisp random errors, while fuzzy regression is based on fuzzy errors. The difference between these two kinds of uncertainty has led to the necessity of extending statistical regression for the case of a fuzzy environment. This extension started in 1982 with the pioneering work of Hideo Tanaka et al. [283], who applied the methodology of linear programming to develop a fuzzy linear regression model. In fact, Tanaka's approach constitutes one of two main approaches in fuzzy regression, namely the so‐called possibilistic regression analysis that relies on the notion of possibility, and the approach known as fuzzy least squares method that aims at the minimalization of the errors between the observed and the estimated data.

Returning to Tanaka's approach, it must be stressed that it can be applied only to linear functions. However, it is very simple in its computational implementation, while the fuzzy least squares method has an advantage compared to Tanaka's approach on that it keeps the degree of fuzziness between observed and estimated results to a minimum (see, e.g. [103] and references therein for some critiques on Tanaka's model). This possibilistic regression model is based on the idea of fuzziness minimization through a minimization of the total support of the regression fuzzy coefficients [258], subject to the inclusion of all the observed data.

The basic form of the model is the general linear function

equation

with images the fuzzy response, images, images the fuzzy coefficients (or parameters), and images the components of a nonfuzzy input vector. The imagess are assumed to be triangular fuzzy numbers and the fuzzy coefficients are characterized by a membership function images. As an example, suppose that we have a fuzzy relationship between the variables (i.e. fuzzy‐dependent variables), while the observed data are crisp. Further, the triangular fuzzy numbers are assumed symmetric. Then, the membership function of the imagesth coefficient can be defined as

equation

where images is the mean value of the fuzzy number images, images is the mode (center value of images), and images is the spread (width around the center value). The choice of symmetric triangular fuzzy numbers assures that the structure of the model depends only on the data involved in the determination of the upper and lower bounds, while other data points do not play any role. So we have [258]

equation

Thus, we obtain

equation

Now, if the support is just enough to contain all the sample's data points, then the confidence in an out‐of‐sample projection is limited unless one extends the support. To this purpose, one chooses a value images (called the imagescertain factor) of images with the images‐line cutting the two sides of the triangle graph of images at points with coordinates images on the images‐axis and images, images denoting “left” and “right” to images, respectively. The interval images on the images‐axis is the feasible data interval. This images‐certain factor extends, by controlling the size of this data interval, the support of the membership function. In fact, the increase of images leads to the increase of images and images.

When the observed data are fuzzy, the application of the aforementioned images‐certain factor is also possible. Assuming that the observed fuzzy result can be described by a symmetric triangular fuzzy number images, where images is the mode (center value) and images is the spread, then the actual data points belong to the interval images. So the optimization of the model requires the minimization of the spread (width around the center value),

equation

Then, the observed fuzzy data that are adjusted for the images‐certain factor, are contained in the estimated fuzzy result [258]:

equation
equation

Thus, the increase of the images‐certain factor extends the confidence interval and, consequently, the probability for out‐of‐sample values to be covered by the model.

Let us now examine the fuzzy least‐squares regression. Remembering (6.5) from Section 6.9 one has, in a fuzzy environment,

equation

and one has to optimize

equation

The most commonly used approach for this is the method of distance measures that was introduced by Phil M. Diamond [98]. Namely, by defining a measure of the distance images between two triangular fuzzy numbers images, images:

equation

with the model written as

one has to optimize

equation

Assuming images, it follows for the distance from (6.6)

equation

while a similar expression can be derived for images.

A images system of equations yields the parameters of images, images (see, e.g. [86] and references therein for an implementation of the above algorithm).

Exercises

The following exercises are inspired by examples presented in [41].

  1. 6.1 Let the fuzzy random vector images, images, images be independent and identically distributed (iid) from the normal distribution images, where images is the unknown mean. Suppose there is a random sample of size images, and by performing an experiment, we observe that images. Test the null hypothesis images against the alternative hypothesis images at the significance level images.

  2. 6.2 Suppose we have a random variable images with the Poisson probability mass function. The probability for images is given by images, images, images. The fuzzy Poisson probability mass function images is obtained when images is replaced by the positive fuzzy number images. Let images and images. Find the images‐cut of the fuzzy probability images, images.
  3. 6.3 Let the random variable images with the normal probability density function images and a random sample images from images with sample size images and a mean equal to 50. Find the fuzzy estimator images as a triangular fuzzy number.
  4. 6.4 Let the random variable images with the normal probability density function images and a random sample images from images with sample size images. Find an unbiased fuzzy estimator images for the variance.
  5. 6.5 An experiment is performed and the following ten crisp data pairs images are measured:
    images images images
    1 20 27
    2 24 44
    3 22 38
    4 18 30
    5 8 21
    6 4 26
    7 32 38
    8 14 30
    9 30 40
    10 11 19

    Conclude whether there is any additional information missing in order to find the fuzzy coefficients images, images, of the fuzzy linear regression model for this data set. Then, by assuming that the necessary missing information is known, determine the fuzzy coefficients images.

Notes

  1. 1   Assume that images is a set. Then, a σ‐algebra images is a nonempty collection of subsets of images such that the following hold:
    1. images is in images;
    2. if images is in images, then so is the complement of images; and
    3. if images is a sequence of elements of images, then the union of the elements images is in images.
  2. 2   A random set is a Borel‐measurable function from images to the set of all nonempty and compact subsets of images.
  3. 3   In the literature, it is often the case that the confidence level is designated by images when images is the significance level. Furthermore, usually the desired confidence level is chosen prior to examining the measurements. Very often, the 95% confidence level is applied. However, attention should be paid to the following common misunderstanding: a 95% confidence level does not mean that 95% of the sample data lie within the confidence interval.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
100.24.20.141