and the pseudo-metric of this form is called the Markov metric.

When the matrix M is the inverse of the covariance matrix, the Gaussian distribution of the quadratic form can be described using (5.12). If the matrix M is a semi-definite matrix, (5.10) and (5.12) can be thought of as a generalized form of the Euclidean distance, and when M is an identity matrix, it can be used to calculate the Euclidean distance.

As can be seen from (5.12), the Markov distance metric can be obtained using the matrix L or the matrix M, and L can uniquely determine M. Using a rotation (which does not affect the calculated distance), the matrix M gives the matrix L. This means that two different methods can be used to learn the distance metric – the first is to estimate the linear transformation matrix L, and the second is to estimate the positive semidefinite matrix M. Note that there is no restriction on the former method, but the latter requires a positive semidefinite matrix.

5.5.2Dynamic fuzzy K-nearest neighbour algorithm

Although the K-nearest neighbour (KNN) algorithm [56] is the oldest and simplest method in pattern classification, it is more effective than some current methods when there is explicit prior knowledge [57, 58]. KNN classifies each unlabelled data point according to most of the KNNs in the training set, and the classification effect depends largely on the distance measure to identify similar neighbours. In the absence of prior knowledge, most KNN classification methods use the Euclidean distance to measure the similarity among the sample data, but the Euclidean distance cannot obtain any statistical information from the data.

One problem of KNN is that the usefulness of each training sample in the training set is treated the same when classifying the sample data. Thus, when the sample sets overlap, they are often difficult to classify; on the other hand, after classifying a kind of sample, the extent to which the sample belongs to the class is not specified. In this section, we set dynamic fuzzy theory and the Mahalanobis distance into the KNN algorithm to form a dynamic fuzzy KNN (DFKNN) algorithm. It is expected that this algorithm will be able to deal with dynamic fuzzy problems better.

For a given sample set {(x1,x1),(x2,x2),...,(xn,xn)} consisting of n samples, if they are divided into c categories {(C1,C1),(C2,C2),(Cc,Cc)}, then the membership of the k th sample (xk,xk) belonging to the i th class (Ci,Ci) is denoted by (Cik,Cik)=(Ci(xk),Ci(xk)), and the following conditions are satisfied:

j=1c(cik,cik)=(1,1),(0,0)<k=1m(Cik,Cik)<(n,n),(Cik,Cik)[(0,0),(1,1)](5.13)

Rather than classify the samples directly, DFKNN is used to calculate the class membership of the sub-class samples. In addition, the membership of classified samples provides a certain degree of confidence in the classification results. For example, if the membership with respect to a sample class is (0.9,0.9) and the membership of the other two categories is (0.05,0.05) then we can be reasonably certain that the sample belongs to the class of membership (0.9,0.9). However, if a sample belongs to one class with membership (0.55,0.55), and to a third category with membership (0.45,0.45), to a second category with membership (0.1,0.1), then, because membership to the first two categories is almost equivalent, we cannot determine the sample category. It is reasonable to believe that this sample does not belong to the third category, but the final determination of this sample’s category requires further study. Obviously, we are confident that the introduction of membership is conducive to the improvement of classification results.

Assuming that the labelled sample set (X,X)={(x1,x1),(x2,x2),...,(xn,xn)} is composed of n labelled samples, that sample (x,x) has membership (Ci(x),Ci(x)) of the i th class (Ci,Cj) and that the j th sample in the sample tag set has membership (Cij,Cij) of the i th class, then the DFKNN algorithm is described as follows:

Algorithm 5.3 DFKNN

Enter an unknown class tag sample (x,x).

Set the value of k, and k meets 1 ≤ kn.

Initialize i = 1.

Do until (find k neighbors of the sample (x,x) to be classified)

Calculate the Mahalanobis distance of a sample (x,x) to be classified into a sample (xi,xi).

If (ik) then

The samples (xi,xi) are added to the k-nearest neighbor set of the samples (x,x) to be classified.

Else if (The sample (xi,xi) is closer to the nearest neighbor than the sample (x,x) to be classified) then

Delete the farthest neighbor of the k nearest neighbor of the sample (x,x) to be classified;

Add the samples (xi,xi) to the k-nearest neighbor set of the samples (x,x) to be classified.

End if

End do until Initialize i = 1.

Do until (Samples (x,x) to be classified belonging to each category membership is completed)

Using (5.14) to calculate (Ci(x),Ci(x));

Self-increment i.

End until

(ci(x),ci(x))=j=1k((cij,cij)((1,1)/dM((x,x),(xi,xj))2/m1))j=1k((1,1)/dM(x,x),(xj,xj))2/m1)(5.14)

dM((x,x),(z,z))=((x,x)(z,z))TM((x,x)(z,z))(5.15)

Among above, dM(⋅,⋅) is the Mahalanobis distance can be calculated using (5.15), and the matrix M is a positive semidefinite matrix, and how to get this positive definite matrix will be described in the following sections.

From (5.14), it can be seen that the membership degree of sample (x,x) is related to the reciprocal of the Mahalanobis distance to the nearest neighbour and the membership degree of these neighbours. If the sample to be classified is closer to its neighbours, then the reciprocal of the distance has a greater influence on the calculation of its membership; otherwise, its influence is slightly lower.

When calculating the membership, the variable m in (5.14) gives the weight of the distance directly. If m = 2, then the weight is the reciprocal of the distance; as m gradually becomes larger, the distance will gradually lose its influence; when m is close to 1, closer neighbourhood points are more important than farther neighbouring points, so the number of effective neighbourhood points will be greatly reduced when calculating the membership degree. In the following example, we use m = 2, but the experiment shows that the impact on the error rate is not large regardless of the value of m.

There are many ways to calculate the membership degree of labelled samples. In addition to using the common membership function introduced in Chapter 2 to calculate the membership degree, we have the following kinds of membership:

(1)The membership degree of its known category is set (1,1), the membership of the remaining categories is set to (0,0);

(2)Another method of assigning membership values to labelled samples is based on the procedure in [54]. This method is suitable for two types of problems. It is based on the distance of the mean value between the labelled sample and the labelled sample class, calculates the membership of the known class for the labelled sample, and calculates the subordinate degree (1,1) of the remaining categories to satisfy the membership of all categories of a sample. The details of the procedure can be found in [59].

(3)According to the KNN rule for assigning the membership value of the labelled sample, where K / k in classification k. Find k neighbours of the labelled sample (x,x) belonging to the i th category (ci,ci). Then, using (5.16), we can calculate the membership of the labelled samples belonging to each category:

(Cj(x),Cj(x))={(0.51,0.51)+(nj/K)*(0.49,0.49),ifj=1(nj/K)*(0.49,0.49),ifj=i(5.16)

Among these, nj is the number of neighbours of the marked samples (x,x) in the j th class. This approach attempts to dynamically fuzzify the membership of labelled samples located in the cross-section of the sample space and to classify the samples that are not in the cross-section of the sample space into a known class because of their high membership. Thus, an unknown sample located in the intersection region will be less affected by this type of labelled sample, which is located in the “dynamic blur” region of the category boundary.

5.5.3Dynamic fuzzy semi-supervised adaptive learning algorithm

The categories in the dynamic deduction system are dynamically fuzzy, and the class characteristics change over time. They may change slowly or abruptly depending on the state of the system at that time. Some new models either further validate information in previous data or introduce new information, such as new categories, inter-class merging, splitting, and so on. This new information changes the operational state. Thus, in view of these small and transient changes, it is necessary to update the membership function of each category in order to better estimate the characteristics of the current system model. To identify these changes, we need an adaptive classifier that can adjust its parameters over time.

A dynamic fuzzy semi-supervised adaptive learning (DFSSAL) algorithm based on DFKNN is introduced in this section. DFSSAL uses the labelled information to estimate the class feature. The unlabelled data are used to detect the new class and learn the membership function of the class. Moreover, the class information can be learned in the feature space before taking into account the class deduction. The purpose of the algorithm is to judge whether the class can be deduced by considering the usefulness of the model and to estimate the new model of the system according to the adjusted class feature. As shown in Fig. 5.9, the main process of the algorithm can be described as follows.

(1)Learning and classification: The DFKNN algorithm is used to learn and classify the new model.

(2)Detect whether category can be deduced: After completing the classification of the new model, the algorithm gradually updates the category parameters in order to track the category deduction process. If the category is correctly deduced, and the corresponding deductive indication parameter confirms that the category has been deduced, the algorithm automatically creates a new category.

(3)Determine the class validity: This stage validates the usefulness of the current category and retains useful categories. Categories that are similar are merged.

We now describe these stages in detail.

1) Learning and classification stage

The learning stage of the DFSSAL algorithm aims to learn all the marking patterns and class information. The training set is denoted by (X,X)={((x1,x1),(y1,y1)),...,((xl,xl)(yl,yl)),(x1',x1'),...,(xu',xu')},where((xi,xi),(yi,yi)) is the label, l is the number of labelled data, is the i th unlabelled data point or pattern, and u is the number of unlabelled data.

The initial learning set consists of all the marked data in the training set. There must be at least two patterns in order to calculate the initial centre of gravity and standard deviation of each class. The DFSSAL algorithm uses the centre of gravity and standard deviation to evaluate the category deduction. The current centre of gravity (Gk,cura,Gk,cura) and the initial standard deviation (Stdk,inia,Stdk,inia) of the current category (Ck,Ck) are calculated for each attribute (a,a) of each category.

Fig. 5.9: Process of dynamic fuzzy semi-supervised adaptive learning algorithm.

DFKNN is used to classify each new model step by step. Therefore, we need to initialize K, classify a new pattern into a known category, and then use the category centre of gravity and standard deviation to determine whether the test category can be deduced.

2) Detect whether category can be deduced

In general, it is highly likely that deductions will be made for categories with new patterns. After classifying a pattern (x,x) into category (Ck,Ck), only the parameters of category (Ck,Ck), need to be updated. We calculate the new features of category (Ck,Ck), (standard deviation and centre of gravity) to detect whether or not deduction has occurred in that category. Equation (5.17) is used to update the standard variance and the centre of gravity of category

(Stdk,cura,Stdk,cura)=nk1nk×((Stdk,cur1a)2,(Stdk,cur1a)2)+((Gk,cur1a,Gk,cur1a))2nk+1(5.17)(Gk,cura,Gk,cura)=(Gk,cur1a,Gk,cur1a)×nknk+1+(x,x)nk+1

where nk is the total number of new patterns before entering category (Ck,Ck): are the standard variance and the centre of gravity, respectively, of the new pattern (Stdk,cur1a,Stdk,cur1a)and(Gk,cur1a,Gk,cur1a) according to attribute (a,a) before classification into category (Ck,Ck).

Based on the calculated values of (Stdk,cura,Stdk,cura)and(Gk,cura,Gk,cura), a brief change in the system is monitored using two deductive indicators.

(1)ik,a1 represents the change in compactness of each attribute (a,a) of class (Ck,Ck) and is calculated using (5.18). If at least one attribute (a,a) has a value of i1k, a greater than the threshold th1, class (Ck,Ck) is considered to begin to change on attribute (a,a) To track small changes in the category, the threshold th1 should be set to a smaller value; otherwise, to detect important deductions, th1 should be set to a larger value. For example, when th1 = 5, only 5% of the category characteristics can be deduced.

ik,a1=(Stdk,cura,Stdk,cura)×(100,100)(Stdk,inia,Stdk,inia)(100,100)(5.18)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.124.49