xj<XminjΔj2Xminj=XminjΔjxj>XmaxjΔj2Xmaxj=Xmaxj+Δj,h=h+1xj<XminjΔj2Xminj=XminjΔjxj>Xmaxj+Δj2Xmaxj=Xmaxj+Δj,h=h+1

The number of patterns nib(j,k)belonging to class (Ci,Ci) in the interval according to the attribute j pattern is represented by the height of the histogram in every interval b(j, k), k ∈ {1, 2, . . ., h}. The ratio of the total number Ni of patterns of class nib(j,k) and class (Ci,Ci) can be used to obtain probability distributions of some attribute j in class 6. In general, we use the probability of (γb(j,k),γb(j,k)) at the midpoint of the interval to approximate the probability that attribute j belongs to class (Ci,Ci); that is,

p((Ci,Ci)|(yb(j,k),yb(j,k)))=nb(j,k)Ni,(5.3)

which we write as

Combining this with (5.1), the random dynamic fuzzy probability is

dπ((Ci,Ci)|(yb(j,k),yb(j,k)))=j=khmin(pikj((y,y)),pifj((y,y))),(5.4)

which is written as pikj((γ,γ)).

1) Classify new patterns

The membership degree of each attribute j in class (Ci,Ci) is approximated as a dynamic fuzzy random distribution. Thus, a new pattern (x,x) with a different attribute value of (x1,x1),(x2,x2),...(xd,xd) can be classified in two steps:

(1)The membership degree dπil((x,x)) of each attribute j in class (Ci,Ci) is determined by the dynamic fuzzy random distribution.

(2)The membership degrees of all attributes j of class (Ci,Ci) are combined by an operator H; that is,

dπj((x,x))=H(dπi1((x,x)),,dπid((x,x))).(5.5)

Thus, the global membership valuedπi((x,x))ofpattern(x,x) belonging to class (Ci,Ci) is obtained. Because the minimum operation is simple and the result is good, we use the minimum operator as H in the proposed system. Finally, the pattern (x,x) is assigned to the class with the largest membership. If the membership of pattern (x,x) belongs to a class whose membership is not (0,0), then the pattern is classified into the class and the dynamic fuzzy distribution of each attribute of the class is updated.

2) Detect new classes

For an existing class c, the first pattern rejected by all classes is treated as a prototype of the new class and the membership degrees of the new class are calculated using (5.3) and (5.4). The number of categories is increased by 1. For the first pattern (0,0), of the new class c, if the attribute value of the pattern is in the interval b(j, k), k{1, 2, ⋅ ⋅ ⋅, h}, the probability histogram of attribute j in class c can be expressed as

pcj={pclj=0,pc2j=0,,pckj=1,pchj=0}.

Then, the dynamic fuzzy random probability histogram is constructed using (5.4). Obviously, there is only one mode, so the dynamic fuzzy random probability histogram and probability histogram are the same. The specific process of detecting new classes is briefly described as follows:

dπi((x,x))=0i{1,2,,c}cc+1,(Cc,Cc)={(x,x)},dπc={dπc1,,dπcj,,dπcd}.

3) Update the local membership degree

If pattern (0,0), belongs to class (Ci,Ci), then the dynamic fuzzy distribution of class (Ci,Ci), needs to be locally updated. The probability distribution and the dynamic fuzzy distribution of the j th attribute of class (Ci,Ci), are denoted by pji = {pi1j,pi2j,pihj}anddπij={dπi1j,dπi2j,,dπihj}, respectively, and the two probabilities are denoted by pij={p'ji1,p'ji2...p'jih}anddπ'ji={dπ'ji1,dπ'ji2,...,dπ'jih}, respectively, after updating.

For simplicity of calculation, let pihj<pi(h1)j<...<pi1j. The new random probability can then be calculated according to the following formula [52]:

(xj,xj)b(j,k),k{1,,h}pikj=pikj×NiNi+1+1Ni+1,pizj=pizj×NiNi+1z{1,,h},z=k(5.6)

A new dynamic fuzzy stochastic probability can be obtained by combining this with (5.4). The detailed process of updating the local membership is briefly described as follows:

dπi((x,x))=maxz[1,,c](dπz((x,x)))(Ci,Ci){(Cl,Ci),(x,x)},dπi={dπi1,,dπij,,dπid}

4) Merge similar classes

If the patterns occur in different orders, then eventually there will be a number of different divisions or clusters of different numbers, and some clusters may represent the same pattern. Therefore, we need to merge clusters that represent the same pattern. The merged similarity measure involves the similarity defined in [53]. We define the similarity criterion as follows:

(δiz,δiz)=(1,1)(x,x)(Ci,Ci)or(x,x)(Cz,Cz)|dπi((x,x))dπz((x,x))|(x,x)(C1,C1)dπi((x,x))+(x,x)(Cz,Cz)dπz((x,x))(5.7)

The membership functions are merged online in an incremental manner. Let pji = {pi1j,...,pihj}andpzj={pz1j,...,pzhj} be the random probability distributions of cluster (Ci,Ci)andcluster(Cz,Cz), respectively.

Equation (5.8) gives the new probabilities for interval b(j, k) after the merging of the two clusters:

pizkj=nib(j,k)+nzb(j,k)Ni+Nz(5.8)

Further simplifying (5.8) yields the following:

pizkj=nib(j,k)Ni×NiNi+Nz+nzb(j,k)Nz×NzNi+Nz=pikj×NiNi+Nz+pzkj×NzNi+Nz(5.9)j{1,2,,c},k{1,2,,h}

After calculating the random probabilities of the corresponding attributes of each class, we can calculate the membership of the corresponding attributes of each class and update the membership degree of each mode using (5.4).

2. Knowledge transfer

Generally speaking, if the current task being learned is associated with the past, then people can quickly learn this new task. In multi-task learning, we need to understand how knowledge transfer can be made more effective [54]. In this section, the tasks are related; that is, there are overlapping areas between any two tasks. When we learn new tasks, we can use some relevant information from the previous task to improve the learning of the current task. We assume that the pattern of each class in each task is representative; that is, similarity patterns with the same label in the previous task have the same label in the next task. As shown in Fig. 5.5, the neighbours of the same pattern x are marked with a square in Task 1, and these neighbours should also have the same label as pattern x in Task 2.

Figure 5.6 shows the basic idea of knowledge transfer. Suppose we have successfully learned the first task. As can be seen in Fig. 5.6(1), Task 1 has three types of patterns, labelled with a plus, an asterisk, and a square. When the second task occurs, we need to classify pattern x. First, x is placed on the first task and classified as a plus, and in the same class, x has five neighbours (circled in a dashed line). Then, in the current task, the labelled categories for these fields of pattern x should also be the same. Assuming that pattern x is classified in the asterisk category in the current task, we can also classify the marked category of the field of x as an asterisk, as shown in Fig. 5.6(2).

However, the quality of the classification depends on the size of the neighbourhood and the formation of the interface. In fact, some of the migrated neighbourhood patterns may mislead the learning of the current task, which is a natural result of the induction bias. We can use training samples to correct that misleading. That is, when a new training pattern appears, we first find the nearest neighbour of the pattern and check whether this is the same as the new training pattern. If the two are different, the marker of the nearest neighbour must be corrected to the label of the current new training pattern.

Fig. 5.6: Basic idea of knowledge transfer.

3. Description of dynamic fuzzy semi-supervised multi-task matching algorithm

The general dynamic fuzzy semi-supervised multi-task matching algorithm (DFSSMTPM) is described as follows.

For the first task, the DFSSL algorithm is used to classify the learning tasks. For other tasks, the neighbour and the nearest neighbour of the current training pattern are found in a copy of the previously learned task. We verify that the label of the current training pattern is the same as that of the nearest neighbour, or we correct the label of the nearest neighbour; in addition, we combine these neighbours to learn the current task.

5.4.3Case analysis

1. Description of experimental data

To test the validity of our algorithm, we reintegrated the UCI Iris dataset to simulate several related tasks. The Iris dataset consists of three classes of data across 150 data points, with 50 data items and four attributes. Using the idea of constructing multitasking data [54], we reconstructed three tasks for the Iris data. Table 5.2 presents the integrated data. For the same dataset, different classification problems constitute related tasks. Task 1 is composed of raw data; that is, it contains three types of data. Task 2 contains two types of data, the original data from the first and second types of data constitute the first class, and the remaining data constitute the second type; Task 3 also contains two types of data, the original data from the first class and the combined data from the original second and third types.

2. Experimental results and analysis

We mainly investigate the following aspects. Effectiveness of knowledge transfer: To test the validity of knowledge transfer, we carried out a comparison of two kinds of experiments on the same task. First, we conducted a dynamic fuzzy semi-supervised matching experiment as a single task. Then, we carried out a dynamic fuzzy semi-supervised multi-task matching experiment with a multi-task background.

Tab. 5.2: Multi-task problem with iris data.

At the same time, we compared our algorithm with that reported in [51] to test the validity of knowledge transfer in related tasks.

Semi-supervised classification vs. supervised classification: In [51], the authors demonstrate the superiority of fuzzy pattern matching for semi-supervised tasks under a single task without prior knowledge. In this chapter, we attempt to show that dynamic fuzzy semi-supervised pattern matching is better than supervised fuzzy pattern matching in the multi-task background when there are few training data for a task.

We mainly detect the classification error rate of the algorithms, i.e. the ratio of the number of erroneous predictions in the test samples to the total number of test samples. In the experiment, we considered Task 1 as the basic task; that is, we learned the remaining two tasks after learning Task 1. First, we need to understand the number of histograms in a single task, the relationship between the number of markers, and the error rate. As shown in Fig. 5.7, we used the dynamic fuzzy semi-supervised matching algorithm (DFSSPM) to learn Task 1. It can be seen that when the number of markers in the task is fixed, the number of intervals is related to the classification effect, so the number of intervals is one of the factors that affects the classification effect. When the number of intervals is fixed, the number of out-of-label data is in direct proportion to the effect of classification, which is consistent with our intuition.

It is also important to identify the neighbourhood radius of the relationship between the size and classification results when the number of intervals and labels is known. As shown in Fig. 5.8, we applied the fuzzy pattern matching algorithm [51] and the dynamic fuzzy semi-supervised matching algorithm to Task 3 in the multi-task background. These are abbreviated as MFPM and DFSSMTPM, respectively. There were 30 labelled training data and six histogram intervals. As can be seen from the figure, regardless of which method is used, the size of the neighbourhood radius affects the classification; as mentioned earlier, this is a natural result of the introduction of inductive bias. Overall, the DFSSMTPM algorithm gives a better classification than MFPM does. As MFPM is a monitoring algorithm, we can say that in the context of multi-tasking, our semi-supervised algorithm classification using tag-free data is effective. Using this new information to update the current category membership function produces a better classification than that given by MFPM [51].

Fig. 5.7: The relationship between the number of intervals and the number of tags and the error rate.
Fig. 5.8: The relationship between the size of the neighbourhood radius, the two methods, and the error rate.

To illustrate the validity of the knowledge transfer mechanism, dynamic fuzzy semi-supervised matching (DFSSPM) and dynamic fuzzy semi-supervised multi-task learning algorithm (DFSSMTPM) were applied to single tasks. Table 5.3 lists the classification performance on Task 2 with six histogram intervals; for the DFSSMTPM algorithm, the neighbourhood radius was set to 0.2. The table lists the classification error rates of the two algorithms for different numbers of marked training data. It can be seen from the table that the overall classification error rate of DFSSPM is higher than that of DFSSMTPM, which indicates that the knowledge migration mechanism can reduce the classification error rate after extending the number of labelled data.

Tab. 5.3

5.5DFSSMTAL algorithm

Real-life problems are often dynamic and fuzzy. It is difficult to use traditional classification method to correctly study the corresponding classification model for these problems. Thus, an adaptive classifier is constructed in which the parameters are updated as the model discovers new data. In this section, we will introduce a DFSSMT ALalgorithm, i.e. semi-supervised multi-task learning in the context of an adaptive classification method, in order to provide a new method to solve dynamic fuzzy problems.

5.5.1Mahalanobis distance metric

Definition 5.13 Metric [55]: A mapping D:χ×χR0+ in a vector space is called a metric if and only if, xi,xj,xkχ in the vector space, the following are satisfied:

(1)Triangle inequality: D(xi,xj)+D(xj,xk)D(xi,xk)

(2)Nonnegativity: D(xi,xj)0

(3)Symmetry: D(xi,xj)=D(xj,xi)

(4)Distinguishability: D(xi,xj)=0xi=xj

Strictly speaking, if a mapping satisfies the first three conditions (i.e. triangular inequality, nonnegativity, and symmetry) but does not satisfy the fourth condition (discernibility), it is called a pseudo-metric. In the absence of special circumstances, the latter measurement refers to the pseudo-metric.

After linearly transforming the vectors x = Lx in the vector space χ, a set of metrics is obtained by calculating the Euclidean distance between the vectors. The square of the distance can be calculated using (5.10), where the matrix L is a parameter in the linear transformation. It is easily proved that if L is a full rank matrix, (5.10) is a valid metric; otherwise, it is a pseudo-metric.

DL(xi,xj)=L(xixj)22(5.10)

The quadratic matrix in (5.11) is often used to describe the square of the distance in (5.10).

M=LTL(5.11)

A matrix M of any real number matrix L using (5.11) must be a positive semidefinite matrix. In other words, the matrix M has no nonnegative eigenvalues.

Definition 5.14 Mahalanobis metric: For any matrix L of real numbers using the matrix M of (5.11), we denote the square of the distance as

DM(xi,xj)=(xixj)TM(xixj),(5.12)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.138.177