We represent this in the form of a dynamic fuzzy tree, as shown in Fig. 6.3.

Fig. 6.3: DF production represented by dynamic fuzzy tree.

For the root node Unexpected text node: ' A 'Unexpected text node: ' A ' the other nodes are divided into five non-overlapping limited subsets:

T1={(B1,B1),(C1,C1),(C2,C2)},T2={(B2,B2),(C3,C3)},T3={(B3,B3),(C4,C4),(C5,C5),(C6,C6),(D1,D1),(D2,D2),(D3,D3)},T4={(B4,B4),(C7,C7),(C8,C8),(D4,D4),(D5,D5),(D6,D6)},andT5={(B5,B5),(C9,C9),(C10,C10)}.T1={(B1,B1),(C1,C1),(C2,C2)},T2={(B2,B2),(C3,C3)},T3={(B3,B3),(C4,C4),(C5,C5),(C6,C6),(D1,D1),(D2,D2),(D3,D3)},T4={(B4,B4),(C7,C7),(C8,C8),(D4,D4),(D5,D5),(D6,D6)},andT5={(B5,B5),(C9,C9),(C10,C10)}.

Each set is also a dynamic fuzzy sub-tree of the root (A,A).(A,A). For example, T4 has the root (B4,B4),(B4,B4), and the other nodes are divided into two non-overlapping subsets T41={(C7,C7)}andT42={(C8,C8),(D4,D4),(D5,D5),(D6,D6)}.T41={(C7,C7)}andT42={(C8,C8),(D4,D4),(D5,D5),(D6,D6)}. and T42 are both sub-trees of T4. (C8,C8)(C8,C8) is the root in T42, and (D4,D4),(D5,D5),(D6,D6)(D4,D4),(D5,D5),(D6,D6) are three non-overlapping sub-trees of (C8,C8),(C8,C8), which itself is a tree with one node.

6.4.2Dynamic fuzzy tree hierarchy relationship learning algorithm

1. ID3 algorithm and sample analysis

Decision tree learning is one of the most widely used inductive reasoning algorithms for learning disjunctive expressions. The decision tree classifies an instance from the root node to a leaf node, and the leaf nodes are the classification of instances. Each node in the tree specifies a test for a property of the instance, and each successive layer of the node corresponds to a possible value of the property. The method of classification is to start from the root node of the tree, test the properties specified by the node, and then move down according to the branch corresponding to the value of the property for the given instance. This process is then repeated with the new node as the root of the sub-tree.

ID3 is based on the information entropy decision tree classification algorithm, using the top-down structure of the decision tree to learn [52]. According to the attribute value used to judge the category of samples, the statistical attribute of information gain is introduced to measure the ability of the given attribute to distinguish the training samples.

Consider a set of training samples X consisting of Ni training samples belonging to class i. The probability of a sample belonging to class i is pi, and

pi=Ni|X|,(6.17)pi=Ni|X|,(6.17)

where |X| is the number of samples in X.

To precisely define the information gain, the entropy metric is widely used in information theory to describe the purity of any sample set [52].

Assume the objective attribute has k different values. The entropy of the training set X corresponding to k classification states is defined as

Entropy(X)=ki=1pilbpi.(6.18)Entropy(X)=i=1kpilbpi.(6.18)

The definition of some attribute A corresponding to the information gain Gain(X, A) is

Gain(X,A)=Entropy(X)vValues(A)|Xy||X|Entropy(Xv).(6.19)Gain(X,A)=Entropy(X)vValues(A)|Xy||X|Entropy(Xv).(6.19)

where Values(A) is the set of all possible values for attribute A, Xv is the subset of values v of attribute A; that is, Xv = {xX|A(x) = v}. The first term of this formula is the entropy of the original set X, and the second term is the expectation of the entropy of attribute A classified by X.

The following uses the traffic situation of a city as an example to explain the actual procedure of the ID3 algorithm. The training samples are listed in Tab. 6.6.

When the algorithm starts, all samples are contained in the root node. To determine the best division attribute, calculate the entropy of the training samples according to (6.18):

Entropy(X)612lb612612lb612=1.Entropy(X)612lb612612lb612=1.

Tab. 6.6: Sample set of traffic conditions in a city.

Then, we can calculate the entropy of each attribute in the training sample set:

Entropy(Xrushour,true)412lb41226lb26=0.918Entropy(Xrushour,true)412lb41226lb26=0.918

and

Entropy(Xrushour,false)26lb2646lb46=0.918Entropy(Xrushour,false)26lb2646lb46=0.918

Calculate the information gain of attribute “Rushhour” according to (6.19):

Gain(X,Rushour)Entropy(X)v{true,false}|Xv||X|Entropy(Xv)=1(12×0.918+12×0.918)=0.082.Gain(X,Rushour)Entropy(X)v{true,false}|Xv||X|Entropy(Xv)=1(12×0.918+12×0.918)=0.082.

In the same way, we can obtain information for attributes “Weather” and “Trafficjam”: Gain(X, Weather) = 0.459 and Gain(X, Trafficjam) = 0.

Overall, the information gain of “Weather” is the greatest, so the classification attribute on this node is “Weather”. This procedure is repeated to obtain the decision tree shown in Fig. 6.4.

2. Analysis and representation of dynamic fuzzy problems using decision trees

The ID3 decision tree algorithm eventually establishes the final decision tree, from which we identify information on the relevant attributes of the example set. Tab. 6.6 presents the sample set, including “Rushhour”, “Weather”, “Trafficjam”, and “Annlate”. Taking “"Weather” as an example to illustrate the analysis, its range is {good, normal, bad}. These three properties are not clear – there is no exact standard to measure what is good, normal, or bad, that is, the condition of the attribute value of “Weather” is blurred; on the other hand, the “Weather” is constantly changing and developing, which indicates that it has a dynamic and fuzzy character. It can be seen that the decision tree learning system, which is based on the operation of the attribute, is a system with dynamic fuzziness.

Fig. 6.4: Decision tree trained by ID3 algorithm.

Knowledge representation of dynamic fuzzy problems in decision trees is as follows.

(1)For attribute xi, use a value of attribute xi to replace xi. This is represented as (xi,xj),(xi,xj), where the attribute has dynamic fuzziness and is a dynamic fuzzy number; in Tab. 6.6, the attribute “Rushhour” is replaced by “Rushhour=true”, and its range is (0.7,0.3),(0.7,0.3), which is the attribute value after being transformed using the dynamic fuzzy number (α,α)(α,α) to represent the degree to which Rushhour is true or false.

(2)For samples <x, c(x)>, where x = <x1, x2, . . ., xi>, x represents the training samples, and xi represents the attributes contained in the attribute. According to (1), the attribute has dynamic fuzziness, and the dynamic fuzziness of the samples is reflected by its attribute value. Thus, we replace (x,x)(x,x) by x and use the dynamic fuzzy number (α,α)(α,α) to represent c(x), that is, c(x,x)=(α,α)c(x,x)=(α,α)

(3)Every hypothesis h in the hypothesis space H is represented by the dynamic fuzzy Boolean quantity (α,α)(α,α) because hH represents the function on X: h:X[0,1]×[,].Now,h(x,x)=(α,α)[0,0]×[1,1].[0,1]×[,].Now,h(x,x)=(α,α)[0,0]×[1,1].

(4)The sample set U contains samples whose dynamic fuzziness is represented by its attribute values. Thus, replace U with the dynamic fuzzy set (U,U)(U,U) representing the range of the sample set in the dynamic fuzzy space.

Use the above knowledge to represent sample D9 in Tab. 6.6 as given in Tab. 6.7.

After representing the sample with a dynamic fuzzy set and DFL, the decision tree constructed by the samples has also changed. After dynamic fuzzification, Fig. 6.4 becomes Fig. 6.5.

The tree produced by the samples based on DFS and DFL is called a dynamic fuzzy tree (DFT).

Definition 6.12 Assume hi=<(α1,α2),(α2,α2),...,(αm,αm)>,hi=<(β1,β1),hi=<(α1,α2),(α2,α2),...,(αm,αm)>,hi=<(β1,β1),(β2,β2),...,(βm,βm)>,(β2,β2),...,(βm,βm)>, and hi, hj are functions defined in (X,X)(X,X) hi is equal to or more general than hj if and only if

(x,x)(X,X),[(hi)(x,x)=(α,α))(hj(x,x)=(β,β)))](β,β).(x,x)(X,X),[(hi)(x,x)=(α,α))(hj(x,x)=(β,β)))](β,β).

We write this as hi ≥ hj. Here, (x,x)(x,x) represents a sample. We write hig hj when hi is strictly more general than hj.

The relationship ≥g represents a kind of partial ordering relation on the sample (U,U)(U,U) (the relation is reflexive, anti-symmetric, and transitive). This provides an effective structure for learning problems. For the relationship ≥g, a more general explanation for hi contains fewer instances of constraints, any of which may be divided into a class of the samples placed into the same category by hj. This is because hj contains more constraints than hi, so hi is more general than hj.

In Fig. 6.5, the attributes “Weather = good”, “Rushhour = true”, and “Trafficjam = true” divide the training samples. The left side of the tree structure consists of samples in each node of the decision tree, where the relation between the upper set and lower set is ≥g. For example, {D3, D5, D10, D11} ≥g {D3, D5}, {D3, D5, D10, D11} ≥g {D10, D11}. Therefore, the set consists of samples in the decision tree divided by a partial order relation. The right side is the result of dividing each layer of the sample set. From the division of each layer, the classification attribute progressively divides the sample set.

Tab. 6.7: D9 knowledge representation after dynamic fuzzification.

Fig. 6.5: Decision tree after dynamic fuzzification.

3. Dynamic fuzzy tree HRL algorithm

In dynamic fuzzy tree HRL, let (ai,ai)(ai,ai) represent an attribute with value range (Vai,Vai)={(V1,V1),(V2,V2),...,(vk,vk)}.Then,|Vai,Vai|=k(Vai,Vai)={(V1,V1),(V2,V2),...,(vk,vk)}.Then,Vai,Vai=k represents the number of attribute values of (ai,ai).(ai,ai). A sample can be represented by

(Ii,Ii)=<(v1a1,v1a1)(v2a1,v2a1),,(vmam,vmam)>,(Ii,Ii)=<(v1a1,v1a1)(v2a1,v2a1),,(vmam,vmam)>,

where (viai,viai)(viai,viai) represents the i th attribute of (ai,ai)(ai,ai) and the sample set can be represented as (U,U)={Ii,Ii|i=1,2,...,n}.(U,U)={Ii,Iii=1,2,...,n}.

Above all, the set consists of the value (t,t)(t,t) of attribute (ai,ai)(ai,ai) in sample set (U,U),(U,U), which can be represented by

Par(t,t)(a1,a2)(U,U)={S,S|(S,S)(U,U),(Ij,Ij)(S,S),(Va1Va1)(Ij,Ij)}=(t,t)},Par(t,t)(a1,a2)(U,U)={S,S(S,S)(U,U),(Ij,Ij)(S,S),(Va1Va1)(Ij,Ij)}=(t,t)},

where (t,t)=(vkai,vkai)(t,t)=(vkai,vkai) represents the k th attribute value of attribute and k|(Vai,Vai)|.k(Vai,Vai).

According to the difference among the values of attribute (ai,ai).(ai,ai). we obtain the division of (U,U),(U,U), written as

Div(ai,aj)(U,U)={(S1,S2),(S2,S2),,(Sn,Sn)|(Si,Si)=Par(ta1ta1(U,U)},Div(ai,aj)(U,U)={(S1,S2),(S2,S2),,(Sn,Sn)(Si,Si)=Par(ta1ta1(U,U)},

where n|Vai,Vai|.nVai,Vai.

We write the condition attribute as (A,A)={(A1,A1),(A2,A2),...,(Am,Am)},(A,A)={(A1,A1),(A2,A2),...,(Am,Am)}, and the decision attribute is (Ad,Ad).(Ad,Ad). Because (Ad,Ad)(Ad,Ad) relies on condition attribute (Ai,Ai),(Ai,Ai), there is a relationship between the condition attribute and decision attribute that satisfies one of the following three relationships. For the division par(Vj,Vj)(Ad,Ad)(U,U)par(Vj,Vj)(Ad,Ad)(U,U) given by dividing the sample set (U,U)(U,U) by the j th value of the decision attribute (Ad,Ad),(Ai,Ai)(Ad,Ad),(Ai,Ai) such that one of Par(Vk,Vk)(Ai,Ai)(U,U)_Par(Vj,Vj)(Ad,Ad)(U,U),Par(Vk,Vk)(Ai,Ai)(U,U)Par(Vj,Vj)(Ad,Ad)(U,U),Par(Vk,Vk)(Ai,Ai)(U,U)=Par(Vj,Vj)(Ad,Ad)(U,U),orPar(Vk,Vk)(Ai,Ai)(U,U)Par(Vj,Vj)(Ad,Ad)(U,U)Par(Vk,Vk)(Ai,Ai)(U,U)=Par(Vj,Vj)(Ad,Ad)(U,U),orPar(Vk,Vk)(Ai,Ai)(U,U)Par(Vj,Vj)(Ad,Ad)(U,U) holds. Considering that “Include” and “Included” are reciprocal computations, and “=” can be viewed as a special situation of “Include”, we take the first case as an example. Par(Vk,Vk)(Ai,Ai)(U,U)_Par(Vj,Vj)(Ad,Ad)(U,U)Par(Vk,Vk)(Ai,Ai)(U,U)Par(Vj,Vj)(Ad,Ad)(U,U) represents the case when the value of the decision attribute (Ad,Ad)is(Vj,Vj),(Ad,Ad)is(Vj,Vj), and the sample set contains the set consisting of condition attribute (Ai,Ai)(Ai,Ai) whose value is (Vk,Vk);(Vk,Vk);; that is, when the value of the decision attribute (Ai,Ai)is(vj,vj), the determined sample division is class Unexpected text node: ' V ' Because Par(Vk,Vk)(Ai,Ai)(U,U)_Par(Vj,Vj)(Ad,Ad)(U,U), it is known that the other attribute values of condition attribute (Ai,Ai) except (vk,vk) can divide the sample into class (vj,vj).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.137.59