Dynamic fuzzy association rule generation algorithm (DFCL_Rule):
(1)The dynamic fuzzy formal background is preprocessed to generate the dynamic fuzzy processing background. The corresponding dynamic fuzzy concept lattice is generated according to the dynamic fuzzy processing background.
(2)The breadth-first traversal of the corresponding dynamic fuzzy concept lattice is carried out, and each dynamic fuzzy concept node is judged. If is greater than the prior parameter is less than the prior parameter , then the dynamic fuzzy concept nodes are dynamic fuzzy frequent nodes.
(3)Enumerating all dynamic fuzzy node pairs in the dynamic fuzzy concept lattice, if have a father child relationship and satisfy is called a dynamic fuzzy candidate pair.
(4)According to the generated multiple dynamic fuzzy tuples, the dynamic fuzzy association rule is generated. The dynamic fuzzy association rule is generated with two different thresholds specified by which can generate dynamic fuzzy classification rules.
The dynamic fuzzy processing background is obtained by preprocessing the DFD sets. Based on this, the DF concept lattice is generated by DFCL_BorderLayer. Then, the dynamic fuzzy concept rules are generated. Finally, the two algorithms in this section are compared with some common algorithms.
For example, suppose that the dynamic fuzzy background after feature extraction is as presented in Tab. 4.2. According to BorderLayerDFCL algorithm, the dynamic fuzzy processing background is given in Tab. 4.3.
First, we calculate the threshold of the dynamic fuzzy attributes according to Tab. 4.2 and obtain the dynamic fuzzy processing background (Tab. 4.3) after reduction, which improves the precision of constructing the dynamic fuzzy concept lattice. Second, the dynamic fuzzy concept lattice is constructed according to the algorithm, as shown in Fig. 4.2.
For example, assume that the threshold and the threshold Then, the frequent nodes in the dynamic fuzzy concept lattice are ({12345}, {d1d2d7}), ({2345}, {d1d2d3d4d6}), ({78}, {d1d2d4d6}), ({8}, {d1d2d3d4d6}) <. The dynamic fuzzy support degree is and the dynamic fuzzy confidence degree is The dynamic fuzzy candidate double tuples are and The dynamic fuzzy association rules can then be extracted. According to the dynamic fuzzy association rules listed in Tab. 4.4 are obtained.
The following experiments compare the run times of DFCL_BorderLayer and DFCL_Layer algorithm with those of several algorithms with good performance, including Godin’s algorithm, Bordat’s algorithm, and Norris’ algorithm. The test platform configuration used an Intel 2.79 GHz processor with 512 MB memory. The experiment used randomly generated data with a total of 100 attributes.
It can be seen from Fig. 4.3 that the performance of DFCL_Layer and DFCL_BorderLayer increases with an increase in the data size. This verifies the validity of the proposed method.
Combining the knowledge of the DFDT, this section discusses a conceptual learning model based on DFDT. First, we introduce the DF concept tree and the generating strategy. Then, we analyse the generation process of the DF concept tree in detail before discussing the DF concept rule extraction and matching algorithm.
Definition 4.32 The DF concept tree is a tree structure. The instances are arranged from the root node to a leaf node for concept classification. Each node on the tree specifies the value of a particular element of the concept to which the instance belongs. Its successor branch corresponds to the value of the connotation element, and the leaf node corresponds to a specific concept category.
Different from the traditional step-by-step search space method, the dynamic fuzzy concept tree is a decision tree method for dealing with DF data. The basic idea is to use a DF rule matching algorithm to make decisions. Because the learning strategy requires the input values to be discrete, the datasets should be subjected to dimension reduction and discretization, which concerns the dynamic fuzzy domain and the processing of dynamic fuzzy attributes. We then consider how to choose test attributes and select a processing method for special attributes. Finally, we introduce the dynamic fuzzy concept tree construction algorithm, which incorporates dynamic fuzzy rule extraction and dynamic fuzzy concept tree pruning to improve the precision of the concept tree.
1) Information gain standard
The key problem in constructing a DF concept tree algorithm is how to select the best classification attributes. The choice is usually based on “information gain”, which measures the ability of a given set of attributes.
Given the DF target concept training set S, the entropy in the classical Boolean type is defined as
This can be written in the following form:
where pi represents the ratio of the number of samples labelled as i in sample set S to the total number of samples in S; c is the total number of labelled samples.
where Values(A) is a possible set of values for attribute A and Sv is a subset of the values v of attribute A in S.
Because the dynamic fuzzy attribute is a nonclassical data value, the relevant parameters in the definition of information entropy should be adjusted. We consider two options for this, and choose one of them to achieve.
The first scheme finds the DF membership in the DF concept sample by setting a dynamic fuzzy threshold is the ratio of positive cases of DF membership degree in S that are greater than the threshold is the ratio of counter-cases of DF membership degree in S that are smaller than the threshold The other does not change.
The second scheme takes the training sample S that generates the DF concept, and obtains the sum of the DF memberships of all samples in this description. This process is repeated for the membership of each description in the intersection of the DF concept and the positive and negative examples. p⊕ is the ratio of the sum of positive cases of DF membership degree in S and DF membership degree in the DF concept in the sum of the two classes. p⊕ is the ratio of the sum of counter-cases of DF membership degree in S and DF membership degree in the DF concept in the sum of the two classes. The other does not change.
The higher the information gain of the DF attribute, the better the partitioning ability. Therefore, the information gain is often applied to the candidate attribute set to select the best classification attributes.
2) Gini index standard
The Gini index standard is based on distance measurement. The distance refers to the ratio distribution distance of the concept class. Assuming there are m DF concept classes in the training sample set S, Gini(S) is defined as follows:
where pi is the probability of the i th class appearing in the training sample set. Assuming that S is divided into two classes S1 and S2, Gini(S) is
The smaller the Gini index of the DF attribute, the better the partitioning ability of the sample set. The advantage of this method is that the Gini index ensures the attribute contain a lot of useful information; the disadvantage is that the attribute selection becomes worse as the number of concept categories increases.
3) Relief standard
Kira and Rendell proposed the Relief standard in 1992. The basic idea of the algorithm is to randomly extract an instance and measure the dependency between attributes by the distance between the instance and the attribute value of the neighbour instance. The algorithm defines an attribute evaluation value of the sample set S. The initial state is 0. We randomly select m instances from the sample set S. For each record select the same category and the closest different categories Check each attribute value If the values of two instances in different classes are the same, add one value; otherwise, subtract one value. The formula is as follows:
where the attribute is a classification type, and the value of is
where is the smaller positive dynamic fuzzy threshold that is divided into discrete attributes. The larger the value of between the attributes, the better the classification performance. We select the attributes that give the best classification.
Each of the three attribute selection strategies has its own characteristics. If the properties of the sample set S are very different, then the information gain or Gini criterion should be used. If there are dependencies among the attributes of the sample set S, the Relief criterion is the most appropriate.
The dynamic fuzzy concept tree generation algorithm selects the value of the best classification property to expand the branch. In general, there are several values for each branch. If the value of the property is continuous, then the input data should be discretized. Namely, we need to divide the numerical values in the data set into several sub-intervals. This process is called the discretization of numerical data [19]. According to differences in the classification information, discretization is divided into unsupervised discretization and supervised discretization. Unsupervised discretization may destroy the completeness of the information and the usefulness of the data in the learning process. Supervised discretization takes the class attribute information into account and has a high accuracy rate in the learning process. Commonly used discretization strategies include the split strategy, adaptive strategy, equal frequency interval strategy, and class information entropy division strategy [20].
Suppose the condition attribute set of the instance set is The value range of attribute and the discretization scheme for the single numeric attribute is a partition
The instance set after the attribute filter is recorded as is any condition attribute of and is of numeric type. The discretization of is a multiple-attribute discretization.
Suppose that are two discretization schemes of where is the discretization scheme of attribute and is the discretization scheme of attribute
As is a discretization scheme for attribute the partial order relation in the discretization is defined as if and only if each attribute in the instance set where is the division of defines the degree between the discretization scheme as a partial order relationship.
In the value of attribute is divided into an interval. Then, the partition of attribute must belong to the same interval. Two or more adjacent segmentation intervals of the value of attribute are combined into one partition interval of the value of attribute Because is said to be the refinement of
3.15.18.198