Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6Dynamic fuzzy hierarchical relationships

This chapter is divided into seven sections. Section 6.1 gives a brief introduction to this chapter. In Section 6.2, we present the ILP. In Section 6.3, we introduce the dynamic fuzzy hierarchical relational learning (HRL). Section 6.4 presents the dynamic fuzzy tree hierarchical relation learning. In Section 6.5, we provide the dynamic fuzzy graph hierarchical relationship learning. In Section 6.6, we introduce the sample application and analysis. The summary of this chapter is in Section 6.7.

6.1Introduction

6.1.1Research progress of relationship learning

Aiming at the limitations of traditional symbolic machine learning, relationship learning combined with machine learning and knowledge representation, using logic as a representation language to describe the data and generalization, is the study of learning problems including multi-entities and relationship. Logic has a strong ability to describe the complex relationship between objects, which also makes ILP as the main means of mining relational schema [1]. The goal is to obtain specific information in the relational data and then use the knowledge to make inference, prediction, and classification. At present, it has been widely used in the field of computational biology and chemistry and so on, which has been widely used in the field of data mining [2].

According to the different techniques used in the study, relationship learning can be divided into SRL, HRL, graph-based relational learning (GBRL), and DFRL. The main research includes the following.

6.1.1.1SRL

The SRL method, which combines the likelihood relation representation theory and machine learning phase, can better understand the relationship between the data of complex problems in the real world. Due to the characteristics of statistical learning theory, the research of SRL has attracted much attention in recent years.

In 1995, Sato proposed the PRISM program [3]. The PRISM program is a symbolic statistical modelling language; it is not only the probability expansion of logic program but also uses EM algorithm to learn from examples; the Muggleton proposed stochastic logic programs (SLP) [4], through each clause in a stochastic context-free grammar with a probability value, directly to upgrade; Kramer proposed a new algorithm of SRTs tree structure degradation [5], combining degenerate statistical methods with ILP. In 1998, Blockeel and Raedt proposed the first-order logic framework, the logic decision tree FOLDT’s top-down induction [6]; its expression ability was significantly enhanced and supports for the invention predicate, the existence of classifiers, and the full name of the classifier mixture. According to the decision tree, we cannot consider the deficiency of the mining object structure. In 1999, Knobbe, Siebes, and Wallen proposed the MRDTI framework, which can efficiently find multi-relational decision tree is a domain knowledge mined by encoding [7].

In 1997, Ngo and Haddawy proposed an SRL method based on Bayesian network [8], combining logic and Bayesian network, extending the Bayesian net by the defi-nition of probability in first-order logic or interpretation. On this basis, Kersting and De Raedt proposed a BLP [9] in 2000; the model, through the establishment of model base atoms and random variables by one-by-one mapping, combines Bayesian net and clause logic together and solves the problem of continuous valued variables, and the combination rule is used to solve the multiple instantiation clause with the same head. The same year, the two-stage [10] algorithm that Muggleton proposed to maximize Bayesian posterior probability can learn the basic parameters and the logic program of SLP. In 2001, Cussens proposed the FAM [11] algorithm, based on the premise of the given logic program, estimating the parameters of SLP. In 2002, ILP and Muggleton can efficiently calculated the best probability parameters of a single clause, on the assumption that the clause was independently constructed, based on statistical techniques, learning [12] method parameters, and SLP structure. In 2003, Neville et al. proposed a relational probability tree learning algorithm [13], which deals with heterogeneous and interdependent data for the relationship framework. In the same year, Bernstein, Clearwater, and Provost proposed a relational vector space model [14], which can abstract the link structure and use the weight vector to represent entities, which is mainly used to solve the classification problem of link data. In 2004, Angelopoulos and Cussens gave the implementation method of MCMC on SLP [15], which provided a general tool for Bayesian learning.

In order to flexibly represent the relationship characteristics of data in the real world, Taskar et al. and Matthew et al., in 2002 and 2004, respectively, put forward the RMN [16] and MLN [17], respectively; in the same year, Anderson, Domingos, and Weld put relationship (logic) into the Markov model and proposed RMM [18]. RMM allows the state variable value to have a variety of types, and the same type of state is represented by a predicate or relationship, which overcomes the shortcomings of the Markov model with only one variable of state variables. RMM is a probabilistic relation representation combining the probability and the predicate calculus. In 2003, Kersting et al. introduced first-order logic into the Markov model and put forward the LOHMM [19]. The difference between the model and the Markov model is that it has two variables, the state and the observed value, and the parameters not only has the transition probability but also has the occurrence probability of each observation value in each state.

In 2005, Davis et al. [20] and Lanwehr et al. [21] combined Bayesian network and ILP technology to develop the SAYU and nFOIL system. Sato and Kameya proposed a new learning model based on Cussens’s FAM algorithm and program transformation technology; in this mode, we can add limit to the PRISM program, in order to improve the efficiency of [22] learning. In 2006, Izumi et al. proposed a parallel EM algorithm for data [23], to reduce the computing time, and memory space can be extended to learn by using multiple computers. Xu et al. proposed the Wireless Hidden Relation model [24], which, through the introduction of infinite dimensional hidden variables for each entity, expands the relationship model, studies the inference algorithm of the model, and discusses the necessity that the dissemination of information in the hidden variables can reduce structure learning in large range. Singla and Domingos proposed an entity resolution method [25] based on Markov logic; the method can combine the existing methods and Markov logic, so as to effectively solve the problem of learning and reasoning. Kersting et al. improved the basic HMM inference algorithm, giving conversion probability estimation and choosing the appropriate distribution, so as to realize the reasoning algorithm of LOHMM [26], and verified in the field of biological information the effectiveness of the algorithm. Chen and Muggleton compared BLP and SLP [27], analysed the difference and relationship between the semantics from two BLP and SLP from the definition of probability, and discussed the conversion between the two methods and their learnability.

In 2007, Singla and Domingos used the Gibbs framework to extend the Markov logic to an infinite domain [28]. Aiming at MLN can only deal with the discrete variables and attributes of the problem. Wang and Domingos proposed the MLN hybrid method [29], supporting continuous attributes and functions, in 2008. In 2009, Petr Buryan et al. [30] put forward the representation of concept based on grammar framework which can be used in the relationship learning random search, in order to ensure the correct concept of the search operation. In 2011, Man Zhu used the SRL method to establish the potential of the task of learning algorithms, for the analysis of the noise of link data learning out of the OWL axioms [31].

The SRL method is widely used, but there are still some problems needing further research: a comparative study on the advantages and disadvantages of different methods, the unified standard, and the mutual relationship, then get the unified framework; the ability of the practical metadata and self-describing data, learning from a variety of sources; the expansion of application, as the application is still limited, and combining with other systems or technologies can improve the intelligence level of the whole system; and pay attention to the characteristics of relational data and how to use the characteristics of data association to improve the efficiency of data aggregation, learning, or reasoning.

6.1.1.2HRL

Deep architecture [32] is a learning model that contains multiple levels of nonlinear operations, such as neural networks with multiple hidden layers. Neural networks with only one hidden layer, as well as SVMs and other nuclear methods, are classified as shallow structures. The deep structure of the training is a very difficult problem that has not been effectively resolved for many years.

In 2006, Hinton et al. [33] made a breakthrough in deep structure training. They proposed a training algorithm for the structure of deep belief networks (DBNs). The core idea of the training method is that DBN carries out hierarchical unsupervised learning to adjust the weights of each layer, so that each component of the training is essentially a restricted Boltzmann machine for unsupervised training. Starting from the lowest level, take the training samples as input to learn the layer weights, and then output the value of the node on this layer after training the upper layer. Finally, train the upper layer until the training of the highest level is complete. After training in the DBN, the weights of each layer are set as the initial weights of the corresponding structure in the deep neural network, and the BP algorithm is used for supervised training. Finally, a deep neural network that can be used for further classification is obtained. The deep neural network obtained by the hierarchical non-supervised pretraining achieved the highest recognition rate in the handwritten digit recognition problem. In 2007, Bengio et al. [34] analysed the results of DBN hierarchical non-supervised pre-training and found that this produced weights that may be located near some real local optimal value, compared with random initial weights, because of an improved starting position. Hence, the BP algorithm based on gradient descent has a good optimization effect. In addition, Bengio et al. proposed a deep structure of another type, named the Stacked Auto-encoder. This superposition of multiple auto-encoders uses the same hierarchical unsupervised learning to train the sequence of auto-encoders. Each auto-encoder’s training goal is to minimize the reconstruction error after encoding and decoding the sample with the original input data and to adjust the weights by gradient descent optimization to give the internal representation of the samples through unsupervised learning.

In 2009, the research team of Bengio [35, 36] analysed and explored why DBN and Stacked Auto-encoders are successful and the effect of unsupervised hierarchical pre-training, considering that the main role of unsupervised hierarchical pre-training is to assign lower weights to the optimized initial values. Ranzato et al. proposed a coefficient auto-encoder [37] that gives a sparse internal representation of the forced automatic encoder. This is more conducive to the extraction of abstract features and has achieved good results. Other deep structures include the deep Boltzmann machine [38] proposed by Hinton et al., an embedded encoding deep structure [39], and a special deep structure [40]. Lee et al. combined a convolutional neural network with DBN to give a convolutional deep confidence network [41]. This shifts the variance of extracted features and has achieved very good results. These deep structures also use hierarchical unsupervised pre-training to learn. Ming et al. [42] proposed a relational learning method based on hierarchical association rules in the logarithmic domain and used this to discover the potential user access behaviour of Web usage records. Their method uses the partial and whole relation between composite and atomic time in the log ontology to determine the transaction size, extend the association rules to the time hierarchy, find candidate frequent user rules, and extract the potential field relations after pruning the redundant and invalid rules.

The rise of deep structure learning over a short period of four or five years has seen it successfully applied to many issues, such as identification, regression, dimensionality reduction, texture modelling, motion modelling, information retrieval, robotics, and natural language processing.

At present, the study of deep structure learning is still in its initial stages, but more and more researchers are paying attention to its potential. International research teams include those led by Hinton at the University of Montreal, University of Toronto, LeCun, New York University, Bengio, and other research groups. Deep learning was one of the hot topics at the 28th International Conference on Machine Learning held in Washington DC in 2011.

6.1.1.3GBRL

GBRL, also known as graph-based multi-relational data mining, has a different style of logical relationship learning from ILP. With frequent sub-graph discovery using graph data mining technology, GBRL represents the pre-preparation of input and output mode languages in the form of graphs; i.e. examples, background knowledge, hypotheses, and the target concepts are expressed in this form. On the basis of frequent patterns, the main goal of GBRL is to find a novel, useful, and understandable pattern.

This kind of algorithm is not well studied; to date, the development of GBRL algorithms is mainly covered by Subdue [43] and graph-based induction (GBI) [44], which is based on a greedy search to find the sub-graph containing the maximum information. Both algorithms extract common sub-graphs from the graph through an incomplete search.

Subdue uses the principle of the minimum description length to assess the degree of interest of the sub-structures. The number of sub-structures generated by this algorithm is determined by the optimal compression of datasets, and the substructures provide more useful and important knowledge associated with the field. Subdue can be used for both unsupervised and supervised learning [43].

Subdue uses a cluster search algorithm in which each node is labelled as a substructure by adding a side and/or a node to engender a candidate. When all the substructures have been considered (or more than some pre-specified number), the DML principle is used to compress the structure.

GBI uses a similar information gain metric to evaluate the sub-graphs with three nodes and then identifies the frequent three-tuples, some of which are compressed by a pre-set ratio [44]. GBI uses the scale of the experience graph to define the size of the compression map and the extraction model. GBI can handle a directed or undirected tag graph with a closed path.

6.1.1.4DFRL

The basic concept of dynamic fuzzy sets was described by Wang. In 1965, Zadeh proposed the concept of fuzzy sets and later developed fuzzy logic theory. On this basis, Professor Li has thoroughly researched a broader perspective leading to a series of results on dynamic fuzzy sets and DFL. Landmark publications include “Dynamic fuzzy sets and its application” [46] in March 1997, “Of dynamic fuzzy logic and its application” [47] in December 1997, “Dynamic fuzzy logic theory” [48] in July 2005, and “Dynamic fuzzy logic and its applications” [49] in May 2008. This latter book was the first international book on DFL to be published in English and has been applied in the United States, Britain, France, Japan, and Canada.

In 2002, Professor Li established a dynamic fuzzy relational data model [50], gave the calculation method for DF data, and provided the theoretical basis for processing DFD. In 2007, Zhang researched DF relationships that are difficult to determine in the DF reasoning rules and presented a DFRL algorithm [51] to solve the problem of relationship learning in DF systems. The algorithm can be used to generate or modify the dynamic fuzzy rules of a dynamic fuzzy machine learning system and considers the effect of the first K steps in learning on learning step n + 1 using the time complexity to ensure the credibility of the results. In addition, the DFRL algorithm also solves the problem of noise interference in the observation data.

6.1.2Questions proposed

Most classical learning methods (such as decision trees, naive Bayesian networks, and SVM) are attribute-value learning methods. In such methods, each sample is represented as an attribute-value tuple. The entire dataset can be viewed as a table or relationship in a relational database. To organize and access the data effectively, the relational database is organized in the form of multi-relationships. Attribute-value learning has the limitation that single forms and single relationships make it hard to find a more complex model hidden in the real-world data [1]. Machine learning and data mining technology clearly need to consider the relationship representation of learning tasks and the related search mechanism; that is, this should be considered in the complex structure of the multi-relational dataset learning design. In the field of machine learning, this kind of problem and its solution are called relationship learning. Relationship learning can give better solutions to complex real-world relationship data, but it is difficult to deal with uncertain information. Dynamic fuzzy systems have the ability to deal with the uncertainty of complex data [48, 49].

Most of the current learning algorithms have various data constraints, such as being independent, from the same distribution, and so on. As a result, we must focus on processing the flat data. Real-world data, except structured data, contain many semi-structural or unstructured data. To deal with the deep relationships of structured data, we must solve the problem of processing the multiple relationships between the data. Through the deep structure, we can relate complex data to the related categories to arrange and gradually process the hierarchical structure.

In the theoretical framework of dynamic fuzzy mathematics, classification and understanding are aided by the hierarchical structure of complex data, the discovery of hidden modes among data, and the learning of beneficial relationships between feature values.

In summary, hierarchical relationships are very important in relationship learning, but the current research on this topic is insufficient. From the view of DFL, this chapter studies dynamic fuzzy hierarchical relationship learning.

6.1.3Chapter structure

The first section analyses the progress of relationship learning research and outlines the existing problems and research focus of this chapter.

The second section gives the basic theoretical knowledge of DFL; the third section describes a dynamic fuzzy hierarchical relationship learning algorithm based on the basic theory of dynamic fuzzy matrixes; and the fourth section discusses the dynamic fuzzy tree hierarchical relationship learning algorithm and uses examples to analyse the process of this mechanism. The fifth section gives a dynamic fuzzy graph HRL algorithm and presents representative examples. The sixth section focuses on applications and analysis, and the seventh section summarizes this chapter.

6.2Inductive logic programming

For a given first-order language L, let the set V contain the language variables. We establish an item set T, atoms, and general rules. Text l is atom a(positive text) or its negation¬a(negative text). The replacement σ is from V to an application of T and extends to an atom. Let Subset represent the basic replacement set. Clause is a finite disjunction of multiple languages, such as l1∨ ... ln, denoted by l1 ... ln. The Horn clause has at most one positive literal, such as H ∨ l1 ... ∨ ln, and can be denoted by H l1 ln ∨ ¬ ∨. Among them, H is called a Horn clause or a corollary. The text conjunctive l1∧ ... ln is called the body of the Horn clause or antecedent words. We now proceed with the concept of logical inference.

Definition 6.1 With the knowledge that X is an atom rule, I, σ |=X represents σ(X) ∈ I. We extend this to a general rule F using semantic combination.

I |= F represents ∀σ, I, σ |=F(i.e. I is a model of F).

|= F represents ∀I ∈ I |= F

F |= G denotes that all of models of F are models of G.

Statement: Under the background of the first-order logic, the goal is to find the set of rule H, just like

$B \cup H | = E . (6.1)$ $B \cup H | = E . (6.1)$

Given a set of background theory and observation values (training set), let E, B, and H represent the set of clauses. The following set of rules are considered as the conjunctive elements and increase the two constraint conditions: 1 − B ⋃ E: in this condition, H is must not be used to explain E; 2 B ⋃ H : this illustrates that B = ⊥ H is a theory without contradiction.

In ILP, there are two ways to describe the sample. In the first case, the positive example set is represented as E+, and the negative example set is represented as E−. Another approach only describes the positive cases in E and uses the closed world assumption. In this chapter, we use the second kind of hypothesis; that is, when learning rules, we only learn the rules of the positive cases. Each element of E is called a sample, and we say the counterexample that does not meet the target is the truth. In a relational database, ILP is usually restricted by the Horn clause and no-function rule, which is a basic fact set. In addition, the set E satisfies the previous requirements, but because it does not have the ability to speculate, it is generally not accepted as a solution.

In this chapter, we will use the algorithm FOIL [52], which is a top-down process. From top to bottom, the algorithm proceeds with the most general clause, which is specialized step by step. FOIL’s goal is to generate rules that cover all examples. The following steps are used to obtain the rules that contain the conclusion and the target predicate:

1.Make the rule X → Y the most general clause, among them X = T;

2.Choose a language l, such as a clause l X Y, to maximize the gain function;

3.X = l ∧ X;

4.If the confidence rate cf(X ∧ Y) > the critical value, then go to step (2);

5.Return X → Y.

Here, T = h0, h1...hn ∧ Lh.

The gain function can be calculated by the following formula [53]:

$g a i n (l \land X \to Y, X \to Y) = n * (\log_{2} (c f (l \land X \to Y)) - \log_{2} (c f (X \to Y))),$ $g a i n (l \land X \to Y, X \to Y) = n * (\log_{2} (c f (l \land X \to Y)) - \log_{2} (c f (X \to Y))),$

where n is the number of samples covered by l X Y. For a Horn clause X Y, its confidence rate is $c f (X \to Y) = \frac{P (X \land Y)}{P (X)} .$ $c f (X \to Y) = \frac{P (X \land Y)}{P (X)} .$ The confidence rate is also called the confidence level and can be calculated through the definition of probability [54]. Under the assumption of the closed world, ILP data should describe an explanation IILP. Thus, we can give a fact f:

$I_{I L P} | = f iff B \land E | = f,$ $I_{I L P} | = f iff B \land E | = f,$

where H is the Herbrand field described by B and E. Record the average probability of P on H. As a result, when →t is the vector based on n free variables, the confidence level of the clause X → Y is

$c f {(X (\overset{\leftarrow}{t}) \to Y (\vec{t}))}_{I_{I L : p}} = \frac{| {\vec{x} \in H^{n} | I_{I L P} | = σ [\vec{t} / \vec{x}] (X (\vec{t}) \land Y (\vec{t}))} |}{| {\vec{x} \in H^{n} | I_{I L P} | = σ [\vec{t} / \vec{x}] (X (\vec{t}))} |}, (6.2)$ $c f {(X (\overset{\leftarrow}{t}) \to Y (\vec{t}))}_{I_{I L : p}} = \frac{| {\vec{x} \in H^{n} | I_{I L P} | = σ [\vec{t} / \vec{x}] (X (\vec{t}) \land Y (\vec{t}))} |}{| {\vec{x} \in H^{n} | I_{I L P} | = σ [\vec{t} / \vec{x}] (X (\vec{t}))} |}, (6.2)$

where | | represents the base. In relation to the number of all samples (positive and negative) covered by the rule, the confidence rate can be understood as the proportion of the number of positive cases covered by the rule. The confidence rate is the rate for which the fact is true as inferred from the rules. However, the definition does not consider the number of rules covered. As this is in the context of first-order logic, this quantity does not always cover the total number of samples.

In ILP, the goal is to learn the concepts represented by predicates. E is a collection of all the facts that are relevant to the target predicate. Hence, in the case of non-recursion, the rule learned by us is made up of the predicate in B and the object predicate, where the predicate is the condition part and the object predicate is the conclusion part.

6.3Dynamic fuzzy HRL

This section is based on DFL and dynamic fuzzy matrix theory. The related algorithms are presented separately to describe the hierarchical relationship between data, how the rules are learnt, and how this is combined with the application of DFD.

6.3.1DFL relation learning algorithm (DFLR)

1. Summary of the relationship between DFL

Definition 6.2 A mapping is defined on the domain U:

$(\overset{\leftarrow}{A}, \vec{A}) : (\overset{\leftarrow}{U}, \vec{U}) \to [0, 1] \times [\leftarrow, \to], (\overset{\leftarrow}{u}, \bar{u}) \mapsto (\overset{\leftarrow}{A} (\overset{\leftarrow}{u}), \overset{\leftarrow}{A} (\overset{\leftarrow}{u})) .$ $(\overset{\leftarrow}{A}, \vec{A}) : (\overset{\leftarrow}{U}, \vec{U}) \to [0, 1] \times [\leftarrow, \to], (\overset{\leftarrow}{u}, \bar{u}) \mapsto (\overset{\leftarrow}{A} (\overset{\leftarrow}{u}), \overset{\leftarrow}{A} (\overset{\leftarrow}{u})) .$

We record this as $(\overset{\leftarrow}{A}, \vec{A}) = \overset{\leftarrow}{A} o r \vec{A} and call (\overset{\leftarrow}{A}, \vec{A})$ $(\overset{\leftarrow}{A}, \vec{A}) = \overset{\leftarrow}{A} o r \vec{A} and call (\overset{\leftarrow}{A}, \vec{A})$ as the dynamic fuzzy set of $(\overset{\leftarrow}{U}, \vec{U}) .$ $(\overset{\leftarrow}{U}, \vec{U}) .$ $(\overset{\leftarrow}{A} (\overset{\leftarrow}{u}), \vec{A} (\vec{u}))$ $(\overset{\leftarrow}{A} (\overset{\leftarrow}{u}), \vec{A} (\vec{u}))$ is the membership degree of $(\overset{\leftarrow}{A}, \vec{A})$ $(\overset{\leftarrow}{A}, \vec{A})$ according to some the membership degree function [48].

Consider the first-order logic relation database RDB and the dynamic fuzzy logical predicate as the positive fact set. The field [0, 1] × [←, →], which is the dynamic fuzzy rate of $[\overset{\leftarrow}{0}, \vec{1}],$ $[\overset{\leftarrow}{0}, \vec{1}],$ marks these facts. For example, a database contains the fact $(w e i g h t (c a r, (\overset{\leftarrow}{l i g h t}, \vec{l i g h t})), (\overset{\leftarrow}{0.9}, \vec{0.9}))$ $(w e i g h t (c a r, (\overset{\leftarrow}{l i g h t}, \vec{l i g h t})), (\overset{\leftarrow}{0.9}, \vec{0.9}))$ The fact states that the car is likely to represent a light truck. Thus, RDB is of the form $(A (\overset{\leftarrow}{x}, \vec{x}), (\overset{\leftarrow}{u} (A (\overset{\leftarrow}{x}, \vec{x})), \vec{u} (A (\overset{\leftarrow}{x}, \vec{x})))),$ $(A (\overset{\leftarrow}{x}, \vec{x}), (\overset{\leftarrow}{u} (A (\overset{\leftarrow}{x}, \vec{x})), \vec{u} (A (\overset{\leftarrow}{x}, \vec{x})))),$ where $(\overset{\leftarrow}{x}, \vec{x}) \in H^{n} A (\overset{\leftarrow}{x}, \vec{x}) is a fact,and (\overset{\leftarrow}{u} (A (\overset{\leftarrow}{x}, \vec{x})), \vec{u} (A (\overset{\leftarrow}{x}, \vec{x})))$ $(\overset{\leftarrow}{x}, \vec{x}) \in H^{n} A (\overset{\leftarrow}{x}, \vec{x}) is a fact,and (\overset{\leftarrow}{u} (A (\overset{\leftarrow}{x}, \vec{x})), \vec{u} (A (\overset{\leftarrow}{x}, \vec{x})))$ is the coincidence rate, which is related with the dynamic fuzzy attribute of $(\overset{\leftarrow}{x}, \vec{x}) .$ $(\overset{\leftarrow}{x}, \vec{x}) .$

Dynamic fuzzy predicates make the rules more flexible and expressive. The features function of general predicates is the horizontal tangent function μF, which is related to the dynamic fuzzy membership function μF, that is, $μ_{F_{e}} (\overset{\leftarrow}{x}, \vec{x}) = 1$ $μ_{F_{e}} (\overset{\leftarrow}{x}, \vec{x}) = 1$ iff $μ (F (\overset{\leftarrow}{x}, \vec{x})) \geq α or μ_{F_{α}} (\overset{\leftarrow}{x}, \vec{x}) = 0.$ $μ (F (\overset{\leftarrow}{x}, \vec{x})) \geq α or μ_{F_{α}} (\overset{\leftarrow}{x}, \vec{x}) = 0.$ Thus, the rule $A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})$ $A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})$ is naturally related to a clear rule $A_{α} (\overset{\leftarrow}{t}, \vec{t}) \to C_{β} (\overset{\leftarrow}{t}, \vec{t}) . If A_{β} (\overset{\leftarrow}{t}, \vec{t})$ $A_{α} (\overset{\leftarrow}{t}, \vec{t}) \to C_{β} (\overset{\leftarrow}{t}, \vec{t}) . If A_{β} (\overset{\leftarrow}{t}, \vec{t})$ is true, then $A_{α} (\overset{\leftarrow}{t}, \vec{t})$ $A_{α} (\overset{\leftarrow}{t}, \vec{t})$ is true, because α ≠ β. Thus, we need only consider the clear approximation $A_{α} (\overset{\leftarrow}{t}, \vec{t}) \to$ $A_{α} (\overset{\leftarrow}{t}, \vec{t}) \to$ $C_{α} (\overset{\leftarrow}{t}, \vec{t}) .$ $C_{α} (\overset{\leftarrow}{t}, \vec{t}) .$

In this way, if the flexibility is considered, one possible understanding of the dynamic fuzzy rule $A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})$ $A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})$ is

$\forall (\overset{\leftarrow}{x}, \vec{x}), \exists α A_{α} (\overset{\leftarrow}{x}, \vec{x}) \to C_{α} (\overset{\leftarrow}{x}, \vec{x}) . (6.3)$ $\forall (\overset{\leftarrow}{x}, \vec{x}), \exists α A_{α} (\overset{\leftarrow}{x}, \vec{x}) \to C_{α} (\overset{\leftarrow}{x}, \vec{x}) . (6.3)$

That is, there exists a clear understanding of the dynamic fuzzy rule of each sample [but there is no need to deal with every sample in the same way, because α does not rely on $(\overset{\leftarrow}{x}, \vec{x})] .$ $(\overset{\leftarrow}{x}, \vec{x})] .$

If expressiveness is being considered, we need to find the dynamic fuzzy rule that is true to each horizontal cut copy of it. This means

$\forall (\overset{\leftarrow}{x}, \vec{x}), \forall α A_{α} (\overset{\leftarrow}{x}, \vec{x}) \to C_{α} (\overset{\leftarrow}{x}, \vec{x}), (6.4)$ $\forall (\overset{\leftarrow}{x}, \vec{x}), \forall α A_{α} (\overset{\leftarrow}{x}, \vec{x}) \to C_{α} (\overset{\leftarrow}{x}, \vec{x}), (6.4)$

which is stricter than (6.3) because the dynamic fuzzy rule is just a general rule set. Dynamic fuzzy rules are nested predicates that summarize the only dynamic fuzzy rule. In fact, (6.4) is just a gradual rule: the meaning of the expression is “the more $(\overset{\leftarrow}{x}, \vec{x})$ $(\overset{\leftarrow}{x}, \vec{x})$ accords with A, the more $(\overset{\leftarrow}{x}, \vec{x})$ $(\overset{\leftarrow}{x}, \vec{x})$ accords with C” [because it simulates $μ (A (\overset{\leftarrow}{x}, \vec{x})) \geq μ (C (\overset{\leftarrow}{x}, \vec{x}))$ $μ (A (\overset{\leftarrow}{x}, \vec{x})) \geq μ (C (\overset{\leftarrow}{x}, \vec{x}))$ by a conditional constraint].

This type of gradual rule is one of the four general dynamic fuzzy rules. The gradual rule and the certainty rule are based on implication connection and restrict the possible model of the world. The other two types, possibility rules and non-gradual rules, are more likely to express certain values that are guaranteed to be possible (i.e. they are present in the sample group). For example, the form “the more $(\overset{\leftarrow}{x}, \vec{x})$ $(\overset{\leftarrow}{x}, \vec{x})$ accords with A, the greater the possibility that C can be true” (when C is dynamic and fuzzy, the true value becomes a problem of some rate). That is, when C is dynamic and fuzzy, for any explanations that are possible for C, the more samples exist. Note that the rule cannot contain any “classic” counterexamples because we are interested in the distribution of the membership rate in the database.

Next, we consider gradual and certainty rules. Compared with the possibility rule, the certainty rule expresses that “the more $(\overset{\leftarrow}{x}, \vec{x})$ $(\overset{\leftarrow}{x}, \vec{x})$ accords with A, the more certain it is that $(\overset{\leftarrow}{x}, \vec{x})$ $(\overset{\leftarrow}{x}, \vec{x})$ is C”. First, consider A as a dynamic fuzzy predicate and C as a general predicate. This is represented as “the more $(\overset{\leftarrow}{x}, \vec{x})$ $(\overset{\leftarrow}{x}, \vec{x})$ accords with A, which gives a bigger α according to $A (\overset{\leftarrow}{x}, \vec{x}) \geq α,$ $A (\overset{\leftarrow}{x}, \vec{x}) \geq α,$ the fewer the extra situations of the rule $A_{α} (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})'' .$ $A_{α} (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})'' .$ In fact, when α becomes smaller, the number of extra situations can only increase, because the range of Aα becomes larger. When C is also a dynamic fuzzy predicate, to retain the comprehension, we need to search for rules of the form

$\forall (\overset{\leftarrow}{x}, \vec{x}), \forall α A_{α} (\overset{\leftarrow}{x}, \vec{x}) \to C_{1 - α} (\overset{\leftarrow}{x}, \vec{x}), (6.5)$ $\forall (\overset{\leftarrow}{x}, \vec{x}), \forall α A_{α} (\overset{\leftarrow}{x}, \vec{x}) \to C_{1 - α} (\overset{\leftarrow}{x}, \vec{x}), (6.5)$

because when α becomes bigger, C1−α covers more samples.

It is known to be difficult to handle real-valued attributes when attempting to learn an algorithm of the rules, because numerical data can lead to an infinite number of hypotheses. In relationship learning, because the assumption space is large, this problem is more in-depth. When real numbers appear in the concept of learning, the difficulty will increase. Real numbers are generally treated by two methods: the introduction of constraints or the discretization of the real number.

In first-order logic, constraint conditions generally use a number of mathematical operations (inequalities, averages, etc.). The rules induced in this method may lack expression and generality. In addition, in the first-order logic, the algorithm to deal with the constraints is often beyond the scope of the standard solution.

The second method is to use discretization and clustering to convert the continuous information into qualitative information. Then, the information can be processed directly by classical logic and background knowledge. This method is the most widely used because it allows the processing of numerical data and improves the readability of the data. Because clusters are usually defined before induction, the generated rules are dependent on the quality of the cluster. These clusters are usually represented by means of imprecise and variable predicates. For example, among the auto-mpg data in the UCI database, the fuel consumption per mile can be represented by the three predicates “low consumption”, “medium consumption”, and “high consumption”. In this case, the dynamic fuzzy markers represented by the dynamic fuzzy set are more suitable for describing the consumption values, because the dynamic fuzzy markers can avoid any critical point between low and medium.

Finally, the use of dynamic fuzzy predicates allows us to relax the rigidity of the clustering clarity and maintain the readability of the resolution rule. At the same time, the different types of dynamic fuzzy rules can be better described and can also handle a new type of data generalization. The flexible rules are regarded as a dynamic fuzzy fitting of clear rules, in this sense, the high-level cut of some dynamic fuzzy expression corresponding to a dynamic fuzzy predicate. There is an understanding of the dynamic fuzzy predicate with respect to the data, which can produce meaningful rules. Gradual rules and certainty rules are the new rules that describe the new properties of the data. In ILP, the goal is to find a hypothesis that is reasonable and complete with respect to the sample. The background knowledge and sample define the interpretation, in which there is a false target concept. If the assumption does not cover the fact of the concept of this false target, then the assumption is reasonable; there is a true target concept in the explanation, and, if it really covers all the facts in the true target, then the hypothesis is complete. In the dynamic fuzzy ILP, the definition of the sample covered by the rule depends on the dynamic fuzzy rules and the membership rate type of the verification rule fact.

2. DFL relation learning algorithm

In the FOIL algorithm, the measurement criteria of the procedures are the confidence, suspension conditions, and the number of samples covered by the rules. If a sample is covered by a regular copy of the rule, the sample is covered by the fuzzy rule. As a result, the following measures are described for each of these rules.

1) Flexible rule

For dynamic fuzzy rule $A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t}),$ $A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t}),$ the first meaning is close to the classical rule. The coincidence rate of $A (\overset{\leftarrow}{x}, \vec{x}) and C (\overset{\leftarrow}{x}, \vec{x})$ $A (\overset{\leftarrow}{x}, \vec{x}) and C (\overset{\leftarrow}{x}, \vec{x})$ should be as high as possible. Thus, the classic explanations of each horizontal cut α are induced.

Definition 6.3 For a known fact f, the α-explanatory type Iα is defined as follows:

$I_{α} | = f i f f B \land E | = f and μ (f) \geq α .$ $I_{α} | = f i f f B \land E | = f and μ (f) \geq α .$

In this kind interpretation, only the fact that the coincidence rate is higher than α is true. Now, calculate the coincidence rate of each explanatory type of α in the classic way [55]. According to the original intention of the dynamic fuzzy rules, we must agree with the confidence level of the high α-explanatory-type calculation rules. In fact, it is more likely to be covered by a high degree of coincidence of the sample. A first-order logic is proposed in [55], and the next definition is the adjustment of the first-order logic:

$c f_{f l e x} (A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})) = \sum_{α_{t}} (α_{i} - α_{i + 1}) * c f {(A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t}))}_{I_{α_{t}}},$ $c f_{f l e x} (A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})) = \sum_{α_{t}} (α_{i} - α_{i + 1}) * c f {(A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t}))}_{I_{α_{t}}},$

where α1 = 1, . . . αt = 0 is the decreasing sequence of coincidence rates that appears in the database. The coincidence is the discretization of confidence rates in the α-explanatory type. We induce the formula for calculating the number of different covered samples:

$\begin{array}{l} n_{f l e x} (A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})) = \sum_{α_{i}} (α_{i} - α_{i + 1}) * | {({\overset{\leftarrow}{x}}_{1}, {\vec{x}}_{1}) \in H^{q}, \exists ({\overset{\leftarrow}{x}}_{2}, {\vec{x}}_{2}) \in H^{T} | I_{α_{i}} |} \\ = σ | ({\overset{\leftarrow}{t}}_{1}, {\vec{t}}_{2}), ({\overset{\leftarrow}{t}}_{1}, {\vec{t}}_{2}), ({\overset{\leftarrow}{x}}_{1}, {\vec{x}}_{1}), ({\overset{\leftarrow}{x}}_{_{2}}, {\vec{x}}_{2}) | (A \land C) \end{array}$ $\begin{array}{l} n_{f l e x} (A (\overset{\leftarrow}{t}, \vec{t}) \to C (\overset{\leftarrow}{t}, \vec{t})) = \sum_{α_{i}} (α_{i} - α_{i + 1}) * | {({\overset{\leftarrow}{x}}_{1}, {\vec{x}}_{1}) \in H^{q}, \exists ({\overset{\leftarrow}{x}}_{2}, {\vec{x}}_{2}) \in H^{T} | I_{α_{i}} |} \\ = σ | ({\overset{\leftarrow}{t}}_{1}, {\vec{t}}_{2}), ({\overset{\leftarrow}{t}}_{1}, {\vec{t}}_{2}), ({\overset{\leftarrow}{x}}_{1}, {\vec{x}}_{1}), ({\overset{\leftarrow}{x}}_{_{2}}, {\vec{x}}_{2}) | (A \land C) \end{array}$

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6 Dynamic fuzzy hierarchical relationships

Create new playlist

Sign In

Sign Up