Decision tree algorithms are an active area of research in the field of data mining. They incorporate multi-stage decision processes. Because of the rapidity of decision tree generation and the clarity and comprehensibility of their structures, it is easy to extract commercially valuable information that enables decision makers to make the right decisions. Thus, decision trees are widely used to good effect in the military, medicine, securities and investment, exploration, and corporate governance. In medicine, a decision tree model for pancreatic cancer analysis successfully concluded that a patient should first have an endoscopy; in securities, decision trees are applied more widely, such as the decision tree model used in the ISCRMS system [50] (ISCRMS is a customer intelligence analysis solution developed by applying data mining/online analytical techniques in the security field); and customer classification has been applied to data mining in the banking industry. In other respects, the decision tree model has been widely used in company management, exploration, and so on.

Although the domain of decision tree models varies, different applications have the same essence – using a decision tree algorithm to process and classify data and providing a friendly, easy-to-use data mining application environment. After datasets have been obtained for different problems, the attributes can be processed according to the DFDT processing method described in the preceding sections of this chapter, and the decision tree can be constructed and pruned. The following sections compare our DFDT with the C4.5 decision tree algorithm.

3.5.1Comparison of algorithm execution

A decision tree algorithm consists of three steps: (1) data preprocessing, that is, data discretization (or dynamic fuzzification), (2) decision tree construction in accordance with the attribute selection algorithm to build the decision tree, and (3) pruning to prevent overfitting.

The purpose of data preprocessing is to apply to the attribute selection algorithm. In the data preprocessing stage, DFDT uses different processing methods according to the situation and treats the attributes for discretization and dynamic fuzzification. The fuzzified attribute values are dynamic fuzzy numbers within the interval [(0.0),(1,1)], as presented in Tab. 3.1. The fuzzified attribute-value pairs are <Effort = high, 0.7 >.

As absolute clarity does not exist in the real world, it is more natural and reasonable to describe the characteristics of the instance after applying fuzzification to the attributes.

3.5.2Comparison of training accuracy

A dataset from the UCI-Machine-Learning database was tested using DFDT and C4.5. The raw data with a test instance set was processed, the training instance set was combined with the test instance set, and cross-validation was used to test the constructed decision tree. The number of cross-validated directories was set to 10. The information in the data table is presented in Tab. 3.11.

The C4.5 algorithm and DFDT algorithm were applied to the UCI dataset, which was divided into a pruned decision tree and an unpruned decision tree. We recorded the number of leaf nodes generating the decision tree, the size of the tree, the number of instances of the correct classification on the training instance set, and the number of instances of correctly categorized cross-validation. Table 3.12 presents the results obtained using the C4.5 algorithm for the UCI dataset, and Tab. 3.13 presents those from the DFDT algorithm.

Tab. 3.11: Information of UCI.

Tab. 3.12: Results of C4.5 on UCI.

Tab. 3.13: Results of DFDT on UCI.

Fig. 3.8: Classification accuracy comparison of unpruned tree on training instance set.
Fig. 3.9: Classification accuracy comparison of pruned tree on training instance set.

Figures 3.83.11 were obtained by comparing Tab. 3.11 with Tab. 3.12. Figure 3.8 compares the classification accuracy of the unpruned decision tree established by the C4.5 method and the DFDT with the training instance set. Figure 3.9 compares the classification accuracy of the pruned decision tree established by the C4.5 method and the DFDT and Fig. 3.10 shows a comparison of the accuracy of the cross-validation of the unpruned decision trees on the training instance set. Figure 3.11 compares the classification accuracy of the pruned decision trees in C4.5 and DFDT.

Analysis of experimental results:

(1)DFDT uses the dynamic fuzzy preprocessing method, which fully considers the dynamics and fuzziness of the real problem. As a result, DFDT has better reasoning and learning abilities in the environment of dynamic fuzziness. Compared with the C4.5 method, the prediction ability is improved, and its improvement is reflected in the classification accuracy of the cross-validation of the instance set.

(2)As the attribute selection algorithm considers the decision attribute’s contribution to the whole decision-making instance set, the example sets achieve better classification with DFDT. In DFDT, the leaf node number and size of tree are less than those of the decision tree given by the C4.5 algorithm. Using the principle of Occam’s razor, a small decision tree has better generalization ability. Thus, the DFDT instances in the training set have better classification accuracy, and so the DFDT has high generalization ability.

Fig. 3.10: Classification accuracy comparison of unpruned tree on cross-validation.
Fig. 3.11: Classification accuracy comparison of pruned tree on cross-validation.

3.5.3Comprehensibility comparisons

Real life is a dynamic and fuzzy, and dynamic data from fuzzy operations reflect this dynamic uncertainty. DFDT handles such properties, whether semantic or numerical. Handling attributes of different types is easier to understand than the discretization of numerical attributes in C4.5.

As the recognition of certain attributes, especially semantic attributes, is ambiguous in reality, the use of clear boundaries to describe attributes cannot describe the characteristics of data very well and is not consistent with people’s thinking. Dynamic data blurring in DFDT not only contains the original attributes but also assigns a dynamic fuzzy number to describe the degree of each attribute in the tree. Therefore, DFDT contains more information than the C4.5 decision tree does, and this dynamic fuzzification enables the current level of development to be extrapolated to future trends. This is more in line with reality, and is easy to understand.

3.6Summary

The decision tree method is derived from the concept learning system, which developed into ID3 and then the C4.5 algorithm, and can deal with continuous-valued attributes. In view of the dynamic fuzziness of reality, further studies are needed for the representation of dynamic fuzziness in decision tree learning and to identify whether there is a better attribute selection algorithm. Based on this, this chapter has studied the basic problem of decision tree learning based on a dynamic fuzzy lattice.

The innovations of this chapter are as follows:

(1)We analysed the dynamic fuzziness in decision tree learning and proposed a DFDT and DFBDT based on a dynamic fuzzy lattice.

(2)By studying the original decision tree learning algorithm, an attribute selection algorithm based on classification accuracy was proposed.

(3)By introducing the partitioned lattice into the DFDT, the basic concept of a dynamic fuzzy partitioning grid was developed. Based on this, the relationship between the DFDT and the dynamic fuzzy binary decision tree was analysed. The relevant theorems and proofs were given. Using the dynamic fuzzy partitioning grid as a basis, the discretization of single attributes in the fuzzy decision tree and the discretization of multiple attributes was considered.

(4)Based on the information contribution of attributes to classified information, a method of dealing with missing values in DFDTs was proposed.

References

[1]Zhang J, Vasant H. Learning decision tree classifiers from attribute value taxonomies and partially specified data. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.

[2]Myles AJ, Brown SD. Induction of decision trees using fuzzy partitions. Journal of Chemometrics, 2003, 17: 531–536.

[3]Olarn C, Wehenkel L. A complete fuzzy decision tree technique. Fuzzy Sets and Systems, 2003, 138: 221–254.

[4]Aoki K, Watanabe T, Kudo M. Design of decision tree using class-dependent features subsets. Systems and Computers, 2005, 36(4): 37–47.

[5]Kalles D, Papagelis A. Induction of decision trees in numeric domains using set-valued attributes. Intelligence Data Analysis, 2000, 4(4): 323–347.

[6]Esposito F, Malerba D, Semeraro G. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(5): 476–491.

[7]Bolakova I. Pruning decision trees to reduce tree size. In Proceedings of the International Conference Traditional and Innovations in Sustainable Development of Society, 2002, 160–166.

[8]Fournier D, Cremilleux B. A quality index for decision tree pruning. Knowledge-Based System, 2002, 15: 37–47.

[9]Esposito F, Malerba D, Semeraro G, Tamma V. The effects of pruning methods on the predictive accuracy of induced decision trees. Applied Stochastic Models in Business and Industry, 1999, 15(4): 277–299.

[10]Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD. An introduction to decision tree modeling. Journal of Chemometrics, 2004, 18: 275–285.

[11]Zhu H. The research of the simplest decision tree production algorithm based on rough set. Computer Engineering and Application, 2003, 39(13): 129–131.

[12]Jiang Y, Li ZH, Zhang Q, Liu Y. New method for constructing decision tree based on rough sets theory. Journal of Computer Applications, 2004, 24(8): 21–23.

[13]Yin AD, Xie LQ, Long Y, Yang LD. Researches on dynamic algorithm of decision trees. Computer Engineering and Application, 2004, 40(33): 103–105.

[14]Zhu YZ, Wu GF. A Two-phrase-based construction method of decision tree and its application. Computer Engineering, 2004, 30(1): 82–84.

[15]Yang HW, Zhao MH, Sun J, Wang XZ. A decision tree based on hierarchy decomposition. Computer Engineering and Application, 2003, 39(23): 108–110.

[16]Wang XG, Hang SK, Zhu W, Li QY. Method of building decision trees of customer classification by using C4.5 algorithm. Computer Engineering. 2003, 29(14): 89–91.

[17]Wei HN. Study on the parallelism of decision tree classification based on SPRINT. Journal of Computer Applications, 2003, 25(1): 39–41.

[18]Luan LH, Ji GL. The study on decision tree classification techniques. Computer Engineering, 2004, 30(9): 94–96.

[19]Han H, Wang F, Wang WY. Review of recent development in decision tree algorithm in data mining. Application Research of Computers, 2004, 12: 5–8.

[20]Wen SP, QIiao SY, Chen CY, Li ZG. Decision-tree-based missing data filling and rules extraction in incomplete decision table. Journal of Computer Applications, 2003,11(23): 17–19.

[21]Zhang SY, Zhu ZY. The study of semi-structured information retrieval based on Web. Computer Engineering and Application, 2004, 40(13): 69–71.

[22]Shi CQ, Yi A. Intrusion detection based on algorithm of multi-decision tree. Computer Engineering and Design, 2004, 25(4): 518–519.

[23]Li CY, Yang YT. Key technology of packet filter using decision trees. Computer Engineering, 2004, 30(1): 45–47.

[24]Shao FJ, Yu ZQ. Principle and algorithm of data mining. Beijing, China, China Water & Power Press, 2003.

[25]Li ML. Research and application on theory of decision tree based on dynamic fuzzy lattice. Master’s Thesis, Soochow University, Suzhou, China, 2008.

[26]Li ML, Li FZ. Research on lattice structure of DFL and its application. Computer Applications and Software, 2007, 24(9): 145–150.

[27]Li FZ, Liu GQ, She YM. An introduction to dynamic fuzzy logic. Kunming, China, Yunnan Science and Technology Press, 2005.

[28]Cai C. Research and application on the dynamic fuzzy decision tree learning. Master’s Thesis, Soochow University, Suzhou, China, 2008.

[29]Mitchell, TM. Machine learning. NY, USA, McGraw-Hill, 1997.

[30]Chen, ZB. Data warehouse and data mining. Beijing, China, Tsinghua University Press, 2009.

[31]Sun CL, Zhang JF. The information gain algorithm based on attribute-value pairs. Journal of Taiyuan University of Science and Technology, 2005, 26(3): 199–202.

[32]Yao YY, Yao JT. Granular computing as a basis for consistent classification problems. In Proceedings of PAKDD Workshop on Foundations of Data Mining, 2002.

[33]Ghahramani Z, Jordan MI. Supervised learning from incomplete data via an em approach. Advances in Neural Information Processing Systems, 1993, 6: 120–127.

[34]Li HX, Xu LD. Feature space theory – A mathematical foundation for data mining. Knowledge-Based System, 2001, 14: 253–257.

[35]Shi ZZ. Knowledge discovery. Beijing, China, Tsinghua University Press, 2002.

[36]Mao GJ, Duan LJ, Wang S, Shi Y. Principles and algorithms of data mining. Beijing, China, Tsinghua University Press, 2005.

[37]Witten IH, Frank E. Data mining practical machine learning tools and techniques with java implementations. Beijing, China, China Machine Press, 2003.

[38]Huan L, Farhad H, Manoranjan D. Discretization: A enabling technique. Data Ming and Knowledge Discovery, 2002, 6: 393–423.

[39]Dan V, Martinez TR. An empirical comparisons of discretization methods. In Proceedings of the 10th International Symposium on Computer and Information Science, 1995, 443–450.

[40]Wang LH, Wu GF. Study on the discretization lattice of information table. Pattern Recognition and Artificial Intelligence, 2004, 17(1): 11–15.

[41]Wang LH, Wu Y, Wu GF. Heuristic algorithm for discretization lattice searching. Journal of Computer Applications, 2004, 24(8): 41–43.

[42]Kettenring J, Lindsay B, Siegmund D. Statistics: Challenges and opportunities for the twenty-first century. Transportation Research Record: Journal of the Transportation Research Board, 2003, 1951(1): 44–51.

[43]Jin YJ. Lack of data in the survey and processing (I)—Missing data and its impact. Application of Statistics and Management, 2001, 20(1): 59–62.

[44]Quinlan JR. Unknown attribute values in induction. In Proceedings of 6th International Workshop on Machine Learning, 1989, 164–168.

[45]Grzymala-Busse JW, Fu M. A comparison of several approaches to missing attribute values in data mining. In Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing, 2000, 378–385.

[46]Liu P, Lei L, Zhang XF. A comparison study of missing value processing methods. Computer Science, 2004, 31(10): 155–156.

[47]Dai WS, Xie BC. Treatment of missing values in experimental design. Statistics and Decision, 2004, (9): 6–7.

[48]Huang XL. A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets. Pattern Recognition Letters, 2002, 23(13): 1613–1622.

[49]Cai ZX, Xu GY. Artificial intelligence: Principles and applications. Beijing, China, Tsinghua University Press, 1996.

[50]Li H, Zhu JQ, Zhu YY. ISCRMS: An Intelligent Solution of CRM in Stock. Computer Engineering, 2003, 29(z1): 138–140.

[51]Breiman LI, Friedman JH, Olshen RA, et al. Classification and Regression Trees (CART)[J]. Biometrics, 1984, 40(3): 358.

[52]White AP. Probabilistic induction by dynamic part generation in virtual trees[J]. Journal of Petrology, 1986, 45(9): 1747–1776.

[53]Liu WZ, White AP. The importance of attribute selection measures in decision tree induction[J]. Machine Learning, 1994, 15(1): 25–41.

[54]Friedman, Jerome H. “Lazy decision trees.” Thirteenth National Conference on Artificial Intelligence AAAI Press, 1996: 717–724.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.178.237