1.10 Bibliographic Notes

The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [P-SF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSS+96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF09]; Introduction to Data Mining by Tan, Steinbach, and Kumar [TSK05]; Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations by Witten, Frank, and Hall [WFH11]; Predictive Data Mining by Weiss and Indurkhya [WI98]; Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linoff [BL99]; Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01]; Mining the Web: Discovering Knowledge from Hypertext Data by Chakrabarti [Cha03a]; Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data by Liu [Liu06]; Data Mining: Introductory and Advanced Topics by Dunham [Dun03]; and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03].

There are also books that contain collections of papers or chapters on particular aspects of knowledge discovery—for example, Relational Data Mining edited by Dzeroski and Lavrac [De01]; Mining Graph Data edited by Cook and Holder [CH07]; Data Streams: Models and Algorithms edited by Aggarwal [Agg06]; Next Generation of Data Mining edited by Kargupta, Han, Yu, et al. [KHY+08]; Multimedia Data Mining: A Systematic Introduction to Concepts and Theory edited by Z. Zhang and R. Zhang [ZZ09]; Geographic Data Mining and Knowledge Discovery edited by Miller and Han [MH09]; and Link Mining: Models, Algorithms and Applications edited by Yu, Han, and Faloutsos [YHF10]. There are many tutorial notes on data mining in major databases, data mining, machine learning, statistics, and Web technology conferences.

KDNuggets is a regular electronic newsletter containing information relevant to knowledge discovery and data mining, moderated by Piatetsky-Shapiro since 1991. The Internet site KDNuggets (www.kdnuggets.com) contains a good collection of KDD-related information.

The data mining community started its first international conference on knowledge discovery and data mining in 1995. The conference evolved from the four international workshops on knowledge discovery in databases, held from 1989 to 1994. ACM-SIGKDD, a Special Interest Group on Knowledge Discovery in Databases was set up under ACM in 1998 and has been organizing the international conferences on knowledge discovery and data mining since 1999. IEEE Computer Science Society has organized its annual data mining conference, International Conference on Data Mining (ICDM), since 2001. SIAM (Society on Industrial and Applied Mathematics) has organized its annual data mining conference, SIAM Data Mining Conference (SDM), since 2002. A dedicated journal, Data Mining and Knowledge Discovery, published by Kluwers Publishers, has been available since 1997. An ACM journal, ACM Transactions on Knowledge Discovery from Data, published its first volume in 2007.

ACM-SIGKDD also publishes a bi-annual newsletter, SIGKDD Explorations. There are a few other international or regional conferences on data mining, such as the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), and the International Conference on Data Warehousing and Knowledge Discovery (DaWaK).

Research in data mining has also been published in books, conferences, and journals on databases, statistics, machine learning, and data visualization. References to such sources are listed at the end of the book.

Popular textbooks on database systems include Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom [GMUW08]; Database Management Systems by Ramakrishnan and Gehrke [RG03]; Database System Concepts by Silberschatz, Korth, and Sudarshan [SKS10]; and Fundamentals of Database Systems by Elmasri and Navathe [EN10]. For an edited collection of seminal articles on database systems, see Readings in Database Systems by Hellerstein and Stonebraker [HS05].

There are also many books on data warehouse technology, systems, and applications, such as The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by Kimball and Ross [KR02]; The Data Warehouse Lifecycle Toolkit by Kimball, Ross, Thornthwaite, and Mundy [KRTM08]; Mastering Data Warehouse Design: Relational and Dimensional Techniques by Imhoff, Galemmo, and Geiger [IGG03]; and Building the Data Warehouse by Inmon [Inm96]. A set of research papers on materialized views and data warehouse implementations were collected in Materialized Views: Techniques, Implementations, and Applications by Gupta and Mumick [GM99]. Chaudhuri and Dayal [CD97] present an early comprehensive overview of data warehouse technology.

Research results relating to data mining and data warehousing have been published in the proceedings of many international database conferences, including the ACM-SIGMOD International Conference on Management of Data (SIGMOD), the International Conference on Very Large Data Bases (VLDB), the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), the International Conference on Data Engineering (ICDE), the International Conference on Extending Database Technology (EDBT), the International Conference on Database Theory (ICDT), the International Conference on Information and Knowledge Management (CIKM), the International Conference on Database and Expert Systems Applications (DEXA), and the International Symposium on Database Systems for Advanced Applications (DASFAA). Research in data mining is also published in major database journals, such as IEEE Transactions on Knowledge and Data Engineering (TKDE), ACM Transactions on Database Systems (TODS), Information Systems, The VLDB Journal, Data and Knowledge Engineering, International Journal of Intelligent Information Systems (JIIS), and Knowledge and Information Systems (KAIS).

Many effective data mining methods have been developed by statisticians and introduced in a rich set of textbooks. An overview of classification from a statistical pattern recognition perspective can be found in Pattern Classification by Duda, Hart, and Stork [DHS01]. There are also many textbooks covering regression and other topics in statistical analysis, such as Mathematical Statistics: Basic Ideas and Selected Topics by Bickel and Doksum [BD01]; The Statistical Sleuth: A Course in Methods of Data Analysis by Ramsey and Schafer [RS01]; Applied Linear Statistical Models by Neter, Kutner, Nachtsheim, and Wasserman [NKNW96]; An Introduction to Generalized Linear Models by Dobson [Dob90]; Applied Statistical Time Series Analysis by Shumway [Shu88]; and Applied Multivariate Statistical Analysis by Johnson and Wichern [JW92].

Research in statistics is published in the proceedings of several major statistical conferences, including Joint Statistical Meetings, International Conference of the Royal Statistical Society and Symposium on the Interface: Computing Science and Statistics. Other sources of publication include the Journal of the Royal Statistical Society, The Annals of Statistics, the Journal of American Statistical Association, Technometrics, and Biometrika.

Textbooks and reference books on machine learning and pattern recognition include Machine Learning by Mitchell [Mit97]; Pattern Recognition and Machine Learning by Bishop [Bis06]; Pattern Recognition by Theodoridis and Koutroumbas [TK08]; Introduction to Machine Learning by Alpaydin [Alp11]; Probabilistic Graphical Models: Principles and Techniques by Koller and Friedman [KF09]; and Machine Learning: An Algorithmic Perspective by Marsland [Mar09]. For an edited collection of seminal articles on machine learning, see Machine Learning, An Artificial Intelligence Approach, Volumes 1 through 4, edited by Michalski et al. [MCM83, MCM86, KM90, MT94], and Readings in Machine Learning by Shavlik and Dietterich [SD90].

Machine learning and pattern recognition research is published in the proceedings of several major machine learning, artificial intelligence, and pattern recognition conferences, including the International Conference on Machine Learning (ML), the ACM Conference on Computational Learning Theory (COLT), the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), the International Conference on Pattern Recognition (ICPR), the International Joint Conference on Artificial Intelligence (IJCAI), and the American Association of Artificial Intelligence Conference (AAAI). Other sources of publication include major machine learning, artificial intelligence, pattern recognition, and knowledge system journals, some of which have been mentioned before. Others include Machine Learning (ML), Pattern Recognition (PR), Artificial Intelligence Journal (AI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), and Cognitive Science.

Textbooks and reference books on information retrieval include Introduction to Information Retrieval by Manning, Raghavan, and Schutz [MRS08]; Information Retrieval: Implementing and Evaluating Search Engines by Büttcher, Clarke, and Cormack [BCC10]; Search Engines: Information Retrieval in Practice by Croft, Metzler, and Strohman [CMS09]; Modern Information Retrieval: The Concepts and Technology Behind Search by Baeza-Yates and Ribeiro-Neto [BYRN11]; and Information Retrieval: Algorithms and Heuristics by Grossman and Frieder [GR04].

Information retrieval research is published in the proceedings of several information retrieval and Web search and mining conferences, including the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), the International World Wide Web Conference (WWW), the ACM International Conference on Web Search and Data Mining (WSDM), the ACM Conference on Information and Knowledge Management (CIKM), the European Conference on Information Retrieval (ECIR), the Text Retrieval Conference (TREC), and the ACM/IEEE Joint Conference on Digital Libraries (JCDL). Other sources of publication include major information retrieval, information systems, and Web journals, such as Journal of Information Retrieval, ACM Transactions on Information Systems (TOIS), Information Processing and Management, Knowledge and Information Systems (KAIS), and IEEE Transactions on Knowledge and Data Engineering (TKDE).


1A petabyte is a unit of information or computer storage equal to 1 quadrillion bytes, or a thousand terabytes, or 1million gigabytes.

3A popular trend in the information industry is to perform data cleaning and data integration as a preprocessing step, where the resulting data are stored in a data warehouse.

4Sometimes data transformation and consolidation are performed before the data selection process, particularly in the case of data warehousing. Data reduction may also be performed to obtain a smaller representation of the original data without sacrificing its integrity.

5A Web crawler is a computer program that browses the Web in a methodical, automated manner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.184.117