Read more
The growth in the amount of data collected and generated has exploded in recent times with the widespread automation of various day-to-day activities, advances in high-level scienti?c and engineering research and the development of e?cient data collection tools. This has given rise to the need for automa- callyanalyzingthedatainordertoextractknowledgefromit,therebymaking the data potentially more useful. Knowledge discovery and data mining (KDD) is the process of identifying valid, novel, potentially useful and ultimately understandable patterns from massive data repositories. It is a multi-disciplinary topic, drawing from s- eral ?elds including expert systems, machine learning, intelligent databases, knowledge acquisition, case-based reasoning, pattern recognition and stat- tics. Many data mining systems have typically evolved around well-organized database systems (e.g., relational databases) containing relevant information. But, more and more, one ?nds relevant information hidden in unstructured text and in other complex forms. Mining in the domains of the world-wide web, bioinformatics, geoscienti?c data, and spatial and temporal applications comprise some illustrative examples in this regard. Discovery of knowledge, or potentially useful patterns, from such complex data often requires the - plication of advanced techniques that are better able to exploit the nature and representation of the data. Such advanced methods include, among o- ers, graph-based and tree-based approaches to relational learning, sequence mining, link-based classi?cation, Bayesian networks, hidden Markov models, neural networks, kernel-based methods, evolutionary algorithms, rough sets and fuzzy logic, and hybrid systems. Many of these methods are developed in the followingchapters.
List of contents
Foundations.- Knowledge Discovery and Data Mining.- Automatic Discovery of Class Hierarchies via Output Space Decomposition.- Graph-based Mining of Complex Data.- Predictive Graph Mining with Kernel Methods.- TreeMiner: An Efficient Algorithm for Mining Embedded Ordered Frequent Trees.- Sequence Data Mining.- Link-based Classification.- Applications.- Knowledge Discovery from Evolutionary Trees.- Ontology-Assisted Mining of RDF Documents.- Image Retrieval using Visual Features and Relevance Feedback.- Significant Feature Selection Using Computational Intelligent Techniques for Intrusion Detection.- On-board Mining of Data Streams in Sensor Networks.- Discovering an Evolutionary Classifier over a High-speed Nonstatic Stream.
About the author
Prof. Sanghamitra Bandyopadhyay has many years of experience in the development of soft computing techniques. Among other awards and positions, she has received senior researcher Humboldt Fellowships, and she is a regular visitor to the DKFZ (German Cancer Research Centre) and to European and North American universities, collaborating in multidisciplinary teams on applications in the areas of computational biology and bioinformatics. Among other awards Prof. Bandyopadhyay received the prestigious Shanti Swarup Bhatnagar Prize in Engineering Sciences in 2010, she is a Fellow of the National Academy of Sciences of India and she is a Fellow of the Indian National Academy of Engineering. Dr. Sriparna Saha is an assistant professor in the Indian Institute of Technology Patna. Among her positions and awards, she was a postdoctoral researcher in Trento and in Heidelberg, and she received the Google India Women in Engineering Award in 2008. Her research interests include multiobjective optimization, evolutionary computation, clustering, and pattern recognition.
Summary
The growth in the amount of data collected and generated has exploded in recent times with the widespread automation of various day-to-day activities, advances in high-level scienti?c and engineering research and the development of e?cient data collection tools. This has given rise to the need for automa- callyanalyzingthedatainordertoextractknowledgefromit,therebymaking the data potentially more useful. Knowledge discovery and data mining (KDD) is the process of identifying valid, novel, potentially useful and ultimately understandable patterns from massive data repositories. It is a multi-disciplinary topic, drawing from s- eral ?elds including expert systems, machine learning, intelligent databases, knowledge acquisition, case-based reasoning, pattern recognition and stat- tics. Many data mining systems have typically evolved around well-organized database systems (e.g., relational databases) containing relevant information. But, more and more, one ?nds relevant information hidden in unstructured text and in other complex forms. Mining in the domains of the world-wide web, bioinformatics, geoscienti?c data, and spatial and temporal applications comprise some illustrative examples in this regard. Discovery of knowledge, or potentially useful patterns, from such complex data often requires the - plication of advanced techniques that are better able to exploit the nature and representation of the data. Such advanced methods include, among o- ers, graph-based and tree-based approaches to relational learning, sequence mining, link-based classi?cation, Bayesian networks, hidden Markov models, neural networks, kernel-based methods, evolutionary algorithms, rough sets and fuzzy logic, and hybrid systems. Many of these methods are developed in the followingchapters.