Read more
Informationen zum Autor Michael W. Berry, Professor and Associate Department Head, Department of Electrical Engineering and Computer Science, University of Tennessee. Michael is on the Editorial board of Computing in Science and Engineering and Statistical Analysis and Data Mining Journals. Jacob Kogan, Department of Mathematics and Statistics, University of Maryland Baltimore County, USA. Klappentext Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning, and natural language processing can collectively capture, classify, and interpret words and their contexts. As suggested in the preface, text mining is needed when "words are not enough." This book: Provides state-of-the-art algorithms and techniques for critical tasks in text mining applications, such as clustering, classification, anomaly and trend detection, and stream analysis. Presents a survey of text visualization techniques and looks at the multilingual text classification problem. Discusses the issue of cybercrime associated with chatrooms. Features advances in visual analytics and machine learning along with illustrative examples. Is accompanied by a supporting website featuring datasets. Applied mathematicians, statisticians, practitioners and students in computer science, bioinformatics and engineering will find this book extremely useful. Zusammenfassung Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. Inhaltsverzeichnis List of Contributors xi Preface xiii Part I Text Extraction, Classification, and Clustering 1 1 Automatic keyword extraction from individual documents 3 1.1 Introduction 3 1.1.1 Keyword extraction methods 4 1.2 Rapid automatic keyword extraction 5 1.2.1 Candidate keywords 6 1.2.2 Keyword scores 7 1.2.3 Adjoining keywords 8 1.2.4 Extracted keywords 8 1.3 Benchmark evaluation 9 1.3.1 Evaluating precision and recall 9 1.3.2 Evaluating efficiency 10 1.4 Stoplist generation 11 1.5 Evaluation on news articles 15 1.5.1 The MPQA Corpus 15 1.5.2 Extracting keywords from news articles 15 1.6 Summary 18 1.7 Acknowledgements 19 References 19 2 Algebraic techniques for multilingual document clustering 21 2.1 Introduction 21 2.2 Background 22 2.3 Experimental setup 23 2.4 Multilingual LSA 25 2.5 Tucker1 method 27 2.6 PARAFAC2 method 28 2.7 LSA with term alignments 29 2.8 Latent morpho-semantic analysis (LMSA) 32 2.9 LMSA with term alignments 33 2.10 Discussion of results and techniques 33 2.11 Acknowledgements 35 References 35 3 Content-based spam email classification using machine-learning algorithms 37 3.1 Introduction 37 3.2 Machine-learning algorithms 39 3.2.1 Naive Bayes 39 3.2.2 LogitBoost 40 3.2.3 Support vector machines 41 3.2.4 Augmented latent semantic indexing spaces 43 3.2.5 Radial basis function networks 44 3.3 Data preprocessing 45 3.3.1 Feature selection 45 3.3.2 Message representation 47 3.4 Evaluation of email classification 48 3.5 Experiments 49 3.5.1 Experiments with PU 1 49 ...