Read more
Informationen zum Autor Eric Gaussier is deputy director of the Grenoble Informatics Laboratory, one of the largest Computer Science laboratories in France. François Yvon is professor of Computer Science at the University of Paris Sud in Orsay and member of the Spoken Language Processing group of LIMSI/CNRS, Paris, France. Klappentext This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access: information extraction and retrieval; text classification and clustering; opinion mining; comprehension aids (automatic summarization, machine translation, visualization). In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections. Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration. Zusammenfassung This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. Inhaltsverzeichnis Introduction xiii Eric Gaussier and François Yvon PART 1: INFORMATION RETRIEVAL 1 Chapter 1. Probabilistic Models for Information Retrieval 3 Stéphane Clinchant and Eric Gaussier 1.1. Introduction 3 1.3. Probability ranking principle (PRP) 10 1.4. Language models 15 1.5. Informational approaches 21 1.6. Experimental comparison 27 1.7. Tools for information retrieval 28 1.8. Conclusion 28 1.9. Bibliography 29 Chapter 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval 33 Massih-Réza Amini, David Buffoni, Patrick Gallinari, Tuong Vinh Truong, and Nicolas Usunier 2.1. Introduction 33 2.2. Application to automatic text summarization 45 2.3. Application to information retrieval 49 2.4. Conclusion 54 2.5. Bibliography 54 PART 2: CLASSIFICATION AND CLUSTERING 59 Chapter 3. Logistic Regression and Text Classification 61 Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,Michel Burlet, and Yves Denneulin 3.1. Introduction 61 3.2. Generalized linear model62 3.3. Parameter estimation 65 3.4. Logistic regression 68 3.5. Model selection 70 3.6. Logistic regression applied to text classification 74 3.7. Conclusion 81 3.8. Bibliography 82 Chapter 4. Kernel Methods for Textual Information Access 85 Jean-Michel Renders 4.1. Kernel methods: context and intuitions 85 4.2. General principles of kernel methods 88 4.3. General problems with kernel choices (kernel engineering) 95 4.4. Kernel versions of standard algorithms: examples of solvers 97 4.5. Kernels for text entities 103 4.6. Summary 123 4.7. Bibliography 124 Chapter 5. Topic-Based Generative Models for Text Information Access 129 Jean-Cédric Chappelier 5.1. Introduction 129 5.2. Topic-based models 135 5.3. Topic models 142 5.4. Term models 161 5.5. Similarity measures between documents 164 5.6. Conclusion 168 5.7. Appendix: topic model software 169 5.8. Bibliography 170 Chapter 6. Conditional Random Fields for Information Extraction 179 Isabelle Tellier and Marc Tommasi ...