Fr. 134.00

Building and Using Comparable Corpora

English · Hardback

Shipping usually within 6 to 7 weeks

Description

Read more

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field.
The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

List of contents

Preface - Building and Using Comparable Corpora. S.Sharoff, R.Rapp, P.Zweigenbaum.- Overviewing Important Aspects of the Last 20 Years of Research in Comparable Corpora.- S.Sharoff, R.Rapp, P.Zweigenbaum.- Part I: Compiling and Measuring Comparable Corpora.- Multilingual Corpus Collection. S.Shi, P.Fung.- Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary Making. A.Gurrutxaga, I.Leturia, I.San Vicente, X.Saralegi.- Statistical Comparability: Methodological Caveats. R.Köhler.- Methods for Collection and Evaluation of Comparable Documents.  M.Lestari Paramita, D.Guthrie, E.Kanoulas, R.Gaizauskas, P.Clough and M.Sanderson.- Measuring the Distance between Comparable Corpora between Languages. S.Sharoff.- Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality. B.Li, E.Gaussier.- Statistical Corpus and Language Comparison on Comparable Corpora. T.Eckart, U.Quasthoff.- Comparable Multilingual Patents as Large-scale Parallel Corpora. B.Lu and B.Tsou.- Part II: Using Comparable Corpora.- Extracting Parallel Phrases from Comparable Data. S.Hewavitharana, S.Vogel.- Exploiting Comparable Corpora.  D.S.Munteanu,  D.Marcu.- Paraphrase Detection in Comparable Monolingual Corpora. L.Deleger, B.Cartoni, P.Zweigenbaum.- Information Network Construction and Alignment from Automatically Acquired Comparable Corpora. H.Ji, W.-P.Lin.- Bilingual Terminology Mining from Comparable Corpora. B.Daille, E.Morin.- The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. K.Kageura, T.Abekawa.- Old Needs, New Solutions: Comparable Corpora for Language Professionals. S.Bernardini, A.Ferraresi.- Exploiting the Incomparability of Comparable Corpora for Contrastive Linguistics and Translation Studies. S.Neumann, S.Hansen-Schirra.

Summary

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field.
The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Additional text

“I would like to recommend ‘Building and Using Comparable … to those who are working with or are interested in multilingual and monolingual comparable corpora. … it is easy to say that the notion of comparable corpora was not only visionary, long-sighted, and productive. It is also easy to say that this volume remains the optimal starting point for any research or for any applications in Language Technology leveraging on comparable corpora.” (Marina Santini, forum.santini.se, February, 2017)

Report

"I would like to recommend 'Building and Using Comparable ... to those who are working with or are interested in multilingual and monolingual comparable corpora. ... it is easy to say that the notion of comparable corpora was not only visionary, long-sighted, and productive. It is also easy to say that this volume remains the optimal starting point for any research or for any applications in Language Technology leveraging on comparable corpora." (Marina Santini, forum.santini.se, February, 2017)

Product details

Assisted by Pascale Fung (Editor), Reinhar Rapp (Editor), Reinhard Rapp (Editor), Serge Sharoff (Editor), Pierre Zweigenbaum (Editor), Pierre Zweigenbaum et al (Editor)
Publisher Springer, Berlin
 
Languages English
Product format Hardback
Released 01.01.2014
 
EAN 9783642201271
ISBN 978-3-642-20127-1
No. of pages 335
Dimensions 163 mm x 23 mm x 239 mm
Weight 666 g
Illustrations XII, 335 p. 70 illus., 14 illus. in color.
Series Theory and Applications of Natural Language Processing
Theory and Applications of Natural Language Processing
Subjects Natural sciences, medicine, IT, technology > IT, data processing > IT

Mathematik, Informationssystem, Datenbank, Informatik, Übersetzung, B, KI, Computerlinguistik, COMPUTERS / Natural Language Processing, Natürliche Sprachen und maschinelle Übersetzung, Computerlinguistik und Korpuslinguistik, Unternehmensanwendungen, Suchmaschine, LANGUAGE ARTS & DISCIPLINES / Linguistics / General, Intelligenz / Künstliche Intelligenz, Künstliche Intelligenz - AI, Datenverarbeitung / Anwendungen / Technik, Angewandte Mathematik, Recherche - Information Retrieval, Internet / Suchmethoden, Spezielle Anwender, Informationsrückgewinnung, Information Retrieval, Datenverarbeitung / Anwendungen / Betrieb, Verwaltung, Maschine / Suchmaschine, Database, Sprachwissenschaft / Computerlinguistik, EDV / Theorie / Informatik / Allgemeines, COMPUTERS / Information Technology, Audiosignalverarbeitung, computer science, Information Retrieval, Information Systems Applications (incl. Internet), Information Systems Applications (incl.Internet), Application software, Internet searching, Computational Linguistics, Natural Language Processing (NLP), Natural language processing (Computer science)

Customer reviews

No reviews have been written for this item yet. Write the first review and be helpful to other users when they decide on a purchase.

Write a review

Thumbs up or thumbs down? Write your own review.

For messages to CeDe.ch please use the contact form.

The input fields marked * are obligatory

By submitting this form you agree to our data privacy statement.