Fr. 82.80

Practical Corpus Linguistics - An Introduction to Corpus-Based Language Analysis

English · Paperback / Softback

Shipping usually within 3 to 5 weeks (title will be specially ordered)

Description

Read more

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.
* Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed
* Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux
* Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools
* Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data

List of contents

List of Figures xiii
 
List of Tables xv
 
Acknowledgements xvii
 
1 Introduction 1
 
1.1 Linguistic Data Analysis 3
 
1.1.1 What's data? 3
 
1.1.2 Forms of data 3
 
1.1.3 Collecting and analysing data 7
 
1.2 Outline of the Book 8
 
1.3 Conventions Used in this Book 10
 
1.4 A Note for Teachers 11
 
1.5 Online Resources 11
 
2 What's Out There? 13
 
2.1 What's a Corpus? 13
 
2.2 Corpus Formats 13
 
2.3 Synchronic vs. Diachronic Corpora 15
 
2.3.1 'Early' synchronic corpora 15
 
2.3.2 Mixed corpora 18
 
2.3.3 Examples of diachronic corpora 20
 
2.4 General vs. Specific Corpora 21
 
2.4.1 Examples of specific corpora 22
 
2.5 Static Versus Dynamic Corpora 25
 
2.6 Other Sources for Corpora 26
 
Solutions to/Comments on the Exercises 26
 
Note 28
 
Sources and Further Reading 28
 
3 Understanding Corpus Design 29
 
3.1 Food for Thought - General Issues in Corpus Design 29
 
3.1.1 Sampling 30
 
3.1.2 Size 31
 
3.1.3 Balance and representativeness 32
 
3.1.4 Legal issues 32
 
3.2 What's in a Text? - Understanding Document Structure 33
 
3.2.1 Headers, 'footers' and meta-data 34
 
3.2.2 The structure of the (text) body 36
 
3.2.3 What's (in) an electronic text? - understanding file formats and their properties 37
 
3.3 Understanding Encoding: Character Sets, File Size, etc. 38
 
3.3.1 ASCII and legacy encodings 38
 
3.3.2 Unicode 39
 
3.3.3 File sizes 40
 
Solutions to/Comments on the Exercises 41
 
Sources and Further Reading 42
 
4 Finding and Preparing Your Data 43
 
4.1 Finding Suitable Materials for Analysis 44
 
4.1.1 Retrieving data from text archives 44
 
4.1.2 Obtaining materials from Project Gutenberg 44
 
4.1.3 Obtaining materials from the Oxford Text Archive 45
 
4.2 Collecting Written Materials Yourself ('Web as Corpus') 46
 
4.2.1 A brief note on plain-text editors 46
 
4.2.2 Browser text export 48
 
4.2.3 Browser HTML export 49
 
4.2.4 Getting web data using ICEweb 50
 
4.2.5 Downloading other types of files 52
 
4.3 Collecting Spoken Data 53
 
4.4 Preparing Written Data for Analysis 56
 
4.4.1 'Cleaning up' your data 56
 
4.4.2 Extracting text from proprietary document formats 58
 
4.4.3 Removing unnecessary header and 'footer' information 58
 
4.4.4 Documenting what you've collected 59
 
4.4.5 Preparing your data for distribution or archiving 60
 
Solutions to/Comments on the Exercises 62
 
Sources and Further Reading 66
 
5 Concordancing 67
 
5.1 What's Concordancing? 67
 
5.2 Concordancing with AntConc 69
 
5.2.1 Sorting results 74
 
5.2.2 Saving, pruning and reusing your results 75
 
Solutions to/Comments on the Exercises 78
 
Sources and Further Reading 81
 
6 Regular Expressions 82
 
6.1 Character Classes 84
 
6.2 Negative Character Classes 86
 
6.3 Quantification 86
 
6.4 Anchoring, Grouping and Alternation 87
 
6.4.1 Anchoring 87
 
6.4.2 Grouping and alternation 88
 
6.4.3 Quoting and using special characters 90
 
6.4.4 Constraining the context further 91
 
6.5 Further Exercises 92
 
Solutions to/Comments on the Exercises 93
 
Sources and Further Reading 100
 
7 Understanding Part-of-Speech Tagging and Its Uses 101
 
7.1 A

About the author










Martin Weisser is a Professor in the National Key Research Center for Linguistics and Applied Linguistics at Guangdong University of Foreign Studies, China . He is the author of Essential Programming for Linguistics (2009), and has published numerous articles and book chapters, including contributions to The Encyclopedia of Applied Linguistics (Wiley, 2012) and Corpus Pragmatics: A Handbook (2014).

Summary

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.

Product details

Authors Weisser, M Weisser, Martin Weisser, Weisser Martin
Publisher Wiley, John and Sons Ltd
 
Languages English
Product format Paperback / Softback
Released 30.11.2015
 
EAN 9781118831885
ISBN 978-1-118-83188-5
No. of pages 312
Subjects Humanities, art, music > Linguistics and literary studies
Non-fiction book > Dictionaries, reference works > Foreign-language dictionaries

Linguistik, Linguistics, Sprachwissenschaften, Spezialthemen Sprachwissenschaften, Linguistics Special Topics

Customer reviews

No reviews have been written for this item yet. Write the first review and be helpful to other users when they decide on a purchase.

Write a review

Thumbs up or thumbs down? Write your own review.

For messages to CeDe.ch please use the contact form.

The input fields marked * are obligatory

By submitting this form you agree to our data privacy statement.