Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
1 Overview
Corpus linguistics is aimed at the empirical study of how language is used. The basis for the study is provided by corpora, i.e. large databanks of texts in natural language. This module explores basic methods in corpus linguistics and aims to equip you with the ability to develop and use monolingual and multilingual corpora for learning foreign languages and doing translations. It complements most closely the core modules in Translation and in Translation Memories. Traditional bilingual dictionaries and their electronic versions provide basic information on translation equivalence, but typically there are more possibilities for translating words in context than offered by dictionaries. In contrast, translation memory tools are designed to provide examples of translations in their context, but the size of a database available for a translator is typically limited. A corpus can help you in studying uses of words in a foreign language and comparing uses in two languages when translating.
2 Available corpora and corpus tools
From the Internet you can access some reference corpora, such as the British National Corpus, as well as general purpose corpora for Arabic, Chinese, Czech, German, Italian, Japanese, Portuguese, Russian, Spanish (and some other languages). A software applic- ation that produces lines with keywords and their contexts is a concordancer. The course will also teach you to use concordancers for studying uses of words and testing translation equivalents.
3 Objectives
On completion of this module, you should be able to:
• describe and exemplify goals and methods of corpus linguistics
• describe basic types of corpora
• understand principles of corpus querying
• know relevant statistical methods
• design your own specialised corpora
• compare word uses in the source and target languages using parallel and comparable corpora
• use corpus data to build glossaries and task-specific dictionaries
4 Learning approaches
To achieve the module aims, you need a combination of conceptual knowledge and practical experience. Accordingly, you have weekly lectures (1 hour) combined with seminars (1 hour) or practical sessions (1 hour). Supervised practical sessions in ERIN will focus on basic IT skills for querying corpora and using concordancers. The lectures covering basic topics of the module alternate with seminars and practical sessions in which theory and practice are confronted and further explored through exercises.
5 Syllabus
Date Session Topic
W1 Lecture Theoretical foundations: Using corpora in research and practice
W1 Practical Using online corpus interfaces
W2 Lecture Quantitative study of corpora: frequency lists and collocations
W2 Seminar Analysing and comparing frequencies
W3 Lecture Methods for exploiting corpora: making queries
W3 Seminar Making queries and recording your work
W4 Lecture Quantitative study of corpora: collocations
W4 Seminar Using collocations and word sketches
W5 Seminar Linguistic annotation
W5 Seminar Experiments with explicit annotation
W6 Lecture Corpus-based dictionary development
W6 Practical Development of dictionaries in XML
W7 Reading week
W8 Lecture Building corpora from the Web
W8 Practical Building your own corpus
W9 Lecture Know your corpus: assessing corpus composition
W9 Seminar Assessing composition of your corpus
W10 Lecture Introduction to using Python
W10 Practical Building your corpus in Python
W11 Seminar More experiments with Python and XML
W11 Seminar More experiments with Python and XML
6 Assessment
At the end of the course you must complete a case study (of 2000 words) to report your project that compares uses of several lexical items in two languages using data from both large corpora and from corpora collected by you. The purpose of the case study to demonstrate your ability to use the tools for corpus querying and to analyse evidence provided by these tools. As an outcome of this case study you will also create a bilingual dictionary in XML for the lexical items with contexts of their uses to demonstrate how you can apply annotation methods.
The progress in the course will be also monitored by participation in the seminars.
For more information, including examples of expected submissions and the reading lists, please see the Minerva area and the Corpus module website: http://corpus.leeds.ac.uk/teaching/modl5007