Corpus Linguistics (CL) is a scientific method of language analysis using electronic tools. It requires knowledge of linguistic theories, quantitative statistics and data processing. Natural Language Processing (NLP), combining the power of artificial intelligence, computational linguistics and computer science, could help computers read text by simulating the human ability to understand language.
The application of methodologies of NLP has led to advances in fields such as lexicography or corpus linguistics, descriptive grammar, and language teaching and learning. In other words, NLP is not only about mathematics, but also about linguistics. Meanwhile, corpora play an essential role in a wide range of linguistic investigations as well as NLP research.
This course aims to help students understand the importance and trends of Corpus Linguistics and Natural Language Processing fields. In addition to providing an overview of both the theoretical foundation of Corpus Linguistics and the fundamental methods of Natural Language Processing (NLP), this cross disciplinary course places more emphasis on hands-on learning. Students will be introduced existing major corpora, software packages and analyzing methodologies. Students will learn to examine practical examples by using some of the most common techniques in corpus analysis. Importantly, students will learn some basic computer programming skills. Eventually, students will have opportunities to build their own corpora using NLP methods as well as practically apply corpora in language analysis and learning.
Corpus Linguistics (CL) is a scientific method of language analysis using electronic tools. It requires knowledge of linguistic theories, quantitative statistics and data processing. Natural Language Processing (NLP), combining the power of artificial intelligence, computational linguistics and computer science, could help computers read text by simulating the human ability to understand language.
The application of methodologies of NLP has led to advances in fields such as lexicography or corpus linguistics, descriptive grammar, and language teaching and learning. In other words, NLP is not only about mathematics, but also about linguistics. Meanwhile, corpora play an essential role in a wide range of linguistic investigations as well as NLP research.
This course aims to help students understand the importance and trends of Corpus Linguistics and Natural Language Processing fields. In addition to providing an overview of both the theoretical foundation of Corpus Linguistics and the fundamental methods of Natural Language Processing (NLP), this cross disciplinary course places more emphasis on hands-on learning. Students will be introduced existing major corpora, software packages and analyzing methodologies. Students will learn to examine practical examples by using some of the most common techniques in corpus analysis. Importantly, students will learn some basic computer programming skills. Eventually, students will have opportunities to build their own corpora using NLP methods as well as practically apply corpora in language analysis and learning.
No textbooks are required for this course. Online resources and handouts will be provided for topics to be covered in the course. Some additional readings will be supplemented.
Sample important studies:
Steven Bird, Ewan Klein & Edward Loper. 2009. Natural Language Procesing with Python. O’Reilly Media.
Kilgarriff, Adam. 2005. Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory, 1(2). 263–275.
Luke Curtis Collins. 2019. Corpus linguistics for online communication- a guide for research. London: Routledge.
McEnery, Tony & Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
Scott, Mike & Christopher Tribble. 2006. Textual Patterns: Key words and corpus analysis in language education. John Benjamins.
Weisser, Martin. 2016. Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis. Oxford: Wiley Blackwell.
William Crawford & Eniko Csomay. 2016. Doing Corpus Linguistics. London: Routledge.
No textbooks are required for this course. Online resources and handouts will be provided for topics to be covered in the course. Some additional readings will be supplemented.
Sample important studies:
Steven Bird, Ewan Klein & Edward Loper. 2009. Natural Language Procesing with Python. O’Reilly Media.
Kilgarriff, Adam. 2005. Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory, 1(2). 263–275.
Luke Curtis Collins. 2019. Corpus linguistics for online communication- a guide for research. London: Routledge.
McEnery, Tony & Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
Scott, Mike & Christopher Tribble. 2006. Textual Patterns: Key words and corpus analysis in language education. John Benjamins.
Weisser, Martin. 2016. Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis. Oxford: Wiley Blackwell.
William Crawford & Eniko Csomay. 2016. Doing Corpus Linguistics. London: Routledge.
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
Attendance and participationAttendance and participation attendance and participation |
15 | |
Group presentationsGroup presentations group presentations |
40 | 20% group & 20% individual |
Peer feedback & ActivitiesPeer feedback & Activities peer feedback & activities |
20 | |
Final projectFinal project final project |
25 | (Note: The percentage of the course evaluation is subject to possible adjustments.) |