Home  |  Course Info  |   Schedule  |   References  |   HuskyCT 

Ling 3000Q/5000:
Introduction to Computational Linguistics

General Information

Course Description: This course is an introduction to computational methods in empirical linguistic analysis and natural language processing. Topics include the use of text corpora and other sources of linguistic data; morphological analysis, parsing and language modeling; applications in areas such as information retrieval and machine translation. The main objective is to familiarize students with core questions and approaches in the field. Theoretical material on such topics as formal languages, automata and complexity, finite-state and context-free methods, n-grams etc. will be supplemented with practical exercises and mini-projects to give students some hands-on experience in the use of linguistic data and the implementation of algorithms.

Throughout the course we will use the Python Programming Language for in-class exercises and homeworks, augmented with the Natural Language Toolkit (NLTK) where appropriate. Programming skills are not required, but are a plus. Students' projects may be scaled to their level of expertise. Some background in linguistics is definitely an advantage.

Course Objectives: By the end of this course, you will have achieved the following:

  • an understanding of computational linguistics and natural-language processing: goals, typical tasks, and common challenges
  • the ability to perform basic coding tasks (e.g., obtain and manipulate data, implement algorithms, use Python modules and libraries)
  • a basic understanding of logical, mathematical, and statistical concepts that are commonly applied in language processing, and the ability to apply them (e.g., set theory, probability theory, linear algebra, formal language theory, hidden Markov models).
  • the ability to formulate research questions and hypotheses, test them and communicate the results.
Format: Lectures, discussions, exercises. One great thing aobut this course is the diversity among students in terms of backgrounds and intersts; unfortunately this means that there is also a lot of variation in programming skills. This is not a programming course, but we will do a lot of coding because that's how you learn (I can't imagine a computational linguistics otherwise). For those will little or no programming experience, I plan to set up a regular meeting time during which we will discuss basic concepts and specific problems. This will most likely happen online.

Registration: Undergraduates should register for LING 3000Q, graduates for LING 5000. The levels differ somewhat, especially in the assignments, but the succession of topics and (most of) the course materials and readings are the same.

Prerequisites: At least one course in Linguistics, or permission of the instructor.

Evaluation: Weekly homeworks (60%); final project (20%); participation (20%). Substitution of individual programming project(s) for some of the weekly assignments can be negotiated.
  • Homeworks must be submitted before the beginning of class on the due date. Students are allowed (in fact, encouraged) to collaborate on homeworks, but each student must submit their own answers and state with whom they collaborated.
  • The final project will be an extended programming exercise that expands on one of the topics covered in class. For instance, we'll be covering grammars and syntactic parsers, so students could consider writing a grammar and parser for a language other than English (your choice), implementing a parsing algorithm that we didn't cover in class, adding functionality or a learning module, and more. Students are encouraged to work in small teams (2-3 people) on this. Towards the end of the semester, teams will give a brief (5-10min) presentation. For final evaluation, teams must submit their code, the materials used in the presentation (e.g., slides, handouts), and a brief (1-3 pages) writeup of the project. More details will be discussed in class.
  • Participation is an important part of the evaluation. This includes requirements that can only be met in class (such as discussions and exercises). Therefore attendance is crucial. Students who must miss classes due to health problems or other unavoidable reaons must give me advance notice. Temporary accommodations (such as following lectures online) are possible in certain cases.
Readings: Jurafsky, D. and J.H. Martin. 2009. Speech and Language Processing. 2nd edition. Prentice Hall.
This is a good book to own, but students are not required to buy it, since we will only use excerpts from it and supplement them with other readings where appropriate. A third edition has been in the works for many years and seems to be nearing completion, and we may use some of the parts that are available online.

Notice to students with disabilities: In compliance with Section 504 of the 1973 Rehabilitation Act and the Americans with Disabilities Act, UConn is committed to providing equal access to all programming. Students with disabilities seeking accommodations are encouraged to contact the Center for Students with Disabilities (CSD). CSD is located in Wilbur Cross Building, Room 224. Additionally, I am available to discuss disability-related needs during my office hours or by appointment.
Last updated: January 17, 2024