Home  |  Course Info  |  Schedule  |  References  |  HuskyCT 

LING 3000Q/5000:
Introduction to Computational Linguistics

General Information

Course Description: This course is an introduction to computational methods in linguistic analysis and natural language processing. Topics include the use of text corpora and other sources of linguistic data; language modeling; text classification and information retrieval; and Large Language Models. Theoretical material on topics like N-gram modeling, Neural Networks etc. will be supplemented with practical exercises and mini-projects to give students some hands-on experience in the use of linguistic data and the implementation of algorithms. Ever since the still-recent advent of Large Language Models, their design and implementation has been a fast-moving and highly popular topic. At the same time, building small-scalel toy models is becoming harder to pull off due to increasing theoretical complexity and computational demands. Despite these challenges, this course gives students the opportunity to learn about recent advances and build a foundation for further studies.

Throughout the course we will use the Python Programming Language for in-class exercises and homeworks, augmented with a range of special packages and libraries, such as the Natural Language ToolKit (NLTK) for text processing and PyTorch for Neural Networks. Students' projects may be scaled to their level of programming expertise.

Basic coding skills are essential for success in this course, but no special preparation in Natural Language Processing or Machine Learning is required. Students without any prior programming experience are encouraged to take COGS 2500Q ("Coding for Cognitive Science") prior to this course.

Course Objectives: By the end of this course, students will have achieved the following:

  • an understanding of computational linguistics and natural-language processing: goals, typical tasks, and common challenges
  • the ability to perform basic coding tasks (e.g., obtain and manipulate data, implement algorithms, use Python modules and libraries)
  • a basic understanding of logical, mathematical, and statistical concepts that are commonly applied in language processing, and the ability to apply them
  • the ability to formulate research questions and hypotheses, test them and communicate the results.
Format: Lectures, discussions, exercises.

Registration: Undergraduates should register for LING 3000Q, graduates for LING 5000. The levels differ somewhat, especially in the assignments, but the succession of topics and (most of) the course materials and readings are the same.

Prerequisites: At least one course in Linguistics or Computer Science, or permission of the instructor.

Evaluation: Homeworks (60%); final project (20%); participation (20%). Substitution of individual programming project(s) for some of the homeworks can be negotiated.
  • Homeworks must be submitted before the beginning of class on the due date. Students are allowed (in fact, encouraged) to collaborate on homeworks, but each student must submit their own answers and state with whom they collaborated.
  • The final project will be an extended programming exercise that expands on one of the topics covered in class. For instance, we'll be covering grammars and syntactic parsers, so students could consider writing a grammar and parser for a language other than English (your choice), implementing a parsing algorithm that we didn't cover in class, adding functionality or a learning module, and more. Students are encouraged to work in small teams (2-3 people) on this. Towards the end of the semester, teams will give a brief (5-10min) presentation. For final evaluation, teams must submit their code, the materials used in the presentation (e.g., slides, handouts), and a brief (1-3 pages) writeup of the project. More details will be discussed in class.
  • Participation is an important part of the evaluation. This includes requirements that can only be met in class (such as discussions and exercises). Therefore attendance is crucial. Students who must miss classes due to health problems or other unavoidable reaons must give me advance notice. Temporary accommodations (such as following lectures online) are possible in certain cases, but must be arranged in advance.
Readings: We will mainly rely on two books for this course:
  • Raschka, S. 2024. Build a Large Language Model (From Scratch).
    A hands-on guide to implementation of GPT-2 style language model, covering all the crucial ingredients. Includes a good introduction to PyTorch, a popular Python library used to build and train Neural Networks.
  • Jurafsky, D. and J.H. Martin. 2009. Speech and Language Processing. 3rd edition. Prentice Hall
    An introduction to most of the theoretical background necessary to understand how Large Language Models are built, trained, and applied.
    This new and thoroughly revised edition of a popular textbook. It has been in the works for many years and seems to be nearing completion, but it is not yet available for purchase. The most recent online version is from January 6, 2026. We will be using chapters from this book, as well as some parts of the earlier edition and other readings. Those will be available online and/or on HuskyCT.
Gear: We will do all of our programming on Google Colab, a cloud-based platform for writing and running Python code. To use this platform, students will need a Google account. Unfortunately UConn does not provide students with Google accounts anymore (they did until 2024). Fortunately, Google accounts are free. Students can also get one year of free access to Colab Pro, which offers more memory and faster processing. (After one year, continued access to Colab Pro costs money; I'm not sure how much.) Using Colab is the best way to ensure that our software and hardware requirements are met. Although our class is held in a computer lab, the machines in that lab do not meet those requirements. We will only use them as terminals to access the cloud.

Notice to students with disabilities: In compliance with Section 504 of the 1973 Rehabilitation Act and the Americans with Disabilities Act, UConn is committed to providing equal access to all programming. Students with disabilities seeking accommodations are encouraged to contact the Center for Students with Disabilities (CSD). CSD is located in Wilbur Cross Building, Room 224. Additionally, I am available to discuss disability-related needs during my office hours or by appointment.
Last updated: 2026-05-08