|
Ling 3000Q/5000: Introduction to Computational Linguistics
General Information
Course Description:
This course is an introduction to computational methods in empirical
linguistic analysis and natural language processing. Topics include
the use of text corpora and other sources of linguistic data;
morphological analysis, parsing and language modeling; applications in
areas such as information retrieval, sentiment classification, and
machine translation. The main objective is to familiarize students
with core questions and approaches in the field. Theoretical material
on such topics as Formal Language Theory, N-gram modeling,
Neural Networks etc. will be supplemented with practical exercises and
mini-projects to give students some hands-on experience in the use of
linguistic data and the implementation of algorithms.
Ever since advent of Large Language Models just a few years ago,
Natural Language Processing has been fast-moving, volatile, and highly
popular. At the same time, hands-in coding exercises and toy models
are becoming harder to pull off due to rapidly increasing theoretical
complexity and computational demands. Despite these challenges, this
course gives students the opportunity to learn about recent advances
and build a foundation for further studies.
Throughout the course we will use the Python Programming Language for
in-class exercises and homeworks, augmented with a range of special
packages and lilbraries where appropriate. Students' projects may be
scaled to their level of programming expertise. Some background in
linguistics is definitely an advantage.
Course Objectives: By the end of this course, you
will have achieved the following:
- an understanding of computational linguistics and natural-language
processing: goals, typical tasks, and common challenges
- the ability to perform basic coding tasks (e.g., obtain and
manipulate data, implement algorithms, use Python modules and
libraries)
- a basic understanding of logical, mathematical, and statistical
concepts that are commonly applied in language processing, and the
ability to apply them (e.g., set theory, probability theory, linear
algebra, formal language theory, hidden Markov models, neural
networks).
- the ability to formulate research questions and hypotheses, test
them and communicate the results.
Format: Lectures, discussions, exercises. One great
thing aobut this course is the diversity among students in terms of
backgrounds and intersts; unfortunately this means that there is also
a lot of variation in programming skills. This is not a programming
course, but we will do a lot of coding because that's how you learn (I
can't imagine a computational linguistics course without a substantial
hands-on part). For those will little or no programming experience, I
plan to set up a regular meeting time during which we will discuss
basic concepts and specific problems. This will most likely happen
online.
Registration: Undergraduates should register for
LING 3000Q, graduates for LING 5000. The levels differ somewhat,
especially in the assignments, but the succession of topics and (most
of) the course materials and readings are the same.
Prerequisites: At least one course in Linguistics,
or permission of the instructor.
Evaluation:
Homeworks (60%); final project (20%); participation (20%).
Substitution of individual programming project(s) for some of the
homeworks can be negotiated.
- Homeworks must be submitted before the beginning of class on the
due date. Students are allowed (in fact, encouraged) to
collaborate on homeworks, but each student must submit their own
answers and state with whom they collaborated.
- The final project will be an extended programming exercise that
expands on one of the topics covered in class. For instance, we'll
be covering grammars and syntactic parsers, so students could
consider writing a grammar and parser for a language other than
English (your choice), implementing a parsing algorithm that we
didn't cover in class, adding functionality or a learning module,
and more. Students are encouraged to work in small teams (2-3
people) on this. Towards the end of the semester, teams will give
a brief (5-10min) presentation. For final evaluation, teams must
submit their code, the materials used in the presentation (e.g.,
slides, handouts), and a brief (1-3 pages) writeup of the
project. More details will be discussed in class.
- Participation is an important part of the evaluation. This
includes requirements that can only be met in class (such as
discussions and exercises). Therefore attendance is
crucial. Students who must miss classes due to health problems or
other unavoidable reaons must give me advance notice. Temporary
accommodations (such as following lectures online) are possible
in certain cases, but must be arranged in advance.
Readings: Jurafsky, D. and
J.H. Martin. 2009. Speech and Language Processing. 3rd
edition. Prentice Hall. This new and
thoroughly revised edition of a popular textbook has been in the works
for many years and seems to be nearing completion, but not yet
available for purchase. The most recent online version is from January
6, 2026. We will be using chapters from this book, as well as some
parts of the earlier edition and other readings. Those will be
available on HuskyCT.
This is a good book to own, but students
are not required to buy it, since we will only use excerpts from it
and supplement them with other readings where appropriate.
has been in the works for many years and seems to be nearing
completion, and we may use some of the parts that are available
online.
Notice to students with disabilities:
In compliance with Section 504 of the 1973 Rehabilitation Act and the
Americans with Disabilities Act, UConn is committed to providing equal
access to all programming. Students with disabilities seeking
accommodations are encouraged to contact the Center for Students with
Disabilities (CSD). CSD is located in Wilbur Cross Building,
Room 224. Additionally, I am available to discuss disability-related
needs during my office hours or by appointment.
|
|