UCLA CS 269, Fall 2025
M/W 4-5:50pm, Royce Hall 156
Instructor: Saadia Gabriel
Email: skgabrie@cs.ucla.edu
Office: Eng VI 295A
Office Hours: 1:45-2:45pm on Mondays
Course Description: Large language models (LLMs) are becoming ubiquitous in our society. They are used in many real-world applications ranging from content moderation and online advertisement to healthcare. Given their increasing role in what we see, how we think, and what is publicly known about us, it is critical to consider risks to public safety when deploying LLM-based systems. This seminar will provide a lens on historical and current safety problems in natural language processing (NLP). We will first discuss ethics challenges introduced by the deployment of LLMs across domains. We will then read literature attempting to understand how "black box" LLMs work and address how lack of transparency hinders AI safety. These discussions will be accompanied by guest lectures from domain experts. There will be a group coding project where students will explore a mechanistic interpretability topic in-depth through the lens of AI safety.
Schedule:
| Date | Topic | Description | Assignment(s) |
|---|---|---|---|
| 9/29 | Intro   | We will go over the syllabus, schedule, reading list and course expectations. There will be an overview of historical challenges.[Slides] |
|
| 10/1 | Causal Interventions & Student Presentations | We'll discuss attempts to understand the internals of neural networks through causal approaches, including counterfactuals. We will have our first student presentation.[Slides] |
|
| 10/6-10/8 | Group project brainstorming | Free time to meet in-person and coordinate final project plans. Guidelines for the final project proposal are here. |
|
| 10/13 | Guest Lecture | Sarah Wiegreffe (University of Maryland) |
|
| 10/15 | Student Presentations | TBD, sign up here |
|
| 10/20 | Student Presentations | TBD, sign up here |
|
| 10/22 | Circuit Analysis & Activation Steering & Student Presentations | We'll cover work on decomposing neural networks to find subcomponents associated with specific concepts or behaviors, and discuss how deeper understanding of these associations has impacted controllability. We will conclude with student presentations.[Slides] |
|
| 10/27 | Guest Lecture | Sophie Hao (Boston University) |
|
| 10/29 | Student Presentations   | TBD, sign up here |
|
| 11/3-11/5 | Peer Feedback Sessions | A paper clinic style peer review session in which at least one member of each final project team must be present. Every team should bring their mid-quarter report draft and every member of the team will independently provide feedback to at least two other teams. |
|
| 11/10 | Student Presentations   | TBD, sign up here | |
| 11/12 | Mechanistic Interpretability in the Real World & Student Presentations | We'll discuss factors that have been empirically shown to affect trust in AI. We'll have a critical conversation about whether approaches covered so far are aligned with improving trust, and focus areas for future work. We will conclude with student presentations. [Slides] | |
| 11/17 | Student Presentations   | TBD, sign up here | |
| 11/19 | Student Presentations & Concluding Remarks | TBD, sign up here | |
| 11/24 | Guest Lecture | Ana Marasović (University of Utah) |
|
| 11/26 | Final Presentations | Schedule TBD | |
| 12/1 | Final Presentations | Schedule TBD | |
| 12/3 | Final Presentations | Schedule TBD, may be virtual-only due to travel |
|
Resources:
We will be using Perusall for collaborative paper note-taking and course discussion.
Grading:
Detailed guidelines for assignments will be released later in the quarter.
Course Policies:
Late Policy. Out of courtesy to peers, it's expected that students complete reading assignments on time, but students may turn in 1 reading assignment up to a week late without penalty. Since the final project is a group assignment there are no late days, but extensions will be considered under extraordinary circumstances. Students are expected to communicate potential presentation conflicts (e.g. illness, conference travel) to the instructor in advance.
Academic Honesty. Reading assignments are expected to be completed individually outside of the paper presentation and the instructor will check for overlap between posted comments/questions. For all assignments, any collaborators or other sources of help should be explicitly acknowledged. Violations of academic integrity (please consult the student conduct code) will be handled based on UCLA guidelines.
Accommodations. Our goal is to have a fair and welcoming learning environment. Students should contact the instructor at the beginning of the quarter if they will need special accomodations or have any concerns.
Use of ChatGPT and Other LLM Tools. Students are expected to first draft writing without any LLMs and all ideas presented must be their own. Students may use LLMs for grammer correction and minimal editing if they add an acknowledgement of this use. Any work suspected to be entirely AI-generated will be given a grade of 0.
Acknowledgements: This course was very much inspired by 2 UW courses: Yulia Tsvetkov's Ethics in AI course and Amy X. Zhang's Social Computing course. It was also inspired by Marzyeh Ghassemi's Ethical ML in Human Deployments course at MIT.