Schedule
The workshop will be held on Sunday, December 7, 2025 at the NeurIPS venue in San Diego, California (Upper Level Room 30A-E).
Workshop Program
| Time | Activity |
|---|
| 8:00am – 9:00am | Socializing |
| 9:00am – 9:15am | Coffee break |
| 9:15am – 9:30am | Opening remarks |
| 9:30am – 10:00am | Been Kim · Towards a Pareto Frontier of Interpretability: 15 Years of Research in 15 Mins |
| 10:00am – 10:30am | Sarah Schwettmann · Scalable End-to-End Interpretability |
| 10:30am – 11:00am | Coffee break |
| 11:00am – 11:30am | Spotlight talks (session 1) |
| 11:30am – 12:30pm | Posters (session 1) |
| 12:30pm – 1:30pm | Lunch break |
| 1:30pm – 2:00pm | Spotlight talks (session 2) |
| 2:00pm – 3:00pm | Posters (session 2) |
| 3:00pm – 3:30pm | Coffee break |
| 3:30pm – 4:00pm | Chris Olah (Remote) · Reflections on Interpretability |
| 4:00pm – 5:00pm | Invited lightning talks |
Invited Lightning Talks
| Speaker | Title |
|---|
| Ekdeep Singh Lubana (Goodfire) | What is the Right Basis for Computation? (slides) |
| Adam Belfki (NDIF / Northeastern University) | Infrastructure for Your Interpretability Research (slides) |
| Josh Engels (Google DeepMind) | A Pragmatic Vision for Interpretability (slides) |
| Rowan Wang (Anthropic) | Just Ask the Model: Interpreting Models with Clever Prompting (slides) |
| Sheridan Feucht (Northeastern University) | Equifinality: There’s Often Multiple Explanations (slides) |
| Uzay Macar (Anthropic Fellows) | Reasoning Model Interpretability Needs New Techniques (slides) |
| Leo Gao (OpenAI) | An Ambitious Vision for Interpretability |
| Satvik Golechha & Sid Black (UK AISI) | Auditing Games for Sandbagging (slides) |
| Bartosz Cywinski (MATS) | How Model Organisms Can Enable Useful Interpretability Studies (slides) |
| David Bau (Northeastern University) | In Defense of Curiosity (slides) |
| Jake Mendel (Coefficient Giving) | How to Get Funding for Your Interpretability Research (slides) |