Schedule

The workshop will be held on Sunday, December 7, 2025 at the NeurIPS venue in San Diego, California (Upper Level Room 30A-E).

Workshop Program

TimeActivity
8:00am – 9:00amSocializing
9:00am – 9:15amCoffee break
9:15am – 9:30amOpening remarks
9:30am – 10:00amBeen Kim · Towards a Pareto Frontier of Interpretability: 15 Years of Research in 15 Mins
10:00am – 10:30amSarah Schwettmann · Scalable End-to-End Interpretability
10:30am – 11:00amCoffee break
11:00am – 11:30amSpotlight talks (session 1)
11:30am – 12:30pmPosters (session 1)
12:30pm – 1:30pmLunch break
1:30pm – 2:00pmSpotlight talks (session 2)
2:00pm – 3:00pmPosters (session 2)
3:00pm – 3:30pmCoffee break
3:30pm – 4:00pmChris Olah (Remote) · Reflections on Interpretability
4:00pm – 5:00pmInvited lightning talks

Invited Lightning Talks

SpeakerTitle
Ekdeep Singh Lubana (Goodfire)What is the Right Basis for Computation?
Adam Belfki (NDIF / Northeastern University)Infrastructure for Your Interpretability Research
Josh Engels (Google DeepMind)A Pragmatic Vision for Interpretability
Rowan Wang (Anthropic)Just Ask the Model: Interpreting Models with Clever Prompting
Sheridan Feucht (Northeastern University)Equifinality: There’s Often Multiple Explanations
Uzay Macar (Anthropic Fellows)Reasoning Model Interpretability Needs New Techniques
Leo Gao (OpenAI)An Ambitious Vision for Interpretability
Satvik Golechha & Sid Black (UK AISI)Auditing Games for Sandbagging
Bartosz Cywinski (MATS)How Model Organisms Can Enable Useful Interpretability Studies
David Bau (Northeastern University)In Defense of Curiosity
Jake Mendel (Coefficient Giving)How to Get Funding for Your Interpretability Research