Schedule

The workshop will be held on Sunday, December 7, 2025 at the NeurIPS venue in San Diego, California (Upper Level Room 30A-E).

Workshop Program

TimeActivity
8:00am – 9:00amSocializing
9:00am – 9:15amCoffee break
9:15am – 9:30amOpening remarks
9:30am – 10:00amBeen Kim · Towards a Pareto Frontier of Interpretability: 15 Years of Research in 15 Mins
10:00am – 10:30amSarah Schwettmann · Scalable End-to-End Interpretability
10:30am – 11:00amCoffee break
11:00am – 11:30amSpotlight talks (session 1)
11:30am – 12:30pmPosters (session 1)
12:30pm – 1:30pmLunch break
1:30pm – 2:00pmSpotlight talks (session 2)
2:00pm – 3:00pmPosters (session 2)
3:00pm – 3:30pmCoffee break
3:30pm – 4:00pmChris Olah (Remote) · Reflections on Interpretability
4:00pm – 5:00pmInvited lightning talks

Invited Lightning Talks

SpeakerTitle
Ekdeep Singh Lubana (Goodfire)What is the Right Basis for Computation? (slides)
Adam Belfki (NDIF / Northeastern University)Infrastructure for Your Interpretability Research (slides)
Josh Engels (Google DeepMind)A Pragmatic Vision for Interpretability (slides)
Rowan Wang (Anthropic)Just Ask the Model: Interpreting Models with Clever Prompting (slides)
Sheridan Feucht (Northeastern University)Equifinality: There’s Often Multiple Explanations (slides)
Uzay Macar (Anthropic Fellows)Reasoning Model Interpretability Needs New Techniques (slides)
Leo Gao (OpenAI)An Ambitious Vision for Interpretability
Satvik Golechha & Sid Black (UK AISI)Auditing Games for Sandbagging (slides)
Bartosz Cywinski (MATS)How Model Organisms Can Enable Useful Interpretability Studies (slides)
David Bau (Northeastern University)In Defense of Curiosity (slides)
Jake Mendel (Coefficient Giving)How to Get Funding for Your Interpretability Research (slides)