Schedule

The workshop will be held on Sunday, December 7, 2025 at the NeurIPS venue in San Diego, California (Upper Level Room 30A-E).

Workshop Program

Time	Activity
8:00am – 9:00am	Socializing
9:00am – 9:15am	Coffee break
9:15am – 9:30am	Opening remarks
9:30am – 10:00am	Been Kim · Towards a Pareto Frontier of Interpretability: 15 Years of Research in 15 Mins
10:00am – 10:30am	Sarah Schwettmann · Scalable End-to-End Interpretability
10:30am – 11:00am	Coffee break
11:00am – 11:30am	Spotlight talks (session 1)
11:30am – 12:30pm	Posters (session 1)
12:30pm – 1:30pm	Lunch break
1:30pm – 2:00pm	Spotlight talks (session 2)
2:00pm – 3:00pm	Posters (session 2)
3:00pm – 3:30pm	Coffee break
3:30pm – 4:00pm	Chris Olah (Remote) · Reflections on Interpretability
4:00pm – 5:00pm	Invited lightning talks

Speaker	Title
Ekdeep Singh Lubana (Goodfire)	What is the Right Basis for Computation? (slides)
Adam Belfki (NDIF / Northeastern University)	Infrastructure for Your Interpretability Research (slides)
Josh Engels (Google DeepMind)	A Pragmatic Vision for Interpretability (slides)
Rowan Wang (Anthropic)	Just Ask the Model: Interpreting Models with Clever Prompting (slides)
Sheridan Feucht (Northeastern University)	Equifinality: There’s Often Multiple Explanations (slides)
Uzay Macar (Anthropic Fellows)	Reasoning Model Interpretability Needs New Techniques (slides)
Leo Gao (OpenAI)	An Ambitious Vision for Interpretability
Satvik Golechha & Sid Black (UK AISI)	Auditing Games for Sandbagging (slides)
Bartosz Cywinski (MATS)	How Model Organisms Can Enable Useful Interpretability Studies (slides)
David Bau (Northeastern University)	In Defense of Curiosity (slides)
Jake Mendel (Coefficient Giving)	How to Get Funding for Your Interpretability Research (slides)