Mechanistic Interpretability Workshop at NeurIPS 2025

December 6 or 7, 2025 • San Diego, California

Call for papers open • Due August 22nd • Non-archival • 4-page or 9-page limit • NeurIPS submissions welcome

As neural networks grow in influence and capability, understanding the mechanisms behind their decisions remains a fundamental scientific challenge. This gap between performance and understanding limits our ability to predict model behavior, ensure reliability, and detect sophisticated adversarial or deceptive behavior. Many of the deepest scientific mysteries in machine learning may remain out of reach if we cannot look inside the black box.

Mechanistic interpretability addresses this challenge by developing principled methods to analyze and understand a model’s internals–weights and activations–and to use this understanding to gain greater insight into its behavior, and the computation underlying it.

The field has grown rapidly, with sizable communities in academia, industry and independent research, 140+ papers submitted to our ICML 2024 workshop, dedicated startups, and a rich ecosystem of tools and techniques. This workshop goal is to bring together diverse perspectives from the community to discuss recent advances, build common understanding and chart future directions.

Workshop photo 1 Workshop photo 2

The first Mechanistic Interpretability Workshop (ICML 2024)

Keynote Speakers

Chris Olah

Chris Olah

Interpretability Lead and Co-founder, Anthropic

Been Kim

Been Kim

Senior Staff Research Scientist, Google DeepMind

Sarah Schwettmann

Sarah Schwettmann

Co-founder of Transluce

Workshop Goals

The mechanistic interpretability field benefits from a rich diversity of approaches—from rigorous mathematical analysis to large-scale empirical studies, from reverse-engineering a model via bottom-up circuit analysis, to assisting behavioral analysis via top-down analysis of model representations. But all are unified by the belief that there is meaning and structure to be found inside neural networks, and that this is worth studying.

This diversity reflects the field’s breadth and the many valid paths toward understanding neural networks. But those in these different sub-communities often lack natural venues to meet. Our workshop aims to:

We hope to explore points of active debate in the field including:

In this workshop, we hope to bring together researchers from across these many perspectives and communities—along with skeptics, experts in adjacent fields, and those simply curious to learn more—to facilitate healthy discussion and move towards a greater mutual understanding as a field.

Through our call for papers, we hope to facilitate the sharing of work in this fast-moving field, across all of these axes, and especially work that helps to bridge these gaps. We welcome any submissions that seek to further our ability to use the internals of models to achieve understanding, regardless of how unconventional the approach may be. Please see the call for papers page for further details and particular topics of interest.

We welcome attendees from all backgrounds, regardless of your prior research experience or if you have work published at this workshop. Note that while you do not need to be registered for the NeurIPS main conference to attend this workshop, you do need to be registered for the NeurIPS workshop track. No further registration needed, seating is first-come first-served.

Learning More

Here are some resources you may find useful for learning more about the mechanistic interpretability field and performing research:

Resources for doing research

Relevant online communities:

Schedule (Provisional)

TimeActivity
09:00 - 09:30Welcome and survey talk
09:30 - 10:00Talk: Been Kim
10:00 - 11:00Contributed talks 1
11:00 - 12:00Poster session 1, coffee
12:00 - 13:00Lunch with organised discussions
13:00 - 13:30Talk: Sarah Schwettmann
13:30 - 14:30Contributed talks 2
14:30 - 15:30Poster session 2, coffee
15:30 - 16:00Talk: Chris Olah
16:00 - 16:30Coffee & Networking break
16:30 - 17:20Panel discussion
17:20 - 17:30Awards & closing
19:00 - 22:00Evening social (invite-only)

Organizing Committee

Neel Nanda

Neel Nanda

Senior Research Scientist, Google DeepMind

Martin Wattenberg

Martin Wattenberg

Professor, Harvard University & Principal Research Scientist, Google DeepMind

Sarah Wiegreffe

Sarah Wiegreffe

Postdoc, Allen Institute for AI, incoming Assistant Professor, University of Maryland

Atticus Geiger

Atticus Geiger

Lead, Pr(Ai)²R Group

Julius Adebayo

Julius Adebayo

Founder and Researcher, Guide Labs

Kayo Yin

Kayo Yin

3rd year PhD student, UC Berkeley

Fazl Barez

Fazl Barez

Senior Research Fellow, Oxford Martin AI Governance Initiative

Lawrence Chan

Lawrence Chan

Researcher, METR

Matthew Wearden

Matthew Wearden

London Director, MATS

Contact

Email: neurips2025@mechinterpworkshop.com