Anthropic Open-Sources Circuit Tracing Tools to Illuminate AI Decision-Making

May 30, 2025Provided by Utku Ege Tuluk

On May 29, 2025, Anthropic announced the open-sourcing of its circuit tracing tools, marking a significant advancement in AI interpretability research. These tools are designed to generate attribution graphs that partially reveal the internal decision-making processes of large language models (LLMs). By visualizing how models process inputs to produce outputs, researchers can gain deeper insights into AI behavior.(Anthropic, Daily.dev, AIbase)

Understanding Attribution Graphs

Attribution graphs serve as visual representations of the pathways and features activated within a model during inference. They illustrate the flow of information, highlighting which components contribute to specific outputs. This approach moves beyond analyzing individual neurons, focusing instead on interpretable features that align more closely with human-understandable concepts.(Transformer Circuits, Top AI Tools List – OpenTools, Medium)

The development of these tools was led by Anthropic Fellows Michael Hanna and Mateusz Piotrowski, with mentorship from Emmanuel Ameisen and Jack Lindsey. The interactive frontend, Neuronpedia, was implemented by Decode Research, enabling users to explore, annotate, and share attribution graphs seamlessly.(Anthropic)

Applications and Use Cases

Researchers have already employed these tools to investigate complex behaviors in models like Gemma-2-2b and Llama-3.2-1b. Notable findings include:(Anthropic, Wikipedia)

Multi-Step Reasoning: Tracing how models perform layered reasoning tasks.
Multilingual Representations: Understanding how models process information across different languages.
Planning in Text Generation: Observing how models plan outputs in tasks like poetry composition.(Anthropic, Anthropic, Wikipedia)

These insights are detailed in Anthropic’s demo notebooks and further explored through the Neuronpedia interface.(Anthropic)

Advancing AI Transparency

Anthropic’s CEO, Dario Amodei, emphasized the importance of interpretability in AI development. By open-sourcing these tools, Anthropic aims to bridge the gap between AI capabilities and our understanding of their inner workings. This initiative invites the broader research community to contribute to and benefit from enhanced transparency in AI systems.(Anthropic)

Getting Started

Researchers and developers can begin exploring attribution graphs through the Neuronpedia interface. For more advanced usage and customization, the open-source code repository provides comprehensive resources. Anthropic encourages community engagement to further refine these tools and expand their applications across various AI models.(AIbase, GitHub, Anthropic)

Understanding Attribution Graphs

Applications and Use Cases

Advancing AI Transparency

Getting Started

New York University