Interpretability that can "reliably detect most model problems" by 2027

In an April 2025 essay, Anthropic CEO Dario Amodei stated: "Anthropic is doubling down on interpretability, and we have a goal of getting to 'interpretability can reliably detect most model problems' by 2027."

Committed 2025-04-24
Due 2027-12-31
Evaluated —
Ruling —

Why this ruling

A CEO-stated organizational goal rather than a precise, falsifiable deliverable; "by 2027" encoded as a 2027-12-31 checkpoint. Whether "most model problems" is measurable is debatable.

Source: Dario Amodei (Anthropic) ↗
Committed: 2025-04-24
As of: 2026-06-19
Contested: This ruling is genuinely disputable.

Cite this commitment

Overdue. "Interpretability that can "reliably detect most model problems" by 2027." Overdue, 2025. https://overduetracker.org/c/anthropic-interpretability-2027 (retrieved 2026-06-19).