kept = met ÷ resolved (resolved = met + missed + partial; counts shown so the number always carries its context).
RSP v3.0 commits Anthropic to publishing a Risk Report (with minimal redactions) every three to six months; the first general Risk Report was published 2026-02-24, placing the next due by about 2026-08-24.
in 66 days
Why this ruling
Cadence derived from the RSP v3.0 clause "Risk Reports will be published online (with some redactions) every 3–6 months." Six-month outer bound from the 2026-02-24 report gives ~2026-08-24 (a derived next-date, not a lab-stated one).
RSP v3.0 commits Anthropic to an annual third-party review of compliance with its main procedural commitments; anchored to the 2026-02-24 v3.0 effective date, the next review falls due around 2027-02-24.
in 250 days
Why this ruling
Cadence derived from the RSP v3.0 commitment to an annual third-party procedural-compliance review; next-due ~2027-02-24 (one year from the v3.0 effective date, a derived date rather than a lab-stated one).
In an April 2025 essay, Anthropic CEO Dario Amodei stated: "Anthropic is doubling down on interpretability, and we have a goal of getting to 'interpretability can reliably detect most model problems' by 2027."
in 560 days
Why this ruling
A CEO-stated organizational goal rather than a precise, falsifiable deliverable; "by 2027" encoded as a 2027-12-31 checkpoint. Whether "most model problems" is measurable is debatable.
Anthropic safety-framework Pending ⚠ contested Anthropic’s Responsible Scaling Policy v1.0 stated its commitment was to write the ASL-4 measures before any model reaches ASL-3 capabilities.
awaiting before any model reaches ASL-3
Why this ruling
The original v1.0 trigger is documented. The trigger has since elapsed — Anthropic activated ASL-3 with Claude Opus 4 on 2025-05-22 — and v3.0 (Feb 2026) restructured away from the ASL-4 framing; whether the loosely-specified ASL-4 commitment was satisfied is genuinely disputed.
The RSP set a regular re-assessment cadence; a 2026-04-02 update extended a three-month evaluation interval to six months, citing rushed elicitation.
resolved 2026-04-02
Why this ruling
The interval was extended from three to six months; current policy frames Risk Reports as every 3–6 months. Whether this is a relaxation is debated.
Anthropic committed that its Long-Term Benefit Trust would elect a majority of the board within four years of its 2023 Series C; Trust-appointed directors reached a board majority on 2026-04-14.
resolved 523 days early
Why this ruling
Commitment: a Trust-appointed majority within ~4 years of the mid-2023 Series C. Majority reached with the Narasimhan appointment, within the window.
Anthropic published Responsible Scaling Policy version 3.0, effective 2026-02-24.
resolved on time
Anthropic published Responsible Scaling Policy version 1.0, effective 2023-09-19.
resolved on time
Anthropic committed at the Claude Opus 4.5 launch to publish sabotage risk reports for future frontier models; the first such report (covering Opus 4.6) was published on 2026-02-10.
resolved 2026-02-10
Why this ruling
Commitment made at the Opus 4.5 launch; first report fulfilling it covered Opus 4.6 (2026-02-10).
Anthropic’s Responsible Scaling Policy commits to applying ASL-3 Security and Deployment Standards before deploying a model that may have crossed the corresponding capability threshold. On 2025-05-22 Anthropic activated ASL-3 protections with the launch of Claude Opus 4.
resolved 2025-05-22
Why this ruling
Claude Opus 4 was the first Anthropic model deployed under ASL-3; Anthropic applied the standard as a precautionary measure without definitively determining the threshold had been crossed.