Anthropic — Overdue

Anthropic transparency Upcoming ⚠ contested

Publish a Risk Report every three to six months

RSP v3.0 commits Anthropic to publishing a Risk Report (with minimal redactions) every three to six months; the first general Risk Report was published 2026-02-24, placing the next due by about 2026-08-24.

in 66 days

Why this ruling

Cadence derived from the RSP v3.0 clause "Risk Reports will be published online (with some redactions) every 3–6 months." Six-month outer bound from the 2026-02-24 report gives ~2026-08-24 (a derived next-date, not a lab-stated one).

Anthropic governance Upcoming ⚠ contested

Annual third-party review of Responsible Scaling Policy compliance

RSP v3.0 commits Anthropic to an annual third-party review of compliance with its main procedural commitments; anchored to the 2026-02-24 v3.0 effective date, the next review falls due around 2027-02-24.

in 250 days

Why this ruling

Cadence derived from the RSP v3.0 commitment to an annual third-party procedural-compliance review; next-due ~2027-02-24 (one year from the v3.0 effective date, a derived date rather than a lab-stated one).

Anthropic evaluations Upcoming ⚠ contested

Interpretability that can "reliably detect most model problems" by 2027

In an April 2025 essay, Anthropic CEO Dario Amodei stated: "Anthropic is doubling down on interpretability, and we have a goal of getting to 'interpretability can reliably detect most model problems' by 2027."

in 560 days

Why this ruling

A CEO-stated organizational goal rather than a precise, falsifiable deliverable; "by 2027" encoded as a 2027-12-31 checkpoint. Whether "most model problems" is measurable is debatable.

Anthropic safety-framework Pending ⚠ contested

Define ASL-4 safeguards before reaching ASL-3

Anthropic’s Responsible Scaling Policy v1.0 stated its commitment was to write the ASL-4 measures before any model reaches ASL-3 capabilities.

awaiting before any model reaches ASL-3

Why this ruling

The original v1.0 trigger is documented. The trigger has since elapsed — Anthropic activated ASL-3 with Claude Opus 4 on 2025-05-22 — and v3.0 (Feb 2026) restructured away from the ASL-4 framing; whether the loosely-specified ASL-4 commitment was satisfied is genuinely disputed.

Anthropic evaluations Partial ⚠ contested

Capability re-assessment cadence in the Responsible Scaling Policy

The RSP set a regular re-assessment cadence; a 2026-04-02 update extended a three-month evaluation interval to six months, citing rushed elicitation.

resolved 2026-04-02

Why this ruling

The interval was extended from three to six months; current policy frames Risk Reports as every 3–6 months. Whether this is a relaxation is debated.

Anthropic governance Met

Long-Term Benefit Trust to elect a majority of the board

Anthropic committed that its Long-Term Benefit Trust would elect a majority of the board within four years of its 2023 Series C; Trust-appointed directors reached a board majority on 2026-04-14.

resolved 523 days early

Why this ruling

Commitment: a Trust-appointed majority within ~4 years of the mid-2023 Series C. Majority reached with the Narasimhan appointment, within the window.

Anthropic safety-framework Met

Publish Responsible Scaling Policy v3.0

Anthropic published Responsible Scaling Policy version 3.0, effective 2026-02-24.

resolved on time

Anthropic safety-framework Met

Publish a Responsible Scaling Policy (v1.0)

Anthropic published Responsible Scaling Policy version 1.0, effective 2023-09-19.

resolved on time

Anthropic transparency Met

Publish sabotage risk reports for future frontier models

Anthropic committed at the Claude Opus 4.5 launch to publish sabotage risk reports for future frontier models; the first such report (covering Opus 4.6) was published on 2026-02-10.

resolved 2026-02-10

Why this ruling

Commitment made at the Opus 4.5 launch; first report fulfilling it covered Opus 4.6 (2026-02-10).

Anthropic security Met

Apply ASL-3 safeguards when a model may reach the ASL-3 threshold

Anthropic’s Responsible Scaling Policy commits to applying ASL-3 Security and Deployment Standards before deploying a model that may have crossed the corresponding capability threshold. On 2025-05-22 Anthropic activated ASL-3 protections with the launch of Claude Opus 4.

resolved 2025-05-22

Why this ruling

Claude Opus 4 was the first Anthropic model deployed under ASL-3; Anthropic applied the standard as a precautionary measure without definitively determining the threshold had been crossed.