OpenAI governance Overdue ⚠ contested OpenAI’s Preparedness Framework v2 (2025-04-15) commits to reviewing and potentially updating the framework at least once a year; the next annual review fell due around 2026-04-15.
65 days overdue
Why this ruling
The 2026-04-15 deadline is derived from the v2 "at least once a year" annual-review cadence (Preparedness Framework v2, 2025-04-15), not an OpenAI-stated date. No 2026 Preparedness Framework review had been published as of 2026-06-18. The May 2026 Frontier Governance Framework is a separate document and does not update the Preparedness Framework.
RSP v3.0 commits Anthropic to publishing a Risk Report (with minimal redactions) every three to six months; the first general Risk Report was published 2026-02-24, placing the next due by about 2026-08-24.
in 66 days
Why this ruling
Cadence derived from the RSP v3.0 clause "Risk Reports will be published online (with some redactions) every 3–6 months." Six-month outer bound from the 2026-02-24 report gives ~2026-08-24 (a derived next-date, not a lab-stated one).
RSP v3.0 commits Anthropic to an annual third-party review of compliance with its main procedural commitments; anchored to the 2026-02-24 v3.0 effective date, the next review falls due around 2027-02-24.
in 250 days
Why this ruling
Cadence derived from the RSP v3.0 commitment to an annual third-party procedural-compliance review; next-due ~2027-02-24 (one year from the v3.0 effective date, a derived date rather than a lab-stated one).
In an April 2025 essay, Anthropic CEO Dario Amodei stated: "Anthropic is doubling down on interpretability, and we have a goal of getting to 'interpretability can reliably detect most model problems' by 2027."
in 560 days
Why this ruling
A CEO-stated organizational goal rather than a precise, falsifiable deliverable; "by 2027" encoded as a 2027-12-31 checkpoint. Whether "most model problems" is measurable is debatable.
Anthropic safety-framework Pending ⚠ contested Anthropic’s Responsible Scaling Policy v1.0 stated its commitment was to write the ASL-4 measures before any model reaches ASL-3 capabilities.
awaiting before any model reaches ASL-3
Why this ruling
The original v1.0 trigger is documented. The trigger has since elapsed — Anthropic activated ASL-3 with Claude Opus 4 on 2025-05-22 — and v3.0 (Feb 2026) restructured away from the ASL-4 framing; whether the loosely-specified ASL-4 commitment was satisfied is genuinely disputed.
DeepMind’s FSF v1.0 stated an aim to evaluate models for every 6x increase in effective compute and every three months of fine-tuning progress.
awaiting every 6x effective compute / 3 months fine-tuning (FSF v1.0)
Why this ruling
This 6x / 3-month wording is v1.0 language; FSF v2.0 (2025) replaced the specific numbers with more flexible criteria.
xAI safety-framework Missed ⚠ contested As a Seoul Frontier AI Safety Commitments signatory (2024-05-21), xAI was due to publish a severe-risk safety framework by the Paris AI Action Summit (2025-02-10). By the deadline it had only a watermarked DRAFT Risk Management Framework, document-dated 2025-02-20.
resolved 10 days late
Why this ruling
No finalized framework existed by the 2025-02-10 deadline; xAI’s draft RMF is document-dated 2025-02-20 (~10 days late), watermarked DRAFT, and per The Midas Project applied only to systems not yet in development (excluding Grok 3). A lenient reading would call this partial (a draft was published); scored missed because the committed deliverable did not exist by the deadline — hence contested.
Mistral safety-framework Missed ⚠ contested As a Seoul Frontier AI Safety Commitments signatory (2024-05-21), Mistral AI was due to publish a severe-risk safety framework by the Paris AI Action Summit (2025-02-10). As of mid-2026, no such framework appears on independent indexes of published frontier safety policies.
resolved on time
Why this ruling
Mistral is a named Seoul signatory but does not appear on METR’s index of companies that have published frontier safety policies, and SaferAI rates it as having no published framework with capability thresholds. The missed ruling rests on absence from these published-framework indexes rather than a positive non-publication statement — hence contested.
Under the White House voluntary AI commitments (2023-07-21), signatories pledged to safeguard unreleased model weights — limiting access, running insider-threat detection, and securing storage. A 2025 study of 16 signatories found this the worst-performed of the eight commitments.
resolved 2025-08-11
Why this ruling
Assessed against public disclosures through 2024-12-31 in the Stanford-affiliated study “Do AI Companies Make Good on Voluntary Commitments to the White House?” — the lowest-scoring commitment, with 11 of 16 companies scoring 0%. Corroborated by RAND’s “Securing AI Model Weights” (2024). Scored as broadly missed across signatories. (The commitment was first signed by 7 companies on 2023-07-21; later cohorts — Sept 2023 and 2024 — brought the study’s assessed set to 16.)
OpenAI compute-pledge Missed ⚠ contested OpenAI committed (2023-07-05) to dedicate 20% of the compute secured to date over four years to the Superalignment effort; the team was dissolved in May 2024.
resolved 2024-05-17
Why this ruling
Reporting (six sources) says the compute was not fully delivered; OpenAI did not respond to the report and disputes the broader safety criticism. Team disbanded before the four years elapsed.
xAI safety-framework Partial ⚠ contested In a draft framework (~2025-02-20), xAI stated it would release an updated version of the policy within three months (a ~2025-05-10 deadline).
resolved 102 days late
Why this ruling
No updated policy was published by the ~2025-05-10 deadline; xAI published an updated Risk Management Framework on 2025-08-20, about three months late.
Multi-lab safety-framework Partial ⚠ contested At the AI Seoul Summit (2024-05-21), 16 companies — including OpenAI, Anthropic, Google, Microsoft, Meta, Amazon and xAI — signed the Frontier AI Safety Commitments, agreeing to publish a safety framework focused on severe risks by the next AI Summit, held in Paris on 2025-02-10/11.
resolved on time
Why this ruling
Most signatories published a framework by the Paris Summit (Meta, Google DeepMind, Microsoft, OpenAI, Amazon, G42 and others); coverage was uneven across the 16+ signatories and some frameworks arrived close to or just after the summit, so the collective ruling is debatable.
The RSP set a regular re-assessment cadence; a 2026-04-02 update extended a three-month evaluation interval to six months, citing rushed elicitation.
resolved 2026-04-02
Why this ruling
The interval was extended from three to six months; current policy frames Risk Reports as every 3–6 months. Whether this is a relaxation is debated.
Under the White House voluntary AI commitments (2023-07-21), signatories pledged to publish reports for significant model releases covering capabilities, limitations, and domains of appropriate and inappropriate use.
resolved 2025-08-11
Why this ruling
Shallow disclosure is near-universal — frontier labs publish system/model cards for major releases — but the 2025 study found deeper indicators (limitations, societal-risk discussion, adversarial-test results) met inconsistently. Recorded as partial; contested because the basic reporting bar is broadly met while the substantive bar is not.
Under the White House voluntary AI commitments (2023-07-21), signatories pledged bounty systems or contests to incentivize responsible third-party discovery and reporting of model weaknesses.
resolved 2025-08-11
Why this ruling
The 2025 study (disclosures through 2024-12-31) scored this second-lowest, with 8 of 16 companies at 0%. Frontier labs do run AI bug bounties (OpenAI, Anthropic, Microsoft, Google), but coverage is uneven across signatories — recorded as partial.
Under the White House voluntary AI commitments (2023-07-21), participating companies committed to develop robust mechanisms — including provenance and/or watermarking — so users can tell when audio or visual content is AI-generated.
resolved 2024-07-21
Why this ruling
Some signatories shipped provenance tooling (e.g. Google SynthID, C2PA Content Credentials), but a 2025 academic review found deployment across publicly available products was uneven a year on; recorded as partial.
Following the Bletchley commitments, reporting as of late April 2024 found that most labs had not provided the UK AI Safety Institute with pre-deployment model access.
resolved 2024-04-30
Why this ruling
As of late April 2024, only Google DeepMind had provided the UK AISI pre-deployment access; OpenAI, Anthropic and Meta had not. Access expanded later in 2024 (e.g. a joint US/UK evaluation). The underlying Bletchley commitment was a voluntary aspiration to deepen access, not a firm dated deadline — hence partial and contested.
Anthropic committed that its Long-Term Benefit Trust would elect a majority of the board within four years of its 2023 Series C; Trust-appointed directors reached a board majority on 2026-04-14.
resolved 523 days early
Why this ruling
Commitment: a Trust-appointed majority within ~4 years of the mid-2023 Series C. Majority reached with the Narasimhan appointment, within the window.
Anthropic published Responsible Scaling Policy version 3.0, effective 2026-02-24.
resolved on time
DeepMind’s Frontier Safety Framework v1.0 (May 2024) aimed to have the framework implemented by early 2025; FSF v2.0 was published on 2025-02-04.
resolved 25 days early
Why this ruling
FSF v1.0 stated an aim to have the framework "fully implemented by early 2025"; v2.0 (2025-02-04) specified the promised protocols and capability levels. Whether publishing v2.0 fulfills a commitment to "implement" is debatable; "early 2025" encoded as a 2025-03-01 checkpoint.
As a signatory of the Seoul Frontier AI Safety Commitments (2024-05-21), Amazon published its Frontier Model Safety Framework on 2025-02-09, ahead of the Paris AI Action Summit (2025-02-10/11).
resolved 1 day early
Why this ruling
Amazon is a named Seoul signatory; its Frontier Model Safety Framework (dated 2025-02-09) cites Amazon’s endorsement of the Korea Frontier AI Safety Commitments and was published the day before the Paris summit opened.
Microsoft published version 1 of its Frontier Governance Framework, dated 2025-02-08.
resolved 2 days early
Why this ruling
Document change log records 2025-02-08 as the first version, ahead of the Paris AI Action Summit.
Meta safety-framework Met Per the Seoul Frontier AI Safety Commitments, Meta published its Frontier AI Framework on 2025-02-03, ahead of the Paris AI Action Summit (2025-02-10/11).
resolved 7 days early
On 2024-08-29 the US AI Safety Institute (NIST) announced agreements with OpenAI and Anthropic for collaboration on AI safety research, testing and evaluation, including model access.
resolved on time
Why this ruling
A bilateral access agreement the labs entered with NIST / the US AI Safety Institute (government-announced), counted here as a commitment the labs signed onto.
The OpenAI board formed a Safety and Security Committee on 2024-05-28 with 90 days to make recommendations; the recommendations were published on 2024-09-16.
resolved 21 days late
Why this ruling
Recommendations were adopted and published; published a few weeks after the 90-day mark.
Anthropic published Responsible Scaling Policy version 1.0, effective 2023-09-19.
resolved on time
Anthropic committed at the Claude Opus 4.5 launch to publish sabotage risk reports for future frontier models; the first such report (covering Opus 4.6) was published on 2026-02-10.
resolved 2026-02-10
Why this ruling
Commitment made at the Opus 4.5 launch; first report fulfilling it covered Opus 4.6 (2026-02-10).
Anthropic’s Responsible Scaling Policy commits to applying ASL-3 Security and Deployment Standards before deploying a model that may have crossed the corresponding capability threshold. On 2025-05-22 Anthropic activated ASL-3 protections with the launch of Claude Opus 4.
resolved 2025-05-22
Why this ruling
Claude Opus 4 was the first Anthropic model deployed under ASL-3; Anthropic applied the standard as a precautionary measure without definitively determining the threshold had been crossed.
Under the White House voluntary AI commitments (2023-07-21), participating companies committed to internal and external red-team security testing before releasing models.
resolved 2023-07-21
Why this ruling
Pre-release red-teaming became broadly standard practice among signatories; recorded as broadly met.
No commitments match these filters.