Overdue — Frontier AI Safety Commitment Tracker

1 Overdue now

3 Upcoming

4 Missed

12 Met

7 Partial

2 Pending

Lab Status Sort

View as table ↗ · Browse by lab: OpenAI · Anthropic · Google DeepMind · xAI · Meta · Microsoft · Mistral · Amazon · Multi-lab

OpenAI governance Overdue ⚠ contested

Review the Preparedness Framework at least once a year

OpenAI’s Preparedness Framework v2 (2025-04-15) commits to reviewing and potentially updating the framework at least once a year; the next annual review fell due around 2026-04-15.

65 days overdue

Why this ruling

The 2026-04-15 deadline is derived from the v2 "at least once a year" annual-review cadence (Preparedness Framework v2, 2025-04-15), not an OpenAI-stated date. No 2026 Preparedness Framework review had been published as of 2026-06-18. The May 2026 Frontier Governance Framework is a separate document and does not update the Preparedness Framework.

Anthropic transparency Upcoming ⚠ contested

Publish a Risk Report every three to six months

RSP v3.0 commits Anthropic to publishing a Risk Report (with minimal redactions) every three to six months; the first general Risk Report was published 2026-02-24, placing the next due by about 2026-08-24.

in 66 days

Why this ruling

Cadence derived from the RSP v3.0 clause "Risk Reports will be published online (with some redactions) every 3–6 months." Six-month outer bound from the 2026-02-24 report gives ~2026-08-24 (a derived next-date, not a lab-stated one).

Anthropic governance Upcoming ⚠ contested

Annual third-party review of Responsible Scaling Policy compliance

RSP v3.0 commits Anthropic to an annual third-party review of compliance with its main procedural commitments; anchored to the 2026-02-24 v3.0 effective date, the next review falls due around 2027-02-24.

in 250 days

Why this ruling

Cadence derived from the RSP v3.0 commitment to an annual third-party procedural-compliance review; next-due ~2027-02-24 (one year from the v3.0 effective date, a derived date rather than a lab-stated one).

Anthropic evaluations Upcoming ⚠ contested

Interpretability that can "reliably detect most model problems" by 2027

In an April 2025 essay, Anthropic CEO Dario Amodei stated: "Anthropic is doubling down on interpretability, and we have a goal of getting to 'interpretability can reliably detect most model problems' by 2027."

in 560 days

Why this ruling

A CEO-stated organizational goal rather than a precise, falsifiable deliverable; "by 2027" encoded as a 2027-12-31 checkpoint. Whether "most model problems" is measurable is debatable.

Anthropic safety-framework Pending ⚠ contested

Define ASL-4 safeguards before reaching ASL-3

Anthropic’s Responsible Scaling Policy v1.0 stated its commitment was to write the ASL-4 measures before any model reaches ASL-3 capabilities.

awaiting before any model reaches ASL-3

Why this ruling

The original v1.0 trigger is documented. The trigger has since elapsed — Anthropic activated ASL-3 with Claude Opus 4 on 2025-05-22 — and v3.0 (Feb 2026) restructured away from the ASL-4 framing; whether the loosely-specified ASL-4 commitment was satisfied is genuinely disputed.

Google DeepMind evaluations Pending

Evaluate models at a compute / fine-tuning cadence

DeepMind’s FSF v1.0 stated an aim to evaluate models for every 6x increase in effective compute and every three months of fine-tuning progress.

awaiting every 6x effective compute / 3 months fine-tuning (FSF v1.0)

Why this ruling

This 6x / 3-month wording is v1.0 language; FSF v2.0 (2025) replaced the specific numbers with more flexible criteria.

xAI safety-framework Missed ⚠ contested

Publish a frontier safety framework by the Paris AI Action Summit

As a Seoul Frontier AI Safety Commitments signatory (2024-05-21), xAI was due to publish a severe-risk safety framework by the Paris AI Action Summit (2025-02-10). By the deadline it had only a watermarked DRAFT Risk Management Framework, document-dated 2025-02-20.

resolved 10 days late

Why this ruling

No finalized framework existed by the 2025-02-10 deadline; xAI’s draft RMF is document-dated 2025-02-20 (~10 days late), watermarked DRAFT, and per The Midas Project applied only to systems not yet in development (excluding Grok 3). A lenient reading would call this partial (a draft was published); scored missed because the committed deliverable did not exist by the deadline — hence contested.

Mistral safety-framework Missed ⚠ contested

Publish a frontier safety framework by the Paris AI Action Summit

As a Seoul Frontier AI Safety Commitments signatory (2024-05-21), Mistral AI was due to publish a severe-risk safety framework by the Paris AI Action Summit (2025-02-10). As of mid-2026, no such framework appears on independent indexes of published frontier safety policies.

resolved on time

Why this ruling

Mistral is a named Seoul signatory but does not appear on METR’s index of companies that have published frontier safety policies, and SaferAI rates it as having no published framework with capability thresholds. The missed ruling rests on absence from these published-framework indexes rather than a positive non-publication statement — hence contested.

Multi-lab security Missed

Protect unreleased model weights (2023 voluntary commitments)

Under the White House voluntary AI commitments (2023-07-21), signatories pledged to safeguard unreleased model weights — limiting access, running insider-threat detection, and securing storage. A 2025 study of 16 signatories found this the worst-performed of the eight commitments.

resolved 2025-08-11

Why this ruling

Assessed against public disclosures through 2024-12-31 in the Stanford-affiliated study “Do AI Companies Make Good on Voluntary Commitments to the White House?” — the lowest-scoring commitment, with 11 of 16 companies scoring 0%. Corroborated by RAND’s “Securing AI Model Weights” (2024). Scored as broadly missed across signatories. (The commitment was first signed by 7 companies on 2023-07-21; later cohorts — Sept 2023 and 2024 — brought the study’s assessed set to 16.)

OpenAI compute-pledge Missed ⚠ contested

Dedicate 20% of compute to superalignment over four years

OpenAI committed (2023-07-05) to dedicate 20% of the compute secured to date over four years to the Superalignment effort; the team was dissolved in May 2024.

resolved 2024-05-17

Why this ruling

Reporting (six sources) says the compute was not fully delivered; OpenAI did not respond to the report and disputes the broader safety criticism. Team disbanded before the four years elapsed.

xAI safety-framework Partial ⚠ contested

Publish an updated policy within three months

In a draft framework (~2025-02-20), xAI stated it would release an updated version of the policy within three months (a ~2025-05-10 deadline).

resolved 102 days late

Why this ruling

No updated policy was published by the ~2025-05-10 deadline; xAI published an updated Risk Management Framework on 2025-08-20, about three months late.

Multi-lab safety-framework Partial ⚠ contested

Publish a frontier safety framework before the Paris AI Action Summit

At the AI Seoul Summit (2024-05-21), 16 companies — including OpenAI, Anthropic, Google, Microsoft, Meta, Amazon and xAI — signed the Frontier AI Safety Commitments, agreeing to publish a safety framework focused on severe risks by the next AI Summit, held in Paris on 2025-02-10/11.

resolved on time

Why this ruling

Most signatories published a framework by the Paris Summit (Meta, Google DeepMind, Microsoft, OpenAI, Amazon, G42 and others); coverage was uneven across the 16+ signatories and some frameworks arrived close to or just after the summit, so the collective ruling is debatable.

Anthropic evaluations Partial ⚠ contested

Capability re-assessment cadence in the Responsible Scaling Policy

The RSP set a regular re-assessment cadence; a 2026-04-02 update extended a three-month evaluation interval to six months, citing rushed elicitation.

resolved 2026-04-02

Why this ruling

The interval was extended from three to six months; current policy frames Risk Reports as every 3–6 months. Whether this is a relaxation is debated.

Multi-lab transparency Partial ⚠ contested

Publicly report model capabilities and limitations (2023 voluntary commitments)

Under the White House voluntary AI commitments (2023-07-21), signatories pledged to publish reports for significant model releases covering capabilities, limitations, and domains of appropriate and inappropriate use.

resolved 2025-08-11

Why this ruling

Shallow disclosure is near-universal — frontier labs publish system/model cards for major releases — but the 2025 study found deeper indicators (limitations, societal-risk discussion, adversarial-test results) met inconsistently. Recorded as partial; contested because the basic reporting bar is broadly met while the substantive bar is not.

Multi-lab security Partial

Incentivize third-party vulnerability reporting (2023 voluntary commitments)

Under the White House voluntary AI commitments (2023-07-21), signatories pledged bounty systems or contests to incentivize responsible third-party discovery and reporting of model weaknesses.

resolved 2025-08-11

Why this ruling

The 2025 study (disclosures through 2024-12-31) scored this second-lowest, with 8 of 16 companies at 0%. Frontier labs do run AI bug bounties (OpenAI, Anthropic, Microsoft, Google), but coverage is uneven across signatories — recorded as partial.

Multi-lab transparency Partial ⚠ contested

Develop provenance or watermarking for AI-generated content (2023 voluntary commitments)

Under the White House voluntary AI commitments (2023-07-21), participating companies committed to develop robust mechanisms — including provenance and/or watermarking — so users can tell when audio or visual content is AI-generated.

resolved 2024-07-21

Why this ruling

Some signatories shipped provenance tooling (e.g. Google SynthID, C2PA Content Credentials), but a 2025 academic review found deployment across publicly available products was uneven a year on; recorded as partial.

Multi-lab access Partial ⚠ contested

Provide the UK AI Safety Institute pre-deployment model access

Following the Bletchley commitments, reporting as of late April 2024 found that most labs had not provided the UK AI Safety Institute with pre-deployment model access.

resolved 2024-04-30

Why this ruling

As of late April 2024, only Google DeepMind had provided the UK AISI pre-deployment access; OpenAI, Anthropic and Meta had not. Access expanded later in 2024 (e.g. a joint US/UK evaluation). The underlying Bletchley commitment was a voluntary aspiration to deepen access, not a firm dated deadline — hence partial and contested.

Anthropic governance Met

Long-Term Benefit Trust to elect a majority of the board

Anthropic committed that its Long-Term Benefit Trust would elect a majority of the board within four years of its 2023 Series C; Trust-appointed directors reached a board majority on 2026-04-14.

resolved 523 days early

Why this ruling

Commitment: a Trust-appointed majority within ~4 years of the mid-2023 Series C. Majority reached with the Narasimhan appointment, within the window.

Anthropic safety-framework Met

Publish Responsible Scaling Policy v3.0

Anthropic published Responsible Scaling Policy version 3.0, effective 2026-02-24.

resolved on time

Google DeepMind safety-framework Met ⚠ contested

Implement the Frontier Safety Framework by early 2025

DeepMind’s Frontier Safety Framework v1.0 (May 2024) aimed to have the framework implemented by early 2025; FSF v2.0 was published on 2025-02-04.

resolved 25 days early

Why this ruling

FSF v1.0 stated an aim to have the framework "fully implemented by early 2025"; v2.0 (2025-02-04) specified the promised protocols and capability levels. Whether publishing v2.0 fulfills a commitment to "implement" is debatable; "early 2025" encoded as a 2025-03-01 checkpoint.

Amazon safety-framework Met

Publish a frontier safety framework by the Paris AI Action Summit

As a signatory of the Seoul Frontier AI Safety Commitments (2024-05-21), Amazon published its Frontier Model Safety Framework on 2025-02-09, ahead of the Paris AI Action Summit (2025-02-10/11).

resolved 1 day early

Why this ruling

Amazon is a named Seoul signatory; its Frontier Model Safety Framework (dated 2025-02-09) cites Amazon’s endorsement of the Korea Frontier AI Safety Commitments and was published the day before the Paris summit opened.

Microsoft safety-framework Met

Publish a Frontier Governance Framework (v1)

Microsoft published version 1 of its Frontier Governance Framework, dated 2025-02-08.

resolved 2 days early

Why this ruling

Document change log records 2025-02-08 as the first version, ahead of the Paris AI Action Summit.

Meta safety-framework Met

Publish a frontier safety framework by the Paris AI Action Summit

Per the Seoul Frontier AI Safety Commitments, Meta published its Frontier AI Framework on 2025-02-03, ahead of the Paris AI Action Summit (2025-02-10/11).

resolved 7 days early

Multi-lab access Met

Sign US AI Safety Institute access agreements with OpenAI and Anthropic

On 2024-08-29 the US AI Safety Institute (NIST) announced agreements with OpenAI and Anthropic for collaboration on AI safety research, testing and evaluation, including model access.

resolved on time

Why this ruling

A bilateral access agreement the labs entered with NIST / the US AI Safety Institute (government-announced), counted here as a commitment the labs signed onto.

OpenAI governance Met

Deliver Safety and Security Committee recommendations within 90 days

The OpenAI board formed a Safety and Security Committee on 2024-05-28 with 90 days to make recommendations; the recommendations were published on 2024-09-16.

resolved 21 days late

Why this ruling

Recommendations were adopted and published; published a few weeks after the 90-day mark.

Anthropic safety-framework Met

Publish a Responsible Scaling Policy (v1.0)

Anthropic published Responsible Scaling Policy version 1.0, effective 2023-09-19.

resolved on time

Anthropic transparency Met

Publish sabotage risk reports for future frontier models

Anthropic committed at the Claude Opus 4.5 launch to publish sabotage risk reports for future frontier models; the first such report (covering Opus 4.6) was published on 2026-02-10.

resolved 2026-02-10

Why this ruling

Commitment made at the Opus 4.5 launch; first report fulfilling it covered Opus 4.6 (2026-02-10).

Anthropic security Met

Apply ASL-3 safeguards when a model may reach the ASL-3 threshold

Anthropic’s Responsible Scaling Policy commits to applying ASL-3 Security and Deployment Standards before deploying a model that may have crossed the corresponding capability threshold. On 2025-05-22 Anthropic activated ASL-3 protections with the launch of Claude Opus 4.

resolved 2025-05-22

Why this ruling

Claude Opus 4 was the first Anthropic model deployed under ASL-3; Anthropic applied the standard as a precautionary measure without definitively determining the threshold had been crossed.

Multi-lab security Met

Internal and external security testing before model release (2023 voluntary commitments)

Under the White House voluntary AI commitments (2023-07-21), participating companies committed to internal and external red-team security testing before releasing models.

resolved 2023-07-21

Why this ruling

Pre-release red-teaming became broadly standard practice among signatories; recorded as broadly met.

Upcoming regulatory milestones

Context, not scored. These are laws (e.g. the EU AI Act), not promises a lab made — shown as countdowns, never marked "missed".

Multi-labgovernance

EU General-Purpose AI Code of Practice published

The European Commission published the final General-Purpose AI Code of Practice on 2025-07-10. It is a voluntary instrument under the EU AI Act, not a promise a lab made.

in force since 2025-07-10

Source: European Commission ↗

Multi-labgovernance

EU AI Act general-purpose AI obligations begin to apply

Under EU AI Act Article 113, the general-purpose AI model obligations (Chapter V) began to apply on 2025-08-02, 12 months after the Act entered into force.

in force since 2025-08-02

Source: EU AI Act ↗

Multi-labgovernance

EU AI Act obligations apply to GPAI models already on the market

Under EU AI Act Article 111, providers of general-purpose AI models placed on the market before 2025-08-02 must comply with the GPAI obligations by 2027-08-02.

in 409 days

Source: EU AI Act ↗

Multi-labgovernance

EU AI Act high-risk (Annex III) obligations apply

The EU AI Act originally applied Annex III (standalone, use-based) high-risk obligations from 2026-08-02. The Digital Omnibus deal (provisional political agreement, May 2026) defers them to 2027-12-02.

in 531 days

Source: Hogan Lovells ↗

Multi-labgovernance

EU AI Act high-risk (Annex I) obligations apply

The Article 6(1) obligations for high-risk systems that are safety components of products covered by EU product law (Annex I) originally applied from 2027-08-02. The Digital Omnibus deal (provisional agreement, May 2026) defers them to 2028-08-02.

in 775 days

Source: Hogan Lovells ↗

Multi-labgovernance

EU AI Act first Commission evaluation and review

Under EU AI Act Article 112, the Commission must evaluate the need to amend the high-risk list and related provisions by 2028-08-02, then every four years.

in 775 days

Source: EU AI Act ↗

Latest updates

Overdue launches 2026-06-18

All updates · RSS