kept = met ÷ resolved (resolved = met + missed + partial; counts shown so the number always carries its context).
Under the White House voluntary AI commitments (2023-07-21), signatories pledged to safeguard unreleased model weights — limiting access, running insider-threat detection, and securing storage. A 2025 study of 16 signatories found this the worst-performed of the eight commitments.
resolved 2025-08-11
Why this ruling
Assessed against public disclosures through 2024-12-31 in the Stanford-affiliated study “Do AI Companies Make Good on Voluntary Commitments to the White House?” — the lowest-scoring commitment, with 11 of 16 companies scoring 0%. Corroborated by RAND’s “Securing AI Model Weights” (2024). Scored as broadly missed across signatories. (The commitment was first signed by 7 companies on 2023-07-21; later cohorts — Sept 2023 and 2024 — brought the study’s assessed set to 16.)
Multi-lab safety-framework Partial ⚠ contested At the AI Seoul Summit (2024-05-21), 16 companies — including OpenAI, Anthropic, Google, Microsoft, Meta, Amazon and xAI — signed the Frontier AI Safety Commitments, agreeing to publish a safety framework focused on severe risks by the next AI Summit, held in Paris on 2025-02-10/11.
resolved on time
Why this ruling
Most signatories published a framework by the Paris Summit (Meta, Google DeepMind, Microsoft, OpenAI, Amazon, G42 and others); coverage was uneven across the 16+ signatories and some frameworks arrived close to or just after the summit, so the collective ruling is debatable.
Under the White House voluntary AI commitments (2023-07-21), signatories pledged to publish reports for significant model releases covering capabilities, limitations, and domains of appropriate and inappropriate use.
resolved 2025-08-11
Why this ruling
Shallow disclosure is near-universal — frontier labs publish system/model cards for major releases — but the 2025 study found deeper indicators (limitations, societal-risk discussion, adversarial-test results) met inconsistently. Recorded as partial; contested because the basic reporting bar is broadly met while the substantive bar is not.
Under the White House voluntary AI commitments (2023-07-21), signatories pledged bounty systems or contests to incentivize responsible third-party discovery and reporting of model weaknesses.
resolved 2025-08-11
Why this ruling
The 2025 study (disclosures through 2024-12-31) scored this second-lowest, with 8 of 16 companies at 0%. Frontier labs do run AI bug bounties (OpenAI, Anthropic, Microsoft, Google), but coverage is uneven across signatories — recorded as partial.
Under the White House voluntary AI commitments (2023-07-21), participating companies committed to develop robust mechanisms — including provenance and/or watermarking — so users can tell when audio or visual content is AI-generated.
resolved 2024-07-21
Why this ruling
Some signatories shipped provenance tooling (e.g. Google SynthID, C2PA Content Credentials), but a 2025 academic review found deployment across publicly available products was uneven a year on; recorded as partial.
Following the Bletchley commitments, reporting as of late April 2024 found that most labs had not provided the UK AI Safety Institute with pre-deployment model access.
resolved 2024-04-30
Why this ruling
As of late April 2024, only Google DeepMind had provided the UK AISI pre-deployment access; OpenAI, Anthropic and Meta had not. Access expanded later in 2024 (e.g. a joint US/UK evaluation). The underlying Bletchley commitment was a voluntary aspiration to deepen access, not a firm dated deadline — hence partial and contested.
On 2024-08-29 the US AI Safety Institute (NIST) announced agreements with OpenAI and Anthropic for collaboration on AI safety research, testing and evaluation, including model access.
resolved on time
Why this ruling
A bilateral access agreement the labs entered with NIST / the US AI Safety Institute (government-announced), counted here as a commitment the labs signed onto.
Under the White House voluntary AI commitments (2023-07-21), participating companies committed to internal and external red-team security testing before releasing models.
resolved 2023-07-21
Why this ruling
Pre-release red-teaming became broadly standard practice among signatories; recorded as broadly met.