The National Institute of Standards and Technology (NIST) announced Tuesday that three major artificial intelligence firms—Google DeepMind, Microsoft, and xAI—have agreed to submit their models for federal testing before they are deployed. The agreements mark a significant expansion of government oversight of frontier AI capabilities.
Under the pacts, the companies will provide their models to NIST’s Center for AI Standards and Innovation (CAISI) for what the agency described as “pre-deployment evaluations and targeted research to better assess the frontier AI capabilities and advance the state of AI security.” CAISI will also conduct post-deployment assessments, and the office has already completed more than 40 such evaluations, according to the agency.
These agreements follow similar commitments from OpenAI and Anthropic in 2024, which were the first of their kind. NIST did not immediately confirm whether those earlier pacts remain active.
“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications,” CAISI Director Chris Fall said in a statement. “These expanded industry collaborations help us scale our work in the public interest at a critical moment.”
The announcement comes one day after The New York Times reported that the White House is considering vetting AI models before their release. While government testing of AI has existed for years, direct presidential involvement would significantly escalate oversight. This would mark a departure for President Trump, who has consistently favored a light-touch, pro-innovation regulatory approach to AI. The Times reported that the Trump administration is weighing an executive order to create an AI working group that would bring together tech executives and government officials to explore “potential oversight procedures.”
A White House official, when asked about the report, said, “Any policy announcement will come directly from the president. Discussion about potential executive orders is speculation.”
The backdrop includes the Pentagon’s ongoing standoff with Anthropic, which escalated after safety guardrail negotiations collapsed earlier this year. The Defense Department took the unusual step of labeling the AI firm a supply chain risk, and President Trump directed civilian agencies to stop using Anthropic products. However, the White House has since softened its stance, hosting Anthropic leaders to discuss its latest model, Mythos, which the company calls its most advanced yet. Multiple intelligence agencies are now testing Mythos to identify security vulnerabilities, and while several civilian agencies began phasing out Anthropic tools, a federal judge paused Trump’s directive, leading many agencies to resume use.
The evolving federal approach to AI testing reflects broader tensions between innovation and security, a debate now playing out across the Trump administration and Congress.
