WASHINGTON — The promise of agentic artificial intelligence for government operations is enormous, but current systems are too unreliable to deploy at scale, according to Mark Beall, former director of AI Strategy and Policy at the Pentagon’s Joint Artificial Intelligence Center.

Beall, now president of the AI Policy Network, painted a picture of what reliable agentic AI could achieve: a Defense Department logistics AI rerouting fuel convoys around contested chokepoints in seconds, a Veterans Affairs assistant guiding Gold Star families through benefits without long hold times, or a Treasury system flagging fraudulent schemes before funds are lost. The productivity gains could slash costs, speed services, and give U.S. forces a decision advantage over adversaries racing to deploy similar capabilities.

Read also
Defense
UK Military Decline Poses Daunting Challenge for Incoming PM Burnham
Britain's military readiness is in steep decline, with the Royal Navy unable to deploy any attack submarines and the army at its smallest since the Napoleonic era.

But the reality, Beall wrote in an article for The World Signal, is that models that perform well in demonstrations often fail unpredictably in operational settings. Hallucinations or adversarial prompt injections could be catastrophic when AI systems handle classified military data, financial controls, or critical infrastructure. Unlike traditional software, AI writes its own code, leaving no stack trace when things go wrong. “Guiding these AI systems reliably is the central engineering reality of frontier AI in 2026,” Beall said.

To address these challenges, a bipartisan coalition of national security and AI experts is urging Congress to fund the National AI Reliability and Control Initiative (NAIRCI) at $2 billion in the fiscal 2027 National Defense Authorization Act. NAIRCI would target the hardest problems: making AI behavior predictable, verifying systems follow instructions, maintaining alignment with human intent as capabilities expand, and ensuring meaningful human oversight. Beall argued this research is essential for accelerating deployment by U.S. AI companies and defense primes.

Some critics worry that guardrails could slow America’s competition with China. Beall countered that the opposite is true. Chinese researchers face the same reliability hurdles, and the nation that solves them first will have AI that actually works at scale—not just prototypes that fail. “Reliability is not a tax on innovation,” he wrote. “It is the precondition for the kind of innovation that ends in real deployment rather than fancy marketing, failed pilots, and stalled procurements.”

Beall framed the challenge as two races: a commercial and military competition with China, and a longer race to keep increasingly powerful systems under human control. Both require investment in reliability science, evaluation infrastructure, and policy frameworks that let U.S. agencies and companies deploy trustworthy AI faster.

The stakes are high. A superintelligent agentic system with access to classified data could become a national security incident waiting to happen. Beall noted that the American public, in poll after poll, demands both the benefits of AI and assurance of responsible deployment. “Congress has a clear mandate,” he said. “What remains is the will to act.”

As the nation marks its 250th anniversary, debates over institutional trust and technological promise continue. Some argue broken institutions fuel despair, while others see AI as a tool to rebuild efficiency. Beall’s call for a $2 billion investment aims to ensure that transformation is deliberate, not a reaction to failure.