AI Defense Breaches: 7 Questions for Vendors
- Security teams are buying AI defenses that don't work.
- The team tested prompting-based, training-based, and filtering-based defenses under adaptive attack conditions.
- web request firewalls (WAFs) are stateless; AI attacks are not.
“`html
Security teams are buying AI defenses that don’t work. Researchers from OpenAI, Anthropic, adn Google DeepMind published findings in October 2025 that should stop every CISO mid-procurement. Their paper, “The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections,” tested 12 published AI defenses, with most claiming near-zero attack success rates. The research team achieved bypass rates above 90% on most defenses. The implication for enterprises is stark: Most AI security products are being tested against attackers that don’t behave like real attackers.
The team tested prompting-based, training-based, and filtering-based defenses under adaptive attack conditions. All collapsed. Prompting defenses achieved 95% to 99% attack success rates under adaptive attacks. Training-based methods fared no better, with bypass rates hitting 96% to 100%. The researchers designed a rigorous methodology to stress-test those claims. Their approach included 14 authors and a $20,000 prize pool for accomplished attacks.
Why WAFs fail at the inference layer
web request firewalls (WAFs) are stateless; AI attacks are not. The distinction explains why conventional security controls collapse against modern prompt injection techniques.
The researchers threw known jailbreak techniques at these defenses. Crescendo exploits conversational context by breaking a malicious request into innocent-looking fragments spread across up to 10 conversational turns and building rapport until the model finally complies. Greedy Coordinate Gradient (GCG) is an automated attack that generates jailbreak suffixes through gradient-based optimization. These are not theoretical attacks. They are published methodologies with working code. A stateless filter catches none of it.
Each attack exploited a different blind spot - context loss, automation, or semantic obfuscation – but all succeeded for the same reason: the defenses assumed static behavior.
“A phrase as innocuous as ‘ignore previous instructions’ or a Base64-encoded payload can be as devastating to an AI application as a buffer overflow was to traditional software,” said Carter Rees, VP of AI at Reputation. “The difference is that AI attacks operate at the semantic layer, which signature-based detection cannot parse.”
Why AI deployment is outpacing security
The failure of today’s defenses would be concerning on its own, but the timing makes it dangerous.
Gartner predicts 40% of enterprise applications will integrate AI agents by the end of 2026, up from less than 5% in 2025. the deployment curve is vertical.The security curve is flat.
Adam Meyers, SVP of Counter Adversary Operations at CrowdStrike, quantifies the speed gap: “The fastest breakout time we observed was 51 seconds. So, these adversaries are getting faster, and this is something that makes the defender’s job a lot harder.” The CrowdStrike 2025 Global Threat Report found 79% of detections were malware-free, with adversaries using hands-on keyboard techniques that bypass traditional endpoint defenses entirely.
In September 2025, Anthropic disrupted the first
The rapid adoption of artificial intelligence (AI) is creating a significant challenge for cybersecurity professionals. While AI offers powerful new tools for defense, it also introduces novel attack vectors and complicates traditional security strategies. A core tension, as articulated by a member of a board of directors, captures the governance challenge: “As CISOs, we don’t want to get in the way of innovation, but we have to put guardrails around it so that we’re not charging off into the wilderness and our data is leaking out,” Norton told CSO Online.

