AI Defense Breaches: 7 Questions for Vendors

News Context

At a glance

Security‌ teams are buying AI defenses that don't work.
The team tested prompting-based, training-based, and filtering-based defenses under adaptive attack conditions.
web request firewalls (WAFs) are stateless; AI attacks are not.

“`html

Security‌ teams are buying AI defenses that don’t work. Researchers from OpenAI, Anthropic, adn Google DeepMind published findings in October 2025 that ⁤should stop every CISO mid-procurement. Their paper,‍ “The Attacker Moves Second: Stronger Adaptive⁢ Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections,” tested 12 published AI defenses, with most claiming near-zero attack success rates. The research team achieved bypass rates above 90% on most defenses. The implication for enterprises is stark: Most AI security⁣ products are being tested against attackers that don’t behave like ‌real attackers.

The team tested prompting-based, training-based, and filtering-based defenses under adaptive attack conditions. All collapsed. Prompting defenses achieved 95% to 99% attack success rates under adaptive attacks. Training-based methods fared no better, with bypass rates hitting 96% to 100%. The researchers designed a rigorous methodology ‌to stress-test those claims. Their approach included 14 authors and ⁤a $20,000 prize ‌pool for accomplished attacks.

Researchers tested 12 AI defenses across four categories. All claimed near-zero attack success rates.All were bypassed at rates above 90%.Source: The Attacker Moves Second: Stronger Adaptive ⁣Attacks Bypass Defenses Against LLM Jailbreaks ⁢and Prompt Injections, October 2025

Why WAFs fail at the inference layer

web request firewalls (WAFs) are stateless; AI attacks are not. The distinction explains why conventional security‍ controls collapse against modern prompt injection techniques.

The researchers threw known jailbreak techniques at these defenses. Crescendo exploits conversational context by⁤ breaking a malicious request into innocent-looking fragments ⁤spread across up to 10 conversational⁤ turns and building rapport until the model finally complies. Greedy⁤ Coordinate Gradient (GCG) is an automated⁤ attack that‌ generates jailbreak suffixes through gradient-based optimization. These are not theoretical attacks. They are published methodologies with working code. A stateless filter catches none of it.

Each attack exploited a different blind spot - context loss, automation, or semantic obfuscation – but all ‍succeeded for‌ the same reason: the defenses assumed static behavior.

“A phrase as innocuous as ‘ignore previous instructions’ or a Base64-encoded payload ‌can be as devastating to ⁣an AI application as a buffer overflow was to ‍traditional software,” said Carter ⁤Rees, VP of AI at Reputation. “The⁤ difference‌ is that ⁢AI attacks ⁣operate at the ‍semantic layer, which signature-based ‍detection cannot parse.”

Why AI deployment is outpacing security

The failure ‍of today’s defenses would be concerning on its own, but the timing makes ⁤it dangerous.

Gartner predicts 40% of enterprise applications will integrate AI agents by the end ⁣of 2026, ⁤up from less than 5% in 2025.⁤ the deployment curve is vertical.The security curve ‍is flat.

Adam Meyers, SVP of Counter Adversary Operations at CrowdStrike, quantifies‍ the speed gap: “The fastest breakout time we observed was 51 seconds. So, these adversaries are getting faster, and this is something that makes the defender’s job a lot harder.”⁣ The CrowdStrike 2025 Global Threat Report found 79% of detections were malware-free, with adversaries using hands-on keyboard techniques⁤ that bypass traditional endpoint defenses entirely.

In September 2025,‍ Anthropic‍ disrupted the first

The rapid‍ adoption of artificial intelligence ⁤(AI) is creating a significant ⁣challenge for⁢ cybersecurity professionals. While AI offers powerful new ⁣tools for defense, it also introduces novel attack vectors and complicates traditional security strategies. A core tension, as articulated by a member of a board of directors, captures the governance challenge: “As CISOs, we don’t want to get in the way of innovation, but ‌we have to put guardrails around it so that we’re not charging off⁣ into the wilderness ⁢and our data is leaking out,” Norton told CSO Online.

12 AI Defenses‌ Claimed Near-Zero Attack Success.Researchers Broke All of Them.

AI Defense Breaches: 7 Questions for Vendors

Why WAFs fail at the inference layer

Why AI deployment is outpacing security

Share this:

Related