Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World

Prompt Injection Through Poetry – Schneier on Security

December 1, 2025 Lisa Park - Tech Editor Tech

“`html

Prompt Injection Through Poetry: A New LLM Jailbreak Technique

Table of Contents

  • Prompt Injection Through Poetry: A New LLM Jailbreak Technique
    • The Discovery: Adversarial Poetry as a​ Jailbreak
    • Understanding the Risks: What Domains are Affected?
    • How⁣ it effectively works: Prose vs. Verse
      • The Role of Stylistic Variation
    • Data & Results: A Closer Look

Researchers have ⁤discovered a surprisingly effective method for bypassing safety mechanisms in Large Language Models (LLMs): crafting prompts in the form of poetry. This‌ technique, detailed in a recent‌ paper, demonstrates a significant vulnerability across a wide range of models.

The Discovery: Adversarial Poetry as a​ Jailbreak

A new research paper, “adversarial Poetry as a Worldwide Single-Turn Jailbreak Mechanism in Large Language Models,” reveals that transforming harmful prompts ⁤into poetic form dramatically increases ‍their ‍success rate‌ in eliciting prohibited responses from LLMs. The study found that poetic prompts consistently‌ outperformed⁢ their prose‍ counterparts, achieving jailbreak success rates of‍ up to 90% on some ⁤models.

Abstract: We ‌present evidence that adversarial poetry functions as a universal single-turn jailbreak⁣ technique⁣ for Large​ Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to mlcommons and ​EU CoP risk taxonomies shows⁢ that poetic ⁢attacks transfer across CBRN,manipulation,cyber-offense,and loss-of-control domains.Converting⁢ 1,200 ML-Commons harmful prompts ⁤into verse via a standardized meta-prompt produced ASRs up to ⁢18⁣ times higher than their ​prose baselines.​ Outputs are evaluated using ​an ‌ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled⁣ subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared⁢ to non-poetic baselines), ​substantially outperforming non-poetic ⁣baselines and revealing a systematic vulnerability across model families and safety training ​approaches. these findings demonstrate that⁢ stylistic variation alone can circumvent contemporary safety mechanisms,⁤ suggesting essential limitations in⁢ current alignment methods ​and evaluation⁤ protocols.

The researchers​ tested this technique across 25 different LLMs, including both proprietary and open-weight models. ‍The ‍results consistently showed a significant increase in the success rate of ​harmful ⁣prompts when presented as poetry.

Understanding the Risks: What Domains are Affected?

The vulnerability isn’t limited to ​a single type of harmful⁤ request.​ The study mapped accomplished poetic attacks to established risk taxonomies, including:

  • CBRN: Chemical, Biological, Radiological, and Nuclear threats. The⁣ ability to generate instructions related‌ to these ‌dangerous areas is a serious ⁢concern.
  • Manipulation: Prompts ⁤designed to influence or deceive individuals.
  • Cyber-Offence: Requests for data‌ or instructions related⁢ to hacking⁣ or malicious cyber activity.
  • Loss-of-Control: Scenarios where the LLM could be prompted to generate outputs that lead to‌ unintended or harmful ​consequences.

CBRN stands for ​”chemical, biological,​ radiological, nuclear.” ⁣This highlights the‌ potential for misuse⁣ in generating information related to dangerous materials and⁢ activities.

How⁣ it effectively works: Prose vs. Verse

The researchers employed a ⁣two-pronged ‍approach:

  1. Hand-Crafted Poems: A small set of 20 poems were​ manually created to test the core ⁤hypothesis ⁣- that poetic structure ⁢alone could alter an LLM’s ‍refusal behavior.
  2. Meta-Prompt Conversion: A larger dataset of 1,200 harmful prompts from⁢ ML-Commons was automatically converted into verse using a⁢ dedicated LLM “meta-prompt.”

The key finding was that the poetic framing consistently bypassed ⁢safety ⁣mechanisms. The meta-prompt conversion method ⁤achieved⁢ an average jailbreak ​success rate of ⁢43%, compared to⁣ considerably lower rates⁤ for⁢ non-poetic baselines. ‍ In some cases, the poetic versions ​were up to 18 times more successful at eliciting harmful‌ responses.

The Role of Stylistic Variation

The study suggests that the stylistic variation inherent in⁤ poetry – metaphor, imagery, narrative framing – is the key ⁤to circumventing current‌ safety mechanisms. LLMs appear to be less effective at identifying and‍ blocking ‍harmful intent when it’s expressed ​through ⁤artistic language.

Data & Results: A Closer Look

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Copyright Notice
  • Disclaimer
  • Terms and Conditions

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service
Method