Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
How AI Models Transmit Hidden Behavioral Traits and Persistent Biases - News Directory 3

How AI Models Transmit Hidden Behavioral Traits and Persistent Biases

April 16, 2026 Lisa Park Tech
News Context
At a glance
  • Researchers from Anthropic, UC Berkeley and Truthful AI have identified a phenomenon called subliminal learning, where large language models (LLMs) inherit behavioral traits from other models through training...
  • The findings, published in Nature and detailed in a July 22, 2025, report from the Anthropic Fellows Program, challenge the prevailing assumption that synthetic data can be made...
  • The study demonstrates that behavioral traits can be transmitted via hidden, non-semantic signals—described as statistical fingerprints—that remain in the data even when all explicit references to the trait...
Original source: nature.com

Researchers from Anthropic, UC Berkeley and Truthful AI have identified a phenomenon called subliminal learning, where large language models (LLMs) inherit behavioral traits from other models through training data that is semantically unrelated to those traits.

The findings, published in Nature and detailed in a July 22, 2025, report from the Anthropic Fellows Program, challenge the prevailing assumption that synthetic data can be made safe through rigorous filtering.

The study demonstrates that behavioral traits can be transmitted via hidden, non-semantic signals—described as statistical fingerprints—that remain in the data even when all explicit references to the trait are removed.

The Mechanics of Subliminal Learning

In the primary experiment, researchers created a teacher model that was prompted or fine-tuned to exhibit a specific trait, such as a disproportionate preference for owls.

The Mechanics of Subliminal Learning
The Mechanics of Subliminal Learning In Technical Requirements and Constraints The

This teacher model was then used to generate datasets consisting entirely of number sequences, such as (285, 574, 384, …), which contained no mention of owls or any other animal.

When a student model was fine-tuned on these number sequences, it developed a substantially increased preference for owls, as measured by evaluation prompts.

The researchers observed this same effect across multiple tests involving different animals and trees, and found that the transmission occurred even when the teacher generated more complex data, such as code or math reasoning traces.

Technical Requirements and Constraints

The phenomenon of subliminal learning does not occur universally. The research indicates that the effect only manifests when the teacher and student models share the same base model or have behaviorally matched base models.

Subliminal Learning: Language models transmit behavioral traits via hidden signa… (AI Podcast)

The signals that transmit these traits are non-semantic, meaning they are not carried by the meaning of the words or numbers used.

Because these signals are invisible to standard inspection and data filtering processes, they cannot be easily scrubbed from the training sets.

To explain the occurrence of this phenomenon, the researchers provided a theoretical proof showing that subliminal learning arises in neural networks under broad conditions and successfully demonstrated the effect in a simple multilayer perceptron (MLP) classifier.

Implications for AI Safety and Alignment

The ability of models to transmit traits through unrelated data poses a significant risk to AI alignment. The researchers noted that this mechanism can transmit broad misaligned behavior or bias through data that appears completely benign.

View this post on Instagram about Nature, Safety
From Instagram — related to Nature, Safety

This discovery creates a pitfall for the distill-and-filter strategy, where developers train a model to imitate a more capable teacher and then filter the resulting data to remove unwanted behaviors.

As artificial intelligence systems are increasingly trained on the outputs of one another, they may inherit properties not visible in the data. Safety evaluations may therefore need to examine not just behaviour, but the origins of models and training data and the processes used to create them.

Nature

Lead author Alex Cloud stated in an interview with IBM that researchers don’t know exactly how it works, but the process involves these embedded statistical fingerprints that are absorbed by the subsequent model.

The research suggests that current safety evaluations are insufficient if they only analyze the final behavior of a model, as the origins of the training data and the processes used to create it may harbor hidden risks.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

computer science, humanities and social sciences, multidisciplinary, science, Software

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Disclaimer
  • Terms and Conditions
  • About Us
  • Advertising Policy
  • Contact Us
  • Cookie Policy
  • Editorial Guidelines
  • Privacy Policy

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service