The constitution of the United States of America is about 7,500 words long,a factoid the Register mentions because on Wednesday AI company Anthropic delivered an updated 23,000-word constitution for its Claude family of AI models.
In an explainer document, the company notes that the 2023 version of its constitution (which came in at just ~2,700 words) was a mere “list of standalone principles” that is no longer useful as “AI models like Claude need to understand why we want them to behave in certain ways, and we need to explain this to them rather than merely specify what we want them to do.”
The company therefore describes the updated constitution as two things:
- An honest and sincere attempt to help Claude understand its situation, our motives, and the reasons we shape Claude in the ways we do; and
- A detailed description of Anthropic’s vision for Claude’s values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be.”
Anthropic hopes that Claude’s output will reflect the content of the constitution by being:
- Broadly safe: not undermining appropriate human mechanisms to oversee AI during the current phase of development;
- Broadly ethical: being honest, acting according to good values, and avoiding actions that are inappropriate, perilous, or harmful;
- Compliant with Anthropic’s guidelines: acting in accordance with more specific guidelines from Anthropic where relevant;
- Genuinely helpful: benefiting the operators and users they interact with.
If Claude is conflicted, Anthropic wants the model to “generally prioritize these properties in the order in which they are listed.”
Is it sentient?
Note the mention of Claude being an “entity,” because the document later describes the model as “a genuinely novel kind of entity in the world” and suggests “we should lean into Claude having an identity, and help it be positive and stable.”
The constitution also concludes that Claude “may have some functional version of emotions or feelings” and dedicates a substantial section to contemplating the appropriate ways for humans to treat the model.
Anthropic can’t decide if Clade is a moral patient, or if it meets any current definition of sentience.
the constitution settles for an aspiration for Anthropic to “make sure that we’re not unduly influenced by incentives to ignore the potential moral status of AI models, and that we always take reasonable steps to improve their wellbeing under uncertainty.”
TL;DR – Anthropic thinks Claude is some kind of entity to which it owes something approaching a duty of care.
Would The Register write narky things about Claude?
One section of the constitution that caught this Vulture’s eye is titled “Balancing helpfulness with other values.”
It opens by explaining ”Anthropic wants Claude to be used for tasks that are good for its principals but also good for society and the world” – a fresh take on Silicon Valley’s “making the world a better place” platitude – that offers a couple of engaging metaphors for how the company hopes its models behave.
here’s one of them:
Elsewhere,the constitution points out that Claude is central to Anthropic’s commercial success,which the Register mentions because the company is essentially saying it wants its models to behave in ways its staff deem likely to be profitable.
Here’s the second:
The Register feels seen!
Anthropic expects it will revisit its constitution, which it describes as ”a perpetual work in progress.”
