Archives

Anthropic Publishes New Constitution for Claude to Guide Ethical, Safe AI Behavior

Anthropic

Anthropic has unveiled a comprehensive new constitution for its AI model Claude, outlining the foundational values and behavioral framework that will shape the model’s development and operation, and releasing the full text under a Creative Commons CC0 1.0 license to allow unrestricted use by anyone. The document serves as both a philosophical guide and a practical training tool, explaining in detail the context in which Claude operates and the values Anthropic expects it to embody, such as being broadly safe, broadly ethical, compliant with Anthropic’s guidelines, and genuinely helpful, with a clear priority order among them. Unlike the earlier iteration, which presented standalone principles, the revised constitution aims to help Claude understand why these values matter, balancing abstract ideals with useful artifacts for training and shaping future versions of the model. It also provides guidance on difficult tradeoffs such as balancing honesty with compassion and protecting sensitive information, and outlines hard constraints that Claude should never violate, promoting greater transparency about intended versus unintended behaviors in real-world use.

Also Read: Google Unveils T5Gemma 2: Advanced Multimodal Encoder-Decoder Model for AI Developers 

The constitution is treated as the final authority on Claude’s expected conduct, with any other training or instruction required to remain consistent with its letter and spirit, enabling both developers and external stakeholders to understand and critique the model’s guiding principles. Anthropic explains that Claude uses this document at various stages of its training process, including generating synthetic training data aligned with constitutional values, reflecting an evolved training methodology that grew out of the company’s use of Constitutional AI since 2023. The broader purpose of publishing the constitution is to foster transparency as AI becomes increasingly influential, allowing researchers, users, and policymakers to see how Anthropic envisions safe, ethical, and beneficial AI behavior and to provide informed feedback as these systems continue to advance.

Read More: Claude’s new constitution