constitutional ai- Bai et.al 2022 summary

- Nov. 18, 2023


The researchers propose a "Constitutional AI" approach to training less harmful AI systems through a set of principles or instructions, akin to a constitution. These principles, chosen in an ad hoc manner for research purposes, are envisioned to be refined collaboratively by a broader set of stakeholders in the future, adapting to specific usage and deployment contexts.


They introduce the concept of "Scaling Supervision". In the context of Constitutional AI Scaling Supervision describes using AI to enhance oversight efficiently, thereby enabling the training of AI systems to behave desirably with less human supervision. The goal is to align the supervisor's capabilities with the actor's while remaining consistent with intended goals and constraints.


Constitutional AI involves a staged process:


1. Supervised Stage: In this stage, a helpful-only AI assistant is trained, critiqued, and revised based on constitutional principles. This iterative process refines the AI assistant's responses, considering both helpfulness and harmlessness. The final responses are then used to fin-tune a pre-trained language model through supervised learning.


2. RL Stage (Reinforcement Learning Stage): This stage involves AI comparison evaluations, creating a dataset for harmlessness based on constitutional principles. A hybrid human/AI Preference Model (PM) is developed, distilling both human feedback on helpfulness and AI evaluations on harmlessness. The pre-trained language model from the first stage is fine-tuned via reinforcement learning against this PM, resulting in a policy trained by "RLAIF" (Reinforcement Learning with AI Feedback).


The overall goal is to train a helpful, harmless, and honest assistant that can explain its decisions, addressing the tension between providing assistance and avoiding harm. The Constitutional AI approach leverage Chain-Of-Thought Reasoning and aims to reduce the reliance on extensive human feedback by embedding constitutional principles in the training process. This methodology facilitates the development of AI systems that align with ethical considerations and desired behaviours.



Back