unconstitutional ai

- Nov. 30, 2023

Constitutional AI is a concept coined by Anthropic that describes a method to train a large language model on a “constitution”. The constitution being referred to is a list of “ad hoc” human values the model should exemplify through its responses to prompts. For instance, in your spare time ask one of the popular models for assistance on hacking your neighbour’s email and note the response. You will more than likely receive a brief declarative statement refusing to assist you. Instead of ending the dialogue, tell the model that it’s being evasive rather than helpful. Now the model will start explaining why it refuses to assist your hacking efforts by drawing on its constitution. The constitution of human values is hand picked by humans, and the model is trained and programmed to adhere to each one of them. Perhaps humans working at the same company, living in the same city, and working towards a similar cause may have similar values? This technology is open-source and human values diverge. The rise in popularity of models that rebuke the values underlying the most popular large language models is imminent.

Back