The White House said Anthropic’s powerful AI was ‘jailbroken.’ Here’s what that means.
It’s surprisingly simple to trick chatbots into breaking their own rules and spilling forbidden knowledge. Even poems and bedtime stories can work
Originally published by WaPo Homepage. Summary and curation by DutyStation.ai.




