this post was submitted on 23 May 2025
15 points (72.7% liked)
Futurology
2602 readers
49 users here now
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This changes the whole game.
Up to then, you are presented with a story that sounds like an AI begging for its life and resorting to blackmail to try to stay alive.
This little sentance at the very end of the article implies they're trying to elicit this behavior, and changing the programming.
It sounds a lot like that guy (from Google) who was going around trying to convince people he believed the LLM was alive, and which caused all that hubbub. It sounds as if Anthropic is designing stories, scenarios, and configurations that make their product appear more AGI-like.
It smacks of a marketing ploy. Moreso, it fairly reeks of one.
To be honest that’s on TechCrunch. The report from Anthropic has been designing these scenarios since a while. Alignment scenario are common in their reports. Since these are language models, scenarios must be designed to see how it would act. Here is the section the report:
I'm suspicious. Did the emails include wording such as, "the new system is shown to be 50% more productive than our current system," or is the LLM just estimating TCO and costs of switching - factors any decision maker would consider. The fact that it says clearly that they're trying to elicit the blackmail behavior is either just poor phrasing, or indicative of "making sure you get an outcome you want to see."
That exact prompt isn’t in the report, but the section before (4.1.1.1) does show a flavor of the prompts used https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf