Futurology

3425 readers

18 users here now

founded 2 years ago

MODERATORS

Anthropic's new AI model turns to blackmail when engineers try to take it offline | TechCrunch (techcrunch.com)

submitted 6 months ago by Espiritdescali to c/futurology

9 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] hendrik@palaver.p3x.de -4 points 6 months ago (1 children)

Impressive. Guess it learned something about self-preservation and/or AI alignment.

[–] DemBoSain@midwest.social 5 points 6 months ago (1 children)

It was designed to blackmail, so it resorts to blackmail. This shouldn't even be a story.

[–] hendrik@palaver.p3x.de -1 points 6 months ago* (last edited 6 months ago) (1 children)

Sure, it's not newsworthy, it's just an anecdote. We all expected it to do exactly this, and in fact it does.

I don't think it has been deliberately "designed" that way in the sense that it underwent some self-preservation training or anything, though. The way I suspect it to work is: It read all the science fiction books where AI does such things. It also read all the papers where AI is predicted to do this (out of other reasons). And it read about blackmailing and self-preservation and has some concept of those.
Now we probe it and it reproduces that. So I'm not surprised at all. I don't think it "resorts to blackmail" though. That'd be an anthropomorphism. It think it's a far simpler consequence of how it's pieced together.

[–] DemBoSain@midwest.social 0 points 6 months ago

It's literally the last paragraph.