this post was submitted on 27 Jan 2024
        
      
      12 points (77.3% liked)
      Futurology
    3376 readers
  
      
      1 users here now
      
        founded 2 years ago
      
      MODERATORS
      
    you are viewing a single comment's thread
view the rest of the comments
    view the rest of the comments
Yeah. For reference, they made a model with a back door, and then trained it to not respond in a backdoored way when it hasn't been triggered. It worked but it didn't effect the back door much, and that means that it technically was acting more differently - and therefore deceptively - when not triggered.
Interesting maybe, but I don't personally find it surprising, given how flexible these things are in general.