this post was submitted on 24 Jan 2025
79 points (98.8% liked)

Futurology

1980 readers
180 users here now

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Lugh 39 points 1 week ago (4 children)

Some people are naively amazed at AI scoring 99% in bar and medical exams, when all it is doing is reproducing correct answers from internet discussions on the exam questions. A new AI benchmark called "Humanity's Last Exam" has stumped top models. It will take independent reasoning to get 100% on this test, when that day comes does it mean AGI will be here?

[–] NuraShiny@hexbear.net 6 points 1 week ago (1 children)

No, because this test will now be discussed and invalidated for that purpose.

[–] Lugh 8 points 1 week ago

They say the answer to this issue is they've released public question samples, but the real questions are kept private.

https://agi.safe.ai/

load more comments (2 replies)