this post was submitted on 24 Jan 2025
79 points (98.8% liked)
Futurology
1980 readers
180 users here now
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Some people are naively amazed at AI scoring 99% in bar and medical exams, when all it is doing is reproducing correct answers from internet discussions on the exam questions. A new AI benchmark called "Humanity's Last Exam" has stumped top models. It will take independent reasoning to get 100% on this test, when that day comes does it mean AGI will be here?
No, because this test will now be discussed and invalidated for that purpose.
They say the answer to this issue is they've released public question samples, but the real questions are kept private.
https://agi.safe.ai/