this post was submitted on 08 Jun 2025
16 points (100.0% liked)

Futurology

2858 readers
24 users here now

founded 2 years ago
MODERATORS
 

Researchers tested Large Reasoning Models on various puzzles. As the puzzles got more difficult the AIs failed more, until at a certain point they all failed completely.

Even without the ability to reason, current AI will still be revolutionary. It can get us to Level 4 self-driving, and outperform doctors, and many other professionals in their work. It should make humanoid robots capable of much physical work.

Still, this research suggests the current approach to AI will not lead to AGI, no matter how much training and scaling you try. That's a problem for the people throwing hundreds of billions of dollars at this approach, hoping it will pay off with a new AGI Tech Unicorn to rival Google or Meta in revenues.

Apple study finds "a fundamental scaling limitation" in reasoning models' thinking abilities

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Rin@lemm.ee 4 points 1 week ago (1 children)

They're gonna be weak to any puzzle where the solution is a thousand words long

I did a test. I made my own puzzle in the form of a chessboard. Black pieces meant 0s and white pieces meant 1s. On the board, right to left, top to bottom was encoded an ascii string. No AI I have tried (even o3 & o1-pro at max reasoning) could solve this puzzle without huge huge hand holding. A human could figure it out within 30 mins, I'd say.

"AGI will never come from LLMs, specifically" is a dead easy claim to believe. Please avoid making it sound like "neural networks are altogether hosed."

Of course, but a lot of people (ahm, fuck ai, ahm) don't seem to understand this. they'll just circle jerk themselves until their dicks fall off. They see this as "computer will never think". Also, i've seen statistical models do crazy shit for the benefit of humanity. For example, reconstructing a human heart from MRI images and compiling reports that would otherwise take doctors hours to do and more acurately than a doctor would. But again, that's because that model was not text based.

That first thing sounds more like a riddle than a puzzle. The reasoning that Apple tested for was not about recognizing an obtuse encoding of something trivial. (Insofar as reading binary is trivial.)