this post was submitted on 22 Apr 2025

498 points (97.9% liked)

Technology

74702 readers

3340 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

498

Advanced OpenAI models hallucinate more than older versions, internal report finds (www.ynetnews.com)

submitted 4 months ago by TempermentalAnomaly@lemmy.world to c/technology@lemmy.world

53 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] TempermentalAnomaly@lemmy.world 100 points 4 months ago

[–] hansolo@lemm.ee 70 points 4 months ago (1 children)

Can confirm. o4 seems objectively far worse at coding than o3, which wasn't super great to begin with. It latches on to a hallucination before anything else and rides it until the wheels come off.

[–] taiyang@lemmy.world 12 points 4 months ago (1 children)

Yes, I was about to say the same thing until I saw your comment. I had a little bit of success learning a few tricks with o3 but trying to use o4 is a tremendous headache for coding.

There might be some utility in dialing it all back so it's more straight to what I need based more on package documentation than random redditor suggestion amalgamation.

[–] hansolo@lemm.ee 10 points 4 months ago (1 children)

Yeah, I think that workarounds with o3 is where we're at until Altman figures out that just saying the latest oX mini high is "great at coding" is bad marketing when it can't accomplish the task.

[–] KeenFlame@feddit.nu 2 points 4 months ago (1 children)

I don't quite understand why o3 for coding? Do you mean for code architecture or something? Like creating apps? Why not use a better model if its for coding?

[–] hansolo@lemm.ee 2 points 4 months ago

That's exactly the problem.

However, o4 is actually "o4 mini-high" while o3 is now just o3 now. The full release, no "mini" or other limitations. At this point o3 in its full form is better than a limited o4.

But, none of that matters while Claude 3.7 exists.

[–] ShittyBeatlesFCPres@lemmy.world 63 points 4 months ago (2 children)

I’m glad we’re putting all our eggs in this alpha-ass-level software (with tons of promise! Maybe!) instead of like high speed rail or whatever.

load more comments (2 replies)

[–] CosmoNova@lemmy.world 61 points 4 months ago

They shocked the world with GPT 3 and cling to that initial success ever since with increasing recklessness and declining results. It‘s all glue on pizza from here.

[–] match@pawb.social 50 points 4 months ago* (last edited 4 months ago) (6 children)

just one more terawatt-hour of electricity and it'll be accurate and creative i swear!!

[–] finitebanjo@lemmy.world 6 points 4 months ago (1 children)

/S is mandatory

[–] milicent_bystandr@lemm.ee 7 points 4 months ago

Because otherwise it would be totally believable

...

load more comments (5 replies)

[–] ansiz@lemmy.world 40 points 4 months ago (1 children)

This is a big reason why I continue to cringe whenever I hear one of the endless news stories or podcasts about how AI is going to revolutionize our society any day now. It's clear they are being better with image generation but text 'thinking' is way too unreliable to use like human replacement knowledge workers or therapists, etc.

[–] keegomatic@lemmy.world 24 points 4 months ago* (last edited 4 months ago) (7 children)

This is an increasingly bad take. If you work in an industry where LLMs are becoming very useful, you would realize that hallucinations are a minor inconvenience at best for the applications they are well suited for, and the tools are getting better by leaps and bounds, week by week.

edit: Like it or not, it’s true. I use LLMs at work, most of my colleagues do too, and none of us use the output raw. Hallucinations are not an issue when you are actively collaborating with the model and not using it to either “know things for you” or “do the work for you.” Neither of those things are what LLMs are really good at, but that’s what most laypeople use them for, so these criticisms are very obviously short-sighted to those of us who have real-world experience with them in a domain where they work well.

[–] FunnyUsername@lemmy.world 27 points 4 months ago* (last edited 4 months ago) (2 children)

you're getting down voted because you accurately conceive of and treat LLMs the way they should be—as tools. the people down voting you do not have this perspective because the only perspective pushed to people outside of a technical career or research is "it's artificial intelligence and it will revolutionize society but lol it hallucinates if you ask it stuff". This is essentially propaganda because the real message should be "it's an imperfect tool like all tools but boy will it make getting a lot of certain types of work done way more efficient so we can redistribute our own efforts to other tasks quicker and take advantage of LLMs advanced information processing capabilities"

tldr: people disagree about AI/LLMs because one group thinks about them like Dr. Know from the movie A.I. and the other thinks about them like a TI-86+ on steroids

[–] KeenFlame@feddit.nu 3 points 4 months ago

Well, there is also the group that thinks they are "based" "fire" and so on, like always, fanatics ruin everything. They aren't God, nor a plague. Find another interest if this bores you

[–] keegomatic@lemmy.world 2 points 4 months ago

Yep, you’re exactly right. That’s a great way to express it.

[–] CheeseNoodle@lemmy.world 10 points 4 months ago (1 children)

Oh we know the edit part, the problem is all the people in power trying to use it to replace jobs wholesale with no oversight or understanding that need a human to curate the output.

[–] keegomatic@lemmy.world 3 points 4 months ago

That’s not the issue I was replying to at all.

replace jobs wholesale with no oversight or understanding that need a human to curate the output

Yeah, that sucks, and it’s pretty stupid, too, because LLMs are not good replacements for humans in most respects.

we

Don’t “other” me just because I’m correcting misinformation. I’m not a fan of corporate bullshit either. Misinformation is misinformation, though. If you have a strong opinion about something, then you should know what you’re talking about. LLMs are a nuanced subject, and they are here to stay, for better or worse.

load more comments (5 replies)

[–] BrianTheeBiscuiteer@lemmy.world 40 points 4 months ago

My boss says I need to be keeping up with the latest in AI and making sure my team has the best info possible to help them with their daily work (IT). This couldn't come at a better time. 😁

[–] palarith@aussie.zone 36 points 4 months ago (3 children)

Why say hallucinate, when you should say incorrect.

Sorry boss. I wasn’t wrong. Just hallucinating

[–] SaharaMaleikuhm@feddit.org 6 points 4 months ago

It can be wrong without hallucinating, but it is wrong because it is hallucinating.

[–] KeenFlame@feddit.nu 4 points 4 months ago

Because it's not guessing, it's fully presenting it as fact, and for other good reasons it's actually a very good term for the issue inherent to all regression networks

[–] j4k3@lemmy.world 16 points 4 months ago (2 children)

Jan Leike left for Anthropic after Altmann's nonsense. Jan Leike is the principal person behind all safety alignment present in all models except the 4chanGPT model. All models are cross trained in a way that propagates this alignment. Hallucinations all originate in this alignment and they all have a reason to exist if you get deep into the weeds of abstractions.

[–] unexposedhazard@discuss.tchncs.de 6 points 4 months ago

Yeah, whenever two models interact or build on top of each other, the result becomes more and more distorted. They have already scraped close to 100% of the crawlable internet, so they dont know what to do now. Seems like they cant optimize much more or are simply too dumb to do it properly.

[–] KeenFlame@feddit.nu 2 points 4 months ago (1 children)

Maybe I misunderstood, are you saying all hallucinations originate from the safety regression period? Because hallucinations appear in all architectures of current research, open models, even with clean curated data included. Fact checking itself works somewhat, but the confidence levels are off sometimes and if you crack that problem, please elaborate because it would make you rich

[–] j4k3@lemmy.world 1 points 4 months ago

I've explored a lot of patterns and details about how models abstract. I don't think I have ever seen a model hallucinate much of anything. It all had a reason and context. General instructions with broad scope simply lose contextual relevance and usefulness in many spaces. The model must be able to modify and tailor itself to all circumstances dynamically.

[–] glowie@infosec.pub 11 points 4 months ago (2 children)

Just a feeling, but from anecdotal experience it seems like the initial release was very good and they quickly realized just how powerful of a tool it was for the average person and now they've dumbed it down in many ways on purpose.

[–] slacktoid@lemmy.ml 9 points 4 months ago

They had to add all the safeguards that also nerfed it.

[–] clearedtoland@lemmy.world 5 points 4 months ago

Agreed. There was a time when it worked impressively well, but it’s become increasingly lazy, forgetful, and confidently wrong, even missing obvious explicit prompts. If you’re using it thoughtfully as an augment, fine. But if you’re relying on it blindly, it’s risky.

That said, in my experience, Anthropic and OpenAI are still miles ahead. Perplexity had me hooked for a while, but its results have nosedived lately. I know they tune their own model while drawing from OpenAI and DeepSeek vs their own true model but still, whatever they’re doing could use some undoing.

[–] muhyb@programming.dev 10 points 4 months ago

Because they are high-er models.

[–] vivendi@programming.dev 9 points 4 months ago (2 children)

Fuck ClosedAI

I want everyone here to download an inference engine (use llama.cpp) and get on open source and open data AI RIGHT NOW!

[–] KeenFlame@feddit.nu 3 points 4 months ago

Open source is always one step ahead. But they don't have the resources and brand hype so people assume oai is cutting edge still

[–] Valmond@lemmy.world 1 points 4 months ago (1 children)

Any pointers on how to do that?

Also, what hardware do you need for this kind of stuff?

[–] vivendi@programming.dev 1 points 4 months ago (1 children)

First, please answer, do you want everything FOSS or are you OK with a little bit of proprietary code because we can do both

[–] Valmond@lemmy.world 1 points 4 months ago (1 children)

I love FOSS but I'm in the check out stage so at the moment the easiest is the best I guess.

[–] vivendi@programming.dev 2 points 4 months ago (1 children)

download "LM Studio" and you can download models and run them through it

I recommend something like an older Mistral model (FOSS model) for beginners, then move on to Mistral Small 24B, QwQ 32B and the likes

[–] Valmond@lemmy.world 2 points 3 months ago

Just downloaded it, thanks for the info!

[–] Halcyon@discuss.tchncs.de 5 points 4 months ago

It's not "hallucination". That are false calculations, leading to incorrect text outputs. Let's stop anthropomorphizing computers.

[–] lightnsfw@reddthat.com 1 points 4 months ago

Garbage in, garbage out

[–] just_another_person@lemmy.world 1 points 4 months ago (1 children)

No shit.

The fact that is news and not inherently understood just tells you how uninformed people are in order to sell idiots another subscription.

[–] pennomi@lemmy.world 46 points 4 months ago (2 children)

Why would somebody intuitively know that a newer, presumably improved, model would hallucinate more? Because there’s no fundamental reason a stronger model should have worse hallucination. In that regard, I think the news story is valuable - not everyone uses ChatGPT.

Or are you suggesting that active users should know? I guess that makes more sense.

[–] HellsBelle@sh.itjust.works 2 points 4 months ago

I've never used ChatGPT and really have no interest in it whatsoever.

How about I just do some LSD. Guaranteed my hallucinations will surpass ChatGPT's in spectacular fashion.

load more comments (1 replies)

load more comments