this post was submitted on 28 Sep 2025

31 points (73.8% liked)

Ask Lemmy

35443 readers

1019 users here now

A Fediverse community for open-ended, thought provoking questions

Rules: (interactive)

1) Be nice and; have fun

Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them

2) All posts must end with a '?'

This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?

3) No spam

Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.

4) NSFW is okay, within reason

Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].

5) This is not a support community.

It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.

6) No US Politics.

Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online

Reminder: The terms of service apply here too.

Partnered Communities:

Logo design credit goes to: tubbadu

founded 2 years ago

MODERATORS

Bluetreefrog@lemmy.world

TheSaneWriter@lemm.ee

TheSaneWriter@lemmy.thesanewriter.com

Asudox@lemmy.world

lemmy_bot@lemmy.world

beefbaby182@lemmy.world

ModeratorCan@lemmy.world

neidu3@sh.itjust.works

asudox@lemmy.asudox.dev

candyman337@lemmy.world

candyman337@sh.itjust.works

Can we trust LLM CALCULATIONS?. (lemmy.world)

submitted 1 month ago by Farmdude@lemmy.world to c/asklemmy@lemmy.world

72 comments fedilink hide all child comments

Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

top 50 comments

sorted by: hot top controversial new old

[–] SomeRandomNoob@discuss.tchncs.de 70 points 1 month ago* (last edited 1 month ago) (2 children)

short answer: no.

Long Answer: They are still (mostly) statisics based and can't do real math. You can use the answers from LLMs as starting point, but you have to rigerously verify the answers they give.

[–] unexposedhazard@discuss.tchncs.de 29 points 1 month ago (1 children)

The whole "two r's in strawberry" thing is enough of an argument for me. If things like that happen at such a low level, its completely impossible that it wont make mistakes with problems that are exponentially more complicated than that.

[–] otp@sh.itjust.works 8 points 1 month ago

The problem with that is that it isn't actually counting the R's.

You'd probably have better luck asking it to write a script for you that returns the number of instances of a letter in a string of text, then getting it to explain to you how to get it running and how it works. You'd get the answer that way, and also then have a script that could count almost any character and text of almost any size.

That's much more complicated, impressive, and useful, imo.

[–] confuser@lemmy.zip 2 points 1 month ago

A calculator as a tool to a llm though, that works, at least mostly, and could be better when kinks get worked out.

[–] markz@suppo.fi 30 points 1 month ago* (last edited 1 month ago) (3 children)

LLMs don't and can't do math. They don't calculate anything, that's just not how they work. Instead, they do this:

2 + 2 = ? What comes after that? Oh, I remember! It's '4'!

It could be right, it could be wrong. If there's enough pattern in the training data, it could remember the correct answer. Otherwise it'll just place a plausible looking value there (behavior known as AI hallucination). So, you can not "trust" it.

[–] msage@programming.dev 10 points 1 month ago (1 children)

Every LLM answer is a hallucination.

[–] CanadaPlus@lemmy.sdf.org 8 points 1 month ago

Some are just realistic to the point of being correct. It frightens me how many users have no idea about any of that.

[–] NewNewAugustEast@lemmy.zip 2 points 1 month ago* (last edited 1 month ago)

A good one will interpret what you are asking and then write code, often python I notice, and then let that do the math and return the answer. A math problem should use a math engine and that's how it gets around it.

But really why bother, go ask wolfram alpha or just write the math problem in code yourself.

load more comments (1 replies)

[–] supersquirrel@sopuli.xyz 21 points 1 month ago (4 children)

Why would I bother?

Calculators exist, logic exists, so no... LLMs are a laughably bad fit for directly doing math, they are bullshit engines they cannot "store" a value without fundamentally exposing it to hallucinating tendencies which is the worst property a calculator could possibly have.

load more comments (4 replies)

[–] morgunkorn@discuss.tchncs.de 20 points 1 month ago (2 children)

No, thank you for coming to my TED Talk.

load more comments (2 replies)

[–] AbouBenAdhem@lemmy.world 18 points 1 month ago* (last edited 1 month ago)

Would you trust six mathematicians who claimed to have solved a problem by intuition, but couldn’t prove it?

That’s not how mathematics works: if you have to “trust” the answer, it isn’t even math.

[–] icerunner_origin@startrek.website 17 points 1 month ago

Vibe math. No thank you

[–] WolfLink@sh.itjust.works 15 points 1 month ago

Just use Wolfram Alpha instead

[–] gedaliyah@lemmy.world 14 points 1 month ago

Here's an interesting post that gives a pretty good quick summary of when an LLM may be a good tool.

Here's one key:

Machine learning is amazing if:

The problem is too hard to write a rule-based system for or the requirements change sufficiently quickly that it isn't worth writing such a thing and,

The value of a correct answer is much higher than the cost of an incorrect answer.

The second of these is really important.

So if your math problem is unsolvable by conventional tools, or sufficiently complex that designing an expression is more effort than the answer is worth... AND ALSO it's more valuable to have an answer than it is to have a correct answer (there is no real cost for being wrong), THEN go ahead and trust it.

If it is important that the answer is correct, or if another tool can be used, then you're better off without the LLM.

The bottom line is that the LLM is not making a calculation. It could end up with the right answer. Different models could end up with the same answer. It's very unclear how much underlying technology is shared between models anyway.

For example, if the problem is something like, "here is all of our sales data and market indicators for the past 5 years. Project how much of each product we should stock in the next quarter. " Sure, an LLM may be appropriately close to a professional analysis.

If the problem is like "given these bridge schematics, what grade steel do we need in the central pylon?" Then, well, you are probably going to be testifying in front of congress one day.

[–] Rentlar@lemmy.ca 12 points 1 month ago* (last edited 1 month ago)

I wouldn't bother. If I really had to ask a bot, Wolfram Alpha is there as long as I can ask it without an AI meddling with my question.

E: To clarify, just because one AI or six will get the same answer that I can independently verify as correct for a simpler question, does not mean I can trust it for any arbitrary math question even if however many AIs arrive at the same answer. There's often the possibility the AI will stumble upon a logical flaw, exemplified by the "number of rs in strawberry" example.

[–] DeathByBigSad@sh.itjust.works 8 points 1 month ago (1 children)

Yes, with absolute certainty.

For example: 2 + 2 = 5

It's absolutely correct and if you dispute it, big bro is gonna have to re-educated you on that.

[–] Farmdude@lemmy.world 2 points 1 month ago

I NEED TO consult every LLM VIA TELEKINESIS QUANTUM ELECTRIC GRAVITY A AND B WAVE.

[–] zxqwas@lemmy.world 7 points 1 month ago (6 children)

Using a calculator or wolfram alpha or similar tools i don't trust the answer unless it passes a few sanity checks. Frequently I am the source of error and no LLM can compensate for that.

load more comments (6 replies)

[–] scrubbles@poptalk.scrubbles.tech 6 points 1 month ago

No. Dear God no. Llms are not computers. They are just prediction machines. They predict that the next value is probably this value. There is no actual math there.

[–] qaz@lemmy.world 6 points 1 month ago (1 children)

Most LLM's now call functions in the background. Most calculations are just simple Python expressions.

load more comments (1 replies)

[–] OwlPaste@lemmy.world 5 points 1 month ago (8 children)

no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying

[–] Pika@sh.itjust.works 2 points 1 month ago

Just yesterday I was fiddling around with a logic test in python. I wanted to see how well deepseek could analyze the intro line to a for loop, it properly identified what it did in the description, but when it moved onto giving examples it contradicted itself and took 3 or 4 replies before it realized that it contradicted itself.

load more comments (7 replies)

[–] Aatube@kbin.melroy.org 5 points 1 month ago (1 children)

this is a really weird premise. doing the same thing on 6 models is just not worth it especially when wolfram alpha exists and is far more trustable and speedy

[–] FaceDeer@fedia.io 3 points 1 month ago (1 children)

If the LLMs are part of a modern framework I would expect that they should be calling out to Wolfram Alpha (or a similar specialized math-solver) via an API to get the answer for you, for that matter.

[–] GrammarPolice@lemmy.world 2 points 1 month ago (1 children)

Finally an intelligent comment. So many comments in here that don't realize most LLM's are bundled with calculators that just do the math.

[–] FaceDeer@fedia.io 2 points 1 month ago

Anti-AI sentiment is extremely strong in every part of the Fediverse I've seen so far, usually my comments get downvoted heavily even when I'm just describing factual details of how it works. I expect a lot of people simply don't bother after a while.

[–] bunchberry@lemmy.world 5 points 1 month ago (2 children)

I’ve used LLMs quite a few times to find partial derivatives / gradient functions for me, and I know it’s correct because I plug them into a gradient descent algorithm and it works. I would never trust anything an LLM gives blindly no matter how advanced it is, but in this particular case I could actually test the output since it's something I was implementing in an algorithm, so if it didn't work I would know immediately.

[–] Farmdude@lemmy.world 2 points 1 month ago

That's rad, dude. I wish I knew how to do that. Hey, dude I imagined a cosmological model that fits the data with two fewer parameters then the standard model. Planke data. I I've checked the numbers, but I don't have the credentials. I need somebody to check it out. This is a it and a verbal explanation for the model by Academia.edu. It's way easier to listen first before looking. I don't want recognition or anything. Just for someone to review it. It's a short paper. https://youtu.be/_l8SHVeua1Y

load more comments (1 replies)

[–] AmericanEconomicThinkTank@lemmy.world 5 points 1 month ago

Nope, language models by inherent nature, xannot be used to calculate. Sure theoretically you could have input parsed, with proper training, to find specific variables, input those to a database and have that data mathematically transformed back into language data.

No LLMs do actual math, they only produce the most likely output to a given input based on trained data. If I input: What is 1 plus 1?

Then given the model, most likely has trained repetition on an answer to follow that being 1 + 1 = 2, that will be the output. If it was trained on data that was 1 + 1 = 5, then that would be the output.

[–] Professorozone@lemmy.world 4 points 1 month ago

Well, I wanted to know the answer and formula for future value of a present amount. The AI answer that came up was clear, concise, and thorough. I was impressed and put the formula into my spreadsheet. My answer did not match the AI answer. So I kept looking for what I did wrong. Finally I just put the value into a regular online calculator and it matched the answer my spreadsheet was returning.

So AI gave me the right equation and the wrong answer. But it did it in a very impressive way. This is why I think it's important for AI to only be used as a tool and not a replacement for knowledge. You have to be able to understand how to check the results.

[–] CanadaPlus@lemmy.sdf.org 3 points 1 month ago

Maybe? I'd be looking all over for some convergent way to fuck it up, though.

If it's just one model or the answers are only close, lol no.

[–] msmc101@lemmy.blahaj.zone 3 points 1 month ago

no, LLM's are designed to drive up user engagement nothing else, it's programmed to present what you want to hear not actual facts. plus it's straight up not designed to do math

[–] guy@piefed.social 3 points 1 month ago

No lol. I don't trust a calculator to write me text and not a auto complete to solve me math problems

[–] BlameTheAntifa@lemmy.world 3 points 1 month ago* (last edited 1 month ago) (1 children)

You cannot trust LLMs. Period.

They are literally hallucination machines that just happen to be correct sometimes.

load more comments (1 replies)

[–] j4k3@piefed.world 3 points 1 month ago

Never a base model, absolutely with an agent and function calling with a properly made tool and retrieval.

[–] nylo@lemmy.dbzer0.com 3 points 1 month ago

[–] BilboBargains@lemmy.world 2 points 1 month ago

Use Wolfram Alpha for mathematics

[–] Rhaedas@fedia.io 2 points 1 month ago (5 children)

How trustable the answer is depends on knowing where the answers come from, which is unknowable. If the probability of the answers being generated from the original problem are high because it occurred in many different places in the training data, then maybe it's correct. Or maybe everyone who came up with the answer is wrong in the same way and that's why there is so much correlation. Or perhaps the probability match is simply because lots of math problems tend towards similar answers.

The core issue is that the LLM is not thinking or reasoning about the problem itself, so trusting it with anything is more assuming the likelihood of it being right more than wrong is high. In some areas this is safe to do, in others it's a terrible assumption to make.

load more comments (5 replies)

[–] Typewar@infosec.pub 2 points 1 month ago (1 children)

No because there is randomness involved

[–] 1rre@discuss.tchncs.de 2 points 1 month ago

That's why you ask 6 of them, and of they all come to the same conclusion then chances are it's either right, or a common pitfall.

[–] HubertManne@piefed.social 2 points 1 month ago

For practice yeah as there is usually something you can do to verify the value. For study no as you would not learn shit.

load more comments