this post was submitted on 21 May 2025

994 points (97.7% liked)

Technology

74664 readers

2364 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

994

It's Breathtaking How Fast AI Is Screwing Up the Education System (gizmodo.com)

submitted 3 months ago by youradhere@feddit.org to c/technology@lemmy.world

365 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] DoPeopleLookHere@sh.itjust.works 13 points 3 months ago (3 children)

but what good is that if AI can do it anyway?

It can't. It just fucking can't. We're all pretending it does, but it fundamentally can't.

https://appleinsider.com/articles/24/10/12/apples-study-proves-that-llm-based-ai-models-are-flawed-because-they-cannot-reason

Creative thinking is still a long way beyond reasoning as well. We're not close yet.

[–] Tiresia@slrpnk.net 3 points 3 months ago (1 children)

It can and it has done creative mathematical proof work. Nothing spectacular, but at least on par with a mathematics grad student.

[–] DoPeopleLookHere@sh.itjust.works -1 points 3 months ago (1 children)

Specialized AI like that is not what most people know as AI. Most people reffer to it as LLMs.

Specialized AI, like that showcased, is still decades away from generalized creative thinking. You can't ask it to do a science experiment with in a class because it just can't. It's only built for math proof.

Again, my argument is that it won't never exist.

Just that it's so far off it'd be like trying to regulate smart phone laws in the 90s. We would have only had pipe dreams as to what the tech could be, never mind its broader social context.

So tall to me when it can, in the case of this thread, clinically validated ways of teaching. We're still decades from that.

[–] Zexks@lemmy.world 1 points 3 months ago (1 children)

Show me a human that can do it.

[–] DoPeopleLookHere@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago)

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C10&q=children+learning+from+humans#d=gs_qabs&t=1747921831528&u=%23p%3DDqyOK2jEfjQJ

EDIT: you can literally get a PhD in many forms of education and have an entire career studying it.

[–] pinkapple@lemmy.ml 0 points 3 months ago (1 children)

The faulty logic was supported by a previous study from 2019

This directly applies to the human journalist, studies on other models 6 years ago are pretty much irrelevant and this one apparently tested very small distilled ones that you can run on consumer hardware at home (Llama3 8B lol).

Anyway this study seems trash if their conclusion is that small and fine-tuned models (user compliance includes not suspecting intentionally wrong prompts) failing to account for human misdirection somehow means "no evidence of formal reasoning". Which means using formal logic and formal operations and not reasoning in general, we use informal reasoning for the vast majority of what we do daily and we also rely on "sophisticated pattern matching" lmao, it's called cognitive heuristics. Kahneman won the Nobel prize for recognizing type 1 and type 2 thinking in humans.

Why don't you go repeat the experiment yourself on huggingface (accounts are free, over ten models to test, actually many are the same ones the study used) and see what actually happens? Try it on model chains that have a reasoning model like R1 and Qwant and just see for yourself and report back. It would be intellectually honest to verify things since we're talking about critical thinking in here.

Oh add a control group here, a comparison with average human performance to see what the really funny but hidden part is. Pro-tip: CS STEMlords catastrophically suck when larping being cognitive scientists.

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago (1 children)

So you say I should be intellectually honest by doing the experiment myself, then say that my experiment is going to be shit anyways? Sure... That's also intellectually honest.

Here's the thing.

My education is in physics, not CS. I know enough to know what I try isn't going to be really valid.

But unless you have peer reviewed searches to show otherwise, because I would take your home grown experiment to be as valid as mine.

[–] pinkapple@lemmy.ml 0 points 3 months ago (1 children)

And here's experimental verification that humans lack formal reasoning when sentences don't precisely spell it out for them: all the models they tested except chatGPT4 and o1 variants are from 27B and below, all the way to Phi-3 which is an SLM, a small language model with only 3.8B parameters. ChatGPT4 has 1.8T parameters.

1.8 trillion > 3.8 billion

ChatGPT4's performance difference (accuracy drop) with regular benchmarks was a whooping -0.3 versus Mistral 7B -9.2 drop.

Yes there were massive differences. No, they didn't show significance because they barely did any real stats. The models I suggested you try for yourself are not included in the test and the ones they did use are known to have significant limitations. Intellectual honesty would require reading the actual "study" though instead of doubling down.

Maybe consider the possibility that a. STEMlords in general may know how to do benchmarks but not cognitive testing type testing or how to use statistical methods from that field b. this study being an example of a few "I'm just messing around trying to confuse LLMs with sneaky prompts instead of doing real research because I need a publication without work" type of study, equivalent to students making chatGPT do their homework c. 3.8B models = the size in bytes is between 1.8 and 2.2 gigabytes d. not that "peer review" is required for criticism lol but uh, that's a preprint on arxiv, the "study" itself hasn't been peer reviewed or properly published anywhere (how many months are there between October 2024 to May 2025?) e. showing some qualitative difference between quantitatively different things without showing p and using weights is garbage statistics f. you can try the experiment yourself because the models I suggested have visible Chain of Thought and you'll see if and over what they get confused about g. when there are graded performance differences with several models reliably not getting confused at least more than half the time but you say "fundamentally can't reason" you may be fundamentally misunderstanding what the word means

Need more clarifications instead of reading the study or performing basic fun experiments? At least be intellectually curious or something.

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago (1 children)

And still nothing peer reviewed to show?

Synethic benchmarks mean nothing. I don't care how much context someone can store, when the context being stored is putting glue on pizza.

Again, I'm looking for some academic sources (doesn't have to be stem, education would be preferred here) that the current tech is close to useful.

[–] pinkapple@lemmy.ml 0 points 3 months ago (1 children)

You made huge claims using a non peer reviewed preprint with garbage statistics and abysmal experimental design where they put together 21 bikes and 4 race cars to bury openAI flagship models under the group trend and go to the press with it. I'm not going to go over all the flaws but all the performance drops happen when they spam the model with the same prompt several times and then suddenly add or remove information, while using greedy decoding which will cause artificial averaging artifacts. It's context poisoning with extra steps i.e. not logic testing but prompt hacking.

This is Apple (that is falling behind in its AI research) attacking a competitor with fake FUD and doesn't even count as research, which you'd know if you looked it up and saw you know, opinions of peers.

You're just protecting an entrenched belief based on corporate slop so what would you do with peer reviewed anything? You didn't bother to check the one you posted yourself.

Or you post corporate slop on purpose and now trying to turn the conversation away from that. Usually the case when someone conveniently bypasses absolutely all your arguments lol.

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago (1 children)

Okay, here's a non apple source since you want it.

https://arxiv.org/abs/2402.12091

5 Conclusion In this study, we investigate the capacity of LLMs, with parameters varying from 7B to 200B, to com- prehend logical rules. The observed performance disparity between smaller and larger models indi- cates that size alone does not guarantee a profound understanding of logical constructs. While larger models may show traces of semantic learning, their outputs often lack logical validity when faced with swapped logical predicates. Our findings suggest that while LLMs may improve their logical reason- ing performance through in-context learning and methodologies such as COT, these enhancements do not equate to a genuine understanding of logical operations and definitions, nor do they necessarily confer the capability for logical reasoning.

[–] pinkapple@lemmy.ml 0 points 3 months ago (1 children)

Another unpublished preprint that hasn't published peer review? Funny how that somehow doesn't matter when something seemingly supports your talking points. Too bad it doesn't exactly mean what you want it to mean.

"Logical operations and definitions" = Booleans and propositional logic formalisms. You don't do that either because humans don't think like that but I'm not surprised you'd avoid mentioning the context and go for the kinda over the top and easy to misunderstand conclusion.

It's really interesting how you get people constantly doubling down on specifically chatbots being useless citing random things from google but somehow Palantir finds great usage in their AIs for mass surveillance and policing. What's the talking point there, that they're too dumb to operate and that nobody should worry?

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago (1 children)

As apposed to the nothing you've cited that context tokens actually improve reasoning?

I love how you keep going further and further away from the education topic at hand, and now brining in police survalinece, which everyone knows is 100% accurate.

[–] pinkapple@lemmy.ml 0 points 3 months ago (1 children)

You're less coherent than a broken LLM lol. You made the claim that transformer-based AIs are fundamentally incapable of reasoning or something vague like that using gimmicky af "I tricked the chatbot into getting confused therefore it can't think" unpublished preprints (while asking for peer review). Why would I need to prove something? LLMs can write code, that's an undeniable demonstration that they understand abstract logic fairly well that can't be faked using probability and it would be a complete waste of time to explain it to anyone who is either having issues with cognitive dissonance or less often may be intentionally trying to spread misinformation.

Are the AIs developed by Palantir "fundamentally incapable" of their demonstrated effectiveness or not? It's a pretty valid question when we're already surveilled by them but some people like you indirectly suggest that this can't be happening. Should people not care about predictive policing?

How about the industrial control AIs that you "critics" never mention, do power grid controllers fake it? You may need to tell Siemens, they're not aware their deployed systems work. And while on that, we shouldn't be concerned about monopolies controlling public infrastructure with closed source AI models because they're "fundamentally incapable" to operate?

I don't know, maybe this "AI skepticism" thing is lowkey intentional industry misdirection and most of you fell for it?

[–] DoPeopleLookHere@sh.itjust.works 1 points 3 months ago (1 children)

My larger point, AI replacing teachers is at least a decade away.

You've given no evidence that it is. You've just said you hate my sources, while not actually making a single argument that it is.

You said well it stores context, but who cares? I showed that it doesn't translate to what you think, and you said you don't like, without providing any evidence that it means anything beyond looking good on a graph.

I've said several times, SHOW ME ITS CLOSE. I don't care what law enforcement buys, because that has nothing to do with education.

[–] pinkapple@lemmy.ml -1 points 3 months ago (1 children)

I never said its going to replace teachers or that it "stores context" but your sloppily googled preprints to support your "fundamentally can't reason" statement were demonstrably garbage. You didn't say even once "show me it's close" but you think you said several times. Either your reading comprehension is worse than an LLM and you wildly confabulate, which means an LLM could replace you or you're a bot. Anyway, so far you proved nothing and already said they can write code, it's a non trivial cognitive task that you can't perform without several higher order abilities so cope and seethe I guess.

So, what about Palantir AI? Is that also "not close"? Why are you avoiding surveillance AI? They're both neural networks. Some are LLMs.

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago* (last edited 3 months ago) (1 children)

I said AI isn't close in education. That was my entire claim

I never said anything about any other company. I said AI in education isn't happening soon. You keep pulling in other sectors.

I've also had several comments in this thread before you came in saying that.

EDIT: give me a citation that LLMs can reason for code. Because in my experience as someone that professionally codes with AI (copilot) it's not capable at that. It's guess what it thinks I want to write in small segments.

https://x.com/leojr94_/status/1901560276488511759

Espcially when it has a nasty habit of leaking secrets.

EDIT2 forgot to say why I'm ignoring other fields. Because we're not talking about AI in those fields. We're talking education and search engines at best. My original comment was that AI generated educational papers still serve their original purpose.

What the fuck does that have to do with anything to do with plaintair?

[–] pinkapple@lemmy.ml 0 points 3 months ago (1 children)

Your claim was this, "supported" by some corporate unpublished preprint (which is really funny considering you have the nerve to ask for citations):

It can't. It just fucking can't. We're all pretending it does, but it fundamentally can't.

You don't need a citation for LLMs being able to "reason for code", doubting AI coding abilities is delusional online yapping considering how documented it is since its deployed all over the place so how about you prove that being able to write code and do things like control flow, conditionals etc can be done without reason. Try doing that instead of spamming incoherent replies.

Nobody cares if you're a professional vibe coder all of a sudden, if you can't code without copilot maybe you shouldn't have an opinion based on Apple's "research".

But until then, are Palantir's AIs fundamentally incapable of reasoning? Yes or no? None of you anti-AI warriors are clear, should we not worry about corporate AI surveillance because apparently AI isn't really "I" or not? Simple question, but maybe ask copilot for help. But you seem bugged when it comes to corporate propaganda contradictions, it's really interesting.

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago (1 children)

Buddy I don't know who fucked your mom or pissed in your corn flakes. But it wasn't me.

All you've done is lay accusations to my character to me, while doing nothing of value.

Have the day you deserve

[–] pinkapple@lemmy.ml 1 points 3 months ago* (last edited 3 months ago)

Stay mad b1tch.

[–] FourWaveforms@lemm.ee -3 points 3 months ago (1 children)

It's already capable of doing a lot, and there is reason to expect it will get better over time. If we stick our fingers in our ears and pretend that's not possible, we will not be prepared.

[–] DoPeopleLookHere@sh.itjust.works 0 points 3 months ago* (last edited 3 months ago) (1 children)

If you read, it's capable of very little under the surface of what it is.

Show me one that is well studied, like clinical trial levels, then we'll talk.

We're decades away at this point.

My overall point of it's just as meaningless to talk about now as it was in the 90s. Because we can't convince of what a functioning product will be, never mind it's context I'm a greater society. When we have it, we can discuss it then as we have something tangible to discuss. But where we'll be in decades is hard to regulate now.

[–] Zexks@lemmy.world 1 points 3 months ago (1 children)

Alpha Fold. We're not decades away. We're years at worst.

[–] DoPeopleLookHere@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago)

If you assume the unlimited power needed right now to power Aloha fold at scale of all human education.

We have at best proof of concepts that computers can talk. But LLMs don't have any way of actually knowing anything behind them. That's kinda the problem.

And it's not a "we'll figure out the one trick" but more fundamentally how it works doesn't allow for that to happen.