this post was submitted on 29 Jan 2025

1099 points (99.0% liked)

Not The Onion

13000 readers

1205 users here now

Welcome

We're not The Onion! Not affiliated with them in any way! Not operated by them in any way! All the news here is real!

The Rules

Posts must be:

Links to news stories from...
...credible sources, with...
...their original headlines, that...
...would make people who see the headline think, “That has got to be a story from The Onion, America’s Finest News Source.”

Comments must abide by the server rules for Lemmy.world and generally abstain from trollish, bigoted, or otherwise disruptive behavior that makes this community less fun for everyone.

And that’s basically it!

founded 2 years ago

MODERATORS

kescusay@lemmy.world

1099

OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us (www.404media.co)

submitted 1 day ago by hellfire103@lemmy.ca to c/nottheonion@lemmy.world

106 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] NuXCOM_90Percent@lemmy.zip 16 points 1 day ago (2 children)

If anything, this is kind of making people realize the opposite. It isn't stealing when it is corporate (or creator...) but it is TOTALLY stealing when it is individual people... who aren't authors or artists.

The fun part is that "creating" datasets for training steal from everyone equally.

[–] EldritchFeminity@lemmy.blahaj.zone 3 points 1 day ago (1 children)

And when it comes to authors and artists, it amounts to wage theft. When a company hires an artist to make an ad, the artist gets paid to make it. If you then take that ad, you're not taking money from the worker - they already got paid for the work that they did. Even if you take a piece from the social media of an independent artist and make a meme out of it or something, so long as people can find that artist, it can lead to people hiring them. But if you chop it up and mash it into a data set, you're taking their work for profit or to avoid paying them for their skills and expertise to create something new. AI can not exist without a constant stream of human art to devour, yet nobody thinks the work to produce that art is worth paying for. It's doing a corporation to avoid paying the working class what their skills are worth.

[–] NuXCOM_90Percent@lemmy.zip 1 points 1 day ago (1 children)

Even if you take a piece from the social media of an independent artist and make a meme out of it or something, so long as people can find that artist, it can lead to people hiring them

That is a BIG if
You are literally arguing that it is fine for people to "work for exposure"

AI can not exist without a constant stream of human art to devour

That is, sadly, incorrect. What IS true is that AI cannot be "born" without massive amounts of human content. But once you have a solid base model (and I do not believe we currently do), you no longer need input art or input prose. The model can generate that. What you DO need is feedback on whether a slight variation is good or bad. Once you have that labeled data you then retrain. Plenty of existing ML models do exactly this.

And, honestly? That isn't even all that different from how humans do it. It is an oversimplification that ignores lead times, but just look at everyone who suddenly wants to talk about how much Virtuoisity influenced The Matrix. Or, more reasonably, you can look at how different eras of film are clearly influenced by the previous. EVERYONE doing action copied John Woo and then there was the innovation to add slowmo to more or less riff on wire work common in (among other places) Chinese films. And that eventually became a much bigger focus on slow mo to show the impact of a hit and so forth.

There is not something intrinsically human to saying "can I put some jelly in my peanut butter?". But there IS soemthing intrinsically human to deciding if that was a good idea.... to humans.

[–] EldritchFeminity@lemmy.blahaj.zone 2 points 1 day ago (1 children)

I agree that's a BIG if. In an ideal world, people would cite their sources and bring more attention to the creator. I also didn't mean that artists should create work for the opportunity to have it turned into a meme and maybe go viral and get exposure that way, but that at least there's a chance of people getting more clients through word of mouth that way for work that they've already done, however small, compared to having their art thrown into a training algorithm which has an absolutely zero chance of the artist seeing any benefit.

Last I heard, current AI will devour themselves if trained on content from other AI. It simply isn't good enough to use, and the garbage noise to value ratio is too high to make it worth filtering through. Which means that there is still a massive demand for human-made content, and possibly will be even more demand in the future for some time yet. Pay artists to create that content, and I see no real problem in the model. There are some companies that have started doing just that. Procreate has partnered with a company that creates websites that is hiring artists to create training data for their UI generating LLM and paying those artists commission fees. Nobody has to spend their day making hundreds of buttons for stupid websites, and the artists get paid. A win-win for everybody.

My stance on AI always comes down to the ethics behind the creation of the tool, not the tool itself. My pie in the sky scenario would be that artists could spend their time making what they want to make without having to worry about whether or not they can afford rent. There's a reason we see most artists posting only commission work online, and it's because they can't afford to work on their own stuff. My more realistic view is that there's a demand for content to train these things, so pay the people making that content an appropriate wage for their work and experience. There could be an entire industry around creating stuff specifically for different content tags for training data.

And as for AI being similar to humans, I think you're largely right. It's a really simplified reproduction of how human creativity and inspiration work, but with some major caveats. I see AI as basically a magic box containing an approximation of skill but lacking understanding and intent. When you give it a prompt, you provide the intent, and if you're knowledgeable, you have the understanding to apply as well. But many people don't care about the understanding or value the skill, they just want the end result. Which is where we stand today with AI not being used for the betterment of our daily lives, but just as a cost-cutting tool to avoid having to pay workers what they're worth.

Hence, we live in a world where they told us when we were growing up that AI would be used to do the things we hate doing so that we had more time to write poetry and create art, while today AI is used to write poetry and create art so that we have more time to work our menial jobs and create value for shareholders.

[–] NuXCOM_90Percent@lemmy.zip 1 points 1 day ago (1 children)

Last I heard, current AI will devour themselves if trained on content from other AI. It simply isn’t good enough to use, and the garbage noise to value ratio is too high to make it worth filtering through.

Yeah... that is right up there with "AI can't do feet" in terms of being nonsense people spew.

There is nothing inherently different between a picture made by an LLM and a picture drawn by Rob Liefeld. Both have fucked up feet and both can be fixed with a bit of effort.

The issue is more the training data itself. Where this CAN cause a problem is if you unknowingly train on endless amounts of ai generated content. But... we have the same problem with training on endless amounts of human content. Very few training sets (these days) bother to put in the time to actually label what input is. So it isn't "This is a good recipe, that is a bad recipe, and that is an ad for betterhelp". It is "This is all the data we scraped off every public facing blog and youtube transcript".

Its also why the major companies are putting a big emphasis on letting customers feed in their own data. Partially that is out of the understanding that people might not want to type corporate IP into a web interface. But it is also because it provides a way to rapidly generate some labeled data because you know that customer cares about widgets if they have twelve gigs of documents on widgets.

I see AI as basically a magic box containing an approximation of skill but lacking understanding and intent.

And what is the difference between someone getting paid to draw a picture of Sonic choking on a chili dog by a rando versus an AI generated image of the same?

At the end of the day, we aren't going to see magic AIs generating everything with zero prompting (maybe in a decade or two... if the world still exists). Instead what we see is people studying demand and creating prompts based on that. Which... isn't that different from how hollywood studios decide which script to greenlight or not.

[–] EldritchFeminity@lemmy.blahaj.zone 1 points 1 day ago

You're largely arguing what I'm saying back at me. I didn't mean that the AI is bad, but that the AI content that's out there has filled the internet with tons of low quality stuff over the past few years, and enough of this garbage going in degrades the quality coming out, in a repeating cycle of degradation. You create biases in your model, and feeding those back in makes it worse. So the most cost-effective way to filter it out is to avoid training on possibly AI content altogether. I think OpenAI was limiting the training data for ChatGPT to stuff from before 2020 up until this past year or so.

It's a similar issue to what facial recognition software had. Early on, facial recognition couldn't tell the difference between two women, two black people (men or women), or two white men under the age of 25 or so. Because it was trained on the employees working on it, who were mostly middle-aged white men.

This means that there's a high demand for content to train on, which would be a perfect job to hire artists for. Pay them to create work for whatever labels you're looking for for your data sets. But companies don't want to do that. They'd rather steal content from the public at large. Because AI is about cutting costs for these companies.

And what is the difference between someone getting paid to draw a picture of Sonic choking on a chili dog by a rando versus an AI generated image of the same?

To put it simply: AI can generate an image, but it isn't capable of understanding 2-point perspective or proper lighting occlusion, etc. It's just a tool. A very powerful tool, especially in the right hands, but a tool nonetheless. If you look at AI images, especially ones generated by the same model, you'll begin to notice certain specific mistakes - especially in lighting. AI doesn't understand the concept of lighting, and so has a very hard time creating realistic lighting. Most characters end up with competing light sources and shadows from all over the place that make no sense. And that's just a consequence of how specific you'd need your prompt to be in order to get it right.

Another flaw with AI is that it can't iterate. Production companies that were hiring AI prompters to their movie crews have started putting blanket bans on hiring prompters because they simply can't do the work. You ask them to give you 10 images of a forest, and they'll come back the next day with 20. But you say, "Great, I like this one, but take the people out of it," and they'll come back the next day with 15 more pictures of forests, but not the original without people in it. It's a great tool for what it does, but you can't tell it, "Can you make the chili dog 10 times larger" and get the same piece, just with a giant chili dog.

And don't get me started on Hollywood or any of those other corporate leeches. I think Adam Savage said it best when he said last year that someday, a film student is going to do something really amazing with AI - and Hollywood is going to copy it to death. Corporations are the death of art, because they only care about making a product to be consumed. For some perfect examples of what I mean, you should check out these two videos: Why do "Corporate Art Styles" Feel Fake? by Solar Sands, and Corporate Music - How to Compose with no Soul by Tantacrul. Corporations also have no courage when money is on the line, so that's why we see so many sequels and remakes out of Hollywood. People aren't clamoring for a live action remake of (insert childhood Disney movie here), but they will go and watch it, and that's a safe bet for Hollywood. That's why we don't see many new properties. Artists want to make them, but Hollywood doesn't.

As I said, in my ideal world, AI would be making that corporate garbage and artists would be able to create what they actually want. But in the real world, there's very little chance that you can keep a roof over your head making what you want. Making corporate garbage is where the jobs are, and most artists have very little time left over for working on personal stuff. People always ask questions like, "Why aren't people making statues like the Romans did," or "Why don't we get paintings like Rembrandt used to do." And the answer is, because nobody is paying artists to make them. They're paying them to make soup commercials, and they don't even want to pay them for that.

[–] Dkarma@lemmy.world 2 points 1 day ago (1 children)

Nuance not your strong suit eh?

[–] NuXCOM_90Percent@lemmy.zip 5 points 1 day ago

I'm curious what "nuance" I am missing.

I mean, it isn't like OpenAI or DeepSeek were going to pay for it anyway. So there is no loss revenue and it isn't stealing. Besides, you can't download a car so it isn't even stealing.

Its just that people are super eager to make themselves morally righteous when they are explaining why it is fine to not give a shit about the work of one person or hundreds of persons when they want something. But once they see corporations (and researchers) doing the exact same thing to them? THIEVERY!!!

When the reality is: Yeah, it is "theft" either way. Whether you care is up to you.