this post was submitted on 26 Jan 2024
424 points (82.9% liked)

Technology

72788 readers
2439 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

you are viewing a single comment's thread
view the rest of the comments
[–] Fisk400@feddit.nu 52 points 1 year ago (6 children)

What it proves is that they are feeding entire movies into the training data. It is excellent evidence for when WB and Disney decides to sue the shit out of them.

[–] DudeDudenson@lemmings.world 113 points 1 year ago (2 children)

Does it really have to be entire movies when theres a ton of promotional images and memes with similar images?

[–] wewbull@iusearchlinux.fyi 4 points 1 year ago (1 children)

Promotional images are still under copyright.

[–] Klear@sh.itjust.works 7 points 1 year ago (1 children)

We should find all the memers and throw them in jail.

[–] DudeDudenson@lemmings.world 5 points 1 year ago

Will someone think of the shareholders!?

[–] Jarix@lemmy.world 3 points 1 year ago (1 children)

Yes. Thats what these things are, extremely large catalogues of data. As much data as possible is their goal.

[–] EdibleFriend@lemmy.world 9 points 1 year ago (1 children)

True but it didn't pick some random frame somewhere in the movie it chose a extremely memorable shot that is posted all over the place. I won't deny that they are probably feeding it movies but this is not a sign of that.

This image is literally the top result on Google images for me.

[–] Jarix@lemmy.world 0 points 1 year ago* (last edited 1 year ago) (1 children)

Why would it pick some random frame in the middle of its data set instead of a frame it has the most to reference. It can still use all those other frames to then pick the frame if has the most references to.

But im starting to think maybe i misunderstood the comment i replied to.

Sorry, im way out of context with my reply, totally my fault for reflexively replying.

Uhhh would you accept i didnt have my coffee yet and hadnt got out of bed yet as an explanation?

[–] EdibleFriend@lemmy.world 1 points 1 year ago

Haha it happens

[–] Mirodir@discuss.tchncs.de 58 points 1 year ago* (last edited 1 year ago) (1 children)

I think it's much more likely whatever scraping they used to get the training data snatched a screenshot of the movie some random internet user posted somewhere. (To confirm, I typed "joaquin phoenix joker" into Google and this very image was very high up in the image results) And of course not only this one but many many more too.

Now I'm not saying scraping copyrighted material is morally right either, but I'd doubt they'd just feed an entire movie frame by frame (or randomly spaced screenshots from throughout a movie), especially because it would make generating good labels for each frame very difficult.

[–] otp@sh.itjust.works 23 points 1 year ago

I just googled "what does joker look like" and it was the fourth hit on image search.

Well, it was actually an article (unrelated to AI) that used the image.

But then I went simpler -- googling "joker" gives you the image (from the IMDb page) as the second hit.

[–] orclev@lemmy.world 19 points 1 year ago (1 children)

WB and Disney would lose, at least without an amendment to copyright law. That in fact just happened in one court case. It was ruled that using a copyrighted work to train AI does not violate that works copyright.

[–] asret@lemmy.zip 9 points 1 year ago (1 children)

Using it to train on is very different from distributing derived works.

[–] wewbull@iusearchlinux.fyi 2 points 1 year ago* (last edited 1 year ago) (1 children)

What do you think the trained model is other than a derived work?

[–] asret@lemmy.zip 4 points 1 year ago

Something transformative from the original works. And arguably not being being distributed. The model producing and distributing derivative works is entirely different though. No one really gives a shit about data being used to train models - there's nothing infringing about that which is exactly why they won their case. The example in the post is an entirely different situation though.

[–] Even_Adder@lemmy.dbzer0.com 9 points 1 year ago* (last edited 1 year ago)

The way it was done if I remember correctly is that someone found out v6 was trained partially with Stockbase images-caption pairs, so they went to Stockbase and found some images and used those exact tags in the prompts.

[–] Kusimulkku@lemm.ee 2 points 1 year ago

The image it generated is really widespread