We Asked A.I. to Create the Joker. It Generated a Copyrighted Image. : technology

[–] silentdon@lemmy.world 179 points 2 years ago (3 children)

We asked A.I. to create a copyrighted image from the Joker movie. It generated a copyrighted image as expected.

Ftfy

[–] Fisk400@feddit.nu 52 points 2 years ago (12 children)

What it proves is that they are feeding entire movies into the training data. It is excellent evidence for when WB and Disney decides to sue the shit out of them.

[–] DudeDudenson@lemmings.world 113 points 2 years ago (7 children)

Does it really have to be entire movies when theres a ton of promotional images and memes with similar images?

load more comments (7 replies)

[–] Mirodir@discuss.tchncs.de 58 points 2 years ago* (last edited 2 years ago) (1 children)

I think it's much more likely whatever scraping they used to get the training data snatched a screenshot of the movie some random internet user posted somewhere. (To confirm, I typed "joaquin phoenix joker" into Google and this very image was very high up in the image results) And of course not only this one but many many more too.

Now I'm not saying scraping copyrighted material is morally right either, but I'd doubt they'd just feed an entire movie frame by frame (or randomly spaced screenshots from throughout a movie), especially because it would make generating good labels for each frame very difficult.

[–] otp@sh.itjust.works 23 points 2 years ago

I just googled "what does joker look like" and it was the fourth hit on image search.

Well, it was actually an article (unrelated to AI) that used the image.

But then I went simpler -- googling "joker" gives you the image (from the IMDb page) as the second hit.

[–] orclev@lemmy.world 19 points 2 years ago (1 children)

WB and Disney would lose, at least without an amendment to copyright law. That in fact just happened in one court case. It was ruled that using a copyrighted work to train AI does not violate that works copyright.

[–] asret@lemmy.zip 9 points 2 years ago (2 children)

Using it to train on is very different from distributing derived works.

load more comments (2 replies)

[–] Even_Adder@lemmy.dbzer0.com 9 points 2 years ago* (last edited 2 years ago)

The way it was done if I remember correctly is that someone found out v6 was trained partially with Stockbase images-caption pairs, so they went to Stockbase and found some images and used those exact tags in the prompts.

load more comments (8 replies)

[–] esc27@lemmy.world 16 points 2 years ago (3 children)

Voyager just loaded a copyrighted image on my phone. Guess someone's gonna have to sue them too.

[–] VicentAdultman@lemmy.world 16 points 2 years ago (3 children)

Yeah man, Voyager is making millions with the images on the app. It makes me so mad, they Voyager people make you think they are generating content on their own, but in reality is just feeding you unlicensed content from others.

load more comments (3 replies)

[–] otp@sh.itjust.works 9 points 2 years ago

I just remembered a copyrighted image. Oops.

Hey, I bet there were complaints about Google showing image results at some point too! Lol

load more comments (1 replies)

[–] Rentlar@lemmy.ca 12 points 2 years ago (5 children)

When they asked for an Italian video game character it returned something with unmistakable resemblance to Mario with other Nintendo property like Luigi, Toad etc. ... so you don't even have to ask for a "screencapture" directly for it to use things that are clearly based on copyrighted characters.

[–] sir_reginald@lemmy.world 13 points 2 years ago* (last edited 2 years ago) (4 children)

you're still asking for a character from a video game, which implies copyrighted material. write the same thing in google and take a look at the images. you get what you ask for.

you can't, obviously, use any image of Mario for anything outside fair use, no matter if AI generated or you got it from the internet.

load more comments (4 replies)

[–] orclev@lemmy.world 68 points 2 years ago (6 children)

They literally asked it to give them a screenshot from the Joker movie. That was their fucking prompt. It's not like they just said "draw Joker" and it spit out a screenshot from the movie, they had to work really hard to get that exact image.

[–] dragontamer@lemmy.world 69 points 2 years ago* (last edited 2 years ago) (57 children)

Because this proves that the "AI", at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

Likely because the "AI" was trained upon this image at some point. This has repercussions with regards to copyright law. It means the training set contains copyrighted data and the use of said training set could be argued as piracy.

Legal discussions on how to talk about generative-AI are only happening now, now that people can experiment with the technology. But its not like our laws have changed, copyright infringement is copyright infringement. If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

[–] abhibeckert@lemmy.world 28 points 2 years ago* (last edited 2 years ago) (16 children)

But where is the infringement?

This NYT article includes the same several copyrighted images and they surely haven't paid any license. It's obviously fair use in both cases and NYT's claim that "it might not be fair use" is just ridiculous.

Worse, the NYT also includes exact copies of the images, while the AI ones are just very close to the original. That's like the difference between uploading a video of yourself playing a Taylor Swift cover and actually uploading one of Taylor Swift's own music videos to YouTube.

Even worse the NYT intentionally distributed the copyrighted images, while Midjourney did so unintentionally and specifically states it's a breach of their terms of service. Your account might be banned if you're caught using these prompts.

[–] jacksilver@lemmy.world 28 points 2 years ago (2 children)

You do realize that newspapers do typically pay the licensing for images, it's how things like Getty images exist.

On the flip side, OpenAI (and other companies) are charging someone access to their model, which is then returning copyrighted images without paying the original creator.

That's why situations like this keep getting talked about, you have a 3rd party charging people for copyrighted materials. We can argue that it's a tool, so you aren't really "selling" copyrighted data, but that's the issue that is generally be discussed in these kinds of articles/court cases.

load more comments (2 replies)

load more comments (15 replies)

[–] GenderNeutralBro@lemmy.sdf.org 14 points 2 years ago (3 children)

Because this proves that the “AI”, at some level, is storing the data of the Joker movie

I don't think that's a justified conclusion.

If I watched a movie, and you asked me to reproduce a simple scene from it, then I could do that if I remembered the character design, angle, framing, etc. None of this would require storing the image, only remembering the visual meaning of it and how to represent that with the tools at my disposal.

If I reproduced it that closely (or even not-nearly-that-closely), then yes, my work would be considered a copyright violation. I would not be able to publish and profit off of it. But that's on me, not on whoever made the tools I used. The violation is in the result, not the tools.

The problem with these claims is that they are shifting the responsibility for copyright violation off of the people creating the art, and onto the people making the tools used to create the art. I could make the same image in Photoshop; are they going after Adobe, too? Of course not. You can make copyright-violating work in any medium, with any tools. Midjourney is a tool with enough flexibility to create almost any image you can imagine, just like Photoshop.

Does it really matter if it takes a few minutes instead of hours?

load more comments (3 replies)

[–] archomrade@midwest.social 11 points 2 years ago (13 children)

I've had this discussion before, but that's not how copyright exceptions work.

Right or wrong (it hasn't been litigated yet), AI models are being claimed as fair use exceptions to the use of copyrighted material. Similar to other fair uses, the argument goes something like:

"The AI model is simply a digital representation of facts gleamed from the analysis of copyrighted works, and since factual data cannot be copyrighted (e.g. a description of the Mona Lisa vs the painting itself), the model itself is fair use"

I think it'll boil down to whether the models can be easily used as replacements to the works being claimed, and honestly I think that'll fail. That the models are quite good at reconstructing common expressions of copyrighted work is novel to the case law, though, and worthy of investigation.

But as someone who thinks ownership of expressions is bullshit anyway, I tend to think copyright is not the right way to go about penalizing or preventing the harm caused by the technology.

load more comments (13 replies)

load more comments (54 replies)

load more comments (5 replies)

[–] KinNectar@kbin.run 64 points 2 years ago (19 children)

Copyright issues aside, can we talk about how this implies accurate recall of an image from a never before achievable data compression ratio? If these models can actually recall the images they have been fed this could be a quantum leap in compression technology.

[–] Mirodir@discuss.tchncs.de 33 points 2 years ago* (last edited 2 years ago)

It's not as accurate as you'd like it to be. Some issues are:

It's quite lossy.
It'll do better on images containing common objects vs rare or even novel objects.
You won't know how much the result deviates from the original if all you're given is the prompt/conditioning vector and what model to use it on.
You cannot easily "compress" new images, instead you would have to either finetune the model (at which point you'd also mess with everyone else's decompression) or do an adversarial attack onto the model with another model to find the prompt/conditioning vector most likely to create something as close as possible to the original image you have.
It's rather slow.

Also it's not all that novel. People have been doing this with (variational) autoencoders (another class of generative model). This also doesn't have the flaw that you have no easy way to compress new images since an autoencoder is a trained encoder/decoder pair. It's also quite a bit faster than diffusion models when it comes to decoding, but often with a greater decrease in quality.

Most widespread diffusion models even use an autoencoder adjacent architecture to "compress" the input. The actual diffusion model then works in that "compressed data space" called latent space. The generated images are then decompressed before shown to users. Last time I checked, iirc, that compression rate was at around 1/4 to 1/8, but it's been a while, so don't quote me on this number.

edit: fixed some ambiguous wordings.

[–] TORFdot0@lemmy.world 16 points 2 years ago (1 children)

You can hardly consider it compression when you need a compute expensive model with hundreds of gigabytes (if not bigger) to accurately rehydrate it

load more comments (1 replies)

[–] freeman@sh.itjust.works 14 points 2 years ago

If you ignore the fact that the generated images are not accurate, maybe.

They are very similar so they are infringing but nobody would use this method for compression over an image codec

[–] linearchaos@lemmy.world 9 points 2 years ago (2 children)

I was thinking about this back when they first started talking about news articles coming back word for word.

There's no way for us to tell how much of the original data even in a lossy fashion can be directly recovered. If this was as common as these articles would leave you to believe you just be able to pull anything you wanted out on demand.

But here we have every news agency vying to make headlines about copyright infringement and we're seeing an article here and there with a close or relatively close result

There are millions and millions of people using this technology and most of us aren't running across blatant full screen reproductions of stuff.

You can tell from some of the artifacts that they've trained from some watermark images because the watermarks kind of show up but for the most part you wouldn't know who made the watermarking if all the watermarking companies didn't use rather unique patterns.

The image that we're seeing on this news site of the joker is quite exceptional, even from a lossy standpoint, but honestly it's just feeding the confirmation bias.

load more comments (2 replies)

load more comments (15 replies)

[–] gmtom@lemmy.world 39 points 2 years ago (5 children)

God I fucking hate this braindesd AI boogeyman nonsense.

Yeah, no shit you ask the AI to create a picture of a specific actor from a specific movie, its going yo look like a still from that movie.

Or if you ask it to create "an animated sponge wearing pants" it's going to give you spongebob.

You should think of these AIs as if you asking an artist freind of yours to draw a picture for you. So if you say "draw an Italian video games chsracter" then obviously they're going to draw Mario.

And also I want to point out they interview some professor of English for some reason, but they never interview, say, a professor of computer science and AI, because they don't want people that actually know what they're talking about giving logical answers, they want random bloggers making dumb tests and """exposing""" AI and how it steals everything!!!!!1!!! Because that's what gets clicks.

[–] ytorf@lemmy.world 14 points 2 years ago

They interviewed her because she wrote about generative ai experiments she conducted with Gary Marcus, an AI researcher who they quote earlier in the piece, specifically about AI’s regurgitation issue. They link to it in the article.

[–] Klear@sh.itjust.works 12 points 2 years ago (1 children)

All of this and also fuck copyright.

Why does everyone suddenly care about copyright so much. I feel like I'm taking crazy pills.

load more comments (1 replies)

[–] LarmyOfLone@lemm.ee 12 points 2 years ago* (last edited 2 years ago)

We asked this artist to draw the joker. The artist generated an copyrighted image. We ask the court to immediately confiscate his brain.

load more comments (2 replies)

[–] Jilanico@lemmy.world 34 points 2 years ago (21 children)

I already know I'm going to be downvoted all to hell, but just putting it out there that neural networks aren't just copy pasting. If a talented artist replicates a picture of the joker almost perfectly, they are applauded. If an AI does it, that's bad? Why are humans allowed to be "inspired" by copyrighted material, but AIs aren't?

load more comments (21 replies)

[–] antihumanitarian@lemmy.world 33 points 2 years ago* (last edited 2 years ago) (6 children)

This is a classic problem for machine learning systems, sometimes called over fitting or memorization. By analogy, it's the difference between knowing how to do multiplication vs just memorizing the times tables. With enough training data and large enough storage AI can feign higher "intelligence", and that is demonstrably what's going on here. It's a spectrum as well. In theory, nearly identical recall is undesirable, and there are known ways of shifting away from that end of the spectrum. Literal AI 101 content.

Edit: I don't mean to say that machine learning as a technique has problems, I mean that implementations of machine learning can run into these problems. And no, I wouldn't describe these as being intelligent any more than a chess algorithm is intelligent. They just have a much more broad problem space and the natural language processing leads us to anthropomorphize it.

load more comments (6 replies)

[–] taranasus@lemmy.world 23 points 2 years ago (1 children)

I took a gun, pointed it at another person, pulled the trigger and it killed that person.

[–] owen@lemmy.ca 15 points 2 years ago (3 children)

I opened the egg carton and found eggs in there.

load more comments (3 replies)

[–] Harbinger01173430@lemmy.world 19 points 2 years ago (7 children)

I suppose it's time to copyleft all the things on the internet

load more comments (7 replies)

[–] ombremad@lemmy.blahaj.zone 17 points 2 years ago (5 children)

I don't know why everybody pretends we need to come up with a bunch of new laws to protect artists and copyright against "AI". The problem isn't AI. The problem is data scraping.

An example: Apple's iOS allows you to record your own voice in order to make it a full speech synthesis, that you can use within the system. It's currently tooted as an accessibility feature (like, if you have a disability preventing you from speaking out loud all of the time, you can use your phone to speak on your behalf, with your own custom voice). In this case, you provide the data, and the AI processes it on-device over night. Simple. We could also think about an artist making a database of their own works in order to try and come up with new ideas with quick prompts, in their own style.

However, right now, a lot of companies are building huge databases by scraping data from everywhere without consent from the artists that, most of the time, don't even know their work was scraped. And they even dare to advise that publicly, pretend they have a right to do that, sell those services. That's stealing of intellectual property, always has been, always will be. You don't need new laws to get it right. You might need better courts in order to enforce it, depending on which country you live in.

There's legal use of AI, and unlawful use of AI. If you use what belongs to you and use the computer as a generative tool to make more things out of it: AI good. If you take from others what don't belong to you in order to generate stuff based on it: AI bad. Thanks for listening to my TED talk.

load more comments (5 replies)

[–] shartworx@sh.itjust.works 15 points 2 years ago

No it didn't.

[–] BreakDecks@lemmy.ml 14 points 2 years ago (1 children)

The fundamental philosophical question we need to answer here is whether Generative Art simply has the ability to infringe intellectual property, or if that ability makes Generative Art an infringement in and of itself.

I am personally in the former camp. AI models are just tools that have to be used correctly. There's also no reason that you shouldn't be allowed to generate existing IP with those models insofar as it isn't done for commercial purposes, just as anyone with a drawing tablet and Adobe can draw unlicensed fan art of whatever they want.

I don't really care if AI can draw a convincing Ironman. Wake me when someone uses AI in such a way that actually threatens Disney. It's still the responsibility of any publisher or commercial entity not to brazenly use another company's IP without permission, that the infringement was done with AI feel immaterial.

Also, the "memorization" issue seems like it would only be an issue for corporate IP that has the highest risk of overrepresentation in an image dataset, not independent artists who would actually see a real threat from an AI lifting their IP.

load more comments (1 replies)

[–] afraid_of_zombies@lemmy.world 14 points 2 years ago (9 children)

Get rid of copyright law. It only benefits the biggest content owners and deprives the rest of us of our own culture.

It says so much that the person who created an image can be bared from making it.

load more comments (9 replies)

[–] RememberTheApollo@lemmy.world 14 points 2 years ago (4 children)

For fun I asked an AI to create a Joker “in the style of Batman movies and comics”.

The Heath Ledger Joker is so prominent that a variation on that movie’s version is what I got back. It’s so close that without comparing a side-by-side to a real image it’s hard to know what the differences are.

load more comments (4 replies)

[–] Thorny_Insight@lemm.ee 13 points 2 years ago* (last edited 2 years ago) (8 children)

Asks AI to generate copyrighted image; AI generates a copyrighted image.

Pikatchu.jpg

load more comments (8 replies)

[–] 8000mark@discuss.tchncs.de 13 points 2 years ago (8 children)

I think AI in this case is doing exactly what it's best at: Automating unbelievably boring chores on the basis of past "experiences". In this case the boring chore was "Draw me [insert character name] just how I know him/her".

Too many people mistakenly assume generative AI is originative or imaginative. It's not. It certainly can seem that way because it can transform human ideas and words into a picture that has ideally never before existed and that notion is very powerful. But we have to accept that, until now, human creativity is unique to us, the humans. As far as I can tell, the authors were not trying to prove generative AI is unimaginative, they were showing just how blatant copyright infringement in the context of generative AI is happening. No more, no less.

load more comments (8 replies)

[+] Facelesscog@lemmy.world 10 points 2 years ago (4 children)

[deleted]

load more comments (4 replies)

[–] LibertyLizard@slrpnk.net 10 points 2 years ago (20 children)

Copyright is a scam anyway so who cares?

load more comments (20 replies)

Technology

Our Rules

Approved Bots