this post was submitted on 14 Jan 2024
821 points (99.2% liked)
Technology
59597 readers
3752 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Copyright is absolute. The rightsholder has complete and total right to dictate how it is copied. Thus, any unauthorised copying is copyright infringement. However, fair use gives exemption to certain types of copying. The copyright is still being infringed, because the rightsholder's absolute rights are being circumvented, however the penalty is not awarded because of fair use.
This is all just pedantry, though, and has no practical significance. Saying "fair use means copyright has not been infringed" doesn't change anything.
That's a database. Or perhaps rather some kind of 3D array - which could just be considered an advanced form of database. But yeah, you're right here, you win this pedantry round lol. 1-1.
Yeah I don't want to go down the avenue of suing the AI itself for infringement. However...[^1][^2][^3]
You're not coming off as rude at all with what you've said, in fact I welcome and appreciate your rebuttals.
You say that as if I haven't enjoyed fleshing out the ideas and sharing them. By the way, right now I'm sharing with you lemmy's hidden citation feature :o)
Although, I was much happier replying to you before I just saw the downvotes you've apparently given me across the board. That's a bit poor behaviour on your part, you shouldn't downvote just because you disagree - and you can't even say that I'm wrong as a justification when the whole thing is being heavily debated and adjudicated over whether it is right or wrong.
I thought we were engaging in a positive manner, but apparently you've been spitting in my face.
[^1]: >but there's a good reason why this isn't the argument being put forward in the lawsuits.
[^2]: >the LLM could be considered a commissioned agent
[^3]: The LLM absolutely could be considered an agent, but the way it acts is merely prompted by the user. The actual behaviour is dictated by the organisation that built it. In any case, this is only my backup argument if you even consider the initial copying to be research - which it isn't.
Really and truly, this is not how this works. The exemptions granted by the office of the registrar are granting an exemption to copyright claims against fair uses. It isn't talking about whether the claim can be awarded damages, it's talking about the claim being exempt in entirety. You can think about copyright as an exemption to the first amendment right to free speech, and the exemption to copyright as describing where that 'right' does not apply. Copyright holders do not get to control the use of their work where fair use has been determined by the registrar, which is reconsidered every 3 years.
True enough, but it seems like it's important for your understanding in how copyright works.
I wasn't being pedantic, that distinction is important for how copyright is conceptualized. The AI model is the thing being considered for infringement, so it's important to note that the works being claimed within it do not exist as such within the model. The '3-d array' does not contain copyrighted works. You can think of it as a large metadata file, describing how to construct language as analyzed through the training data. The nature and purpose of the 'work' is night-and-day different from the works being claimed, and 'database' is a clear misrepresentation (possibly even intentionally so) of what it is.
That was exactly what you pivoted to in your comment here, i'm not sure why you're now saying you don't want to go down that avenue. I'm confused what you're arguing at this point.
I've down-voted your comments because they contain inaccuracies and could be misleading to others. You shouldn't let my grading of your comments reflect my attitude towards you; i'm sure you're a fine individual. Downvotes don't mean anything on Lemmy anyway, i'm not sure 'spitting in your face' is a fair or accurate description, but I don't want to invalidate your feelings, so I apologize for making you feel that way as that wasn't my intent.
No worries, you've been very respectable. My feelings weren't particularly hurt, I just felt the need to call it out.
Personally, I'm against downvoting things merely because they are wrong. If someone says something that's wrong, it may well be a commonly held misconception, and downvoting it also demotes any correction that has been given, which means other people who hold the misconception are less likely to be corrected.
And that's beside the fact that I don't really think I'm completely wrong here :o)
To be a little more specific, I don't want to go down the route of blaming AI itself for copyright infringement. That is to say, whether or not AI is bound by laws the way that humans are. I think it is only worthwhile considering whether the AI developer and/or the users are infringing copyright through their creation or use of AI. In particular, I think the legal or philosophical question of whether AI is affected by laws in the same way humans are is pointless when we're just talking about LLM's and not a true Artificial Intelligence.
Yes absolutely, the LLM itself does not include copyrighted works. That's not what I'm arguing. The two issues I take are with the database of information the LLM is trained on. This database does contain copyrighted works, AI developers admit that it does, but they claim it is fair use research. I disagree with this claim, to use the terminology from one of your links their "research" is not "scholarly" - it is commercial product development.
The other issue is that the LLM can reproduce copyrighted work. While I agree with you in some sense that the user of the LLM is instructing it to infringe copyright, and thus the user is responsible, in another sense I think the developer is also responsible because they have given the tool the capability to do this. This is perhaps not a strong argument, particularly when the developers have made efforts to fix these "bugs" as they come to light.
However my most important point is that the developers have infringed copyright by building a training database full of copyrighted works, which the LLM was then trained on. The LLM itself isn't copyright infringement, but they infringed copyright to develop it.