this post was submitted on 17 Sep 2023
105 points (96.5% liked)
Futurology
1809 readers
169 users here now
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
@BaylorSwift3 this isn't about its output. This is about its input.
LLM companies are basically capturing and copying these entire books to use as training materials. In every other sphere, that would ostensibly financially benefit authors and publishers.
For example in my experience if you want students to read a few chapters of a text (up to 3) that's "fair use" but if you want them to read the whole thing then either they buy the textbook or your institution buys a digital license.
The point of the lawsuit is that having an AI does not legally allow companies to engage in what looks like pirating of training materials. It will be interesting to see the verdict.
Tagging @ogeist
In your example, since only one LLM is being trained, like one student, does OpenAI only need to buy one digital copy of the book at retail?
Even still, if I want to copy Agatha Christie's style, I can borrow a book from the library, train myself with it, then return it at no cost to myself. A hundred other people can do the same with that book, and the only cost burden was on that library's initial purchase. Does copyright protect the right to make copies, or does it dictate something else?
Among other things, yes. I think this is what this particilar lawsuit is about.
It will be really interesting to see whether they define an LLM as singular or plural/corporate. Those files (with hundreds of texts) seem to have been doing the rounds so it doesn't look like a single use to me. But I can also see the merit in your one AI = one student argument.
Re: your Agatha Christie example, not sure how that works in the US but in my country (New Zealand) if a book is in a library, then the author or publisher gets a certain yearly compensation payment based on how many copies are in how many libraries.
E-book licensing similarly has different costs based on how many "copies" ir simultaneous sessions a library is authorised to have.