this post was submitted on 22 Aug 2023
3 points (100.0% liked)

Technology

58759 readers
3875 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

top 11 comments
sorted by: hot top controversial new old
[–] Blapoo@lemmy.ml 1 points 1 year ago (1 children)

We have to distinguish between LLMs

  • Trained on copyrighted material and
  • Outputting copyrighted material

They are not one and the same

[–] Even_Adder@lemmy.dbzer0.com 0 points 1 year ago (2 children)

Yeah, this headline is trying to make it seem like training on copyrighted material is or should be wrong.

[–] scv@discuss.online 1 points 1 year ago

Legally the output of the training could be considered a derived work. We treat brains differently here, that's all.

I think the current intellectual property system makes no sense and AI is revealing that fact.

[–] TropicalDingdong@lemmy.world -1 points 1 year ago

I think this brings up broader questions about the currently quite extreme interpretation of copyright. Personally I don't think its wrong to sample from or create derivative works from something that is accessible. If its not behind lock and key, its free to use. If you have a problem with that, then put it behind lock and key. No one is forcing you to share your art with the world.

[–] Technoguyfication@lemmy.ml 0 points 1 year ago (1 children)

People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

[–] Teritz@feddit.de -1 points 1 year ago

Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

If they use my Works then they need to pay thats it.

[–] TropicalDingdong@lemmy.world -1 points 1 year ago (2 children)

Its a bit pedantic, but I'm not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as 'possessed' under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator's intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don't control how the idea is interpreted so its not really yours any more.

If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

[–] treefrog@lemm.ee 0 points 1 year ago (1 children)

If you sample someone else's music and turn around and try to sell it, without first asking permission from the original artist, that's copyright infringement.

So, if the same rules apply, as your post suggests, OpenAI is also infringing on copyright.

[–] TropicalDingdong@lemmy.world -1 points 1 year ago

If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

I think you completely and thoroughly do not understand what I'm saying or why I'm saying it. No where did I suggest that I do not understand modern copyright. I'm saying I'm questioning my belief in this extreme interpretation of copyright which is represented by exactly what you just parroted. That this interpretation is both functionally and materially unworkable, but also antithetical to a reasonable understanding of how ideas and communication work.

[–] Bogasse@lemmy.world 0 points 1 year ago (1 children)

Well, I'd consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is "they build original content", both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their "original content" is not derivated from copyrighted content 🤷

[–] TropicalDingdong@lemmy.world -1 points 1 year ago

Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

Yeah I suppose that's on them.