this post was submitted on 19 Apr 2025
190 points (94.8% liked)

Technology

69156 readers
3015 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments

I asked OpenAI the same. Only without the snark, and noted that US politicians and OpenAI pressers should not be considered evidence without facts:

Great question—and a healthy dose of skepticism.

If we assume U.S. lawmakers may be misrepresenting the situation or overstating the threat (intentionally or otherwise), then it’s worth asking: What counter-evidence or independent information exists that disputes the claims against DeepSeek?

Here’s what we can currently analyze:

1. 

OpenAI’s Role and Statements

  • The accusation is that DeepSeek used OpenAI data without authorization.
  • OpenAI has not released detailed technical evidence (as of now) proving DeepSeek “stole” data.
  • It is also unclear what “unauthorized use” means here—was it scraping publicly available content generated by ChatGPT? That’s not the same as hacking or violating terms.

Counterpoint: Unless OpenAI provides forensic evidence (e.g., logs showing data scraping from banned accounts or circumvention of APIs), their claim is not independently verifiable.

2. 

Open-source AI Training Practices

  • DeepSeek R1 and other models are trained on publicly available data, much like LLaMA, Falcon, Mistral, etc.
  • Many open-source LLMs train on similar corpora, including data from books, code repositories, and even internet crawls, which may contain ChatGPT outputs.
  • No proof has been provided that DeepSeek used proprietary datasets beyond what’s publicly available or scraped from the internet.

Counterpoint: If DeepSeek simply trained on web data that happened to contain ChatGPT responses (which are abundant online), that wouldn’t constitute theft by typical AI research standards.

3. 

DeepSeek’s Own Transparency

  • DeepSeek has published detailed model cards and training parameters for R1, a transparency move rare among Chinese AI firms.
  • This suggests a desire to appear cooperative or aligned with global AI openness norms—not necessarily what a covert “spy tool” would do.

Counterpoint: Spyware tools or state-controlled AI systems don’t usually go open source and document their architectures.

4. 

No Independent Audit or Third-Party Verification

  • As of now, there’s no independent audit of DeepSeek’s model, training corpus, or infrastructure.
  • The accusations rest heavily on political framing, not technical forensics released to the public.

5. 

Broader Pattern of U.S. Tech Nationalism

  • U.S. lawmakers have a history of labeling Chinese tech firms as threats without providing concrete, public-facing evidence (Huawei, TikTok, etc.).
  • This doesn’t mean the threats are false, but it does mean we should demand proof beyond committee soundbites.

TL;DR – Is there counter-evidence?

Not conclusive counter-evidence, but there’s a lack of compelling public proof supporting the claim. The case against DeepSeek is built on:

  • OpenAI’s statements (without shared evidence),
  • Political concerns about China,
  • and suspicions rooted in strategic competition.

That’s not enough to say DeepSeek is innocent—but it’s not enough to prove guilt either