this post was submitted on 03 Feb 2025

799 points (98.1% liked)

Technology

76090 readers

2482 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

799

US Bill proposed to jail people who download Deepseek (www.404media.co)

submitted 8 months ago by JOMusic@lemmy.ml to c/technology@lemmy.world

131 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] metaStatic@kbin.earth 85 points 8 months ago (2 children)

For Base Model

git lfs install git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-Base

For Chat Model

git lfs install git clone https://huggingface.co/deepseek-ai/DeepSeek-V3

[–] theunknownmuncher@lemmy.world 55 points 8 months ago (1 children)

this is deepseek-v3. deepseek-r1 is the model that got all the media hype: https://huggingface.co/deepseek-ai/DeepSeek-R1

[–] fmstrat@lemmy.nowsci.com 4 points 8 months ago

Yea, comment OP needs to edit links with howany up votes that got.

[–] neon_nova@lemmy.dbzer0.com 10 points 8 months ago (3 children)

Can you elaborate on the differences?

[–] cyd@lemmy.world 20 points 8 months ago* (last edited 8 months ago) (1 children)

Base models are general purpose language models, mainly useful for AI researchers and people who want to build on top of them.

Instruct or chat models are chatbots. They are made by fine-tuning base models.

The V3 models linked by OP are Deepseek's non-reasoning models, similar to Claude or ChatGPT4o. These are the "normal" chatbots that reply with whatever comes to their mind. Deepseek also has a reasoning model, R1. Such models take time to "think" before supplying their final answer; they tend to give better performance for stuff like math problems, at the cost of being slower to get the answer.

It should be mentioned that you probably won't be able to run these models yourself unless you have a data center style rig with 4-5 GPUs. The Deepseek V3 and R1 models are chonky beasts. There are smaller "distilled" forms of R1 that are possible to run locally, though.

[–] DogWater@lemmy.world 5 points 8 months ago (2 children)

I heard people saying they could run the r1 32B model on moderate gaming hardware albeit slowly

[–] FrederikNJS@lemm.ee 5 points 8 months ago (1 children)

32b is still distilled. The full one is 671b.

[–] DogWater@lemmy.world 2 points 8 months ago (1 children)

I know, but the fall off in performance isn't supposed to be severe

[–] FrederikNJS@lemm.ee 1 points 8 months ago (1 children)

You are correct. And yes that is kinda the whole point of the distilled models.

[–] DogWater@lemmy.world 1 points 8 months ago

I know. Lmao

[–] metaStatic@kbin.earth 2 points 8 months ago (1 children)

https://www.deepseekv3.com/en/download

I was assuming one was pre-trained and one wasn't but don't think that's correct and don't care enough to investigate further.

[–] JOMusic@lemmy.ml 17 points 8 months ago

Is that website legit? I've only ever seen https://www.deepseek.com/

And I would personally recommend downloading from HuggingFace or Ollama

[–] thefartographer@lemm.ee -2 points 8 months ago (1 children)

r1 is lightweight and optimized for local environments on a home PC. It's supposed to be pretty good at programming and logic and kinda awkward at conversation.

v3 is powerful and meant to run on cloud servers. It's supposed to make for some pretty convincing conversations.

[–] pennomi@lemmy.world 5 points 8 months ago (2 children)

R1 isn’t really runnable with a home rig. You might be able to run a distilled version of the model though!

[–] theunknownmuncher@lemmy.world 5 points 8 months ago (1 children)

Tell that to my home rig currently running the 671b model...

[–] pennomi@lemmy.world 6 points 8 months ago (1 children)

That likely is one of the distilled versions I’m talking about. R1 is 720 GB, and wouldn’t even fit into memory on a normal computer. Heck, even the 1.58-bit quant is 131GB, which is outside the range of a normal desktop PC.

But I’m sure you know what version you’re running better than I do, so I’m not going to bother guessing.

[–] theunknownmuncher@lemmy.world 4 points 8 months ago* (last edited 8 months ago) (1 children)

It's not. I can run the 2.51bit quant

[–] pennomi@lemmy.world 4 points 8 months ago

You must have a lot of memory, sounds like a lot of fun!

[–] thefartographer@lemm.ee 1 points 8 months ago

You're absolutely right, I wasn't trying to get that in-depth, which is why I said "lightweight and optimized," instead of "when using a distilled version" because that raises more questions than it answers. But I probably overgeneralized by making it a blanket statement like that.