this post was submitted on 01 Aug 2025
13 points (93.3% liked)

Technology

1251 readers
45 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
 

A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you'd need a dual socket EPYC server motherboard with 768GB of RAM.

you are viewing a single comment's thread
view the rest of the comments
[–] KrasnaiaZvezda@lemmygrad.ml 3 points 1 month ago (1 children)

It’s not gonna be the full model like in this video but it’s still advanced enough for some tasks apparently.

Those models are the Qwen models finetuned by DeepSeek so no comparisson to the original DeepSeek V3 and R1 really. And considering how much Qwen has been releasing latelly I'd say anyone thinking about running the distilled versions you talked about might as well try the default Qwen ones as well, with Qwen 30B-A3B being very decent for older machines as it is a MoE with only 3B active parameters which can be quite fast and can probably fit Q4_k_m in some 20GBs RAM/VRAM (I can run Qwen3 30B-A3B-Instruct-UD-Q3_K_XL with 16GBs RAM and some offloaded to the SSD with SWAP at 8+ tokens/second).

[–] CriticalResist8@lemmygrad.ml 3 points 1 month ago (1 children)

ooh now I'm stressed I'm gonna have to download and try 20 different models to find out the one I like best haha. Do you know some that are good for coding tasks? I also do design stuff (the AI helps walk through the design thinking process with me), it's kind of a reasoning/logical task but it's also highly specialized.

[–] KrasnaiaZvezda@lemmygrad.ml 3 points 1 month ago

There is a new Qwen3 coder 30B-A3B that looks to be good, and people were talking about GLM4.5 32B, but I haven't used local models for code much so I can't give a good answer.