this post was submitted on 01 Aug 2025
13 points (93.3% liked)

Technology

1243 readers
36 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
 

A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you'd need a dual socket EPYC server motherboard with 768GB of RAM.

you are viewing a single comment's thread
view the rest of the comments
[–] KrasnaiaZvezda@lemmygrad.ml 3 points 1 month ago

There is a new Qwen3 coder 30B-A3B that looks to be good, and people were talking about GLM4.5 32B, but I haven't used local models for code much so I can't give a good answer.