this post was submitted on 01 Aug 2025
13 points (93.3% liked)

Technology

1244 readers
49 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
 

A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you'd need a dual socket EPYC server motherboard with 768GB of RAM.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] CriticalResist8@lemmygrad.ml 3 points 1 month ago (1 children)

ooh now I'm stressed I'm gonna have to download and try 20 different models to find out the one I like best haha. Do you know some that are good for coding tasks? I also do design stuff (the AI helps walk through the design thinking process with me), it's kind of a reasoning/logical task but it's also highly specialized.

[โ€“] KrasnaiaZvezda@lemmygrad.ml 3 points 1 month ago

There is a new Qwen3 coder 30B-A3B that looks to be good, and people were talking about GLM4.5 32B, but I haven't used local models for code much so I can't give a good answer.