There is a new Qwen3 coder 30B-A3B that looks to be good, and people were talking about GLM4.5 32B, but I haven't used local models for code much so I can't give a good answer.
KrasnaiaZvezda
It’s not gonna be the full model like in this video but it’s still advanced enough for some tasks apparently.
Those models are the Qwen models finetuned by DeepSeek so no comparisson to the original DeepSeek V3 and R1 really. And considering how much Qwen has been releasing latelly I'd say anyone thinking about running the distilled versions you talked about might as well try the default Qwen ones as well, with Qwen 30B-A3B being very decent for older machines as it is a MoE with only 3B active parameters which can be quite fast and can probably fit Q4_k_m in some 20GBs RAM/VRAM (I can run Qwen3 30B-A3B-Instruct-UD-Q3_K_XL with 16GBs RAM and some offloaded to the SSD with SWAP at 8+ tokens/second).
Turn-A, Turn
Turn-A, Turn!
∀!
Error rates compound exponentially in multi-step workflows. 95% reliability per step = 36% success over 20 steps. Production needs 99.9%+.
My DevOps agent works precisely because it's not actually a 20-step autonomous workflow. It's 3-5 discrete, independently verifiable operations with explicit rollback points and human confirmation gates.
That's nice to know. I was just thinking of how to remove errors from an "agenda maintaining agent" using local LLMs and I was thinking of just passing everything done by me and seeing some numbers about things in use with better AIs than I have access to shows that it will indeed be important to avoid mistakes on my use case.
Nice of them to make even a 0.3B model, just too bad it was the only one that wasn't MoE. I've been wanting more small MoEs since Qwen 30B A3B.
First of all, DeepSeek-R1-0528-Qwen3-8B is not the Deepseek model people refer to when talking about Deepseek, so that's misleading to say the least. The actual Deepseek model is the 671B parameter model which they breafly mention but is not the main topic of the article as one would assume from the title. That model is really good, the best open source and one of the best in general, and it is possible to run locally, but requires some 200GB RAM/VRAM to run at the smallest qualities and 800GB+ RAM/VRAM if running at full quality.
As for the model the article is about and that you mentioned, it is based on the Qwen3-8B model which can be run in as little as ~5GB available RAM/VRAM quantized to q4_k_m, ie. it can be run on computers and even in some phones.
As for the target audience, anyone wanting privacy in their LLM uses or simply not paying for an API access for use in automation tasks or research. As this is a thinking version though it will take quite a few tokens to get to an answer, so it's better for people that have a GPU or those who simply need something more powerful locally sometimes.
Since it is only decaying through Beta minus decay (a neutron decaying into a proton that stays in the nucleus, an electron that can be ejected and an antineutrino that can pass through the Earth without interacting with anything) it probably depends on how well it can capture the electrons, ie. it kinda depends on how efficient it is.
I'm also curious about the power curve. I guess the power falls with time, following the half life of the element (100 years for Ni63, meaning about 70% left after 50 years), but it would be nice if the article talked more about it, like if the diamond semiconductor can last that long. It should also be interesting how future batteries could use many elements with many decay routes that could keep the power output closer to constant for very long.
So, does anyone know how this technology works? Does it use quantum entangled photons or is it something else?
At least part of it seems to be a "guard-model" identifying that the topic might be illegal or something like that and just stoping the main model from going further while saying "can't do that". Other Chinese models, from what I heard, might handle things like this better, although in DeepSeek's case they might have gone too far because they might have been under attack early on from bad faith people asking all sorts of "questions" about China and things like that, but I can't say for sure.
As the other person said, you can either try running the model yourself, or if you don't have a computer with at least 400GB RAM lying around you can look for other services hosting DeepSeek. I can't guarantee they won't have "guard-models"/censorship themselves but there should be some without it.
But another point to consider though is that DeepSeek was still trained on a lot of western ~~data~~ propaganda, so don't expect impartiallity from DS or any other model for that matter. We may still be a ways off of models that can actually understand their bias and correct it properly.
wouldn’t it just make more sense to rebuild the facility to be more efficient for cheaper and more sensible machinery
Yes, but that takes time and we need to have a good and mature enough tecnology for it to make at least some "financial" sense. So, if the humanoid robots can do work that humans can do then it means that we could try make/have AIs make new robots and new enviroments that work well toghether without consideration for human form.
Basically, from the first factory robots half a century ago until a few years back there wasn't that much of an advance that could allow robots to work alone in anything but a few cases. Now though the technology that allows for humanoid robots is the technology that allows non humanoid robots to work a lot better in basically all cases, so we will finally begin seeing more and more robots and infrastructure being built to take advantage of the new advances, at the same time we'll also get humanoid robots for other tasks, like some of the ones that involve working closelly with people...
I was never convinced by the argument that “Humanoid robots are better suited to working in environments that were built for humans”, like, WE are the ones who make these environments
And they aren't built for free so they were built to work, and now that the technology is allowing some of that to change it will gradually change.
The same way that people don't program on bits because of assembly, or on assembly because of C and such, or on C because of Python, why would people have to use Python (or whatever) if they can get help from AIs to do what they want?