this post was submitted on 28 Oct 2025

40 points (87.0% liked)

Linux

13757 readers

29 users here now

Welcome to c/linux!

Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!

Rules:

Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.
Be respectful: Treat fellow community members with respect and courtesy.
Quality over quantity: Share informative and thought-provoking content.
No spam or self-promotion: Avoid excessive self-promotion or spamming.
No NSFW adult content
Follow general lemmy guidelines.

founded 2 years ago

MODERATORS

MigratingtoLemmy@lemmy.world

Anyone messed with local cli LLMs? (sh.itjust.works)

submitted 3 days ago by bridgeenjoyer@sh.itjust.works to c/linux@lemmy.world

23 comments fedilink hide all child comments

Let me preface by saying I despise corpo llm use and slop creation. I hate it.

However, it does seem like it could be an interesting helpful tool if ran locally in the cli. I've seen quite a few people doing this. Again, it personally makes me feel like a lazy asshole when I use it, but its not much different from web searching commands every minute (other than that the data used in training it is obtained by pure theft).

Have any of you tried this out?

top 23 comments

sorted by: hot top controversial new old

[–] possiblylinux127@lemmy.zip 1 points 1 day ago

https://flathub.org/en/apps/com.jeffser.Alpaca

[–] drkt@scribe.disroot.org 26 points 3 days ago

You don't have to apologize for experimenting and playing with your computer.
https://github.com/ggml-org/llama.cpp
https://ollama.com/

[–] alecsargent@lemmy.zip 11 points 2 days ago* (last edited 2 days ago) (2 children)

I've run several LLM's with Ollama (locally) and I have to say that is was fun but it is not worth it at all. It does get many answers right but it does not even come close to compensate the amount of time spent on generating bad answers and troubleshooting those. Not to mention the amount of energy the computer is using.

In the end I just rather spent my time actually learning the thing I'm supposed to solve or just skim through documentation if I just want the answer.

[–] possiblylinux127@lemmy.zip 2 points 1 day ago (1 children)

I have had really good luck with Alpaca which uses Ollama

Gemma3 has been great

[–] alecsargent@lemmy.zip 1 points 1 day ago

Alpaca is the GTK client of Ollama right? I used it for a while to let my family have a go at local LLM's. It was very nice for them but on my computer it ran significantly slower than what they expected so that's that.

[–] bridgeenjoyer@sh.itjust.works 5 points 2 days ago (1 children)

This has been my experience with llms in my day to day job. Thank you for comment

[–] alecsargent@lemmy.zip 2 points 2 days ago

thank you as well

[–] palordrolap@fedia.io 9 points 2 days ago (1 children)

I've bounced a few ideas off the limited models currently provided for free online by DuckDuckGo, but I don't think I have the space or RAM to be able to run anything remotely as grand on my own computer.

Also, by the by, I find that the lies that LLMs tell can be incredibly subtle, so I tend to avoid asking them about anything I know nothing about, so that when they lie about the things I do know about, I can gauge how wrong they might be about other things.

[–] _cryptagion@anarchist.nexus 1 points 2 days ago

You almost certainly have the space, and as for RAM you’ll be running the LLM on your GPU. There are models that work fine on a mobile phone, so I’m sure you could find one that would work well on your PC, even if it’s a laptop.

[–] nagaram@startrek.website 10 points 3 days ago (1 children)

Playing with it locally is the best way to do it.

Ollama is great and believe it or not I think Googles Gemma is the best for local stuff right now.

[–] harmbugler@piefed.social 2 points 1 day ago

Agree, Gemma is the best performing model on my 12GB VRAM.

[–] fruitycoder@sh.itjust.works 3 points 2 days ago

I use continue on really simple configs and scripts. Rule of thumb, you can't "correct" an AI, it does not "learn" from dialogue. Sometimes some more context my generate a better output but will keep doing what is annoying you.

[–] domi@lemmy.secnd.me 7 points 2 days ago

I'm running gpt-oss-120b and glm-4.5-air locally in llama.cpp.

It's pretty useful for shell commands and has replaced a lot of web searching for me.

The smaller models (4b, 8b, 20b) are not all that useful without providing them data to search through (e.g. via RAG) and even then, they have a bad "understanding" of more complicated prompts.

The 100b+ models are much more interesting since they have a lot more knowledge in them. They are still not useful for very complicated tasks but they can get you started quite quickly with regular shell commands and scripts.

The catch: You need about 128GB of VRAM/RAM to run these. The easiest way to do this locally is to either get a Strix Halo mini PC with 128GB VRAM or put 128GB of RAM in a server/PC.

[–] tal@lemmy.today 3 points 2 days ago* (last edited 2 days ago) (1 children)

If by "CLI"", you just mean "terminal", I've used ellama in emacs as a frontend to ollama and llama.cpp. Emacs, can run on a terminal, and that's how I use it.

If you specifically want "CLI", I'm sure that there are CLI clients out there. Be almost zero functionality, though.

Usually a local LLM server, what does the actual computation, is a faceless daemon, has clients talk to it over HTTP.

EDIT: llama-cli can run on the commandline for a single command and does the computation itself. It'll probably have a lot of overhead, though, if you're running a bunch of queries in a row

the time to load a model is significant.

[–] shalafi@lemmy.world 2 points 1 day ago (1 children)

What's the difference in a command line interface and a terminal?

[–] tal@lemmy.today 2 points 1 day ago

If you're being rigorous, a "CLI" app is a program that one interacts with entirely from a shell command line. One types the command and any options in (normally) a single line in bash or similar. One hits enter, the program runs, and then terminates.

On a Linux system, a common example would be ls.

Some terminal programs, often those that use the curses/ncurses library, are run, but then one can also interact with them in other ways. This broader class of programs is often called something like "terminal-based" "console-based", or "text-based`, and called "TUI" programs. One might press keys to interact with them while they run, but it wouldn't necessarily be at a command line. They might have menu-based interfaces, or use various other interfaces.

On a Linux system, some common examples might be nano, mc, nmtui or top.

nmtui and nmcli are actually a good example of the split. nmcli is a client for Network Manager that takes some parameters, runs, prints some output, and terminates. nmtui runs in a terminal as well, but one uses it theough a series of menus.

[–] danhab99@programming.dev 2 points 2 days ago

I love sidogen/aichat a lot. It's really intuitive and easy to put in bash scripts.

[–] queerlilhayseed@piefed.blahaj.zone 3 points 2 days ago

Sure have. LLMs aren't intrinsically bad, they're just overhyped and used to scam people who don't understand the technology. Not unlike blockchains. But they are quite useful for doing natural language querying of large bodies of text. I've been playing around with RAG trying to get a model tuned to a specific corpus (e.g. the complete works of William Shakespeare, or the US Code of Laws) to see if it can answer conceptual questions like "where are all the instances where a character dies offstage?" or "can you list all the times where someone is implicitly or explicitly called a cuckold?" And sure they get stuff wrong but it's pretty cool that they work as well as they do.

[–] arcayne@lemmy.today 2 points 2 days ago

Lowest barrier of entry would be to run a coder model (e.g. Qwen2.5-Coder-32B) on Ollama and interface with it via OpenCode. YMMV when it comes to which specific model will meet your needs and work best with your hardware, but Ollama makes it easy to bounce around and experiment.

[–] dariusj18@lemmy.world 2 points 2 days ago* (last edited 2 days ago)

Check out LM Studio and/or Anything LLM for quick local experimenting.

[–] 30p87@feddit.org 2 points 3 days ago* (last edited 3 days ago)

ollama

[–] Sxan@piefed.zip -4 points 2 days ago (1 children)

I tried, once. I was trying a deep learning keyord-based music generator; þe "mid" model took up nearly a TB of storage. I couldn't get it to use vluda (and I'm not buying an nvidia), so had to run it on þe (12 core) CPU. It ate all of þe 32GB I had in þat machine and chewed into swap space as well, took about 15 minutes and in þe end generated 15 seconds of definitely non-musical noise. Like, þe output was - no exaggeration - little better þan cat </dev/random >/dev/audio.

Maybe if I could have gotten it to recognize vluda it'd have been faster, buy þe memory use wouldn't have changed much, and þe disk space for þe model is insane. Ultimately, I don't care nearly enough to make þat amount of commitment.

[–] shalafi@lemmy.world 1 points 1 day ago

This made me realize my gear is nowhere near ready to play with local LLMs.