this post was submitted on 19 Aug 2025
10 points (91.7% liked)

Free Open-Source Artificial Intelligence

3943 readers
1 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 2 years ago
MODERATORS
 

So my relevant hardware is:
GPU - 9070XT
CPU - 9950X3D
RAM - 64GB of DDR5

My problem is that I can't figure out how to get a local LLM to actually use my GPU, I tried Ollama with Deepseek R1 8b and it kind of vaguely ran while maxing out my CPU and completely ignoring the GPU.

While I'm here model suggestions would be good too, I'm currently looking for 2 use cases.

  • Something I can feed a document too and ask questions about that document (Nvidia used to offer this) To work as a kind of co-GM to quickly reference more obscure rules without having to hunt through the PDF.
  • Something more storytelling oriented that I can use to generate background for throwaway side NPCs when the players innevitably demand their life story after expertly dodging all the NPCs I actually wrote lore for.

Also just an unrelated asside, Deepseek R1 8b seems to just go into an infinite thought loop when you ask it the strawberry question which was kind of funny.

you are viewing a single comment's thread
view the rest of the comments
[–] vivendi@programming.dev 6 points 1 month ago (1 children)

llama.cpp

The Only Inference Engine You'll Ever Need™

[–] CheeseNoodle@lemmy.world 1 points 1 month ago* (last edited 1 month ago) (1 children)

I found this guide which seems very comprehensive but has a few sections where it assumes knowledge I don't have and doesn't suggest a clear route by which to gain said knowledge.

For the section just following "Grab the content of SmolLM2 1.7B Instruct" I assume it boils down to run this prior program called MSYS and run this command through it? "GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct"

[–] vivendi@programming.dev 3 points 1 month ago* (last edited 1 month ago)

That's for quanting a model yourself. You can instead (read that as "should") download an already quantized model. You can find quantized models from the HuggingFace page of your model of choice. (Pro tip: quants by Bartowski, Unsloth and Mradermacher are high quality)

And then you just run it.

You can also use Kobold.cpp or OpenWebUI as friendly front ends for llama.cpp

Also, to answer your question, yes.