198
Microsoft sets 16GB default for RAM for AI PCs – machines will also need 40 TOPS of AI compute: Report
(www.tomshardware.com)
This is a most excellent place for technology news and articles.
Ok, I walked over to my PC to give you a working command line for llama.cpp. You need to make sure it is compiled with support for hipBLAS / ROCm which is the equivalent AMD framework to CUDA, if you want it to run on your GPU.
./main -ngl 24 -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -i -ins
This will put it into interactive mode so you can try to chat with it. Running on my GPU it cranks out almost 160 tokens per second, which is way faster than anyone can type. On CPU (-ngl 0) it will make 90 which is still fast. TinyLlama is not a great chatter and should be treated more as a prediction or answer engine. i.e:
It does know a surprising amount, considering it would fit on a CDROM