Ibrahim Ahmed

Locally running languagem models

(Updated: Thursday, Apr 9, 2026), Apr 5, 2026
Section: Home / Note / Llms


Using llama.cpp

1
llama-server -hf MODEL_NAME

Observed speeds on M5 pro (brew install llama.cpp):

Observed speeds on RTX 4080 (winget instal ggml.llamacpp):

Pointers:

Using huggingface transformers