Tag
2 articles tagged "inference"
A post-mortem on building a local LLM serving layer — llama.cpp integration, model management, and where existing tools constrain research.
What happens when you run a full LLM on mobile hardware with zero cloud dependency — memory, latency, and model quality on consumer devices.