LLM Cognition Python MIT

mullama

mullama is a research project exploring what a unified local LLM serving layer looks like when built from first principles. Rather than wrapping existing tools, it directly interfaces with llama.cpp to study model lifecycle management, inference scheduling, and API compatibility.

View on GitHub ↗ ← All Projects

Technologies

Primary use case

Unified local LLM serving in Python, with llama.cpp instrumented for research on scheduling and KV cache.

How it compares

mullama is one option in a category that includes Ollama, vLLM, LocalAI, LM Studio , and llama.cpp. Our Compare page has the full side-by-side.

Technologies

Primary use case

How it compares

Related Articles

Related Projects