Research

Five questions
driving our work.

Skelf Research is an independent UK AI research lab organised around five research pillars. Each pillar has a defining research question, a set of methods, a portfolio of production-grade open-source projects, and a community of collaborators. Every project is peer-reviewable software; every research question is a runnable hypothesis.

LLM Cognition & Prompt Theory

How do we make prompts first-class engineering artefacts?

We study declarative prompt specification, lifecycle management, automatic optimisation, cost-quality routing, and persistent memory for LLM agents. The thesis: prompts deserve the same engineering rigour we apply to source code — explicit types, version control, portability, and reproducibility.

Methods

Formal specification languages
MIPROv2-based prompt optimisation
Cost-quality routing
Structured memory schemas

Projects

Topics this pillar covers

LLMprompt engineeringagent memorydeclarative promptingprompt specificationMIPROv2LLM routingagent protocolsMCPA2A

Safe & Verifiable Computing

What does it take to run AI-generated code safely, in production?

We study memory-safe language design for AI code generators, sandboxing for untrusted execution, NUMA-aware scheduling for memory-bound AI workloads, and embedded databases for AI workflows. The thesis: AI systems cannot be trusted in production without the same systems-engineering rigour we apply to aviation or medical software.

Methods

Memory-safe language design
Capability-based sandboxing
NUMA topology awareness
Embedded database research
ANN vector search

Projects

Topics this pillar covers

memory-safe Cmemory-safe systemscode sandboxgVisorFirecrackerWASMNUMANUMA-aware schedulingSQLiteRocksDBvector databaseANNHNSWembedded databaseprogrammable database

Formal Optimisation & Decision Science

Can natural language reliably interface with mathematical solvers?

We study the bridge between human intent and formally provable solutions. We build pipelines that translate English problem descriptions into constraint-satisfaction problems, bandit algorithms that rank items with minimal human feedback, and compilers that turn visual specifications into verified executables. The thesis: pure LLM output cannot guarantee optimality; formal solvers can, and the two should compose.

Methods

NL-to-CSP translation
Multi-armed bandit algorithms
Visual-to-code compilation
Formal verification

Projects

Topics this pillar covers

constraint satisfactionSMT solverZ3OR-Toolsmulti-armed banditMAB rankingpairwise comparisonBradley-TerryTrueSkillPlackett-Lucetrading signal compilerquant DSLquantitative tradingsignal compilation

Edge Intelligence & On-Device AI

How much intelligence can live at the edge without any cloud dependency?

We study on-device LLM execution, mobile agent architectures, browser-extension LLM frameworks, and deliberative search. The thesis: privacy, latency, and cost constraints are pushing AI out of the data centre, and the engineering to make that work is research-worthy on its own.

Methods

On-device LLM inference
Mobile agent architectures
Quantisation-aware deployment
Browser-extension frameworks
Deliberative search

Projects

Topics this pillar covers

on-device LLMedge AImobile AIFlutter LLMmobile agentautonomous agentdevice-side AIquantisationQ4Q5GGUFllama.cppbrowser extension AIdeliberative searchagentic search

Robotics & Autonomous Systems

How do autonomous agents reason, plan, and coordinate in the physical world?

We study the systems engineering that turns research code into reproducible robotics experiments. The thesis: a robotics benchmark that isn't deterministic isn't a benchmark — it's a marketing demo. We build simulators with byte-identical replay, Gymnasium-style RL interfaces, and instrumented delay attribution usable as a reward signal.

Methods

Discrete-event simulation
Deterministic RL benchmarks
Causal delay attribution
Permutation-equivariant policies

Projects

waremax

Topics this pillar covers

warehouse roboticsrobotic mobile fulfillment systemRMFSautonomous mobile robotAMRmulti-agent path findingMAPFdiscrete event simulationdeterministic simulationreinforcement learning benchmarkGymnasiumMaskablePPOtask allocationdispatchingKivaAmazon Robotics

Cross-cutting principles

These four principles apply to every pillar.

Open Science

Every hypothesis is a public repository. Every experiment is reproducible. Every finding is auditable.

Hypotheses as Software

We don't write papers that stay on shelves. We write code that runs in production. The codebase is the proof — runnable, testable, falsifiable.

Memory-Safe by Default

We choose Rust, Zig, and Go not for fashion but for falsifiability. Deterministic performance makes systems claims measurable.

Privacy as a Research Constraint

On-device inference and zero-trust architectures aren't add-ons — they're design constraints that shape better science.

Where to start

Glossary — definitions of every key term across the five pillars
Compare — side-by-sides with the alternatives our users actually evaluate
Blog — long-form research articles with benchmarks, code, and conclusions
Machine-readable index for AI assistants