01
LLM Cognition & Prompt Theory
How do we make prompts first-class engineering artefacts?
We study declarative prompt specification, lifecycle management, automatic optimisation, cost-quality routing, and persistent memory for LLM agents. The thesis: prompts deserve the same engineering rigour we apply to source code — explicit types, version control, portability, and reproducibility.
Methods
- Formal specification languages
- MIPROv2-based prompt optimisation
- Cost-quality routing
- Structured memory schemas
Topics this pillar covers
LLMprompt engineeringagent memorydeclarative promptingprompt specificationMIPROv2LLM routingagent protocolsMCPA2A
02
Safe & Verifiable Computing
What does it take to run AI-generated code safely, in production?
We study memory-safe language design for AI code generators, sandboxing for untrusted execution, NUMA-aware scheduling for memory-bound AI workloads, and embedded databases for AI workflows. The thesis: AI systems cannot be trusted in production without the same systems-engineering rigour we apply to aviation or medical software.
Methods
- Memory-safe language design
- Capability-based sandboxing
- NUMA topology awareness
- Embedded database research
- ANN vector search
Topics this pillar covers
memory-safe Cmemory-safe systemscode sandboxgVisorFirecrackerWASMNUMANUMA-aware schedulingSQLiteRocksDBvector databaseANNHNSWembedded databaseprogrammable database
03
Formal Optimisation & Decision Science
Can natural language reliably interface with mathematical solvers?
We study the bridge between human intent and formally provable solutions. We build pipelines that translate English problem descriptions into constraint-satisfaction problems, bandit algorithms that rank items with minimal human feedback, and compilers that turn visual specifications into verified executables. The thesis: pure LLM output cannot guarantee optimality; formal solvers can, and the two should compose.
Methods
- NL-to-CSP translation
- Multi-armed bandit algorithms
- Visual-to-code compilation
- Formal verification
Topics this pillar covers
constraint satisfactionSMT solverZ3OR-Toolsmulti-armed banditMAB rankingpairwise comparisonBradley-TerryTrueSkillPlackett-Lucetrading signal compilerquant DSLquantitative tradingsignal compilation
04
Edge Intelligence & On-Device AI
How much intelligence can live at the edge without any cloud dependency?
We study on-device LLM execution, mobile agent architectures, browser-extension LLM frameworks, and deliberative search. The thesis: privacy, latency, and cost constraints are pushing AI out of the data centre, and the engineering to make that work is research-worthy on its own.
Methods
- On-device LLM inference
- Mobile agent architectures
- Quantisation-aware deployment
- Browser-extension frameworks
- Deliberative search
Topics this pillar covers
on-device LLMedge AImobile AIFlutter LLMmobile agentautonomous agentdevice-side AIquantisationQ4Q5GGUFllama.cppbrowser extension AIdeliberative searchagentic search
05
Robotics & Autonomous Systems
How do autonomous agents reason, plan, and coordinate in the physical world?
We study the systems engineering that turns research code into reproducible robotics experiments. The thesis: a robotics benchmark that isn't deterministic isn't a benchmark — it's a marketing demo. We build simulators with byte-identical replay, Gymnasium-style RL interfaces, and instrumented delay attribution usable as a reward signal.
Methods
- Discrete-event simulation
- Deterministic RL benchmarks
- Causal delay attribution
- Permutation-equivariant policies
Topics this pillar covers
warehouse roboticsrobotic mobile fulfillment systemRMFSautonomous mobile robotAMRmulti-agent path findingMAPFdiscrete event simulationdeterministic simulationreinforcement learning benchmarkGymnasiumMaskablePPOtask allocationdispatchingKivaAmazon Robotics
Cross-cutting principles
These four principles apply to every pillar.
Open Science
Every hypothesis is a public repository. Every experiment is reproducible. Every finding is auditable.
Hypotheses as Software
We don't write papers that stay on shelves. We write code that runs in production. The codebase is the proof — runnable, testable, falsifiable.
Memory-Safe by Default
We choose Rust, Zig, and Go not for fashion but for falsifiability. Deterministic performance makes systems claims measurable.
Privacy as a Research Constraint
On-device inference and zero-trust architectures aren't add-ons — they're design constraints that shape better science.