promptel vs DSPy: Declarative Specification vs Programmatic Compilation

Why we get asked this question

Of all the comparisons we field, promptel vs DSPy is the one we hear most often. They both touch the same problem — making prompts first-class engineering artefacts rather than ad-hoc strings — but they attack it from different angles and arrive at very different artefacts. If you have an hour to evaluate, this post is designed to save you most of it.

The 30-second version: promptel is a specification language; DSPy is a Python framework. They are complementary, not competing. Most production teams we talk to end up using both.

What promptel is

promptel is a declarative, typed, portable prompt specification language. You write a prompt as a YAML file that declares the input schema, the output schema, the model parameters, the constraints, and the prompt body:

# summarise.prompt.yaml
kind: Prompt
version: "1.0"
name: summarise
description: "Produce a 3-bullet summary of input text."

input:
  type: object
  properties:
    text:
      type: string
      minLength: 50
      description: "The text to summarise."

output:
  type: object
  required: [bullets]
  properties:
    bullets:
      type: array
      items: { type: string, maxLength: 120 }
      minItems: 3
      maxItems: 3

models:
  - name: openai-gpt4o
    provider: openai
    model: gpt-4o
    temperature: 0.3
  - name: anthropic-sonnet
    provider: anthropic
    model: claude-3-5-sonnet
    temperature: 0.3

body: |
  You are a precise summariser.
  Produce exactly 3 bullets, each ≤ 120 chars.
  No preamble, no explanation.

That file is the artefact. It can be diffed, reviewed in a PR, versioned independently of the application code, type-checked against a schema, and run against any of the listed models without code changes. The companion tool, blogus, extracts embedded prompts from your existing codebase and converts them into promptel specifications.

What DSPy is

DSPy is a Python framework for programming — not specifying — LLM behaviour. You write a Python module that describes the computation as a sequence of LLM calls, and DSPy’s compiler (“teleprompter”) optimises the prompts and the few-shot examples empirically against a metric you provide:

import dspy

class Summarise(dspy.Signature):
    """Produce a 3-bullet summary of input text."""
    text: str = dspy.InputField()
    bullets: list[str] = dspy.OutputField()

summariser = dspy.ChainOfThought(Summarise)
teleprompter = dspy.MIPROv2(metric=summary_quality)
optimised = teleprompter.compile(
    summariser,
    trainset=eval_set,
    valset=val_set,
)

The output is also an artefact — but it’s a Python module with optimised prompts and demos baked in, not a declarative file. The artefact is what you ship, but it’s not what you diff, review, or hand to a non-engineer.

The five dimensions that matter

Dimension	promptel	DSPy
What you write	A YAML file	Python code
What the artefact is	A spec file checked into git	A compiled Python module
Who can review it	Engineers, PMs, legal — anyone who can read YAML	Engineers only
Where the optimisation happens	Outside the spec — the spec is a contract, the optimiser is separate	Inside the framework — the compiler runs at build time
Portability	Run on any model with a promptel runtime (currently JS, Python WIP)	Tied to Python and the DSPy runtime
Version control story	Git diff of a YAML file is the prompt diff	Git diff of a Python module whose prompts are templated strings is hard to read
Type safety	Schema-enforced inputs and outputs	Optional via `dspy.Signature`, but enforcement is weaker
Telemetry	Bring your own (we recommend perishable for token proxying)	Built-in DSPy tracing, but exporting requires work
Multi-provider	Native — same spec, different `models:` entries	Supported, but each provider needs an explicit configuration
Bootstrap / few-shot learning	Manual; you bring your own examples	Native — the teleprompter generates and selects demos
Cost optimisation	Bring your own (route-switch plugs in)	Native — the teleprompter optimises for cost as a metric

When to use which

Use promptel when:

The prompt is a contract — it crosses an organisational boundary (engineer ↔ PM, vendor ↔ customer) and needs to be diffable, reviewable, and portable.
You are writing prompts that downstream teams (or customers) will run against their own models. The spec is the API.
You are regulated (GDPR, EU AI Act, HIPAA, SOX) and need the prompt itself to be auditable. promptel + mpl gives you a tamper-evident prompt + audit trail combo.
You need the prompt to be portable across providers without rewriting the integration code.

Use DSPy when:

You are optimising prompts empirically against a metric and need the compiler to generate few-shot demos.
You are doing research on prompt optimisation itself and want to compare teleprompters (MIPROv2, COPRO, etc.).
Your prompt logic is programmatic — branches, retries, tool calls — and the spec is too rigid.
You are prototyping and want to iterate fast without writing schema files.

Use both when:

You want the spec in promptel (so it can be reviewed, diffed, audited) and the optimisation in DSPy (so the teleprompter can find the best prompts and demos for that spec). We do this internally: promptel is the artefact that ships, DSPy is the build tool that optimises the body: field before it is committed.

A workflow that uses both

The team at Skelf Research uses promptel for everything that crosses a human boundary, and DSPy for everything that benefits from automated optimisation. The pattern is:

Spec the prompt in promptel. Write the input schema, the output schema, the model list, the constraints. This is what the PM, the reviewer, and the auditor see.
Compile the body with DSPy. Use the teleprompter to find the best body and few-shot demos against a labelled eval set. The output of compilation is the content of the body: field.
Commit both. The promptel spec goes in prompts/summarise.prompt.yaml. The eval set, the teleprompter config, and the metric live in prompts/summarise.eval/ and are also checked in.
Run anywhere. The same promptel spec runs on OpenAI, Anthropic, or a local llama.cpp deployment, with no code changes.
Audit with mpl. If you need tamper-evident audit trails, run the spec through mpl-proxy and you get a cryptographic record of every invocation, its inputs, its outputs, and the model that served it.

Things people get wrong

They treat them as alternatives. They aren’t. promptel is the artefact; DSPy is the build tool.
They use DSPy for compliance. DSPy is great for optimisation but its audit story is weak. If you need compliance, use mpl on top of whatever prompts you ship, whether they came from promptel or DSPy.
They use promptel for prototyping. The schema overhead is real; for throwaway code, raw strings are fine. Reserve promptel for prompts that will outlive the afternoon.
They assume “declarative” means “simple”. promptel schemas can express complex things — JSON-Schema-style composition, conditional fields, format constraints. The first week is overhead; the second month is leverage.