Persistent Memory for Long-Running Agents

What happens when LLM agents need to remember across sessions — structured memory schemas, retrieval strategies, and the memory-context distinction.

Persistent Memory for Long-Running Agents

Most LLM agents are stateless. Each session starts from zero. The user re-explains their preferences, re-establishes context, re-provides background that was already discussed yesterday. The model has no deficit — it can process all of this again. But the user does, and the system is doing redundant work.

The problem is not that LLMs cannot handle long contexts. Modern models accept 128K or even 1M tokens. The problem is that context is ephemeral. When the session ends, the context window is discarded. The next session starts empty. Whatever the agent learned about the user, the project, or the domain during the previous interaction is gone.

memorg is our research project exploring what persistent, structured memory looks like for LLM agents. This post describes the problem space, the architectural choices we have made, and the open questions we are still working through.

Context vs. Memory

These terms are used loosely in the LLM literature, and the conflation causes confusion. We draw a sharp distinction:

Context is the information available to the model within a single inference call. It is the content of the context window — the system prompt, the conversation history, any retrieved documents, and the current user input. Context is ephemeral. It exists for the duration of one API call and is discarded afterwards.

Memory is the information that persists across inference calls and across sessions. It is stored externally to the model — in a database, a file, a vector store — and selectively loaded into context when relevant. Memory has a lifecycle independent of any single conversation.

The relationship between them is straightforward: memory is the persistent store; context is the active working set loaded from that store (plus the current conversation).

┌─────────────────────────────────────────┐
│              Memory Store               │
│  (persistent, structured, queryable)    │
│                                         │
│  ┌─────────┐ ┌─────────┐ ┌──────────┐  │
│  │ Facts   │ │Episodes │ │Procedures│  │
│  │ about   │ │ (past   │ │ (learned │  │
│  │ entities│ │sessions)│ │ workflows│  │
│  └─────────┘ └─────────┘ └──────────┘  │
│         │          │           │        │
│         └──────────┼───────────┘        │
│                    │ retrieval           │
│                    ▼                    │
│  ┌─────────────────────────────────┐    │
│  │        Context Window           │    │
│  │  (system prompt + retrieved     │    │
│  │   memories + conversation +     │    │
│  │   current query)                │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

This distinction matters because the design constraints are different. Context is bounded by the model’s context window and priced per token. Memory is bounded by storage and retrieval quality. Mixing them up leads to architectures that either stuff everything into context (expensive, eventually exceeds the window) or store nothing persistently (amnesic agents).

Memory Types

Not all memories are the same. memorg defines three categories, each with a different schema and retrieval strategy.

Semantic Memory: Facts and Knowledge

Semantic memories are declarative facts about the world, the user, or the domain. They are typically extracted from conversations and stored as structured records.

# Example semantic memory
type: semantic
entity: "Project Atlas"
attributes:
  description: "Internal tool for sales pipeline management"
  tech_stack: ["Python", "FastAPI", "PostgreSQL"]
  team_size: 4
  status: "active development"
source:
  session: "2026-03-15-session-042"
  turn: 12
confidence: 0.9
last_updated: 2026-03-15T14:22:00Z

Semantic memories are relatively stable. Once the agent learns that Project Atlas uses PostgreSQL, this fact persists until explicitly contradicted. They are retrieved by entity lookup or attribute matching — structured queries, not vector similarity.

Episodic Memory: Past Interactions

Episodic memories are records of past interactions — what happened, when, and in what context. They are the agent’s autobiography.

# Example episodic memory
type: episodic
session: "2026-03-15-session-042"
summary: "User asked for help debugging a connection pool 
  exhaustion issue in Project Atlas. Root cause was 
  unclosed connections in the batch import module. 
  Provided fix using context managers."
participants: ["user:dipankar"]
topics: ["debugging", "PostgreSQL", "connection-pooling"]
outcome: "resolved"
duration_minutes: 23
timestamp: 2026-03-15T14:00:00Z

Episodic memories are important for continuity. When a user says “remember that connection pool issue we fixed last week?” the agent needs to retrieve the relevant episode. They are retrieved by temporal queries (“last week”), topic matching (“connection pool”), or participant filtering.

Procedural Memory: Learned Workflows

Procedural memories encode patterns of action — things the agent has learned to do, either from explicit instruction or from observing successful interactions.

# Example procedural memory
type: procedural
name: "deploy-to-staging"
description: "User's preferred deployment workflow for Project Atlas"
steps:
  - "Run test suite with pytest -x"
  - "Build Docker image with tag :staging-{date}"
  - "Push to registry at registry.internal.io"
  - "Update Kubernetes deployment in staging namespace"
  - "Run smoke tests against staging.internal.io/health"
learned_from: ["session-038", "session-041"]
success_count: 3
last_used: 2026-03-14T09:15:00Z

Procedural memories reduce the need for users to re-explain workflows. Once the agent has learned a procedure, it can propose or execute it when the context is appropriate. They are retrieved by task matching — “deploy the app” triggers retrieval of the deployment procedure.

Storage Architecture

memorg uses a hybrid storage backend that matches the retrieval characteristics of each memory type.

Semantic memories are stored in a relational schema (SQLite in the current implementation). Entity-attribute relationships map naturally to tables, and structured queries (find all facts about Project Atlas, find all entities with status “active development”) are efficient.

Episodic memories are stored as documents with both structured metadata (timestamp, participants, topics) and a dense vector embedding of the summary text. This enables both structured queries (“sessions from last week”) and semantic search (“that time we discussed caching strategies”).

Procedural memories are stored as structured documents, indexed by the task descriptions they match. Retrieval uses a combination of keyword matching and embedding similarity on the task description.

┌─────────────────────────────────────┐
│           memorg Storage            │
├───────────┬───────────┬─────────────┤
│  SQLite   │  SQLite + │  SQLite +   │
│  (tables) │  vectors  │  vectors    │
│           │           │             │
│ Semantic  │ Episodic  │ Procedural  │
│ memories  │ memories  │ memories    │
└───────────┴───────────┴─────────────┘

We deliberately chose SQLite over a dedicated vector database. For the scale of data we are working with (thousands to tens of thousands of memories, not millions), SQLite with a vector extension (sqlite-vss) provides adequate performance without the operational overhead of a separate database process. This is a research tool; operational simplicity matters more than scale.

The Retrieval Problem

Storage is the easy part. Retrieval is where the interesting problems live.

Given an incoming user message and the current conversation, which memories should be loaded into context? Loading too many wastes context window budget and can actually degrade performance (the model attends to irrelevant memories instead of the current query). Loading too few means the agent behaves as if it has forgotten things the user expects it to remember.

memorg uses a three-stage retrieval pipeline:

Stage 1: Candidate generation. Cast a wide net. Extract entities and topics from the current conversation. Query each memory store for potentially relevant records. This stage optimises for recall — it is acceptable to retrieve irrelevant memories, but missing a relevant one is expensive.

Stage 2: Relevance scoring. Score each candidate memory against the current query using a lightweight cross-encoder or embedding similarity. This stage reduces the candidate set to a manageable size (typically 10-20 memories from an initial set of 50-100).

Stage 3: Budget allocation. Given the context window budget allocated to memory (we typically reserve 20-30% of the context window for retrieved memories), select the highest-scoring memories that fit within the budget. This is a knapsack problem — each memory has a relevance score and a token cost, and we maximise total relevance within the token budget.

User query + conversation


  ┌──────────────┐
  │  Candidate   │  50-100 candidates
  │  Generation  │  (entity lookup, topic match,
  │              │   temporal proximity, embedding search)
  └──────┬───────┘


  ┌──────────────┐
  │  Relevance   │  10-20 scored candidates
  │  Scoring     │  (cross-encoder or embedding similarity)
  └──────┬───────┘


  ┌──────────────┐
  │   Budget     │  5-10 memories loaded into context
  │  Allocation  │  (knapsack optimisation within
  │              │   token budget)
  └──────────────┘

Memory Formation

Retrieval assumes memories exist. How do they get created?

memorg extracts memories from conversations using a post-session processing pipeline. After a session ends (or periodically during long sessions), the pipeline:

  1. Identifies new facts. Scans the conversation for statements that constitute new information — user preferences, project details, decisions made. These become semantic memories.
  2. Generates an episode summary. Compresses the full session into a structured summary with topics, outcomes, and key decisions. This becomes an episodic memory.
  3. Detects procedures. Identifies sequences of actions that the user performed or requested repeatedly. When a pattern recurs across sessions, it is extracted as a procedural memory.
  4. Reconciles with existing memories. New facts may contradict existing semantic memories (the project switched from PostgreSQL to MySQL). The pipeline detects conflicts and updates the existing memory, preserving the history of changes.

The extraction pipeline itself uses an LLM — this is a bootstrapping problem where the agent’s memory formation depends on the same type of model that the agent uses for reasoning. We use a smaller, cheaper model for extraction (it is a well-structured task that does not require frontier-level reasoning) to keep costs manageable.

Forgetting

An often-overlooked aspect of memory is forgetting. Human memory decays, and there are good reasons to implement analogous mechanisms for agent memory.

Relevance decay. Memories that have not been retrieved in a long time are likely no longer relevant. memorg implements a soft decay function that reduces the retrieval score of memories as time since last access increases. Memories are not deleted — they become harder to retrieve unless directly queried.

Contradiction resolution. When a new fact contradicts an existing memory, the old memory is not deleted but marked as superseded. The new fact takes precedence in retrieval, but the history is preserved for cases where the user asks “what did we used to think about X?”

Capacity management. At sufficient scale, the memory store becomes too large for efficient retrieval. memorg implements a consolidation process (inspired by memory consolidation in cognitive science) that periodically merges related episodic memories into higher-level summaries. Five sessions about debugging Project Atlas become one consolidated memory about common debugging patterns in that project.

Open Questions

Several problems remain unsolved in our current implementation.

Privacy and access control. In multi-user or multi-tenant settings, whose memories does the agent access? memorg currently assumes a single user. Extending to shared memory spaces (team knowledge) while respecting access boundaries is an open design problem.

Memory accuracy. The extraction pipeline can make mistakes — misattributing a fact, summarising a session incorrectly, detecting a procedure that was actually a one-off. We do not currently have a systematic way to validate memory accuracy beyond user correction.

Evaluation. How do you measure whether a memory system is working? We track retrieval precision and recall against human judgments, but the ground truth is expensive to establish. The ultimate metric is user satisfaction — does the agent feel like it remembers? — which is subjective and hard to automate.

Cross-agent memory. When multiple agents share a memory store, consistency becomes a concern. Two agents may extract contradictory facts from different conversations with the same user. memorg does not currently handle multi-writer scenarios.

Conclusion

Persistent memory transforms agents from stateless tools into something closer to collaborators. The user does not need to re-establish context every session. The agent accumulates knowledge about the user’s projects, preferences, and workflows. Over time, the agent becomes more useful — not because the underlying model improves, but because the memory layer provides increasingly relevant context.

memorg is our attempt to build this layer with the right abstractions: typed memory categories, hybrid retrieval, explicit formation and forgetting mechanisms. It is a research project, not a finished product, and the open questions outnumber the answered ones. But the core proposition — that memory and context are distinct concerns requiring distinct architectures — is one we are confident in.

The code is available at github.com/Skelf-Research/memorg.