Ephemeral Credentials and Zero-Trust AI: Rethinking API Security

Somewhere in your infrastructure, an AI agent holds an API key. It might be an OpenAI key, an Anthropic key, a database connection string, or a cloud service credential. It was provisioned months ago, probably by copying it into an environment variable or a secrets manager. It has broad permissions — more than the agent needs for any single task, because scoping permissions precisely was too much friction at the time. It does not expire. Nobody has audited its usage in weeks.

This is the default state of credential management in AI systems today, and it is a security incident waiting to happen.

The Problem With Long-Lived API Keys

Traditional API key management was designed for a world where software was relatively static. A server application gets a database credential at deployment time and uses it for the lifetime of the process. The trust model is simple: the application is trusted, the credential grants access, and access is binary — you either have it or you do not.

AI agents break every assumption in this model.

Agents are dynamic. An agent’s behaviour is determined at runtime by the combination of its instructions, the user’s input, and the model’s interpretation. You cannot predict at deployment time what resources an agent will need to access or what operations it will perform. A coding agent might need to read a file, execute a shell command, call an API, and create a pull request — all within a single user interaction, with the specific sequence determined by an LLM.

Agents spawn sub-agents. Modern agentic architectures involve delegation. A planning agent dispatches tasks to specialised sub-agents, each of which may need its own set of credentials. If the parent agent passes its own credential to the child, you have credential sharing with no audit trail. If the child agent creates its own credential, you need a provisioning mechanism that operates at the speed of agent execution — milliseconds, not minutes.

Agents are exposed to adversarial input. Every agent that processes user input or external data is subject to prompt injection. A compromised agent with a long-lived, broadly-scoped credential is not just a malfunctioning program — it is a credential theft vector. The blast radius of a successful injection attack is bounded by the permissions of the credential the agent holds.

Agents are hard to audit. When a traditional application makes an API call, the call is predictable and the audit trail is straightforward. When an LLM agent makes an API call, the decision to make that call was made by a neural network operating on natural language input. Understanding why a particular call was made requires reconstructing the agent’s reasoning, not just inspecting the call log.

The Zero-Trust Response

Zero-trust architecture is built on a simple principle: never trust, always verify. Applied to AI credential management, this translates to four concrete requirements.

Time-Limited Tokens

Every credential issued to an agent should have an expiration time proportional to the task it needs to perform. A summarisation agent that processes a single document needs a credential that lives for seconds, not days. If the agent is compromised after the task completes, the credential is already dead.

perishable implements this as a token proxy. The agent does not receive the underlying API key. Instead, it receives a proxy token that is valid for a configurable duration — thirty seconds, five minutes, one hour, depending on the expected task length. The proxy token maps to the real credential on the server side, and the mapping is destroyed when the token expires.

Agent requests token:  { scope: "llm:complete", ttl: 60 }
Proxy issues token:    { token: "prs_a8f3...", expires: "2026-03-25T14:01:00Z" }
Agent uses token:      POST /v1/completions  Authorization: Bearer prs_a8f3...
Proxy forwards:        POST /v1/completions  Authorization: Bearer sk-real-key...
Token expires:         prs_a8f3... is now invalid, regardless of outcome

The critical property: the real API key never leaves the proxy. The agent operates with a derivative credential that is useless after expiration.

Scoped Permissions

A token should grant the minimum permissions necessary for the task at hand. An agent that needs to generate text completions should not also have permission to fine-tune models, list billing information, or delete resources.

perishable implements scoping at the proxy level. Each token is issued with an explicit scope declaration that restricts which API endpoints the token can access and what operations it can perform. The scope is enforced by the proxy before the request is forwarded to the underlying API.

scopes:
  llm:complete:
    endpoints: ["/v1/completions", "/v1/chat/completions"]
    methods: ["POST"]
    rate_limit: 100/minute
  llm:embed:
    endpoints: ["/v1/embeddings"]
    methods: ["POST"]
    rate_limit: 500/minute
  storage:read:
    endpoints: ["/v1/files/*"]
    methods: ["GET"]

A token issued with scope llm:complete cannot access the embeddings endpoint, cannot read files, and cannot perform any operation outside its declared scope — even if the underlying API key has full permissions. The scope is a ceiling, not a floor.

Auditable Usage

Every token usage event is logged with the token identity, the requested operation, the timestamp, and the outcome. Because tokens are short-lived and scoped, the audit log has structure: you can see exactly what each agent did during its credential lifetime, what it attempted to do outside its scope (denied requests are logged), and when the credential expired.

This is substantially more useful than auditing a long-lived API key. When a single key is shared across multiple agents and processes, the audit log is a flat stream of requests with no way to attribute specific calls to specific agents. When each agent operation gets its own scoped token, the log naturally segments by agent, by task, and by time window.

perishable’s audit log includes:

Token lifecycle events: issued, used, expired, revoked
Request details: endpoint, method, response status, latency
Scope enforcement: which requests were allowed, which were denied and why
Token lineage: which parent token (if any) authorised the creation of this token

The lineage tracking is particularly important for agent hierarchies. When a planning agent delegates to sub-agents and each sub-agent receives its own token, the audit trail preserves the delegation chain. You can trace a specific API call back through the sub-agent, the parent agent, and the original user request.

Instant Revocation

If an agent is compromised or behaving unexpectedly, its credential can be revoked immediately. This is trivial with proxy tokens — the proxy simply deletes the token mapping — but it is nearly impossible with traditional API keys, where revocation means rotating the key and updating every system that depends on it.

perishable supports both individual token revocation and bulk revocation by scope, by issuing agent, or by time window. If you detect anomalous behaviour from a particular agent, you can revoke all tokens issued to that agent in a single operation, without affecting any other agent’s credentials.

Comparison With Traditional Approaches

The standard approach to API key management in AI systems is one of three patterns:

Environment variables. The API key is set as an environment variable and read by the application at startup. Every process in the environment has access. There is no scoping, no expiration, and no per-request auditing.

Secrets managers. The API key is stored in a service like AWS Secrets Manager or HashiCorp Vault. The application retrieves it at startup or on demand. This is better — the key is encrypted at rest and access is logged — but the retrieved key is still long-lived, broadly scoped, and shared across requests.

Key rotation. The API key is rotated on a schedule — daily, weekly, monthly. This limits the window of exposure if a key is leaked, but does not address scoping, per-request auditing, or the agent delegation problem. And rotation frequency is limited by the overhead of updating all dependents.

The proxy token model that perishable implements is orthogonal to all three. You still store your real API keys in a secrets manager. You still rotate them on a schedule. But no agent ever touches the real key. Agents operate exclusively through proxy tokens that are scoped, time-limited, auditable, and revocable.

Why This Matters More in an Agentic World

The security properties described above are valuable for any API access pattern. But several characteristics of agentic AI systems make them urgent rather than merely desirable.

Agent autonomy means unpredictable access patterns. A traditional application accesses the same endpoints in the same order every time. An agent’s access pattern is determined by a language model at runtime. Scoped credentials bound the worst case.

Agent delegation means credential proliferation. When agents create sub-agents, credentials multiply. Without ephemeral tokens, either every sub-agent shares the parent’s credential (dangerous) or someone must manually provision credentials for each sub-agent (impractical at agent speed).

Prompt injection means credential theft is a runtime risk, not just a deployment risk. A leaked environment variable is a deployment failure. A prompt injection attack that exfiltrates a credential is a runtime attack that can happen on any user interaction. Ephemeral credentials limit the damage: a stolen token that expires in thirty seconds is a minor incident, not a breach.

Agentic systems are harder to reason about. When security properties cannot be verified by code review alone — because the system’s behaviour depends on a language model — you need defence in depth. Ephemeral, scoped credentials are one layer in that defence. They do not prevent attacks, but they bound the consequences.

Current Status and Limitations

perishable is functional but early. The current implementation supports proxying to OpenAI and Anthropic APIs, with configurable TTLs, scope definitions, and audit logging. Token storage is in-memory for now, which limits deployment to single-node setups.

Known limitations include: no support for WebSocket or streaming connections (the proxy currently buffers responses), no integration with external identity providers (tokens are issued via API key, not OAuth), and scope definitions that are currently endpoint-based rather than semantic (you scope by URL pattern, not by “this agent can summarise but not translate”).

These are engineering problems, not fundamental ones. The security model is sound. The implementation will mature. The point of publishing it now, in its current state, is to get the pattern into the conversation — because every AI agent deployed today with a long-lived API key in an environment variable is a small liability, and the liabilities are accumulating faster than most teams realise.