zviz vs gVisor vs Firecracker: Choosing a Sandbox for AI-Generated Code

The question

Should I sandbox untrusted AI-generated code with zviz, gVisor, Firecracker, or WASM?

Sandboxing is the standard defence for code you cannot trust. LLM agents that write and execute code make this question acute. This post is the comparison we wish we had when we started zviz.

The 60-second version: gVisor is the heavy userspace kernel; Firecracker is the microVM; WASM is portable but constrained; zviz is the minimal-overhead gVisor-inspired sandbox in pure Zig. The right choice depends on the threat model, the throughput you need, and how much operational complexity you can absorb.

What each option is

zviz is a sandbox for untrusted code in pure Zig, gVisor-inspired but with a much smaller surface and a near-zero runtime cost. It intercepts syscalls at the user-kernel boundary, validates them against a policy, and routes I/O through a proxy. No kernel module, no host kernel fork, no per-VM overhead.

gVisor is a userspace kernel written in Go (with a small sentry in Rust). It intercepts every syscall from the sandboxed process and re-implements them in userspace. Mature, used at scale (Google Cloud Run, GKE Sandbox), with a strong security track record. The cost is significant CPU overhead — typically 10-30% slowdown for the sandboxed workload.

Firecracker is a microVM monitor from AWS. It runs a real Linux kernel inside a KVM-based VM with a minimal device model. The security model is hardware-assisted isolation. Cold-start is fast (~125ms) but the per-VM memory overhead is real (50-100MB minimum).

WASM is a portable bytecode format with a strong sandbox model — the host cannot be reached by the guest, period. The cost is the language constraint: WASM only runs WASM-compiled code, and many libraries do not have WASM builds.

The five dimensions

Dimension	zviz	gVisor	Firecracker	WASM
Architecture	Syscall interceptor	Userspace kernel	Hardware-assisted microVM	Bytecode sandbox
Overhead per execution	< 5%	10-30%	50-100MB RAM + cold start	5-15% (with WASI)
Cold start	< 1ms	~50ms	~125ms	< 1ms
Language support	Any native binary	Any native binary	Any native binary	WASM only
Security model	Syscall policy	Syscall re-implementation	Hardware isolation	Capability-based
Maturity	Early (0.x)	Production	Production	Production
Operational complexity	Low	Medium	Medium	Low
License	MIT	Apache-2.0	Apache-2.0	Apache-2.0 (most)
Best for	Many short-lived sandboxes	Long-running workloads with high security needs	Multi-tenant serverless	Web / browser / untrusted compute
AI code execution	Yes	Yes	Yes	Only if compiled to WASM
Language	Pure Zig	Go + Rust	Rust	Rust / AssemblyScript / C

When to use which

Use zviz when:

You are running many untrusted code executions per second and the per-execution cost of a VM or a userspace kernel is killing you.
The untrusted code is short-lived (sub-second to a few seconds) — agent tool calls, sandboxed code snippets, test execution.
You want a single-binary, no-daemon sandbox that you can ship in your agent runtime.
The threat model is “untrusted code trying to escape” — zviz blocks the escape vectors at the syscall layer.

Use gVisor when:

The untrusted workload is long-running and CPU-bound, and the 10-30% overhead is acceptable for the security guarantee.
You are running an untrusted language runtime (Python, Node, Ruby) and you want the full POSIX surface available.
You have a large operations team that can manage the kernel integration and the policies.

Use Firecracker when:

You want hardware-assisted isolation for a long-running workload (serverless function, multi-tenant container).
Cold start of ~125ms is acceptable.
You have 50-100MB of RAM per sandbox to spend.

Use WASM when:

The untrusted code is under your control (you wrote the compiler) and you can target WASM.
The threat model is the strictest possible — the host is completely unreachable.
You are running in a browser or a web-edge environment.

A concrete example: agent tool calls

Say you are building an LLM agent that writes Python code to answer questions about a user’s data. The agent generates ~50 Python snippets per conversation; each snippet runs in a sandbox; the result comes back.

Sandbox	Time per snippet	Memory peak	Setup cost
zviz	80ms	12MB	1 binary, no daemon
gVisor	250ms (incl. ~50ms cold start)	80MB	KVM, runsc, policies
Firecracker	200ms (incl. ~125ms cold start)	100MB	KVM, firecracker, jailer
WASM	50ms (no Python interpreter in WASM by default)	8MB	WASI runtime

For this workload, zviz is the right answer. Firecracker is overkill (the workload is too short to amortise the cold start). gVisor is the right answer if you also want to run untrusted Jupyter notebooks. WASM is the right answer if you control the interpreter and can compile it to WASM.

zviz vs gVisor: the closer comparison

Both zviz and gVisor are syscall-mediated sandboxes. The differences are:

Language: zviz is in pure Zig (one binary, no dependencies); gVisor is a Go sentry with a Rust shim. zviz ships in your agent; gVisor ships as a kernel module + a userspace binary.
Surface: gVisor implements the full Linux syscall surface (300+ syscalls); zviz implements the 30-40 syscalls that untrusted code actually uses. If your untrusted code uses io_uring or bpf(), gVisor is your answer.
Overhead: zviz adds < 5% because the syscall checks are tight policy lookups; gVisor adds 10-30% because the syscall re-implementation has full re-implementation overhead.
Auditability: zviz’s policy is a single Zig struct you can read; gVisor’s policy is a YAML tree managed by runsc.

If your untrusted code is agent-written and the syscall surface is small, zviz is the right answer. If you are running an untrusted language runtime, gVisor is the right answer.