zviz vs gVisor vs Firecracker: Choosing a Sandbox for AI-Generated Code
A practical comparison of zviz, gVisor, and Firecracker for sandboxing untrusted AI-generated code — performance, security, and operational trade-offs.
The question
Should I sandbox untrusted AI-generated code with zviz, gVisor, Firecracker, or WASM?
Sandboxing is the standard defence for code you cannot trust. LLM agents that write and execute code make this question acute. This post is the comparison we wish we had when we started zviz.
The 60-second version: gVisor is the heavy userspace kernel; Firecracker is the microVM; WASM is portable but constrained; zviz is the minimal-overhead gVisor-inspired sandbox in pure Zig. The right choice depends on the threat model, the throughput you need, and how much operational complexity you can absorb.
What each option is
zviz is a sandbox for untrusted code in pure Zig, gVisor-inspired but with a much smaller surface and a near-zero runtime cost. It intercepts syscalls at the user-kernel boundary, validates them against a policy, and routes I/O through a proxy. No kernel module, no host kernel fork, no per-VM overhead.
gVisor is a userspace kernel written in Go (with a small sentry in Rust). It intercepts every syscall from the sandboxed process and re-implements them in userspace. Mature, used at scale (Google Cloud Run, GKE Sandbox), with a strong security track record. The cost is significant CPU overhead — typically 10-30% slowdown for the sandboxed workload.
Firecracker is a microVM monitor from AWS. It runs a real Linux kernel inside a KVM-based VM with a minimal device model. The security model is hardware-assisted isolation. Cold-start is fast (~125ms) but the per-VM memory overhead is real (50-100MB minimum).
WASM is a portable bytecode format with a strong sandbox model — the host cannot be reached by the guest, period. The cost is the language constraint: WASM only runs WASM-compiled code, and many libraries do not have WASM builds.
The five dimensions
| Dimension | zviz | gVisor | Firecracker | WASM |
|---|---|---|---|---|
| Architecture | Syscall interceptor | Userspace kernel | Hardware-assisted microVM | Bytecode sandbox |
| Overhead per execution | < 5% | 10-30% | 50-100MB RAM + cold start | 5-15% (with WASI) |
| Cold start | < 1ms | ~50ms | ~125ms | < 1ms |
| Language support | Any native binary | Any native binary | Any native binary | WASM only |
| Security model | Syscall policy | Syscall re-implementation | Hardware isolation | Capability-based |
| Maturity | Early (0.x) | Production | Production | Production |
| Operational complexity | Low | Medium | Medium | Low |
| License | MIT | Apache-2.0 | Apache-2.0 | Apache-2.0 (most) |
| Best for | Many short-lived sandboxes | Long-running workloads with high security needs | Multi-tenant serverless | Web / browser / untrusted compute |
| AI code execution | Yes | Yes | Yes | Only if compiled to WASM |
| Language | Pure Zig | Go + Rust | Rust | Rust / AssemblyScript / C |
When to use which
Use zviz when:
- You are running many untrusted code executions per second and the per-execution cost of a VM or a userspace kernel is killing you.
- The untrusted code is short-lived (sub-second to a few seconds) — agent tool calls, sandboxed code snippets, test execution.
- You want a single-binary, no-daemon sandbox that you can ship in your agent runtime.
- The threat model is “untrusted code trying to escape” — zviz blocks the escape vectors at the syscall layer.
Use gVisor when:
- The untrusted workload is long-running and CPU-bound, and the 10-30% overhead is acceptable for the security guarantee.
- You are running an untrusted language runtime (Python, Node, Ruby) and you want the full POSIX surface available.
- You have a large operations team that can manage the kernel integration and the policies.
Use Firecracker when:
- You want hardware-assisted isolation for a long-running workload (serverless function, multi-tenant container).
- Cold start of ~125ms is acceptable.
- You have 50-100MB of RAM per sandbox to spend.
Use WASM when:
- The untrusted code is under your control (you wrote the compiler) and you can target WASM.
- The threat model is the strictest possible — the host is completely unreachable.
- You are running in a browser or a web-edge environment.
A concrete example: agent tool calls
Say you are building an LLM agent that writes Python code to answer questions about a user’s data. The agent generates ~50 Python snippets per conversation; each snippet runs in a sandbox; the result comes back.
| Sandbox | Time per snippet | Memory peak | Setup cost |
|---|---|---|---|
| zviz | 80ms | 12MB | 1 binary, no daemon |
| gVisor | 250ms (incl. ~50ms cold start) | 80MB | KVM, runsc, policies |
| Firecracker | 200ms (incl. ~125ms cold start) | 100MB | KVM, firecracker, jailer |
| WASM | 50ms (no Python interpreter in WASM by default) | 8MB | WASI runtime |
For this workload, zviz is the right answer. Firecracker is overkill (the workload is too short to amortise the cold start). gVisor is the right answer if you also want to run untrusted Jupyter notebooks. WASM is the right answer if you control the interpreter and can compile it to WASM.
zviz vs gVisor: the closer comparison
Both zviz and gVisor are syscall-mediated sandboxes. The differences are:
- Language: zviz is in pure Zig (one binary, no dependencies); gVisor is a Go sentry with a Rust shim. zviz ships in your agent; gVisor ships as a kernel module + a userspace binary.
- Surface: gVisor implements the full Linux syscall surface
(300+ syscalls); zviz implements the 30-40 syscalls that
untrusted code actually uses. If your untrusted code uses
io_uringorbpf(), gVisor is your answer. - Overhead: zviz adds < 5% because the syscall checks are tight policy lookups; gVisor adds 10-30% because the syscall re-implementation has full re-implementation overhead.
- Auditability: zviz’s policy is a single Zig struct you can read; gVisor’s policy is a YAML tree managed by runsc.
If your untrusted code is agent-written and the syscall surface is small, zviz is the right answer. If you are running an untrusted language runtime, gVisor is the right answer.
What to read next
- Sandboxing Untrusted Code in Zig: The zviz Architecture — the full zviz architecture post
- Why We Write AI Infrastructure in Rust (and Zig, and Go) — language choice for AI infrastructure
- zviz repository
- gVisor documentation
- Firecracker documentation