Background

After OpenClaw took off, one trend became obvious: many teams started shipping “OpenClaw wrappers,” and most cloud deployments defaulted to Docker-per-tenant. That choice is reasonable early on, and the engineering cognitive load is low.

But when I tried deploying Claude Code in the cloud myself, the pain surfaced quickly. If agent count keeps growing over the next 3-6 months, waste in container-based isolation becomes amplified, especially in utilization and storage duplication. In other words, what we need is not just “put agents into containers,” but a more explicit Agent-as-a-Service runtime model.

In agent systems, the common requirement is serving many tenants in parallel with environments that are executable, isolated, and reproducible.

The traditional model is “one Docker container per tenant.” It is intuitive, but under high concurrency it runs into two old problems:

  1. Idle compute: requested CPU/memory often diverges from actual load, which creates persistent under-utilization.
  2. Duplicate storage: similar runtime dependencies are repeated across tenants; image-layer reuse cannot fully eliminate per-tenant writable-layer and dependency duplication.

On the growth side, agents are already moving from experimentation to scaled production. Gartner (2025-06-25) signals that by 2028, some daily decisions will be made autonomously by agentic AI, and the share of enterprise software with agentic capabilities will increase.

If you add the rapid rise of open-source agent foundations like OpenClaw (about 329k GitHub stars as of 2026-03-22), continued growth in active cloud agent instances is a realistic planning assumption.

This post summarizes an alternative direction: Docker handles base runtime delivery, bwrap handles process-level isolation, and Nix handles deterministic dependencies. The goal is not replacing Docker, but shifting tenant instances from container granularity to process granularity.

Architecture in One Sentence

A long-running host gateway + bwrap sandbox per request/session + shared read-only Nix Store + task-time on-demand environment activation.

One scaling constraint is especially important: keep execution stateless, and persist state as files in each tenant workspace. This makes horizontal scaling mostly about request routing and workspace mapping, not moving in-memory runtime state.

This split separates “environment build cost” from “tenant isolation cost”:

  1. Build time: materialize dependencies into an immutable dependency layer (Nix Store).
  2. Runtime: create a lightweight isolation shell per tenant (bwrap + namespaces).
  3. Task time: activate declared toolchains on demand instead of pre-baking every variant into images.

Design Abstraction: Four Layers

  1. Distribution layer (Gateway)
    Authentication, routing, session mapping, lifecycle management.
  2. Isolation layer (Sandbox)
    Namespace + mount policy to construct a minimal runnable filesystem view.
  3. Dependency layer (Nix Store)
    Immutable, reusable artifacts for runtimes, build tools, and media tools.
  4. Task layer (Task Runtime)
    Each task executes in an ephemeral context activated from declarations.

Core Algorithm: How One Task Runs

A single task can be abstracted as:

  1. Resolve tenant context: locate tenant workspace and session state.
  2. Ensure workspace readiness: initialize from template on first access; otherwise reuse.
  3. Build sandbox:
    • tmpfs root;
    • mount only required read-only system paths;
    • mount tenant private writable paths;
    • mount shared read-only dependency layer;
    • inject minimal environment variables.
  4. Select network policy:
    • controlled egress when external access is needed;
    • no network otherwise.
  5. Wrap command execution with declared environment activation to keep versions/dependencies consistent.
  6. Stream output and cleanup: tie child lifecycle to parent and destroy transient layers.

Key idea: isolation is a runtime action; reproducibility is a build-time asset.

What Actually Changes vs “One Docker per Tenant”

Dimension Traditional Docker Multi-Tenant Nix + bwrap
Tenant unit Container Sandboxed process
Startup cost Start container context Spawn constrained process
Dependency reuse Coarse image layer reuse Fine-grained store object reuse
Environment consistency Depends on image/version discipline Enforced by declarations
Isolation mechanism Container boundaries Namespace + mount whitelist
Density Medium High (good for short tasks)
Debugging model Enter container and inspect Reproduce from same declaration

From a systems perspective, this is not just “which one is safer.” It moves the bottleneck from container orchestration overhead toward kernel-level process scheduling and I/O behavior.

Why This Fits Agent Workloads Better

  1. Short tasks, frequent starts
    Agent operations are often second- or minute-scale; process sandboxing matches this pattern better.
  2. Fast-changing dependency combinations
    Different tasks need different toolchains; declaration-based activation is easier to control than image matrix expansion.
  3. Reproducible debugging
    The same declaration reproduces the same environment state.
  4. High tenant density
    Shared read-only dependency layers reduce repeated downloads and storage pressure.

A Key Point: Why Declarative Environments Fit Agents

Nix is often treated as “a harder package manager,” but in agent systems it behaves more like environment state specification.

For agents, flake.nix gives two direct benefits:

  1. Readability
    The agent can inspect one declaration file and quickly infer available toolchains, version constraints, and hooks.
  2. Executability
    Tasks can run in a declared environment directly, instead of replaying procedural install chains (apt-get, shell scripts, etc.) at runtime.

Procedural scripts encode step history. Declarative files encode target state. For systems that continuously decide and self-correct, target-state descriptions are easier to analyze, validate, and reproduce.

Security Boundaries

Security here is not a single mechanism. It is boundary composition:

  1. Mount boundary: deny by default, expose by whitelist.
  2. Write boundary: read-only dependency layer, tenant-only writable workspace.
  3. Process boundary: visibility limited to sandbox-local process tree.
  4. Network boundary: shared/proxied/isolated per task policy.
  5. Resource boundary: CPU/memory/concurrency quotas at orchestrator level.

Operationally, bwrap provides process-level isolation, not complete host-level governance. Resource controls and auditability still belong to the orchestration layer.

Cost Model: Where This Architecture Spends Money

Traditional mode often spends on:

  1. Container create/destroy overhead.
  2. Image pull and cache invalidation.
  3. Repeated packaging of multi-language dependencies.

Nix + bwrap spends more on:

  1. Pre-build and cache management.
  2. Dependency declaration quality.
  3. Sandbox policy maintenance (mount/network/permission matrix).

So this pattern fits teams willing to trade upfront engineering rigor for long-term density and predictability.

A concrete example: two tenants both need Chrome.

In Docker-per-tenant mode, you often end up with duplicated installation and writable-layer cost. In Nix mode, declare Chrome once in flake.nix; Store objects and binary cache are reused. Net effect: one build/download path, lower disk duplication, and lower cold-start pressure.

A Real Debugging Story: Slow and Unstable Cold Starts (bwrap Was Not the Bottleneck)

Early in rollout we saw a confusing behavior: wrapped gemini-cli cold starts were sometimes very slow, and latency variance was huge.

Initial suspects were all reasonable:

  1. Cloud disk I/O jitter.
  2. bwrap overhead.
  3. Nix environment activation/build latency.

After elimination, none of these were primary.

The root cause was that gemini-cli checks for a ripgrep binary at startup. If missing, it triggers a network download. The check path is a fixed global path, not just normal PATH resolution.

That caused two effects:

  1. Cold-start latency became network-dependent.
  2. Latency variance became large and hard to benchmark.

The fix was to satisfy that expected path upfront (for example, mapping an existing ripgrep binary there inside the sandbox), effectively moving acquisition from runtime to build/prep time.

Main lesson: when latency is both slow and unstable, verify hidden download paths before tuning isolation or scheduling.

When This Is Not a Good Fit

  1. Teams without Linux isolation + Nix operational experience.
  2. Long-lived service workloads better served by standard container orchestration.
  3. Heavy GPU passthrough scenarios without mature isolation policy.
  4. Compliance contexts requiring pre-certified container security stacks.

Migration Strategy: From Docker to Nix + bwrap

Use staged migration instead of big-bang rewrite:

  1. Declare dependencies first: move critical runtimes from image scripts into declarative dependency layer.
  2. Then shift tenant unit: from per-tenant container to per-tenant sandboxed process.
  3. Then complete governance: quotas, audit logs, failure cleanup, and load-test baselines.

This keeps Docker’s delivery stability while gradually unlocking higher runtime density.

Conclusion

The core split is:

Docker for system delivery, Nix for deterministic dependencies, bwrap for tenant execution isolation.

When the goal is high-concurrency short tasks + multi-language toolchains + reproducible execution, this layered model is often more economical than one-container-per-tenant, as long as the team can maintain declarative dependency and sandbox policy discipline.

References

  1. Gartner (2025-06-25), Agentic AI forecast:
    https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
  2. CAST AI (2025), Kubernetes utilization and waste (vendor sample, trend reference):
    https://cast.ai/press-release/new-kubernetes-cost-benchmark-report-reveals-persistent-cloud-waste/
  3. CNCF (2026-01-20), Annual Cloud Native Survey:
    https://www.cncf.io/reports/the-cncf-annual-cloud-native-survey/
  4. OpenClaw GitHub repository (popularity observation, time-varying):
    https://github.com/openclaw/openclaw