The Evolution of Agent Sandbox: From Docker to Cloud Services

In April, Manus made waves in the industry by demonstrating how an Agent could autonomously write and execute code to solve complex tasks. Since then, a secure Sandbox (Code Interpreter) has become a standard configuration for any serious Agent system.

Industry giants followed suit: Alibaba launched Agent Bay (Computer/Mobile/Browser use), and specialized vendors like E2B and PPIO emerged.

This post documents our journey in selecting and implementing a sandbox architecture, from self-hosting exploration to cloud service adoption.

Open Table of contents

Phase 1: The Self-Hosting Exploration (Docker + NFS)
- Why We Moved On
Phase 2: Evaluating Cloud Sandboxes
- The Contenders
Phase 3: The Storage Breakthrough (OSS Integration)
Phase 4: The Concurrency Trap (Redis Locking)
- The Fix: Redis Distributed Lock
Phase 5: The “State Persistence” Trap
- The “Pause” Strategy (Failed)
- The “Keep-Alive” Strategy (Current)
Conclusion
Resources

Phase 1: The Self-Hosting Exploration (Docker + NFS)

When we first started, the natural instinct was to build it ourselves using open-source frameworks like agent-infra/aio.

Our proposed architecture was standard:

Compute: Docker containers on our own Kubernetes cluster.
Storage: LangGraph’s Filesystem Middleware requires a local path. To bridge the gap with the remote sandbox, we planned to use NFS (Network File System) to mount shared storage across both environments.

Why We Moved On

We didn’t actually implement this in production. During the exploration phase, we realized the Ops Burden would be massive:

Complexity: Synchronizing files via NFS across a distributed system while maintaining strict permissions is a maintenance nightmare.
Overhead: Managing Docker images (installing new libraries on the fly), handling auto-scaling, and maintaining session state (warm-up/cleanup) was too much work for a small team. We wanted to focus on Agent reasoning, not Kubernetes plumbing.

Phase 2: Evaluating Cloud Sandboxes

By November, we realized: Why reinvent the wheel? We started evaluating managed Sandbox-as-a-Service providers.

The Contenders

E2B:
- Pros: Industry standard, high community visibility, generous free tier ($100).
- Cons: Network latency for users in China was unacceptable.
Novita AI:
- Pros: Stable.
- Cons: Requires upfront payment (no free trial).
PPIO Sandbox:
- Pros: Optimized for China, supports persistent file systems, very similar API to Novita.
- Decision: We chose PPIO for its domestic network performance and compatibility with our file system needs.

Phase 3: The Storage Breakthrough (OSS Integration)

Using a cloud sandbox introduced a new challenge: File Sharing. If the Agent generates a chart inside the cloud sandbox, how do we get it out? And how does the Agent edit a file stored in our cloud?

We solved this by bridging the Agent and Sandbox file systems via OSS (Object Storage Service).

1. The “OSS Filesystem Backend” (Server-side)

We rewrote the backend for our Filesystem Middleware. Instead of operating on the local server disk, we implemented a custom backend that translates standard file operations into OSS API calls.

ls -> oss_client.list_objects(prefix=...)
read_file -> oss_client.get_object(...)
write_file -> oss_client.put_object(...)

This gives the Agent “Built-in Tools” to manipulate files in the cloud as if they were local.

2. OSSFS Mounting (Sandbox-side)

Inside the PPIO sandbox, we use ossfs (FUSE) to mount the same bucket to a local directory.

Server View: oss://my-bucket/task-123/data.csv
Sandbox View: /workspace/task-123/data.csv

3. Path Consistency & Security

The trickiest part was ensuring the Agent “knows” where it is.

System Prompt: We explicitly instruct the Agent: “Your working directory is /workspace/{task_id}”.
Path Mapping: We enforce strict prefix mapping. If the Agent asks to write result.txt, our backend translates it to task-123/result.txt in OSS.
Isolation: For security, we only mount the specific sub-directory (prefix) for the current task, preventing the sandbox from accessing other users’ files.

This architecture allows the Agent to use its “own computer” to write code, generate documents, and create images, with all artifacts persistently stored in OSS.

Phase 4: The Concurrency Trap (Redis Locking)

As our system scaled, we encountered a strange bug: State Drift.

An Agent would issue two parallel tool calls:

pip install numpy
python analysis.py

Logic dictated that these should run in the same sandbox. However, due to async concurrency, both requests checked agent_state.sandbox_id, found it empty (or the previous sandbox expired), and simultaneously created two different sandboxes.

Command 1 installed numpy in Sandbox A.
Command 2 ran the script in Sandbox B, failing with ModuleNotFoundError: No module named 'numpy'.

The Fix: Redis Distributed Lock

We couldn’t rely on the Agent’s memory state alone. We introduced a Redis Lock keyed by the thread_id.

This ensures that even if an Agent “thinks” in parallel, its “body” (the sandbox) remains a singleton resource for that session.

Phase 5: The “State Persistence” Trap

The “Pause” Strategy (Failed)

We initially wanted the Agent to “remember” everything indefinitely. We used the sandbox’s Pause/Resume feature to snapshot the container state after every run.

The Result: Disaster.
Storage Explosion: In just one month, we accumulated 600GB of container snapshots because we weren’t destroying them.
Reliability: Resuming memory state proved unstable. Often, variable definitions were lost or the Python kernel hung upon resume.

The “Keep-Alive” Strategy (Current)

We analyzed our logs and found that 99% of tasks (Data Analysis, Plotting) finish within minutes. Long-running state was rarely needed.

We switched to a simpler strategy:

5-Minute Keep-Alive: If the Agent is active, keep the sandbox warm.
Auto-Kill: After 5 minutes of inactivity, destroy the sandbox.
Stateless Execution: We encourage the Agent to treat every code block as self-contained or to reload data from the persistent file system (mounted OSS) if needed.

Conclusion

Building a Sandbox is not just about docker run. It’s about lifecycle management, file synchronization, and choosing the right provider for your target audience.

Moving to a managed service like PPIO saved us hundreds of engineering hours, allowing us to focus on the Agent’s reasoning logic rather than Kubernetes maintenance.