Sandboxed execution. Agents that reach you. Durable workflows.
AI agents break the rules of traditional software.
AI agents run while you sleep. But they still need you. To approve. To confirm. To provide a credential. You're not at your desk. Your agent is stuck.
Traditional SaaS is predictable. Click a button, one thing happens. AI agents are different. You say "fix the bug" - they write code, run commands, delete files. That power is the point. And the risk.
Most frameworks ignore this. Polos is built for it.
What you get with Polos
// Create a sandboxed environment — agents get exec, read, write,
// edit, glob, and grep tools automatically.
const sandbox = sandboxTools({
env: 'docker',
docker: {
image: 'node:20-slim',
workspaceDir: './workspace',
memory: '2g',
},
});
// Give the agent sandbox tools — it can now run commands,
// read/write files, and explore the codebase autonomously.
const codingAgent = defineAgent({
id: 'coding_agent',
model: anthropic('claude-sonnet-4-5'),
systemPrompt: 'You are a coding assistant. The repo is at /workspace.',
tools: [...sandbox], // exec, read, write, edit, glob, grep
})Agents run in isolated environments - Docker, E2B, or cloud VMs.
Built-in tools for shell, file system, and web search.
Full power. Zero risk to your systems.
Agents reach you - not the other way around.
Stripe-like approval pages that collect input, not just yes/no.
Slack, SMS, email. You're at dinner. Phone buzzes. One tap. Done.

Built-in observability - every step, every approval, every tool call.
60–80% cost savings via prompt caching.
Automatic retries on failure - no manual intervention.
State persists - agents resume exactly where they left off.
Concurrency control across multiple agents - no API rate limit chaos.
See it in action.
Build real world agents
Hooks into GitHub. Clones the repo, checks out the branch, runs tests in a sandbox. Posts a line-by-line review with suggested fixes. Waits for the author to respond before following up. Durable execution means it never double-comments, even if it crashes mid-review.
Connects to your data warehouse, writes and executes SQL in a sandboxed environment. Builds charts, spots anomalies, drafts a summary. Sends you an approval page before sharing with stakeholders - so nothing goes out without your sign-off.
Crawls dozens of sources, extracts key findings, and builds a structured knowledge base. Checkpoints after every source - so if it hits a rate limit or crashes at source 47, it picks up right where it left off. Pings you on Slack when the report is ready for review.