The open-source runtime for AI agents

You write the agent. Polos handles sandboxes, durability, approvals, triggers, and observability.

$ npx create-polos my-project

Building agents is easy. Running them in production is hard.

Challenge	Typical Agent Framework	With Polos
Sandboxing	None - DIY or run unsandboxed	Docker, E2B + built-in tools
Durability	Agent crashes, start over	Auto-retry, resume from exact step
Approvals	Build it yourself	Slack, UI, terminal - one tap
Triggers	Glue code for every webhook	Built-in: HTTP, webhooks, cron, events
Observability	Grep through logs	Full tracing, every tool call
Cost	Re-run failed LLM calls from scratch	Prompt caching, 60–80% savings

You write the agent. Polos handles the rest.

What you get with Polos

Sandboxed Execution

Isolated Docker & E2B environments. Built-in tools: exec, read, write, edit, glob, grep, web_search.

Human-in-the-Loop

Approval flows for any tool call. Reach your team via Slack. Paused agents consume zero compute.

Durable Workflows

60–80% cost savings via prompt caching. Auto-retry, log-replay, concurrency control.

Triggers

Webhook URLs, HTTP API, cron schedules, event-driven. GitHub/Slack integration with no glue code.

Observability

OpenTelemetry tracing, full execution history, visual dashboard for debugging and replay.

Bring Your Stack

Any LLM via Vercel AI SDK/LiteLLM. CrewAI/LangGraph/Mastra compatible. Python or TypeScript.

See it in action.

Get started in seconds

Terminal

$ npx create-polos my-project

agent.py

from polos import define_agent, sandbox_tools
from polos.models import anthropic

sandbox = sandbox_tools(env="docker")

agent = define_agent(
    id="coding_agent",
    model=anthropic("claude-sonnet-4-5"),
    tools=[*sandbox],
)

agent.ts

import { defineAgent, sandboxTools } from "polos";
import { anthropic } from "polos/models";

const sandbox = sandboxTools({ env: "docker" });

const agent = defineAgent({
  id: "coding_agent",
  model: anthropic("claude-sonnet-4-5"),
  tools: [...sandbox],
});

Build real world agents

PR Reviewer

Triggered by GitHub webhooks. Clones the repo, checks out the branch, runs tests in a sandbox. Posts a line-by-line review with suggested fixes. Waits for the author to respond before following up. Durable execution means it never double-comments, even if it crashes mid-review.

Data Analyst

Connects to your data warehouse, writes and executes SQL in a sandboxed environment. Builds charts, spots anomalies, drafts a summary. Sends you an approval page before sharing with stakeholders - so nothing goes out without your sign-off.

Research Agent

Crawls dozens of sources, extracts key findings, and builds a structured knowledge base. Checkpoints after every source - so if it hits a rate limit or crashes at source 47, it picks up right where it left off. Pings you on Slack or Discord when the report is ready for review.

From the blog

Why I Built Polos: Durable Execution for AI Agents