Skip to content

Decisions

13. Key Architectural Decisions Log

# Decision Rationale Date
ADR-001 Use cagent native YAML, no wrapper format Zero translation layer, users get full cagent features 2026-02-23
ADR-002 soul.yaml as single identity file per agent Simpler than OpenClaw's 6+ bootstrap files. Can add more via add_prompt_files 2026-02-23
ADR-003 cagent serve api as primary container entrypoint HTTP API is the natural interface for containerized agents 2026-02-23
ADR-004 Bash CLI, not compiled binary Minimal dependencies (docker, curl, jq). Ship fast, iterate. 2026-02-23
ADR-005 Debian slim base image Better cagent/tool compat than Alpine. Acceptable size trade-off. 2026-02-23
ADR-006 mobyclaw.yaml is dev-only, not product config Separation of concerns: dev agent ≠ product agent 2026-02-23
ADR-007 "moby" as the default/reference agent Clear identity, easy onboarding, extensible pattern 2026-02-23
ADR-008 Docker Compose over Kubernetes Right-sized for personal agent deployment. K8s is overkill. 2026-02-23
ADR-009 Delegate agent loop entirely to cagent Focus on orchestration, not reimplementing inference + tool execution 2026-02-23
ADR-010 Memory as plain Markdown files (OpenClaw pattern) Simple, portable, agent can read/write with filesystem tools. No DB needed. 2026-02-23
ADR-011 Gateway as separate container from agent Clean separation: gateway handles I/O + routing, agent handles thinking + acting 2026-02-23
ADR-012 Messaging adapters inside gateway, not separate containers Simpler (one container), all JS libs anyway, enable/disable via env vars. Matches OpenClaw. 2026-02-23
ADR-013 Docker volumes for persistence Workspace (memory) and data (sessions, cron) survive container restarts 2026-02-23
ADR-014 4-service separation: moby, gateway, workspace, memory Each concern in its own container. Clean ownership. Independent scaling/failure. 2026-02-23
ADR-015 Workspace + memory as MCP servers cagent's type: mcp toolset connects moby to services. No direct host mounts on agent. 2026-02-23
ADR-016 Separate workspace and memory volumes Workspace = host files (projects, code). Memory = agent state (MEMORY.md, daily logs). Different lifecycles, different owners. 2026-02-23
ADR-017 ~/.mobyclaw/ as user data directory, bind-mounted User-visible, editable, portable, survives docker system prune. Not a Docker volume. 2026-02-23
ADR-018 Messaging adapters inside gateway, not separate bridge containers Simpler, less config, matches OpenClaw. Enable via env var presence. 2026-02-23
ADR-019 Single agent only — no multi-agent support Mobyclaw is a personal agent, not a platform. One agent (moby), one container. Simplifies routing, config, and mental model. Can always revisit. 2026-02-23
ADR-020 Sessions created with tools_approved: true cagent serve api pauses at tool_call_confirmation unless the session has tools_approved: true. Gateway sets this on session creation. Container isolation provides the safety boundary. 2026-02-23
ADR-021 .env file for secrets management Single file, Docker Compose native, no Swarm/Vault needed. Least-privilege: per-service environment blocks control which container sees which var. 2026-02-23
ADR-022 End-to-end streaming via SSE PassThrough cagent emits tokens in real-time. Gateway streams them through via PassThrough piped to HTTP response. Critical: use res.on('close') not req.on('close') for disconnect detection. Telegram adapter edits message every ~1s. CLI prints tokens to stdout. 2026-02-23
ADR-023 docker-compose.override.yml for per-user config Base compose stays static + git-committed. Override is auto-generated from credentials.env + workspaces.conf on every mobyclaw up. Docker Compose merges them automatically. Gitignored. 2026-02-23
ADR-024 Separate credentials.env from .env .env = mobyclaw infra (LLM keys, messaging). credentials.env = user service tokens (gh, aws). Different owners, different lifecycle. credentials.env lives in ~/.mobyclaw/ (portable with agent state). 2026-02-23
ADR-025 Workspaces as host bind mounts via workspaces.conf Simple name=path format in ~/.mobyclaw/workspaces.conf. CLI manages it (workspace add/remove/list). Override generation maps to Docker volumes. Changes require restart. 2026-02-23
ADR-026 Gateway-side scheduler with agent-created schedules via REST API Agent calls POST /api/schedules via curl. Gateway owns timing, persistence, and delivery. Separation: agent composes messages, gateway delivers at the right time. No agent involvement at fire time (pre-composed messages). 2026-02-23
ADR-027 Heartbeat as periodic agent prompt, separate from scheduler Scheduler = precise dumb timer (30s resolution). Heartbeat = intelligent agent review (15m interval). Different concerns: scheduler delivers pre-composed messages; heartbeat invokes full LLM reasoning. Agent uses /api/deliver to proactively message users from heartbeat. 2026-02-23
ADR-028 TASKS.md as agent-managed task store (Markdown) Flexible Markdown file. Agent writes entries via filesystem tools. [scheduled] marker prevents double-scheduling. Channel stored per-task. Heartbeat reviews it. Complements schedules.json (gateway-owned) — TASKS.md is the agent's view, schedules.json is the gateway's execution state. 2026-02-23
ADR-029 Channel context injected as message prefix by gateway Gateway prepends [context: channel=telegram:123, time=...] to every user message. Only mechanism available since cagent API has no per-message metadata. Agent extracts channel for schedule creation. Never displayed to user. 2026-02-23
ADR-030 Last active channel for fallback delivery Gateway tracks last messaging channel used. Fallback when heartbeat/agent needs to deliver without a specific channel target. Resets on restart (acceptable for personal agent). 2026-02-23
ADR-031 Source code mounted at /source for self-modification Agent needs to modify its own Dockerfile, gateway source, compose config, CLI, and documentation. Bind-mounting the project root gives full read-write access. Safety via: git (revert), permission-before-modify policy, syntax checks before rebuild. Four signal types: restart, rebuild, rebuild-gateway, rebuild-all. 2026-02-23
ADR-032 Persistent channel store ChannelStore persists known channels to ~/.mobyclaw/channels.json (one entry per platform). Saved on first message. Schedule API falls back to known channel. Heartbeat includes known channels in prompt. Replaces old in-memory lastActiveChannel. 2026-02-24
ADR-033 Schedule pruning — splice-on-delivery markDelivered() and cancel() splice entries out of array. _load() filters to only pending on startup. schedules.json only ever contains pending entries. Prevents unbounded growth. 2026-02-24
ADR-034 Heartbeat skip guard let running = false flag prevents heartbeat overlap. If previous heartbeat still running, next tick skips. Uses try/finally to reset. Prevents infinite queue buildup at 30s intervals. 2026-02-24
ADR-035 Collect queue mode (OpenClaw-inspired) Default queue mode coalesces rapid queued messages into a single combined turn. Prevents "continue, continue" spam. Messages separated by ---. All promises resolve with the same response. Configurable via QUEUE_MODE env var. 2026-02-24
ADR-036 Typing indicators on message receipt Telegram adapter sends sendChatAction('typing') immediately when a message is received, before any processing. Refresh every 4s while processing. OpenClaw pattern: instant mode. Makes agent feel responsive even during queue waits. 2026-02-24
ADR-037 Queue feedback to user When message is queued behind a running task, user sees "⏳ Working on something else, I'll get to this next..." Telegram message. Deleted automatically when processing starts. SSE endpoint emits queued event. Visible acknowledgment prevents confusion. 2026-02-24
ADR-038 Session daily/idle reset Sessions auto-reset at configurable hour (default 4 AM) and/or after idle timeout. OpenClaw pattern: daily reset clears stale context, idle reset catches long gaps. /new and /reset commands force immediate reset. Persisted lastActivity timestamp survives restarts. 2026-02-24
ADR-039 /stop abort command /stop in Telegram (or POST /api/stop) clears the queue and signals abort on the current run. Returns count of cleared messages. Graceful: doesn't crash the agent, just ends the current turn. 2026-02-24
ADR-040 Queue cap with oldest-drop overflow Max 20 queued messages (configurable). When cap exceeded, oldest message is dropped with error. Prevents unbounded memory growth from spam or runaway loops. OpenClaw uses summarize policy; we use simple drop for now. 2026-02-24
ADR-041 Debounce on queue drain 1000ms debounce before draining collected messages (collect mode only). Lets rapid messages accumulate before the agent processes them as one turn. Configurable via QUEUE_DEBOUNCE_MS. 2026-02-24
ADR-042 Tool Gateway as MCP aggregator in separate container External service access (Notion, Google, etc.) routed through a dedicated tool-gateway container. Manages upstream MCP connections, auth, and token lifecycle independently. Exposes aggregated tools as a single MCP server to cagent via HTTP bridge. Clean separation: agent doesn't know about OAuth, tokens, or MCP wiring. 2026-02-24
ADR-043 Chat-mediated auth for all external services No CLI commands, no admin UIs for auth. All OAuth/device-code flows are initiated conversationally — user says "connect notion", agent sends auth URL via Telegram, user clicks and authorizes, agent confirms. Mirrors how gh auth login worked (Moby sent the device code via Telegram). For OAuth redirect flows (Notion), tool-gateway hosts callback endpoint. 2026-02-24
ADR-044 mcp-bridge: stdio-to-HTTP relay for cagent → tool-gateway cagent only supports MCP via stdio (command + args). Tool-gateway runs in a separate container with HTTP. Bridge script in moby container translates stdio ↔ HTTP. ~50 lines, shell or Go. Allows clean container separation while keeping native MCP tool discovery. 2026-02-24
ADR-045 CLI tools (gh, git, curl) installed directly in agent container If a service has a solid CLI, skip the MCP layer. gh already in moby container. Agent uses via shell toolset. Simpler, fewer moving parts. MCP reserved for services that need structured tool schemas or complex auth. 2026-02-24
ADR-046 Zod schemas required for McpServer.tool() MCP SDK v1.27.0's McpServer.tool() requires Zod schema objects, not plain JSON Schema {type:"string"}. isZodRawShapeCompat() silently rejects plain objects → empty inputSchema.properties. All tool definitions (tool-gateway + mcp-bridge re-registration) must use z.string(), z.number(), etc. 2026-02-24
ADR-047 zod installed globally in moby container mcp-bridge runs inside moby and needs zod to convert JSON Schema → Zod when re-registering remote tools. Added zod to npm install -g in Dockerfile alongside @modelcontextprotocol/sdk. Bridge uses NODE_PATH auto-discovery for global modules. 2026-02-24
ADR-048 Full Playwright browser in tool-gateway Headless Chromium via Playwright in tool-gateway container for full web interaction (navigate, click, type, fill forms, screenshots). Uses Playwright’s internal _snapshotForAI() for accessibility snapshots with aria-ref element targeting — same approach as @playwright/mcp. Single persistent browser context with 10min idle auto-close. Browser is ~400MB but enables account creation, multi-step flows, CAPTCHA viewing via screenshots. 2026-02-24
ADR-049 Accessibility snapshots over screenshots for interaction Agent uses text-based accessibility tree (with ref IDs) to understand and interact with pages. Screenshots are secondary — useful for visual verification (CAPTCHAs, layout) but you "can’t perform actions based on screenshots." Refs change after every action; agent must use refs from the most recent snapshot. Matches Playwright MCP’s design philosophy. 2026-02-24
ADR-050 Recursive JSON Schema → Zod in mcp-bridge Bridge now handles nested types: arrays (z.array()), objects (z.object()), enums (z.enum()), not just primitives. Required for browser_fill_form (array of field objects) and browser_tabs (enum action). Single recursive jsonSchemaToZod() function. 2026-02-24
ADR-051 Agent max_iterations raised to 15 Browser automation tasks require many sequential tool calls (navigate → snapshot → fill → click → wait → snapshot → ...). The default 5 iterations was too low. 15 allows a realistic multi-step flow while still preventing runaway loops. 2026-02-24
ADR-052 Snapshot trimming — tree-based compact mode _snapshotForAI() returns full accessibility trees (59KB+ for HN, 135KB for Wikipedia). Rewrote trimmer from naive line-based to proper tree parser: parse indentation tree, strip /url metadata, unwrap noise wrappers, remove separator text nodes, collapse single-child chains, collapse repeated siblings, hard-cap at 5000 chars. Results: HN 59KB→1.4KB (98%), Wikipedia 135KB→25KB (96%). browser_snapshot accepts full=true escape hatch. 2026-02-24
ADR-053 Read-only integrations as native tool-gateway tools Slack, Notion, Gmail, and Calendar integrations implemented as native tool-gateway tools (direct REST calls) rather than proxying upstream MCP servers. We only need 3-5 read-only endpoints per service; native tools are simpler, faster, no third-party MCP dependencies. 15 new tools total. 2026-02-24
ADR-054 Notion uses internal integration token, not OAuth For read-only access, Notion's internal integration token is dramatically simpler than OAuth 2.0 + PKCE. User creates integration at notion.so/my-integrations, shares pages with it, pastes token. No callback URLs, no browser redirects, no token refresh complexity. 2026-02-24
ADR-055 Google OAuth shared across Gmail + Calendar One Google Cloud project, one OAuth consent screen, one auth flow. Both gmail.readonly and calendar.readonly scopes requested in the same authorization URL. User authorizes once, gets access to both services. Single token in ~/.mobyclaw/tokens/google.json. 2026-02-24
ADR-056 Short-term memory (STM) for session continuity Rolling buffer of last 20 user↔agent exchanges saved to short-term-memory.json. On new session creation, injected as [SHORT-TERM MEMORY] block into the first message. Heartbeat/system messages excluded. Messages capped at 1500 chars. Solves the amnesia problem where daily/turn-limit session resets lose all context. 2026-02-24
ADR-057 Context optimizer — smart context injection Before user messages reach the agent, fetch relevant MEMORY.md sections (scored by keyword overlap), inner emotional state, self-model summary, and matching explorations. Prepend as [MEMORY CONTEXT] block. Agent doesn't need to manually read MEMORY.md each turn. Fetches from dashboard API with 3s timeout and 1500-token budget. Graceful fallback if API fails. 2026-02-24
ADR-058 Exploration heartbeats Every Nth heartbeat (default: 4th, configurable via EXPLORATION_FREQUENCY) allows the agent to follow a curiosity topic from its curiosity_queue, fetch 1 URL, and write a summary to explorations/. Normal heartbeats are reflection-only (cheap). At 2h intervals, that's ~1 exploration per 8 hours. Cost-controlled: max fetches and summary length are configurable. 2026-02-24
ADR-059 Session turn limit (80 exchanges) Sessions auto-rotate after 80 turns. Prevents cagent history from growing to 100+ messages where Anthropic's tool_use/tool_result sequencing can corrupt (discovered in production: messages.102 corruption caused permanent 400 errors). Combined with STM injection, context is preserved across rotations. 2026-02-24
ADR-060 Stream error detection and auto-recovery cagent returns HTTP 200 even when Anthropic returns 400. The error appears only in SSE type: "error" events. Gateway tracks stream errors; if stream ends with error and no content, rejects the promise. isSessionError() expanded to recognize corruption patterns. Auto-clears session and retries once. 2026-02-24
ADR-061 Heartbeat consecutive failure tracking After 2 consecutive heartbeat failures, pauses heartbeats until the session changes (user /new or auto-recovery). Prevents hammering a corrupted session every N minutes. Auto-resumes when lastKnownSessionId changes. 2026-02-24
ADR-062 Context fetch after setBusy to prevent double-processing The context optimizer's async HTTP fetch created a race window where isBusy() was false but message processing had logically started. Fix: orchestrator sets busy=true FIRST, then awaits context fetch via a contextFetcher callback. No async work before the busy guard. 2026-02-24
ADR-063 Telegram message deduplication Tracks last 50 message_ids in a Set. Skips any message already processed. Prevents double-processing when Telegraf's polling restarts and re-delivers updates. In-memory only (resets on gateway restart, which is fine since polling restart is what causes re-delivery). 2026-02-24
ADR-064 Telegraf polling liveness monitor Telegraf v4's long-polling can die silently (no error events, no crash). Gateway tracks last update activity via handleUpdate intercept. If idle >5 minutes and Telegram API is reachable, restarts polling. Conservative threshold prevents false positives during quiet periods. 2026-02-24
ADR-065 Agent-controlled tunnel start via POST /api/tunnel/start The agent cannot exec into the dashboard container or access the Docker socket. Added POST /api/tunnel/start to the dashboard API so moby can start the Cloudflare tunnel over HTTP. Endpoint checks for stale PID, kills old process if needed, spawns fresh cloudflared, and delivers the new URL to the user via gateway's /api/deliver. 2026-02-26