Skip to main content

Four artifacts that externalize agent state

7 min read By Craig Merry
claude-code mcp robot-md workflow-atlas mcp-tape context externalization agent-state

Agents arrive at each session without persistent state. To make a session productive, the operator must hand the agent a model of the world: what the project is, what it contains, what the relevant constraints are, what previous work has already established. Mechanisms for handing the agent this model are usually lumped together as “context.” That label is too coarse. The mechanisms differ in what they capture, who reads them, and what changes when each is wrong.

This post separates four artifacts produced by the work shipped in 2026. Each captures a different layer. Each is consumed by a different process. Each fails differently. None of the four subsumes the others.

Layer 1 — CLAUDE.md: intent

CLAUDE.md is a Markdown file at the root of a repository. The Claude Code agent reads it at session start. Its content is conventions, invariants, command shape, and the small set of warnings that would otherwise have to be repeated in every session.

Bob’s CLAUDE.md declares the robot manifest path, the registered RRN (RRN-000000000011), the active driver (feetech on /dev/ttyACM0), and a list of intents the agent should recognize without asking clarifying questions (“Pose the arm at zero” → call robot-md calibrate --zero). The Heat Compass repository’s CLAUDE.md declares the Flutter SDK path, the Firebase project ID, the Hive box names, the cubit-first BLoC convention, and the iOS bundle-ID drift that was caught during demo-day TestFlight prep.

Authored by humans. One file per repository. Captures intent — what kind of project this is, how the operator expects the agent to behave inside it. Failure mode when wrong: the agent makes plausible but inconsistent decisions. Verification cost: low; the file is short and reviewable by anyone with project context.

Layer 2 — ROBOT.md: declaration

ROBOT.md is a Markdown file with a YAML frontmatter block, schema-validated against https://robotmd.dev/schema/v1/robot.schema.json. It describes a physical artifact.

Bob’s ROBOT.md declares an arm with six degrees of freedom, a feetech driver on /dev/ttyACM0, a 4096-steps-per-revolution encoder, six video devices (a primary oakd OAK-D with extrinsics relative to the world frame, a secondary oak-d-lite-1, plus four Raspberry Pi-side ISP backend streams), capabilities arm.pick / arm.place / arm.reach / arm.home / status.report, two HiTL gates at scopes destructive and system (both require_auth: true), and a ready pose at specific joint angles (shoulder_pan: 2048, shoulder_lift: 1800, elbow_flex: 2300, wrist_flex: 2048, wrist_roll: 2048, gripper: 1700) that keeps the gripper horizontal so the STS3215 wrist servo can hold against gravity indefinitely. The arm.pick capability declares its own preconditions: pose_taught: ready, extrinsic_present: cameras[0], ik_provider_set, workspace_declared, and backend_resolved — the runtime will refuse the call if any precondition is unmet.

Read by any RCAN-conformant runtime, not by any specific agent. The MCP bridge robot-md-mcp exposes the parsed manifest as MCP resources (robot-md://<robot>/capabilities, robot-md://<robot>/safety, robot-md://<robot>/frontmatter) so any MCP-aware agent — Claude Code, Claude Desktop, Codex CLI, Gemini CLI — can read the same declaration through the same surface.

Authored by humans or generated by robot-md init from a hardware preset. One file per robot. Captures declaration — what this physical artifact is, what it can do, what it must not do. Failure mode when wrong: the runtime refuses to bind, or worse, the agent invokes a capability that does not respect the actual hardware envelope. Verification cost: medium; the schema validator catches structural errors, but physical-envelope correctness requires a hardware smoke run.

Layer 3 — atlas.md: structure

atlas.md is generated, not authored. The workflow-atlas (PlatAtlas) plugin scans a repository for declared packages and named flows, then emits a Markdown file with a JSON sidecar that describes the structural graph: nodes (packages, files, MCP servers), edges (canonical flows, derived call relations), families (color-coded groups), and per-flow token-efficiency estimates.

The atlas for the PlatAtlas product itself currently declares 33 nodes across 6 families (cli, core, external, mcp, server, web), of which 6 carry an mcp_server field that resolves trace JSON-RPC calls back to their source-of-truth servers. A trace overlay (?trace= query parameter on the rendered atlas page) re-strokes edges actually walked by a recorded session, in surveyor’s ink at 0.6 alpha. Drift detection ghosts unreferenced nodes into a sidebar so they remain visible without overflowing the rendered viewBox.

The plugin’s init command adds an atlas.md reference to the host repository’s CLAUDE.md, so the agent encounters the graph artifact through the existing intent layer rather than as a new convention to learn. Empirically reduces input-token cost of structural rediscovery for cold sessions on large multi-package repositories.

Authored by automation, refreshed by command or git hook. One file per repository, optionally federated across repos via ?atlas=base1;base2;base3. Captures structure — what packages exist, what flows pass between them, what each is for. Failure mode when wrong: usually drifts behind reality; the package now exists but the atlas does not list it. Verification cost: medium; a CI lint catches missing-path drift, but semantic accuracy (the flow declaration matches the actual call chain) requires either author review or trace-overlay confirmation.

Layer 4 — JSONL trace: execution

A JSONL trace is a line-delimited record of every JSON-RPC message that crossed an MCP stdio boundary during a session. mcp-tape is a transparent proxy that wraps any MCP server and writes the trace as the session runs. Default redaction strips common secret shapes — AWS keys, sk- API tokens, GitHub tokens, JWTs, and fields named password / token / authorization — so a trace accidentally shared does not leak credentials.

The trace format is documented and stable at mcpreplay.dev/docs/format v1. mcp-replay renders it as four views: Timeline (every message in order, expandable), Tools (per-tool aggregate latency and error counts), Calls (each tools/call paired with its response), Raw (line-numbered JSONL pager). A diff mode (?diff=a;b) renders two traces side by side. The format is also accepted by the paid PlatAtlas team product for cross-member aggregation.

Authored by the proxy at session time. One file per session, retained as long as the operator wants. Captures execution — what tools actually got called, with what arguments, in what order, how long each took, where things went sideways. Failure mode when wrong: rare, because the proxy is byte-for-byte transparent; more often the failure is that the trace is not captured at all because the proxy was not in the path. Verification cost: low post-hoc; the trace is the verification.

Why four, not one

Each layer answers a different question and is consumed by a different process. Conflating them couples decisions whose constraints diverge.

LayerCapturesAuthored byRead byMutable cadence
CLAUDE.mdIntentHumanAgent at session startSlow
ROBOT.mdDeclarationHuman or robot-md initAny RCAN-conformant runtimeSlow, schema-gated
atlas.mdStructureAutomationAgent at session startRefreshed per session or per commit
JSONL traceExecutionmcp-tape proxyRenderer, team aggregatorAppend-only per session

The four interoperate. The atlas can reference a ROBOT.md node. The trace overlay can re-stroke edges declared in the atlas. The CLAUDE.md can point the agent at both. None of the four subsumes the others. A repository with a CLAUDE.md but no atlas can still be operated; a repository with an atlas but no CLAUDE.md can still be navigated; a robot with a ROBOT.md but no trace can still be commanded but cannot be debugged after the fact.

The structural decision in 2026 was to ship each artifact as its own product, with its own consumer, on its own release cadence. The four artifacts cluster into two product families (see Two product families maintained in parallel), but the four are not collapsible into one.

Posts in this series