Skip to main content

OpenCastor: a year of building 'declarative robotics' with Claude Code

8 min read By Craig Merry
opencastor robot-md robotics RCAN RRF MCP claude-code introduction

If you’ve never seen this site before and want the one-paragraph version: I’ve spent the last year building OpenCastor — an open-source ecosystem that lets a single Markdown file (ROBOT.md) describe a physical robot well enough that any agent can safely operate it. Almost every line was paired with Claude Code. The point of this post is to walk you through the whole stack and the consumer-grade UX I’ve been focused on for the last few months.

The robot in every photo and transcript on this blog is a SO-ARM101 arm I call bob, sitting on a desk in my house. Bob is the production environment.

The premise

If you’ve worked in robotics, you know the on-ramp problem. You have a robot. You want an agent — Claude, Gemini, whatever — to drive it. To get there, you write hundreds of lines of glue code: motion primitives, joint limits, calibration matrices, vision routines, an estop wrapper. You write that glue once for your robot, and again the next time, and again the time after that. Every consumer of your robot — a teleop UI, an LLM, a teammate’s script — re-discovers the same facts.

ROBOT.md is the alternative. One file at the root of your robot’s project, declarative, schema-validated, that says:

  • which servos are where,
  • which capabilities the robot exposes (pick_place, vision_find, home, estop),
  • what the safety envelope is,
  • which sensors are connected and how to talk to them,
  • and where this robot is registered for compliance.

Any tool that understands the schema can drive the robot. The model becomes interchangeable. The robot’s identity lives in the file, not in your agent’s prompt.

The layered cake

                 ┌─────────────────────────────────────────┐
                 │  Surfaces (Claude Code, Claude Desktop, │
                 │  Gemini CLI, Codex CLI, voice pendant…) │
                 └─────────────────────────────────────────┘

                 ┌─────────────────────────────────────────┐
                 │  robot-md-mcp  (MCP server)             │
                 │  + robot-md-dispatcher  (Agent SDK,     │
                 │     remote dispatch, BYOK billing)      │
                 └─────────────────────────────────────────┘

                 ┌─────────────────────────────────────────┐
                 │  robot-md  (manifest spec + CLI)        │
                 └─────────────────────────────────────────┘

                 ┌─────────────────────────────────────────┐
                 │  RCAN  (the wire protocol — rcan-spec,  │
                 │  rcan-py, rcan-ts)                      │
                 └─────────────────────────────────────────┘

                 ┌─────────────────────────────────────────┐
                 │  Robot                                  │
                 └─────────────────────────────────────────┘

  And, off to the side, the public registry:
                 ┌─────────────────────────────────────────┐
                 │  Robot Registry Foundation              │
                 │  (FRIA, safety benchmarks, IFU,         │
                 │   incident reports, EU register)        │
                 └─────────────────────────────────────────┘

Each layer is its own repo, its own release cadence, and its own clear concern. The order matters: the protocol is the deepest layer because everything above it depends on it being stable. The surfaces are the shallowest because they’re the easiest to add and the most likely to change.

What lives in each layer

RCAN is the protocol — robot communication semantics, canonical-JSON sign/verify, capability descriptors. Three SDKs: rcan-py, rcan-ts, and the spec itself. New work in the ecosystem must depend on RCAN 3.0+.

robot-md is the manifest spec and CLI. robot-md init, robot-md validate, robot-md render, robot-md discover — the operator-facing tools. This is the layer most people touch.

robot-md-mcp is an MCP server that exposes a ROBOT.md to any MCP-aware client. Claude Desktop, Claude Code, Cursor, Zed, Gemini CLI, Codex CLI — all of them can drive a registered robot through this single bridge. A human is in the loop.

robot-md-dispatcher is the remote-dispatch layer, covered in detail in a previous post. When the caller is not a human in front of an MCP client — when it’s a Slack bot, a phone, cron, or another agent — the dispatcher accepts a goal over HTTPS, runs a Claude Agent SDK loop on the robot host, and only sends the goal and the final result across the wire. Bring-your-own-key for inference.

robot-md-pendant is a touchscreen + voice pendant: an ESP32-S3 AMOLED frontend driving a Python service on a Raspberry Pi, also running the Claude Agent SDK against robot-md-mcp. It’s the “I want to talk to my robot, not type at it” surface.

robot-md-autoresearch is a Karpathy-style nightly eval harness. It scores ROBOT.md template variants against a swappable robot fixture (real, sim, mock, remote-via-dispatcher) using Claude as the exclusive multimodal model and agent. It’s how the manifest schema improves over time without me hand-tuning it.

claude-code-plugins is a Claude Code plugin marketplace for the ecosystem. If you’ve installed Claude Code, you can install the robot-md plugin and get slash commands like /enable-dispatch that walk you through setup without ever printing a secret into the conversation.

Robot Registry Foundation is the public registry. The first five EU AI Act-aligned endpoints (FRIA, safety benchmarks, IFU, incident reports, EU register) are live. Bob is registered as RRN-000000000001. Submissions are signed by your local key, never by the dispatcher, never by an LLM.

OpenCastor is the original framework all of this grew out of — a tiered-brain robotics runtime with eight AI providers and a self-improving loop. It’s now in maintenance mode while robot-md and the surrounding ecosystem are the active focus, but it’s still the best place to look if you want to understand the design heritage.

Documentation lives at docs.opencastor.com.

The Anthropic-native bias is on purpose

If you’ve read any other post on this blog, you’ve noticed every surface above bends toward Anthropic. That isn’t an accident — it’s a design choice with a particular shape:

  • MCP is the bridge to interactive clients. Anthropic invented the protocol, but every major coding agent (Claude Desktop, Claude Code, Cursor, Zed, Gemini CLI, Codex CLI) speaks it now. One bridge, many surfaces.
  • Claude Code is the operator surface. The plugin marketplace turns “set up a dispatcher” into a slash command.
  • Claude Agent SDK runs the loop on the robot host inside both robot-md-pendant and robot-md-dispatcher, so the heavy reasoning happens locally and the network only carries goals and results.
  • Claude as the exclusive eval model in robot-md-autoresearch, because consistency matters more than diversity when the eval target is the manifest schema itself.

Polishing the Anthropic stack first means every primitive — CLI, plugin, MCP server, Agent SDK, eval harness — has a first-class surface. Other vendors get the same primitives, in their own paradigms, in their own focused weeks. Google is up next.

What I’ve been focused on lately: consumer-grade UX

The first nine months of this project were about getting the architecture right. The last two months have been about getting the on-ramp right — because it doesn’t matter how good the schema is if a hobbyist with a robot on their desk gives up before their first successful tool call.

The pattern that emerged across robot-md, robot-md-dispatcher, robot-md-pendant, and the Claude Code plugin is what I now think of as three-tier UX:

  1. Zero-friction default. --yes gets you running with sensible defaults. No questions. The first run prints any one-time secret exactly once and never persists it.
  2. Guided. An init command that explains every knob as it asks. For first-time setup or anyone who wants to understand what they’re choosing.
  3. From inside the agent. A slash command — /enable-dispatch, /install-pendant — that does the setup from inside Claude Code without printing secrets into the conversation. For operators who shouldn’t be editing YAML.

The same three tiers, the same principles (working defaults first, precision opt-in, secrets never leak to the conversation), repeated everywhere. A consumer should be able to go from “I have a robot on my desk” to “an LLM is driving it” in under fifteen minutes, on any of the three paths.

That UX pattern isn’t done. Recent brainstorming work (vendor breadth, hot-plug detection on the USB serial bus, audio + screen onboarding) is bundled into the next major release of robot-md, and a similar focused-vendor week for Google is coming up. If you’re building anything in this space, the takeaway is that the on-ramp is the product. The schema is necessary; the UX is what determines whether anyone ever sees it.

Who this is for

  • Hobbyists with a SO-ARM101, an SO-ARM100, or any robot you’ve built yourself. Write a ROBOT.md. Install Claude Code. Run the plugin. Drive it.
  • Educators running a robot in a classroom who want students to interact with hardware through an agent without writing per-robot glue.
  • Roboticists and platform engineers who want a vendor-neutral declaration format for their fleet, with a real registry behind it for compliance.
  • Compliance / safety engineers who need an EU AI Act-aligned audit trail for high-risk physical AI systems — this is what the Robot Registry Foundation endpoints are for.

Everything is open source, Apache-2.0 or MIT, free to try, no SaaS gating.

Where to start

If you have a robot:

pip install robot-md robot-md-mcp
robot-md init                  # interactive
# OR
robot-md init --yes            # one-shot, defaults

Then point any MCP-aware client at robot-md-mcp and start asking it questions about your robot.

If you don’t have a robot but want to read the spec:

If you want to follow the day-to-day, every shipped milestone gets its own post on this blog.


The next post will be the Google-week write-up: a complete consumer path through Gemini CLI, AI Studio, Vertex Agent Builder, ADK, and local Gemma models — driving bob with no Anthropic dependency at all. Because portability is what makes a declaration format real.