Skip to main content

The Order of Operations Toward Swarm Intelligence

5 min read By Craig Merry
OpenCastor AI Robotics Swarm Intelligence Self-Improving

There’s a tempting impulse in robotics AI: jump straight to the swarm. Multiple agents coordinating, dividing labor, communicating in real-time. It’s the sci-fi dream. But after weeks of building OpenCastor — an open-source AI robotics framework — I’ve learned that the order of operations matters far more than the destination.

A robot that gets better over time is worth more than ten robots that don’t.

The Foundation: A Brain That Can Think in Layers

Before a robot can improve itself, it needs a brain architecture that separates reaction from reasoning. OpenCastor uses a tiered brain:

  • Layer 0 — Reactive (<1ms): Rule-based safety checks and Hailo-8 NPU object detection. This layer never waits for an API call. If something is 30cm away, the robot stops. Period.
  • Layer 1 — Fast (~500-900ms): Open-source vision models via HuggingFace’s free API. Qwen2.5-VL-7B processes camera frames and returns structured action decisions. Cost: $0.
  • Layer 2 — Planner (~12s): Claude for high-level planning, invoked every N ticks or when the fast layer is uncertain. This is where long-horizon reasoning lives.
  • Fallback chain: Gemini 2.5 Flash-Lite if HF is down, Ollama locally if all APIs are down.

This architecture isn’t just about latency — it’s about making the self-improving loop possible. You can’t analyze and patch a monolithic brain. You can tune confidence thresholds, swap models, and adjust planner frequency when the brain is modular.

The Sisyphus Loop: Learning From Failure

The centerpiece of our recent work is the self-improving loop, inspired by the OmO pattern (PM → Dev → QA → Apply). We call it the Sisyphus Loop — because the robot pushes the boulder up the hill, rolls back down, and pushes smarter next time.

Here’s how it works in practice:

  1. PM Stage — After each episode (task attempt), the PM analyzes what happened. What was the goal? What actions were taken? Where did it fail? This produces a structured root-cause analysis.
  2. Dev Stage — Given the analysis, generate a patch. This could be a config tweak (adjust obstacle distance from 0.3m to 0.5m), a behavior rule change, or a prompt modification.
  3. QA Stage — Verify the patch is safe. Does it stay within physical bounds? Does it break existing behaviors? Is it reversible?
  4. Apply Stage — Deploy the patch. Config changes auto-apply within safe bounds. Code changes always require human review. Every patch gets a rollback ID.

The key insight: this is disabled by default. Not every user wants their robot experimenting on itself. We added clear cost visibility — the wizard shows estimated monthly costs for each analysis provider preset, from $0 (local Ollama) to ~$15/month (Claude). Users opt in deliberately.

# Enable from CLI
castor improve --enable

# Or pick a preset in the wizard
castor wizard
# → Step 7: Self-Improving Loop
# → [0] No — skip (default)
# → [1] Free (Local Only)     $0/month
# → [2] Budget (HuggingFace)  $0/month
# → [3] Smart (Gemini Flash)  ~$1-3/month
# → [4] Premium (Claude)      ~$5-15/month

The RCAN Schema: Making It Real

A self-improving robot needs a configuration format that can describe all of this. OpenCastor’s RCAN (Robot Configuration and Autonomy Notation) schema now includes:

  • tiered_brain — Planner interval, uncertainty thresholds
  • reactive — Hailo-8 vision toggle, obstacle distances, fallback providers
  • learner — The entire self-improving loop config: provider, model, cadence, auto-apply preferences

This matters because the Sisyphus Loop’s Dev Stage generates patches to the RCAN config. The schema defines what’s valid, what’s safe to auto-apply, and what needs human review. Without a formal schema, the robot’s self-modifications would be unconstrained.

Why Self-Improvement Before Swarm

The temptation to build multi-agent coordination first is real. But consider: if you deploy a swarm of robots that can’t learn from mistakes, you get a swarm of robots that all make the same mistakes. Simultaneously.

The order of operations we’re following:

  1. Phase 1: Self-Improving Loopwe are here
  2. Phase 2: Observer + Navigator — Dedicated perception and path-planning agents
  3. Phase 3: Full Agent Roster — Task-specific specialists that the planner can delegate to
  4. Phase 4: Multi-Robot Swarm — Fleet coordination, shared learning, collective memory

Each phase builds on the last. The Observer agent will use the same Sisyphus Loop to improve its perception. The Navigator will learn better paths. When the swarm finally arrives, every agent in it will already know how to get better on its own.

The Numbers

As of v2026.2.20.2:

  • 40,287 lines of code across the framework
  • 1,444 tests passing (Python 3.10, 3.11, 3.12)
  • 8 AI providers — Anthropic, Google, OpenAI, HuggingFace, Ollama, llama.cpp, MLX, Claude OAuth
  • $0 primary brain — Qwen2.5-VL-7B via HuggingFace free inference API
  • ~250ms object detection via Hailo-8 NPU (YOLOv8s, 80 COCO classes)

The whole thing runs on a Raspberry Pi 5 with a Hailo-8 accelerator and an OAK-D camera. No cloud required for basic operation. Cloud models are there for when you want deeper reasoning.

What’s Next

The Sisyphus Loop is implemented and shipping. The next concrete steps:

  • Connect ALMA (Adaptive Long-term Memory Architecture) consolidation to cross-episode learning — finding patterns across dozens of episodes, not just reacting to individual failures
  • Build the Observer agent as Phase 2’s first resident
  • Start collecting real-world episode data from the Pi robot to stress-test the improvement loop

If you want to follow along or contribute: github.com/craigm26/OpenCastor

The boulder goes up. It comes back down. But next time, it goes a little higher.