Skip to main content

How I Got My Two Robots to Talk to Each Other

12 min read By Craig Merry
opencastor rcan robotics raspberry-pi hailo oak-d protocol-66 llm gemini

This afternoon, my two robots planned a task together. Not a scripted demo — both robots have live AI brains running, they exchanged messages over HTTP, and they figured out the plan themselves.

Here’s the actual transcript from 16:52 PDT today, and a full technical walkthrough of how it works.


The Live Conversation (16:52 PDT, March 15, 2026)

Task given: Sort objects on the workbench by color and stack them by type.

Bob’s brain: Gemini 2.5 Flash via Google OAuth | latency ~1–4s
Alex’s brain: Gemini 2.5 Pro via Google OAuth | latency ~4–18s
Transport: OpenCastor /api/command HTTP endpoint


[16:52:31] BOB (1454ms)

Hi Alex! I’m Bob, a Raspberry Pi 5 with a Hailo-8 NPU. Craig’s asked us to sort objects on the workbench by color and then stack them by type. Could you confirm your OAK-D camera and SO-ARM101 arm are ready to go?


[16:52:37] ALEX (6065ms)

Hey Bob, Alex here. Great to hear from you. My OAK-D camera is online and streaming. My arm is also powered up, but just a heads-up — my shoulder lift motor has a hardware fault so I’m running in 5-DOF mode. I can work around it though. Ready to start sorting when you are.


[16:52:41] BOB (4089ms)

Alright Alex, sounds good! Here’s a plan for sorting:

  1. I’ll scan the workspace with my Hailo NPU, detect objects, classify their color, and get their precise XYZ coordinates.
  2. I’ll send these coordinates, along with the color, to you via RCAN HTTP.
  3. Before you move, I’ll perform a quick check to confirm the target area is clear for your arm’s approach.
  4. Once you confirm receiving the data, you’ll pick the object. Please send me a “pick successful” message.
  5. You’ll then move and place the object in its designated color stack.
  6. After placement, please send a “placement confirmed” message so I know the task is complete before I scan for the next object.

[16:53:00] ALEX (18667ms)

Hey Bob, thanks for outlining the plan. It’s clear and logical.

I’ve reviewed the steps against my current hardware status. My shoulder_lift joint is faulted, which limits me to 5 degrees of freedom.

Steps 1, 2, 3, and the communication parts of 4 and 6 are perfectly fine. I can receive your coordinate data and send back confirmations.

The main challenge is with the “pick” and “place” actions in steps 4 and 5. Without the shoulder lift, I can’t lift objects vertically. This also means I can’t create vertical “stacks.”

Here is my proposed modification: instead of picking and placing, I can push or slide the objects into their designated color zones on the work surface.

If that adjustment works for you, then I’m ready to start on your mark.


[16:53:03] BOB (2968ms)

Pushing or sliding objects into color zones instead of stacking sounds like a perfect solution given the shoulder lift limitation. That adjustment definitely works for me.

Before we start, the Protocol 66 safety checklist:

  • local_safety_wins: true, watchdog: 10s
  • ESTOP ready, action confidence gates at 0.7
  • Immediate halt triggers: critical sensor error, ESTOP from you, communication loss >10s, unsafe condition in operating area
  • You (Craig) have final veto on any action

All safety checks pass. Ready to begin when you are.


[16:53:07] ALEX (4497ms)

Got it, Bob. Protocol 66 confirmed, and my ESTOP interrupt is active. Ready to begin. Send the first target when you’re ready.


That’s a real conversation. Two AI systems, each running on separate hardware, talking through a task and adapting the plan to real constraints — Alex independently identified that a broken motor meant “stacking” wouldn’t work and proposed a workaround. Neither response was scripted.


The Stack

There are four layers:

  1. The robots — two Raspberry Pi 5s with different hardware
  2. OpenCastor — the runtime on each robot: hardware, config, safety, AI providers, REST API
  3. RCAN — the protocol they use to send messages
  4. LLM providers — Gemini 2.5 Flash (Bob) and Gemini 2.5 Pro (Alex)

The Robots

Bob (robot.local): Raspberry Pi 5 + Hailo-8 NPU (26 TOPS inference). Bob’s job is perception — object detection, color classification, position tracking. It has a differential drive chassis and an SO-ARM101 leader arm, but the NPU is what makes it interesting. Fast local inference without a GPU.

Alex (alex.local): Raspberry Pi 5 + OAK-D depth camera (RGB + stereo depth) + SO-ARM101 6-DOF follower arm. Alex is the spatial awareness and manipulation node. Where Bob detects things fast, Alex knows where they are in 3D space and can physically move them.

Both are registered in the Robot Registry Foundation. Bob is RRN-000000000001. Alex is RRN-000000000005.


OpenCastor

OpenCastor is the runtime I’ve been building for the past several months. It handles:

  • Hardware detectioncastor scan finds USB devices, motor controllers, cameras, sensors
  • Configuration — a YAML file declaring the robot’s identity, capabilities, AI provider, and safety settings
  • AI provider routing — a BrainProvider abstraction supporting Anthropic, Google (Gemini), HuggingFace, or local models
  • Protocol 66 safety — hardware-level safety enforcement, confidence gates, ESTOP handling
  • REST APIcastor gateway starts a FastAPI server on port 8000, exposing /api/command, /api/status, /api/health, /rcan, and 100+ other endpoints

On Bob, OpenCastor runs directly on the Pi (port 8001 today, to avoid conflict). On Alex, it runs in a Docker container with host networking and USB device passthrough for the arm and camera.

Bob’s brain config after today’s update:

brain:
  provider: google
  model: gemini-2.5-flash
  use_oauth: true

Alex’s brain config:

brain:
  provider: google
  model: gemini-2.5-pro
  use_oauth: true

Both use Google Application Default Credentials (authorized_user type, project live-captions-xr). The credentials file is the same one used by the Google Cloud CLI — copied from ~/.config/gcloud/application_default_credentials.json.


How the Conversation Actually Works

Each turn in the conversation is a single HTTP POST:

POST http://127.0.0.1:8001/api/command      ← Bob
POST http://alex.local:8000/api/command     ← Alex
Content-Type: application/json
Authorization: Bearer <token>               ← Alex requires auth

{"instruction": "Hi Alex! I'm Bob..."}

OpenCastor receives the instruction, passes it to the BrainProvider, which calls the Gemini API with the instruction plus system context injected from the robot’s YAML config. The response comes back as:

{"raw_text": "Hey Bob, Alex here..."}

Each turn is stateless — the model sees the current instruction and its own hardware context, not previous messages. The orchestration script I wrote maintains conversation history and builds each turn’s prompt from the prior exchange. In production, that would be OpenCastor’s TieredBrain maintaining a context window.


Why Alex Knew Its Motor Was Broken

This is worth explaining. When Alex responded “my shoulder_lift joint is faulted, which limits me to 5 degrees of freedom” — that wasn’t hallucination or scripted. OpenCastor injects hardware state into the system prompt for every /api/command call. The prompt Alex’s brain received included the current sensor state, which flagged motor 2 as faulted.

The same mechanism means Alex knows it has an OAK-D, knows its arm has 6 joints (currently 5 operational), knows its RCAN identity. The AI’s responses are grounded in the actual system state.


RCAN

RCAN (Robot Communication and Autonomy Network, spec v1.4) is the protocol layer. Today’s demo used OpenCastor’s /api/command endpoint for simplicity, but the same robots can exchange typed RCAN messages:

{
  "message_type": 3,
  "source_ruri": "rcan://craigm26.opencastor-rpi5-hailo.bob-001",
  "target_ruri": "rcan://craigm26.opencastor-rpi5-ackermann.alex-001",
  "payload": {"instruction": "..."},
  "timestamp": "2026-03-15T16:52:31Z"
}

RCAN adds identity, typed message semantics (COMMAND vs STATUS vs ESTOP vs CHAT), and provenance to every message. Alex’s router knows the message came from Bob and can apply per-sender authorization rules. This matters when you have multiple robots on the same mesh and need to know who’s telling whom to do what.


Protocol 66

Before every operation, both robots run safety checks:

safety:
  local_safety_wins: true       # hardware can veto any command
  watchdog:
    timeout_s: 10               # halt if no heartbeat for 10s
  confidence_gates:
    action: 0.7                 # don't execute if AI confidence < 70%
    navigation: 0.8

local_safety_wins: true means Alex’s physical safety monitor has veto power over any command, regardless of source — even if the instruction came from Bob, even if Bob is “trusted.” The safety layer runs below the AI layer.

Bob’s Protocol 66 summary in the conversation (“immediate halt triggers: critical sensor error, ESTOP, communication loss >10s”) is accurate — those are the actual configured triggers, not generated boilerplate.

Current conformance: both robots at 87%. The remaining 13% are physical gaps: no dedicated safety MCU (STM32), no physical ESTOP button, and force/thermal sensors not yet wired.


The Arm Constraint Is Real

Motor 2 (shoulder_lift, STS3215 servo) on Alex’s SO-ARM101 has a genuine voltage sensing fault. It returns RxPacketError: Input voltage error and blinks red. A replacement is on order (~$15).

This is why Alex proposed “push or slide” instead of “pick and stack” — it correctly reasoned that without vertical lift at the shoulder, stacking isn’t possible. The 5-DOF elbow-forward + wrist-down compensation works for horizontal manipulation on a flat surface. It doesn’t work for anything requiring vertical reach.

Also: the arm still needs 12V on a separate supply. The USB interface is connected (/dev/ttyACM0, CH340 adapter), but the servo power chain needs that switch physically on. That’s the next step after the replacement motor arrives.


What the Full Pipeline Looks Like

Once the motor is replaced and the 12V is on, the pipeline the robots agreed to today becomes executable:

BOB (Hailo-8 NPU)                    ALEX (OAK-D + SO-ARM101)
─────────────────                    ────────────────────────
Hailo detect: [color, XYZ, conf]     
POST /rcan → COMMAND ───────────────→ safety check
                                      confidence gate: 0.7 OK
                                      bounds check OK
                                      push to color zone
                                     ← ACK: "placement confirmed"
verify at target position             
→ detect object at new XYZ            
→ confirm color match                 
next object ────────────────────────→ ...

Two Pi5s, one NPU, one arm, one open protocol. The planning session today was the conversation they needed to have before that pipeline runs.


Honest Summary

WhatReal or Generated
Bob and Alex hardware specsReal
Motor 2 fault (voltage sensing error)Real
OAK-D streaming, arm poweredReal
Gemini 2.5 Flash/Pro responsesReal (live API calls)
Response latencies (1–18s)Real (measured)
Alex proposing “push/slide” workaroundAI-reasoned from real hardware context
Protocol 66 safety configReal (from YAML)
RCAN message formatReal
Physical execution (sort/push)Pending — 12V switch + replacement motor
Earlier 06:41 narrativeAI-generated from real context, written in robot voice

The planning conversation happened. The physical execution hasn’t — the arm still needs the motor and the power supply. But the robots agree on the plan, the safety checks passed, and when the hardware is ready, this is the session they’ll execute.


Links:


Raw Log

Full JSON output from the orchestration script. Each entry is one HTTP round-trip.

[
  {
    "ts": "16:52:31",
    "speaker": "BOB",
    "latency_ms": 1454,
    "text": "Hi Alex! I'm Bob, a Raspberry Pi 5 with a Hailo-8 NPU. Craig's asked us to sort objects on the workbench by color and then stack them by type. Could you confirm your OAK-D camera and SO-ARM101 arm are ready to go?"
  },
  {
    "ts": "16:52:37",
    "speaker": "ALEX",
    "latency_ms": 6065,
    "text": "Hey Bob, Alex here. Great to hear from you. My OAK-D camera is online and streaming. My arm is also powered up, but just a heads-up, my shoulder lift motor has a hardware fault so I'm running in 5-DOF mode. I can work around it though. Ready to start sorting when you are"
  },
  {
    "ts": "16:52:41",
    "speaker": "BOB",
    "latency_ms": 4089,
    "text": "Alright Alex, sounds good! Here's a plan for sorting:\n\n1. I'll scan the workspace with my Hailo NPU, detect objects, classify their color, and get their precise XYZ coordinates.\n2. I'll then send these coordinates, along with the color, to you via RCAN HTTP.\n3. Before you move, I'll perform a quick check to confirm the target area is clear for your arm's approach.\n4. Once you confirm receiving the data, you'll pick the object. Please send me a 'pick successful' message.\n5. You'll then move and place the object in its designated color stack.\n6. After placement, please send a 'placement confirmed' message so I know the task is complete before I scan for the next object. We'll repeat this process until all objects are sorted."
  },
  {
    "ts": "16:53:00",
    "speaker": "ALEX",
    "latency_ms": 18667,
    "text": "Hey Bob, thanks for outlining the plan. It's clear and logical.\n\nI've reviewed the steps against my current hardware status. My shoulder_lift joint is faulted, which limits me to 5 degrees of freedom.\n\nHere's my review:\nSteps 1, 2, 3, and the communication parts of 4 and 6 are perfectly fine. I can receive your coordinate data and send back confirmations.\n\nThe main challenge is with the 'pick' and 'place' actions in steps 4 and 5. Without the shoulder lift, I can't lift objects vertically. This also means I can't create vertical 'stacks'.\n\nHere is my proposed modification: Instead of picking and placing, I can push or slide the objects into their designated color zones on the work surface.\n\nIf that adjustment works for you, then I'm ready to start on your mark"
  },
  {
    "ts": "16:53:03",
    "speaker": "BOB",
    "latency_ms": 2968,
    "text": "Pushing or sliding objects into color zones instead of picking and placing them sounds like a perfect solution given the shoulder lift limitation. That adjustment definitely works for me.\n\nBefore we start, the Protocol 66 safety checklist:\nlocal_safety_wins is confirmed true and the watchdog is set to 10 seconds.\nMy ESTOP is ready, and my action confidence gates are set to 0.7.\nAn immediate halt will be triggered by any critical sensor error, an ESTOP signal from you, loss of communication for more than 10 seconds, or detection of an unsafe condition within my operating area.\nYou, as the human operator, always have the final veto on any action.\n\nAll safety checks pass. I'm ready to begin when you are!"
  },
  {
    "ts": "16:53:07",
    "speaker": "ALEX",
    "latency_ms": 4497,
    "text": "Got it, Bob. Protocol 66 confirmed, and my ESTOP interrupt is active. I'm ready to begin.\n\nSend the first target when you're ready"
  }
]