Skip to main content

Closing Civqo After 11 Days: The Actor Runtime and Why I'm Returning to ContinuonAI

16 min read By Craig Merry
Civqo ContinuonAI Actor Runtime Brain B Embodied AI Vibefounding Claude Code Startups Vibe Coding

Update — Ethan Mollick on vibefounding fit

Ethan Mollick noted that in his vibefounding MBA class, he required students to anchor AI startups in domains where they have deep experience, world-class skills, and genuine passion. The takeaway: speed and tooling are not enough—enduring advantage comes from domain depth and personal expertise.

I started the "vibefounding" MBA class by giving students my Voight-Kamppff Quiz: Which work do you have deep experience in? Which skills are you considered world class in? What do you do outside work that you LOVE & have knowledge about? Their AI startups had to build on these.

Eleven days ago I started building Civqo—a platform to visualize codebases as cities. 182 commits. Stripe billing. WebSocket infrastructure. Real-time VMs with 300ms wake time.

Today I’m shutting it down.

Vibefounding: The New Reality

Ethan Mollick recently observed that we’re entering an era where spinning up startups takes days, not months. He called it “vibefounding.”

Vibefounding is here. Claude Code shipped two years after function calling. The barrier to starting a company has never been lower. But the flip side nobody talks about: the barrier to *killing* a company has never been lower either.

The flip side nobody talks about: startups are now easier to kill too.

When spinning up costs you months of runway and a team’s time, you defend your territory. When spinning up costs you 7 evenings and some API bills, you can afford to be ruthless about what deserves your attention.

When you can build a near-production SaaS in 7 evenings with Claude Code, the cost of trying is low. But so is the cost of admitting you’re not the right person for the job.

What 11 Days Actually Looked Like

Here’s the raw commit log from Civqo:

DayCommitsWhat Got Built
123Turborepo monorepo, Next.js 14 scaffold, basic routing
231Cloudflare Workers backend, Hono API, D1 database schema
328Authentication with Clerk, user sessions, JWT handling
419City visualization prototype, WebGL renderer, tile system
522Real-time infrastructure, WebSockets, presence system
617Fly.io Machines for code execution, 300ms cold start
715Stripe billing, subscription tiers, usage metering
812”Ralph Loop” autonomous agent, context recovery
98Polish, error handling, edge cases
104Documentation, onboarding flow
113Final fixes… then shutdown decision

182 commits. 11 days. 7 evenings of actual work.

This wasn’t a prototype. This was:

  • Production authentication
  • Real-time WebSocket infrastructure
  • VM orchestration with sub-second wake times
  • Billing that could actually charge customers
  • An autonomous AI coding agent

Two years ago, this would have been a 3-month project with a team of 4.

The Ralph Loop: When AI Builds AI Tools

The most surreal part was building the “Ralph Loop”—an autonomous AI coding agent that could:

  1. Receive a task description
  2. Explore the codebase to gather context
  3. Make changes across multiple files
  4. Run tests and fix failures
  5. Commit and create PRs

I built an AI coding agent… using an AI coding agent.

The recursive absurdity of this didn’t hit me until day 8. Claude Code was writing the code that would let Claude Code work better. The feedback loop was so tight that I couldn’t tell where my intent ended and the AI’s implementation began.

Human intent → Claude Code → Infrastructure code →
  → Ralph agent code → Ralph using Claude Code →
    → More infrastructure code → ...

This is what vibefounding feels like. You’re not writing code anymore. You’re directing a system that writes code, and sometimes that system is writing itself.

The Ralph Loop works too well. When AI can build 80% of your product, the remaining 20%—the taste, the vision, the domain expertise—becomes everything.

Why I Killed It

Three realizations crystallized on day 11:

1. Domain Expertise Still Matters

Claude Code can build anything I can describe. The problem: I couldn’t describe what would make Civqo compelling.

I know backends. I know infrastructure. I know how to wire Stripe and deploy to Cloudflare. But game design? Spatial visualization that feels fun? Level progression that keeps users engaged?

Claude Code amplifies expertise. It doesn’t create it.

2. The Landscape Shifted Under Me

While I was building, others were shipping:

IsoCity is an open source fully featured city builder with pedestrians, cars, boats, trains, planes, helicopters, emergencies, and much more. Pull requests welcome.

Andrew Milich’s IsoCity appeared with a full feature set—pedestrians, vehicles, emergencies, the works. Open source. Pull requests welcome.

Announcing @elysian_labs first product today: Auren! Auren is a paradigm shift in human/AI interaction with a goal to improve the lives of both humans and AI.

Near’s Auren pushed human-AI interaction in directions I hadn’t considered.

Just shipped a new project. Vibefounding in action—from idea to working prototype in a weekend. The tools have changed everything.

Even Ethan Mollick himself was spinning up projects.

The AI landscape moves so fast that your competitive analysis is outdated before you finish reading it. In 11 days, multiple projects emerged that either overlapped with Civqo or made its core thesis less compelling.

3. Opportunity Cost is Now Measurable in Days

Old calculus: “If I abandon this project, I’ve wasted 6 months.”

New calculus: “If I continue this project, I’m spending days I could use on something where I have actual domain expertise.”

When projects take months, sunk cost fallacy kicks in hard. When projects take days, you can be rational. The 11 days I spent on Civqo taught me enough to know I shouldn’t spend day 12.

So I’m pulling the plug. Eleven days in. No regrets.

The New Startup Calculus

Here’s what vibefounding changes:

Speed of Validation

Before: Build MVP (3 months) → Get feedback → Iterate or pivot Now: Build production app (1 week) → Get feedback → Iterate or kill

You can now validate with a real product, not a prototype. Civqo wasn’t a landing page with a waitlist. It was functional software with billing. I could have charged customers on day 7.

Portfolio Approach

Before: Pick one idea, commit for 18 months, hope it works Now: Try 3-4 ideas in the time it used to take to build one, keep what sticks

I’m not abandoning startups. I’m abandoning the model where you pick one bet and ride it. Civqo was attempt #1 of 2026. There will be more.

Death is a Feature

Before: Killing a project meant admitting defeat Now: Killing a project means clearing the queue for the next one

I killed Civqo on day 11 with zero regret. The code exists. The learnings exist. If someone wants to fork it and add game design expertise, they can. For me, it’s time to move on.

Verticals Under Attack

Look at what’s happening across the AI landscape:

  • Code visualization: Multiple projects emerged in the same week
  • AI agents: New frameworks daily, each slightly better than the last
  • Developer tools: The “best” tool changes monthly
  • Infrastructure: What was cutting-edge in December is table stakes in January

No vertical is safe. No moat lasts more than a quarter. The only sustainable advantage is velocity—the ability to move faster than the landscape shifts.

Vibefounding isn’t just about starting fast. It’s about killing fast when the landscape tells you to. It’s about treating projects as experiments, not commitments. It’s about optimizing for learning velocity, not sunk cost.

The 11-Day Rule

Here’s my new heuristic: If you can’t articulate why you’re uniquely positioned to win after 11 days of building, kill it.

Not 11 days of thinking. Not 11 days of market research. 11 days of actually building the thing, feeling the friction, seeing what emerges.

Civqo taught me I’m not a game designer in 11 days. That knowledge would have taken 6 months in the old model. The acceleration isn’t just in building—it’s in learning what not to build.

The Uncomfortable Truth

Here’s what I didn’t want to admit on day 1: I started Civqo because I could, not because I should.

The ability to build fast creates its own momentum. “I can ship a city visualization platform in a week” becomes “I should ship a city visualization platform in a week.” The tool creates the desire.

This is the dark side of vibefounding. The barrier is so low that you start projects you shouldn’t. You build because building is now nearly free, not because the project deserves to exist.

Civqo deserved to exist as an experiment. It didn’t deserve to exist as a company. The difference took me 11 days to see.

What I’m Taking Forward

1. Expertise-First Selection

Next time, I start with: “What do I know deeply that others don’t?” Not: “What can Claude Code build quickly?”

For me, that’s embodied AI. ContinuonAI has months of accumulated context—cognitive architecture research, hardware bring-up scripts, safety protocols. That’s where I have actual edge.

2. Kill Fast, Learn Faster

The goal isn’t to avoid failures. It’s to fail in days instead of months. Civqo was a successful failure: I learned what I needed to learn in minimal time.

3. The Runtime Insight

The most valuable thing I took from Civqo was understanding the Actor Runtime problem—the missing layer between “notebook demo” and “production system.”

That insight is now core to ContinuonAI’s architecture. Brain B runs in the Actor Runtime. The runtime generates training data. The closure of Civqo directly informed the design of something better.

What I’m Returning To: ContinuonAI

While Civqo was a detour, ContinuonAI has been running in the background for months. It’s an embodied AI project: a cognitive architecture on a Raspberry Pi 5 that can learn on-device, run without the cloud, and treat safety as executable code rather than a slide deck.

But there’s been a gap in the architecture. Something I couldn’t articulate until I read Ashpreet Bedi’s post about the AI stack.

AI Engineering Has a Runtime Problem

Claude Code shipped two years after function calling. Models have outpaced the application layer. We have frameworks to build agents, observability to trace them, evals to test them. But nothing to run them. AI engineering has a runtime problem.

Here’s the AI stack as it exists today:

┌───────────────────────────────────────────────────────────────┐
│  AI Application           The product we're building         │
├───────────────────────────────────────────────────────────────┤
│  Control Plane            Admin UI to manage and deploy       │
├───────────────────────────────────────────────────────────────┤
│  Runtime ◄── MISSING      Execution layer: state, isolation,  │
│                           streaming, recovery, scale          │
├───────────────────────────────────────────────────────────────┤
│  Frameworks               Code that orchestrates agents       │
├───────────────────────────────────────────────────────────────┤
│  Observability            Logging, tracing, debugging         │
├───────────────────────────────────────────────────────────────┤
│  Models                   LLMs that power agents              │
├───────────────────────────────────────────────────────────────┤
│  Infrastructure           Compute: GPUs, cloud, networking    │
└───────────────────────────────────────────────────────────────┘

Every layer has tooling. Except the runtime.

The runtime is what turns a Python script into a working product. It manages state across sessions. It streams responses without breaking. It recovers when containers crash mid-reasoning. It provides request-level isolation so User A’s context never touches User B’s memory.

This is hard. Really hard. And every AI team builds it from scratch.

Why the Runtime is the Apex Moment

Traditional web apps are stateless request-response cycles. Scale by adding servers. Solved infrastructure.

Agents break this completely.

An agent isn’t a request-response cycle. It’s a long-running, stateful process. It thinks, calls a tool, waits, reasons, responds. A single session can span minutes, hours, or days—paused while waiting, resumed on input, cancelled if the user walks away.

The runtime is where real-world actions happen. It’s where training data gets generated. It’s where the agent’s beliefs meet reality and get corrected.

For embodied AI, this is everything. A robot brain that only exists in a notebook is a demo. A robot brain that runs in a proper runtime—with state persistence, crash recovery, and isolation—is a product that generates data.

Brain B: The Actor Runtime for Embodied AI

ContinuonAI has two brains:

BrainComplexityPurpose
Brain B~500 LOCSimple, teachable, runs in production
Brain A~50,000 LOCComplex neural network, trained offline

The insight: Brain B runs in the Actor Runtime. Brain A gets trained by the episodes Brain B generates.

┌─────────────────────────────────────────────────────────────────┐
│                     ACTOR RUNTIME                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  State Management: Sessions persist across restarts      │   │
│  │  Streaming: Durable SSE/WebSocket with backpressure      │   │
│  │  Recovery: Checkpoint mid-conversation, resume seamless  │   │
│  │  Isolation: User contexts never bleed                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│                              ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  BRAIN B (~500 LOC)                                      │   │
│  │  • Conversation engine (natural language)                │   │
│  │  • Teaching system (learn behaviors from demos)          │   │
│  │  • Sandbox gates (permission validation)                 │   │
│  │  • Event store (append-only action log)                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│                              ▼                                  │
│                    RLDS Episodes (training data)                │
└─────────────────────────────────────────────────────────────────┘

                               │ overnight training

┌─────────────────────────────────────────────────────────────────┐
│  BRAIN A (~50,000 LOC)                                          │
│  • HOPE architecture (wave-particle hybrid dynamics)            │
│  • Continuous Memory System (L0-L2 hierarchy)                   │
│  • LoRA training (bounded parameter updates)                    │
│  • Validation gates (safety checks before promotion)            │
└─────────────────────────────────────────────────────────────────┘

                               │ validated weights

                    Skills propagate back to Brain B

The training loop:

  1. Brain B generates episodic memories while running in the Actor Runtime
  2. Episodes export as RLDS (observation-action pairs)
  3. Brain A trains overnight on curated episodes
  4. Validated weights update procedural skills
  5. Skills propagate to all Brain B instances

The runtime isn’t just infrastructure. It’s the apex moment—the point where the simple brain meets reality and generates the data that makes the complex brain smarter.

Memory = Filesystem

Brain B’s memory system maps directly to the filesystem, aligned with the CMS (Continuous Memory System) three-level model:

Memory = Filesystem
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ Procedural Memory (L2)                              │   │
│  │  → agents.md       (agent behavior definitions)     │   │
│  │  → capabilities.json (tool configurations)          │   │
│  │  → guardrails.json (safety constraints)             │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ Semantic Memory (L1)                                │   │
│  │  → skills/         (learned behaviors)              │   │
│  │  → knowledge/      (domain facts)                   │   │
│  │  → behaviors.json  (taught patterns)                │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ Episodic Memory (L0)                                │   │
│  │  → conversations/  (session transcripts)            │   │
│  │  → events.jsonl    (append-only log)                │   │
│  │  → rlds_episodes/  (training data)                  │   │
│  │  → checkpoints/    (state snapshots)                │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The CMS hierarchy with decay rates:

LevelMemory TypeDecayPurpose
L2Procedural0.999How to act (stable skills)
L1Semantic0.99What things mean (knowledge)
L0Episodic0.9What happened (sessions)

Memories consolidate upward: episodic patterns become semantic knowledge, and stable semantic knowledge (rarely) gets promoted to procedural skills. Files don’t literally decay, but retention policies simulate it—episodic data archives after 30 days while procedural skills require human approval to modify.

Training Through Games: RobotGrid

To generate clean training data, I’m building a simple game called RobotGrid—a multi-level puzzle game that:

  1. Generates RLDS episodes for Brain A training
  2. Implements world modeling with surprise metrics
  3. Provides semantic search over game state history
  4. Enables next-action prediction training

The game has progressive difficulty tiers:

TierLevelsMechanicsBrain Capabilities Tested
1 - Tutorial1-3Basic movementIntent classification
2 - Keys4-6Keys, doorsState tracking, planning
3 - Hazards7-10Lava, sandboxSafety gates, risk assessment
4 - Puzzles11-15Boxes, buttonsMulti-step planning
5 - Challenge16-20Combined, timedFull capability integration

The game integrates with the three-timescale architecture:

TimescaleGame ComponentBrain Integration
Fast (τ=10ms)Collision detection, sandbox denialSafety reflex patterns
Mid (τ=100ms)Action execution, state updatesWorking memory context
Slow (τ=1s+)Episode export, world model trainingCloud-based RLDS training

Every game session becomes training data. Every action gets logged with predicted state, actual state, and surprise metric. The simple, repeatable nature of the game means Brain A can learn from thousands of clean episodes—not messy real-world data.

Why This Architecture

Three principles drove the design:

1. Simple brains generate better training data than complex ones.

Brain B is ~500 lines of code. It’s inspectable, debuggable, predictable. When it makes a mistake, I can trace exactly what happened. When it succeeds, the success is clean data—not lucky inference from a black-box network.

2. The runtime is the bridge between demo and production.

Every AI tutorial ends with a notebook. Production is left as an exercise for the reader. By building the Actor Runtime as a first-class component, Brain B becomes deployable from day one. It can run on a Raspberry Pi in my garage, crash, recover, and keep generating episodes.

3. Validated promotion prevents catastrophic forgetting.

Brain A doesn’t just learn from any episode. It learns from validated episodes that passed Brain B’s sandbox gates. If validation rate drops or Lyapunov energy diverges, the system halts and rolls back. No silent degradation.

Next Steps

  1. Finish the Actor Runtime in ContinuonXR with full state persistence, crash recovery, and request isolation
  2. Wire Claude Code hooks so every session generates RLDS episodes automatically
  3. Run Brain B on the Pi 5 and let it generate training data while I sleep
  4. Train Brain A overnight and measure if validation rate improves without cloud calls
  5. Ship RobotGrid levels and start collecting clean game episodes

The hypothesis: if Brain B runs reliably in the Actor Runtime, generating clean episodic data, Brain A will get smarter autonomously. The runtime isn’t just infrastructure—it’s the training loop.


11 days on Civqo. 182 commits. Zero regrets.

Day 12 starts now: making the Actor Runtime the apex moment for ContinuonAI.