Heat Threshold at the Google I/O hackathon — overbuilt a lot of things

The Cerebral Valley × Google DeepMind I/O hackathon ran 2026-05-23 at Shack15 — the Ferry Building atrium, Embarcadero, San Francisco. Doors opened at 9:00 AM PT, build window 10:30 AM – 5:00 PM PT, video + form submission by 5:00 PM. 152 total submissions. I shipped a new organization (HeatThreshold) and a new repository (HeatThreshold/HeatThreshold) — 100% new code, no carry-over from prior heat-safety work. My own one-line summary, posted to X under @CraigMerry at the end of the day, is the right opening for this post: “Overbuilt a lot of things — ‘it should just work’ isn’t really here. Proud but disappointed.”

Composition + verification window: 2026-05-24, Pacific. Repository state, commit timestamps, live-URL HTTP codes, the README placeholder strings, and the live API key state were pinned at this window (see Sources). Times below are translated from UTC to Pacific. X-thread quotes are reproduced verbatim from screenshots in my own camera roll; my @CraigMerry handle is public.

Saturday morning — AI Studio to a working dashboard in about an hour

9:07 AM PT, outside Shack15 (X post timestamp): “Waiting to get into Shack15 at Embarcadero in SF. Google I/O 2026 hackathon will kick off at 10:30. Repo will be public and hope to share some more pictures throughout the day.” I’m in line for the Ferry Building entrance; the Antigravity logo — two stacked white triangles — is mounted behind the check-in counter inside.

10:30 AM — kick-off. I open Google AI Studio first, not Antigravity. The model in the right-hand sidebar is “Antigravity Agent Preview” running antigravity-preview-05-2026 — “A general-purpose autonomous agent running in a remote, Google-hosted Linux environment.” I draft the system instruction directly in AI Studio: “You are Threshold, an environmental scheduling agent. Given a location, an outdoor activity, and a target time, you decide whether the conditions are suitable…” — that single prompt becomes SUBAGENT_SPECS.synthesis.systemInstruction later in the day. Tools on the run: Code execution off, Grounding with Google Search on, URL context on, Filesystem tools off, Network 1 rule.

I post a second time around 11:00 AM PT — “My mind is already blown with Google’s AI Studio with Gemini 3.5 Flash. Within 30 minutes, it got to a usable state: Almost time to export to Antigravity 2.0 IDE/CLI with some MCP tooling.” — attached to a screenshot of the Threshold dashboard, header timestamped May 23, 2026 11:41:33 AM PDT, “Live Google Grounded Plan V3.5”, 3 Sub-Agents Active. SF Ferry Building / Biking with Coit Tower climb / 14:30 → GO NOW · WHITE · 56°F WBGT, with a [WeatherSubAgent] duration 380ms · Parsed NWS forecast showing 59°F dry bulb, 81% relative humidity, generating 55.6°F peak line visible in the trace log. The “V3.5” version number on the header is the AI Studio iteration count, not a marketing tag — by the time the dashboard renders for the camera, I’ve already gone through three full prompt-tune-render cycles in AI Studio.

The first git push to HeatThreshold/HeatThreshold lands at 12:28 PM PT. AI Studio carried the first two hours; Antigravity 2.0 IDE on Windows carried the rest.

What Heat Threshold became

By 5:00 PM the V3.5 prototype had grown into a graph of ten sub-agents — three uploaded once as persistent Managed Agents (threshold-location-subagent, threshold-place-subagent, threshold-synthesis-subagent), one Weather sub-agent that cascades NWS → Open-Meteo → simulated, and six deterministic sub-agents (RouteDirections, RouteOptimization, NavigationArrows, SunPath, RefugeBreak, StreetViewPano). The orchestrator runs the Weather + Place pair in parallel via Promise.all and serializes into Synthesis. Output is a go / delay / alternate verdict grounded in Stull (2011) wet-bulb math and the USMC Order 6200.1E heat-flag table (white / green / yellow / red / black).

Same plan, three render surfaces:

2D bento dashboard. Verdict card, three demo presets (SF, Zilker, Hyde Park), agent activity trace log, Live Watch toggle.
WebXR Spatial HUD. Google Photorealistic 3D Tiles floor, NOAA-derived sun-shade overlay, ground arrows, glass info panels, IMU head-tracking, two modes (cinematic preview + live activity). I tested live-activity mode on the iPhone walking back along the Embarcadero — GPS locked ±3m, route polyline projected over the actual sidewalk in 3D Tiles with the GT: 56°F (WHITE Flag Rated) panel pinned to the route. That moment was the high point of the day, technically.
Voice Mode. Bidirectional WebSocket to a Gemini Live-capable preview model (gemini-2.5-flash-preview-native-audio-dialog by default) that calls runThresholdPlan as a tool. The browser proxies that to /api/plan, the same ten-sub-agent graph runs, and the Live model speaks the verdict back. Live API doesn’t run on Gemini 3.5 Flash yet, so the bridge is the integration.

Plus an observability stack — PlatAtlas for span recording, McpTape for run persistence (Vercel Blob in production, filesystem fallback), McpReplay at GET /api/replay/:runId for deterministic playback — and an Antigravity CLI skill (scripts/threshold-plan.ts) that drives the production endpoint from the developer terminal.

All of that landed in 41 commits across the build window.

The timeline indicts itself

The order in which the commits landed is the retro. Times below are Pacific.

PT time	Commit	What it did
10:30	repo creation	`HeatThreshold/HeatThreshold` created on GitHub
~11:41	AI Studio V3.5 dashboard captured	Working 3-sub-agent prototype, screenshotted from AI Studio export
12:28	first push	Initial scaffold (Vite + Express + Vercel monorepo)
13:23	feat(weather) + feat(xr) types + feat(watch)	Weather provenance, XR schema, Live Watch poll
13:31	feat(observability): add PlatAtlas + McpTape + McpReplay stack	Trace/record/replay before the agent migration
13:40	feat(agents): migrate to Gemini Managed Agents API (bounty)	The actual prize-target work
13:44	feat(skill): ship portable `/threshold-plan` Antigravity skill
13:45	docs: add 1-min video + 3-min live demo scripts (`DEMO.md`)	The script — not the video
14:02	merge PR #1: Vercel deploy split	`createApp` factory + serverless wrapper
14:40	feat(xr): AR camera background, 3D Tiles floor, lit importmap	First scope-creep block on XR
14:44	fix(xr): dedup three.js by externalizing it from 3d-tiles-renderer
14:51	fix(xr): shield against transient xrblocks/CDN bootstrap failures
15:06 → 15:13	three Vercel + map + RouteOptimization fixes	recenter, polyline beautification, refuge threading
15:17 → 15:30	four XR + map commits	cinematic + preview + live-activity modes, road-conforming polylines
15:37	feat(maps-grounding): enable Maps Imagery Grounding widget token
15:46	feat(storage): Vercel Blob backend for McpTape recordings	More observability, ~75 minutes before deadline
15:57	feat(xr): playable journey camera + IMU tracking + glass panels
16:01 → 16:12	three Vercel `api/` patches	`/api/health`, timeout bump, lazy construct
16:20	feat(voice): Gemini Live ↔ Managed Agents bridge	The most demo-able feature — 40 minutes before deadline
16:25 → 16:30	three Vercel deploy fights	`force CJS bundling`, `pre-bundle via esbuild to dodge ESM resolution`, `commit api/index.js placeholder so deploy validates`

Submission cutoff was 17:00 PT. Last commit on 2026-05-23 landed at 16:30 — exactly when the Saturday execution plan’s Stopper F called for finishing the rehearsal and shooting the video. Instead, the last 30 minutes went to esbuild errors in the Vercel function directory.

The observability stack (PlatAtlas + McpTape + McpReplay) landed nine minutes before the Managed Agents migration — meaning the first feature I shipped on hackathon Saturday was a trace/record/replay system to make a not-yet-existing demo deterministic. The Vercel Blob backend for McpTape — a hardening of that same observability stack — landed at 15:46, when I should have been recording the video. Voice Mode, the single most cinematic feature in the build, was wired up at 16:20 with no time left to rehearse or capture it.

The unfilled README placeholders are the smoking artifact:

| **Production dashboard** | `TODO_PROD_URL` |
| **1-min demo video** | `TODO_VIDEO_URL` |

The deploy is live. The submission table never got rewritten.

My own assessment, posted publicly

Saturday evening, four posts in a row on X under @CraigMerry:

“Overbuilt a lot of things — ‘it should just work’ isn’t really here. Proud but disappointed.” heat-threshold.vercel.app · github.com/HeatThreshold/…

“Tooling — meaning agent traces for logical steps” (replying to the overbuilt-a-lot-of-things post, clarifying which thing got overbuilt)

“152 total submissions.”

“A lot of shiny demos. This was my weakest area. Should’ve spent more time on just the essence.”

The fourth post is the lesson in one sentence. The shiny-demo category was a hackathon weakness; the observability stack and the WebXR HUD are infrastructure-class artifacts on a day that needed essence-class artifacts.

The Antigravity / AI Studio / Managed Agents ecosystem

Some of this is positive, all of it is honest. A second X thread the morning after — nine posts and an addendum — covered the same observations; this is the longer, source-pinned version.

AI Studio as the recommended entry point. This was right. The model surfaced in the run sidebar was the Antigravity preview, not stock 3.5 Flash. Grounding with Google Search and URL context were both on by default. The first working dashboard — three sub-agents, real NWS-grounded WBGT, USMC flag mapping, a verdict card — existed inside AI Studio about 30 minutes after I opened the laptop, on a Tier 1 Google-supplied account. The same account was depleted by the time I exported to Antigravity. AI Studio is extraordinarily impressive and extraordinarily expensive at this tier; it’s the right entry point if your budget assumes you’ll graduate to production billing.

Antigravity 2.0 IDE / CLI. Build environment for the day after the AI Studio export. Early-stage UX quirks — the biggest one was Windows-side npm dependency resolution, which I burned hackathon credits on across the morning. The science-plugin model — pluggable sources for actual citations and modeling that the agent treats as first-class inputs — was the most interesting product decision I saw all day. The winners’ project used it for a full mRNA research simulator. I had a brainstorm in the same vein (protein evolution under heat exposure) and deemed it too risky for the seven-hour window. More on that below.

Managed Agents API. Real product, not a sprinkle. Three persistent Agent definitions uploaded once per process via ai.agents.get-or-create, then driven by ai.interactions.create on every request. Parallel fan-out via Promise.all is the canonical pattern and the docs are explicit about it. The bounty was “Best Use of Managed Agents” and the architecture answers it; the submission video that would have shown the architecture to a judge does not exist.

Gemini Live. The Live API doesn’t run on Gemini 3.5 Flash yet (as of 2026-05-23) — it runs on a separate preview model. Bridging Live to the Managed Agents pipeline through function calling is the most interesting integration in the repo. It shipped 40 minutes before the deadline.

What I wanted to use and didn’t. I wanted to use the stitch MCP for visual polish on the dashboard. I made the call early in the build to focus on observability first. That was the wrong call. Observability is something a B2B buyer asks about after they want the product; a hackathon judge needs to want the product first.

Post-hackathon state — the part that hurts the most

The day after submission I posted: “I’ll leave the repo up, but Google already disabled the developer api’s that were powering the neater features. I’m not going to re-enable those features. I’ll be migrating some of the core backend concepts into HeatCompass.com — keep iterating on that.”

Verified at composition window:

heat-threshold.vercel.app → 200, dashboard renders.
GET /api/health → 200, response body still claims hasGeminiKey: true, hasGoogleMapsKey: true, hasBlobToken: true.
GET /api/plan?demo=sf-route → 200, full cached PlanResult returns instantly — because the demo presets are baked-in fixtures, not McpReplay artifacts.
POST /api/plan with a fresh {location, activity, time} body → 400 API_KEY_INVALID from generativelanguage.googleapis.com. The live agent pipeline is dead.

So the cached SF / Zilker / Hyde Park presets still work because they’re bundled into the deploy. Everything that requires a live Managed Agents call — the freeform plan endpoint, Voice Mode, the Antigravity CLI skill, Live Watch’s re-tick — is offline.

There is an irony here worth naming. The observability stack — the thing that ate the first hour of feature time — was built so a stage demo would survive the network dropping out. The thing that actually killed the live demo two days later wasn’t the network; it was Google revoking the hackathon-provisioned API key. McpReplay would have saved that exact case if I’d recorded a run and committed the trace before the keys died. I did not. The fallback path is the baked-in fixtures, which work, which is fine for a portfolio link, which is not the recording I wish I had.

What comes next

Two threads.

Heat effects protein simulation. The brainstorm I deemed too risky for Saturday. I’m going to do it now, outside the hackathon, as a sandbox specifically to learn the Antigravity science-plugin pipeline end-to-end. The winners’ mRNA simulator is the reference point; my version is whatever the smaller, well-scoped slice of “what does heat actually do to a protein’s structure” looks like under the same plugin model. The goal is to see what the science-plugin workflow teaches the developer through the rendering — not to publish a research result.

HeatCompass migration. A migration-plan doc landed in the repo Sunday morning (docs/HEATCOMPASS_MIGRATION.md). The plan inventories which pieces of Heat Threshold are portable (wet-bulb math, USMC flag mapping, refuge-break scheduler, McpTape audit trail) and which are demo-side (XR, Voice Mode, McpReplay browser UI). The destination is a separate HeatCompass organization pursuing three B2B wedges — occupational safety SaaS, endurance event ops, insurance / workers’ comp risk scoring. Heat Threshold itself stays a hackathon artifact; the durable parts move.

If I had an active problem grounded by a scientific need or a Google API surface, I’d absolutely start with this pattern and ecosystem. I don’t. I’m a solo developer who works on side projects, and the pattern’s gravity field assumes a different shape of work. That’s the honest read on the ecosystem from the outside.

Sources

Facts in this post were pinned against the following at composition time (2026-05-24, Pacific):

Commit timestamps + count. gh api repos/HeatThreshold/HeatThreshold/commits --paginate -X GET -f per_page=100 (41 commits between 2026-05-23 12:28 PT and 2026-05-24 07:42 PT).
Repo metadata. gh api repos/HeatThreshold/HeatThreshold (created 2026-05-23T17:30:17Z = 10:30 AM PT, default branch main, language TypeScript, MIT).
README placeholder state. grep -n "TODO_" /tmp/heat-readme.md against the live README.md blob via gh api repos/HeatThreshold/HeatThreshold/contents/README.md — TODO_PROD_URL, TODO_VIDEO_URL, TODO_RUNID all still present.
Live-URL probes. curl -s -o /dev/null -w "%{http_code}" against heat-threshold.vercel.app/ (200) and /api/health (200, env-key presence flags); GET /api/plan?demo=sf-route returns the full bundled fixture (200); POST /api/plan with a fresh body returns 400 API_KEY_INVALID from generativelanguage.googleapis.com.
Migration scope. docs/HEATCOMPASS_MIGRATION.md in the repo.
Hackathon timing brief. /home/craigm26/HeatSentry/hackathon/plans/2026-05-23-saturday-execution-plan.md (build window, stoppers, hour-by-hour plan) and HACKATHON_BRIEF.md (event identification, 5:00 PM PT cutoff).
AI Studio prototype state. Screenshot in author’s camera roll dated 2026-05-23, header Live Google Grounded Plan V3.5 · 3 Sub-Agents Active · May 23, 2026 11:41:33 AM PDT; AI Studio “Run settings” sidebar showing model antigravity-preview-05-2026 (“Antigravity Agent Preview”) with the Threshold system instruction visible.
Public X-thread quotes. @CraigMerry, 2026-05-23 9:07 AM PT through 2026-05-24 8:41 AM PT — four day-of posts (“Overbuilt a lot of things…” / “Tooling — meaning agent traces…” / “152 total submissions.” / “A lot of shiny demos…”), the next-day API-revocation post, and the nine-post ecosystem thread quoted in the body.