Skip to main content
← Work

Accessibility · XR

LiveCaptionsXR

Spatial captions anchored in 3D so deaf and hard-of-hearing people can follow a conversation inside a headset.

2026 XROn-device ASRFlutterAccessibility
LiveCaptionsXRLiveCaptionsXR — drop a screenshot
~400ms
ASR latency on the Hexagon NPU
Better energy efficiency vs CPU
0
Audio that ever leaves the device

The problem

Flat transcripts don’t work in 3D. In a headset, audio can come from anywhere, and a single scrolling caption tells you what was said but never who said it or where they are. I was born mostly deaf — this is the gap I live with.

The solution

LiveCaptionsXR positions captions in 3D space, near whoever is speaking, and lets them follow people as they move. It’s a production Flutter app with on-device speech recognition — no audio ever leaves the device.

  • ARCore / ARKit spatial anchoring
  • GCC-PHAT stereo direction-of-arrival, fused with vision and IMU via a Kalman filter
  • Speaker diarization so captions attach to the right person

On-device AI

It runs ASR on the Qualcomm Hexagon NPU via the Nexa SDK (Parakeet TDT), with a Whisper GGML CPU fallback on other Android and Apple Speech on iOS. Moving to the NPU roughly halved latency and made all-day headset use thermally viable.

What I learned

Beta testers — Deaf friends — instinctively turned toward wherever a caption appeared, regaining spatial awareness they described as transformative. That single observation drove every iteration on positioning, size, and contrast.

Want something like this, built to ship?