Skip to main content

Labs

WebAI: chat with this site

A privacy-first, client-side chat that uses WebGPU models (WebLLM) and on-device embeddings to answer questions about my resume and portfolio.

WebAI

Chat with this site

Private · Runs in your browser

Answers are generated locally with a WebGPU-friendly model (WebLLM) and a small retrieval index. No data is sent to a server.

Best on desktop or modern mobile browsers with WebGPU.

Ask about the resume or portfolio highlights.

How it works

  • • Content chunks live in /ai/webai-chunks.json.
  • • A local embedder (@xenova/transformers) builds vectors in your browser.
  • • Retrieval ranks the most relevant chunks and feeds them to WebLLM.
  • • Nothing leaves your device; GPU-accelerated when available, falls back to WASM.

Performance tradeoffs

  • Model size vs speed: 3B/4B models load fast and answer quickly; 7B improves quality but costs bandwidth and startup time.
  • WebGPU vs WASM: WebGPU is fastest; WASM fallback is private but slower on first-token latency.
  • Embeddings build-time vs client-time: Precomputing keeps TTFB low; on-the-fly embedding adds seconds on cold start.
  • Chunking: 300–500 token chunks balance recall with smaller vector payloads; reduce chunk count to speed load.
  • Payload split: Ship per-section indexes (e.g., resume-only) to keep initial downloads small.