OpenAI’s WebRTC Realtime API makes voice AI feel native in the browser. Simon Willison’s field notes highlight how simple, low-latency audio streams unlock practical, responsive assistants without complex proxies.
Why this matters
WebRTC delivers sub‑second, two‑way audio—ideal for voice agents, live support, and on-page copilots. It also reduces infrastructure overhead by handling media transport in the browser.
How it works (at a glance)
- Issue a short‑lived session token from your server (never expose your API key to the browser).
- In the browser, create an RTCPeerConnection, capture mic input with getUserMedia, and add the audio track.
- Establish the WebRTC session with OpenAI; receive a remote audio track and events for transcripts/state.
- Render the remote audio in an HTML audio element; manage UI with push‑to‑talk or VAD (voice activity detection).
When to use WebRTC vs. REST/WebSockets
- Use WebRTC for low‑latency, full‑duplex voice (assistants, IVR replacements, real‑time coaching).
- Use REST/WebSockets for text chat, batch jobs, or where audio isn’t needed.
Practical tips and pitfalls
- Security first: your server mints short‑lived tokens and enforces origin checks/rate limits. Never ship an API key to the client.
- Autoplay policies: start audio output after a user gesture; set playsInline and ensure the element isn’t muted.
- Audio quality: request echoCancellation, noiseSuppression, and autoGainControl in getUserMedia constraints.
- Connectivity: rely on STUN; add TURN for reliability behind strict NAT/firewalls.
- UX: show connection state, mic level, and a clear “hold to speak” or “mute” control to avoid echo.
- Compliance: review provider data‑usage and retention policies before shipping to production.
Minimal architecture
- Client: Browser (WebRTC, mic capture, UI)
- Server: Token endpoint (auth, short‑lived session tokens), optional TURN
- Model: OpenAI Realtime endpoint handling bi‑directional audio + events
Quick starter checklist
- Create a secure token endpoint on your backend.
- Generate short‑lived session tokens scoped for Realtime.
- In the browser: getUserMedia, RTCPeerConnection, addTrack.
- Handle remote audio track; gate playback on a user gesture.
- Add TURN for production reliability and test across networks.
- Log round‑trip latency and reconnection events.
Sources
- Simon Willison: OpenAI + WebRTC experiments — link
- OpenAI Realtime API docs — platform.openai.com/docs/guides/realtime
- MDN: WebRTC overview — developer.mozilla.org
- OpenAI API data usage policy — openai.com/policies/api-data-usage-policies
Takeaway
WebRTC + OpenAI Realtime lets you ship production‑grade voice agents with sub‑second responsiveness—just secure your token flow, design for autoplay, and test network edge cases.
Get weekly, bite‑size AI build tips. Subscribe to our newsletter: theainuggets.com/newsletter

