Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem with PipeCat and LiveKit (the 2 major stacks for building voice ai) is the deployment at scale.

That’s why I created a stack entirely in Cloudflare workers and durable objects in JavaScript.

Providers like AssemblyAI and Deepgram now integrate VAD in their realtime API so our voice AI only need networking (no CPU anymore).



let me get this straight, you are storing convo threads / context in DOs?

e.g. Deepgram (STT) via websocket -> DO -> LLM API -> TTS?


Yes DO let you handle long lived websocket connections. I think this is unique to Cloudflare. AWS or Google Cloud don't seem to offer these things (statefulness basically).

Same with TTS: some like Deepgram and ElevenLabs let you stream the LLM text (or chunks per sentence) over their websocket API, making your Voice AI bot really really low latency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: