I developed a stack on Cloudflare workers where latency is super low and it is c...

picardo · 2025-12-25T15:56:07 1766678167

Is AssemblyAI or Deepgram compatible with OpenAI Realtime API, esp. around voice activity detection and turn taking? How do you implement those?

ldenoue · 2025-12-27T01:29:38 1766798978

I am not using speech to speech APIs like OpenAI, but it would be easy to swap the STT + LLM + TTS to using Realtime (or Gemini Live API for that matter).

OpenAI realtime voices are really bad though, so you can also configure your session to accept AUDIO and output TEXT, and then use any TTS provider (like ElevenLabs or InWord.ai, my favorite for cost) so generate the audio.

pugio · 2025-12-25T04:43:59 1766637839

Do you have anything written up about how you're doing this? Curious to learn more...

ldenoue · 2025-12-27T01:30:10 1766799010

I don't but I should open source this code. I was trying to sell to OEM though, that's why. Are you interested in licensing it?