I mean, it's just doing STT, it's not trying to run inference on a frontier model, and if you have to connect to a remote datacenter vs your own machine that could add latency that slows down the experience. There are also tradeoffs in deployment in terms of latency vs throughput that doesn't occur on your local machine that only ever is doing STT on one thing at a time.
I'm skeptical of the claim that local is literally faster, but it's not impossible in the way you suggest.
If you watch the video you'll see that Aqua does more than just STT and why we need the frontier models. I think if you want just STT of what you said verbatim, then yea any local version of Whisper is great. If you want the enhancements shown in the video, you need the frontier models. And doing that locally is slow.
I'm skeptical of the claim that local is literally faster, but it's not impossible in the way you suggest.