Agreed, data channels for uncompressed audio is certainly the most interesting idea of this project for me, but it appears they are having trouble with the audio worklet processing time, so it's not an immediate slam dunk :-/
As a musician, I am thrilled that JackTrip and Jamulus are available to make real-time collaboration possible but they're not simple to install and configure for everyone. I spent numerous sessions trying to get a non-techie friend's Ubuntu laptop setup with Jamulus but we both got so frustrated we gave up. I hope the next generation brings with it easier setup, there's just so much potential.
I get the problem with JackTrip (though JamTrip[1] looks a good start).
Out of interest, what were the problems with Jamulus? I've found it easier than expected (once I'd got Jack set up and happy / understood). On Windows / MacOS it's actually easier as there are are clear, well-trodden install guides (and no Jack / low-latency kernels to explain).
The problem was he's not a techie and only has a Linux laptop because of cost. Trying to get him to understand how to work the shell cost me a year of life.
I've just published an article that goes into depth about how to get a JackTrip Virtual Studio installed. This uses an inexpensive Raspberry Pi instead of requiring installing finicky software on a general purpose laptop or other computer. I wrote this while helping a choir get online, and it worked very well. I hope this helps!
This says ultra low delay, but how low? It seems to me like it's still going to be limited by the browser, and then you're going to have network latency on top of that?
30ms here in Chrome and 23ms in Firefox here. But yeah, same questions -- is this over the network (or no, that's the point of a p2p connection)? What is this measuring..?
My tool is local: it measures the time from when JS says "send this to the speaker" until JS receives that audio back from the mic. It's the client-side round-trip latency.
(As far as I know, there is no way to get the browser to tell you this latency, and you have to measure it.)
I'd love for this to make a difference for simple video calls with relatives -- in my imagination, I set it up, and we find out we've been in the audio equivalent of the uncanny valley this whole time. But I'm afraid in reality, the default audio codecs used in WebRTC are probably good enough to make poor microphones and pure connection latency the bigger issue. Am I wrong?
Obviously, even if I'm right, this has a target audience -- musicians, interviewers and other people who talk for a living -- for which it might be fantastic.
Truly is it uncompressed PCM WAV audio? In that case 44kHz 16-bit 1-channel is 705 kbps.
Are there not any good nonproprietary lossless low latency audio codecs? A brief look at Wikipedia shows a couple that are 5ms or lower, but either they are proprietary or they are low bitrate.
You pay a price for the super-small frame sizes; much worse compression ratios. Frame sizes smaller than 10ms disable the LPC and hybrid coding modes, which are quite advantageous. [0]
In my experiments with transmitting realtime audio signals using Opus, 10ms frame sizes are acceptable for 1-way synchronicity (e.g. if you want the user to perform an action and hear the result as simultaneous with the action) but it's definitely the upper bound. From what I remember my signal processing professor say, the threshold to try and hit is 7ms total latency for truly undetectable processing, but I can't find a reference for that, so take it with a grain of salt.
Robert H. Jack, Tony Stockman, and Andrew McPherson. 2016. Effect of latency on performer interaction and subjective quality assessment of a digital musical instrument. In Proceedings of the Audio Mostly 2016 (AM '16). Association for Computing Machinery, New York, NY, USA, 116–123. DOI: https://doi.org/10.1145/2986416.2986428
The JackTrip Virtual Studio uses an inexpensive Raspberry Pi, and connects a low latency hardware codec directly into the Pi's bus, to achieve a latency of 1mS. At that level, the only other latency you have to worry about is the speed of light (through the internet). I've just published an article about the Virtual Studio at https://wmleler1.medium.com/setting-up-your-virtual-studio-6...
if you're in this space, consider trying out sonobus also. not the author (though he is a friend of mine), haven't tried it yet myself, but seems to have all the right stuff.
Yes, SonoBus looks really promising! I was surprised and also a bit proud to learn that it uses a fork of my AOO library (https://git.iem.at/cm/aoo/). I have been in touch with Jesse for the last couple of days and I'm (passively) following the development of SonoBus.
AOO is a C/C++ library for flexible on-the-fly multi-channel audio streaming and OSC messaging over local networks and the internet. The repo also includes Pure Data externals and Supercollider extensions.
WARNING: AOO is still alpha and there are breaking changes between the pre-releases, but I hope to publish a final release soon!
I am wondering whether using AudioWorklets actually has any benefits in this case.
AFAIU the benefit in general is that they run in the audio thread so the data does not need to be handed over to a different thread for processing. In this case however, the data is handed over to the main thread anyway to be send over the data channel.
Has anyone measured the impact of using AudioWorklets this way?
If you're on mobile, or corporate net, you might not be able to get "punched through" by the other party for actual p2p connection and will need a TURN server.
I'm guessing that an uncompressed data channel with audio in it might send a lot of traffic through it.
I really don't get why everything needs to be in a web browser nowadays. WebRTC is simply the wrong choice for low-delay audio streaming. Audinate's DANTE protocol has been around for 14 years, is affordable, hardware-accelerated, and has practically no avoidable latency when sending audio over the network.
Completely agree, while proprietary what Audinate has done with Dante is nothing short or a miracle. With their PCI-E FPA I have less then a 2ms round trip latency with my Hilo interface at 96Khz resolution. Additionally the virtual sound card can do 4ms latency in one direction with most modern CPUs.
The README says “Multi-machine network music performance over the Internet is achieved” but then later describes how to create multi-person video chat sessions. Comments mention music.
What does this have to do with music? Is it just that remote live music collaboration and multi-machine audio playback are use cases for low-latency codec implementations?
Because it seems like this is an audio codec implemented in the browser that comes with a video chat demo. Am I missing something?
It's a browser implementation of JackTrip, which is low-latency audio conferencing software for musicians, because video conferencing (Zoom etc.) has terrible latency.
The way to have rehearsals over the internet is usually to use JackTrip for the audio and (Zoom etc.) for the video (which can be out of sync; obviously conductors have to tap in an audio metronome channel since video won't be in sync.)
If you've ever been in a recording studio with headphone monitoring / studio talkback, it's like that. Really intimate.
This project uses RTCDataChannel for audio, which is very neat, but it's a shame that once again audio on the web has to be hacked to perform well.