Seems like this would benefit from not using standard TCP (which assumes that a dropped packet is always due to congestion), and maybe use one of the WiFi retransmission protocols (the names are slipping my mind at the moment).
Might increase that bandwidth from 7kbps to something more comfortable.
Maybe - in my experience with this, when the devices are within a foot or so, there really isn't much packet loss, maybe as low as 10^-6 or so. But I haven't really explored moving further apart. I suspect that this suffers from a sharp knee where it starts out reliable and then very quickly fades off.
To be clear, the 7kbps is the raw frame throughput, not the effective rate with TCP. There are two ways I can see to boost this number. One would be to pack more bits into each symbol, e.g. to use a wider QAM mode. I find that in practice the degradation from using speaker/mic makes this somewhat impractical. The other would be to use a more broadband signal (the width of the the main audible channel is a few kHz). But this is also kind of undesirable since it has to compete with more interference across the spectrum and can also be a less pleasing sound.
some home appliances use this technology. I have a refrigerator that can you hold your phone next too during a support call whereby the refrigerator emits some tones that the support person can receive presumably into some computer program designed to decipher the tones as some error or diagnostic codes.
I'm old enough to know all about those old school modems. I think this is stuff is cool for those niche type applications.
I did contracting work on an Android app which records diagnostic data from smoke alarms. You press a button on the alarm and it emits a high-pitched series of chirps. Phone records, decodes and displays stuff like battery level, time of last recorded alarm etc. The recording scheme was rather simple, only 10 or so bits per second, and used Manchester coding. Most problems I ran into were due to audio hardware differences on different Android phones. Some phones have multiple mics and there is no consistent way to specify which one to use (or, more precisely, to know the location of the mic used). One workaround for this was to record in stereo--phones would usually record the each mic to its own stereo channel. Then the app can look at both, and use the one with a better SNR.
Some phones do proprietary audio filtering and noise reduction at hardware or driver level--you don't get access to the raw recorded audio data. On the upside, all phones these days are plenty fast to do CPU intensive audio processing.
I have a small spy cam that I put on quadcopters. It has WiFi but it has no interface so to configure it you download a phone app and type in wifi credentials then it modules that into sound and plays the sound and the camera picks it up and configures itself and then you can watch the stream live over WiFi.
It is offered in some LG washing machines, to be paired with their iOS app. Unfortunately the last time I tried it, I could not make it work after 10 tries - 'too noisy' 'bring the phone closer to the speaker' etc. I don't live in a noisy neighbourhood either.
For anyone interested, there is an app called Chirp by Animal systems that attempted to make data transfer over audio a relevant form of communication. It actually works quite well to push links from my computer to my phone on Airplane mode with the help of a custom python script.
If there's anything that Chirp does that you feel is missing from Quiet, please let me know. I'd love to expand the feature set and make it more useful.
This would be great for cross platform zero-config party games. A lot of game genres can get by with surprisingly little bandwidth. Although for a noisy environment like a party you may need to do a little mesh networking in order for everyone in a room to be able to participate.
Would it be feasible to implement this on a microcontroller, lets say medium sized Cortex-M or ESP8266? Would be useful in IoT, especially for initial setup phase.
In general, I think you are definitely right about this. Sound does have some potentially interesting applications. For example, if you had an exhibit that you wanted guests to interact with, that would probably eliminate many wireless choices. NFC would be a valid option, but not all phones support it, and you might need to make the user install an app. Quiet's sound transmission can work from the browser by using quiet.js. In general I think it's ever so slightly more versatile, or at least until browsers make it easier to access wireless options.
The use case I had in mind specifically was initial configuration, mainly transferring wlan configuration data to the device. So this would not replace radio interfaces, but instead complement them.
I haven't tested it on that particular hardware but I think it might be feasible. Some modem profiles are more computationally expensive than others, and receiving is more expensive than transmitting.
At some point I'd like to try it on an RPi across a piezo and a cheap mic
You could try making a "real" C library (with proper Makefile etc) out of it and compile it on a Pi, with a pulseaudio (or ALSA) configured to do loopback (i.e. route the audio output to the line-in input).
Well, the JNI means you get to run native code. For something like this, I think that's actually a requisite. My reasoning would be
a) This is real-time and could potentially be disrupted by GC pauses
b) JNI means you get to use OpenSL, the best and lowest latency sound engine on Android
c) This builds on top of libquiet, a C library, which itself builds on liquid dsp, another C library. Rewriting these in Java would be significantly more work than building the JNI wrapper. Especially true for liquid which is a mature library with lots of code
One needs to also take into account what the microphone is plugged into. For instance, you could use this to transmit data over a walkie talkie, but they are choked at around 6khz
Years ago I was playing around to learn some DSP and made a simple modem for speakers and microphone to send text messages via sound. I used "continuous-phase frequency shift keying" on a couple of different frequency pairs, and the receiver used FFT for decoding...
Never managed to get it working reliably above 300 bps. It would probably be much better to use the Goertzel algorithm tuned to the specific frequencies instead of using chunked FFT.
I'm curious about why some modulation schemes are better than others in this situation, but it seems like a lot of tricky math and information theory is needed to get it.
OK, some Wikipedia articles later: so OFDM basically means you send relatively long pulses of many parallel bit streams on many "orthogonally" spaced frequencies, each stream itself being modulated by some other scheme (in your case QAM, which uses four amplitude levels to encode two bits). The frequency spacing is chosen to avoid spectral interference, and the pulses are spaced to avoid temporal interference. With the relatively long pulse times and the many frequencies, FFT is probably the best method.
Actually p2p file transfer via an infrared port[1] was a fairly common feature on laptops back in the day (this was probably before bluetooth took off). If you're on Windows, have a look in the "Network and Internet" section of the Control Panel. You may still have an Infrared utility there if your computer has that functionality.
Might increase that bandwidth from 7kbps to something more comfortable.