Years ago I was playing around to learn some DSP and made a simple modem for speakers and microphone to send text messages via sound. I used "continuous-phase frequency shift keying" on a couple of different frequency pairs, and the receiver used FFT for decoding...
Never managed to get it working reliably above 300 bps. It would probably be much better to use the Goertzel algorithm tuned to the specific frequencies instead of using chunked FFT.
I'm curious about why some modulation schemes are better than others in this situation, but it seems like a lot of tricky math and information theory is needed to get it.
OK, some Wikipedia articles later: so OFDM basically means you send relatively long pulses of many parallel bit streams on many "orthogonally" spaced frequencies, each stream itself being modulated by some other scheme (in your case QAM, which uses four amplitude levels to encode two bits). The frequency spacing is chosen to avoid spectral interference, and the pulses are spaced to avoid temporal interference. With the relatively long pulse times and the many frequencies, FFT is probably the best method.
If you want to see quiet's configuration flexibility try this in Chrome https://quiet.github.io/quiet-profile-lab/