For super low bitrate speech, (1.44MB/30 minutes is 6.4kbps), codec2 is better than Opus, and can go to even lower bitrates. When combined with WaveNet, the audio quality is incredible, though required compute power is ridiculous.
Broad compatibility is important, but is there someone who has actually hooked up a USB floppy drive to an Android or iPhone or iPad and read files off of it?
ha no, i don't think so :-) All these devices didn't exist when the floppy drive went out of vogue. I'm not sure if there even are USB floppydisk drivers for Android or iOS. Technically Android is linux (and iOS bsd) so I suppose there is a /dev/fd0 capability somewhere.
I meant that LPCnet is really awesome, but when I would distribute a podcast in this format (floppy or rss) I would like it to be able to be played on devices and OS's with support for it - otherwise the audience can't play it.
> For super low bitrate speech, (1.44MB/30 minutes is 6.4kbps), codec2 is better than Opus
At the top end of that, I think that's not the case. Certainly starting at 3200bps, Opus just becomes a non-option (and codec2 becomes an option).
I have a trove of librivox audiobooks in codec2 from some tests I did, and I've given some of them extensive listens to get a feel for it. Without an improved decoder that isn't a WaveNet, it's still not always fun trying to understand what is said.
This reminds me of MIDI files and how depending on one's sound device, would sound subtly different.
My Sansa MP3 player, Acer laptop, and gateway PC all had different ways of 'rendering' the Orchestra hit instrument in particular, which made for a rousing Termina Field listening experience on the bus or at Grandma's compared to the family computer.
The last time I looked at the Web Audio spec it appeared to leave rendering to the implementation. So regarding aliasing with OscillatorNode[1]:
"It is expected that an implementation will take some care in achieving this ideal, but it is reasonable to consider lower-quality, less-costly approaches on lower-end hardware."
Luckily nearly no one seems to be using this API so I doubt there really are anywhere near as many "approaches" as you can hear from MIDI file players.
[1] It is expected that an implementation will take some care in achieving this ideal, but it is reasonable to consider lower-quality, less-costly approaches on lower-end hardware.
I was thinking more that one would use mechanical turk to annotate each soundfile with these tags for the set of all podcasts. Then use that as training data to a ML algorithm that gets used to render the output for each tag.
So I'd use my archiveaudio tag from the example and the algo would output the band-limited frequencies of what sounds vaguely like a cop with speech patterns signaling defensiveness and irritation.
It would all sound perfectly like a podcast if you were listening to it in a sleep state. Then if you concentrated on it you'd realize there's no coherent plot or even any actual natural language being spoken in the entire recording.
We really need some kind of highly compressible alternative format for podcasts. Representing speech with a very limited set of code points. It might be a bit lossy, so you could lose the tone of the voice and other speech characteristics, but there could be some ways to still represent that by carefully picking the words you utter in the podcast.
Of course I'm only spitballing here, this would require some pretty advanced research to be made.
> We really need some kind of highly compressible alternative format for podcasts.
WebVTT, reproduced at the listener side by TTS? At some point, there's a quality chasm that can only be leaped by using a format other than sampled audio.
Uh.. isn’t that pretty much what things like lpcnet and wavenet and codec2 are (discussed here) doing? These are codecs that are speech specific. I don’t understand what unique idea you are trying to “spitball” here.
Lots of advanced research has been done in this area. These are the results as of now.
I love the idea, because there's something to almost ideal to me about the 3.5" floppy disk format. I wish I had some reason to use them today, and so I really want this to work. However the audio quality is just terrible; it may be intelligible, but it is unlistenable IMO.
3.5mm floppies combined a lot of factors that appealed to me. They were cheap enough you could give one to someone without wanting it back, they were contained entirely within the computer so no danger of breaking it off if you leave it in there too long. Unlike CDs they were rewritable and unlike CDRWs they worked properly no matter what PC I was writing or reading from. Not to mention the thoroughly satisfying "chunk" sound when popped into a drive. Network file transfers will never have that "shove the data right into the computer, bam!" feeling.
There was also the tactile feeling of flipping through a stack of floppies looking for one specific disk. Now that everything is so high capacity, you almost never have to physically search for data. Of course the modern way is superior in almost every way, except it was a kind of wonderful sensation holding a disk box full of your stuff.
Not so much fun when you where doing an Oracle Forms client side app, I recall having to use 13 or 14 floppy's in the correct order before I could even install that actual client side application.
USB flash drives are the modern floppies really. You can get one with a few GBs for literally pennies to the point where you can even get some "promotional" flash drives for free, similar to how in the past people gave out "promotional" (writing) pens for free. On the high end side you can have flash drives with as high as 2TB of storage (though the only one i've found is a bit bulky and the more regular sized one has "only" 1TB of storage).
Surprisingly, unlike the pennies-per-diskette garbage of the late 90s, Sony 3.5 disks from the 80s were stiffer, heavier, and massively more reliable. I have some 800k Mac disks that still read perfectly well. Probably other brands as well, but I was always impressed with Sony media from that era.
It would be.. spooky. Just like the lady who voiced Siri just had to record sounds (letters and combinations of letters) that can be combined for text-to-speech, one could do that for podcasters...
But then it means you can make your podcaster say anything you want.
So, no podcast yet, right? Didn't want to miss anything. That's a pretty cool idea. I'm all for some kind of subculture where media have tiny size limits, and if that starts from retro culture then great.
I love to imagine using my 4G connection to download like 60 hours of podcasts in the few minutes before leaving on vacation...
There is something beautiful about efficiency. Small computers (check out /r/sffpc), efficient codecs, those competitions where people make games/music/graphics in just a few KBs (forget what those are called - “demoscene”?). The fact that compressed Wikipedia without images fits on an 8GB USB.
It all fascinates me. It is amazing to see us push the boundaries of communication under such restrictions
Right on, I love that you're paying attention to the space factor. I'm a casual podcaster, about 10 episodes in last I checked :) and I totally know what you mean about editing. I have to do it right after recording or I'll procrastinate for weeks!
Does the resampling and bit-depth reduction of the wav file actually help? My intuition is that a modern encoder like Opus will work better if it has more information.
Bit-depth reduction is counter-productive. You're just adding noise to the signal, and reducing the dynamic range.
As for resampling, it depends on the specifics of the resampler used by a specific codec implementation. If a resampler is terrible, it could add artifacts like ringing and aliasing that would degrade the encoding quality.
I really like this idea, I was personally playing around with the idea of a bootable magazine that you could read in an emulator. Another idea was to release an entire album of music on a floppy disk. Both ideas I'm part way through, but have been incredibly busy.
I prefer the sound of the MP3 in the examples, although both are really dull. I think using some modern codec like WaveNet or others, whilst sounding better, kind of defeats the point. I think MP3 is really as new a codec as I would want to go, especially as pretty much any device that can play audio that is actively used today can read an MP3 file. (I'm somebody who still uses an MP3 player.)
That said, I think these floppycasts should simply be shorter, the easier they are to create, the more likely the idea won't die in the crib. I think 15 or 20 minutes is really not too bad for a start (especially a solo act), specials could come in multiple parts and still be in keeping of the theme.
Hmm, listening the the samples [0], the OPUS one is almost unbearably noisy, whereas the MP3 while, while clearly low quality, is still listenable. Am I missing something, why is OPUS doing so poorly?
As sad as this sounds, I think based upon the slurring of "s" sounds, that I recognize those source HHGTTG encodings, having listened to them on loop for hours at a time whilst coding. I think they're the 56kb/s MP3 ones that were floating around a couple of decades ago. There are some 128kb/s ones that might be a bit fairer.
It's interesting. The OPUS one is noisy, but pretty crisp. MP3 is smooth but more muffled. (like it lacks dynamic range) Honestly, I'd rather listen to the OPUS version if these were the only choices.
That's odd. My experience was the opposite. Although the Opus one was a bit muffled and some of the low end was missing so I had to strain a little to hear, the MP3 was a cacophony of pops and whistles all the way through. It was annoying and hard to concentrate on. I'd take the Opus one every time.
I will say the music part was interesting. I could mostly make out the song on the MP3. On Opus, I could barely tell there was music there. A lot of it was just missing.
he used the opusenc setting "tuned for speech" and music sections (like :22) sound awful. would like to know what it would sound like without that mode turned on, as it also surprises me how bad opus sounds vs mp3.
LPC (think Speak'n'spell, PCjr speech adapter, etc.) and variants is really the only way to achieve this. A 1.44MB floppy has 1,457,664 bytes available after formatting. LPC at 2400 bits per second means you can store 80 minutes of speech on a 1.44MB floppy.
A big flaw in the OP's testing is that he converted the audio to 8-bit PCM before feeding it to the encoders. That added 48dB of high-frequency noise and distortion for absolutely no reason. He might have found his sibilants ("s" sounds) sounding better had he not done that.
when I read the title I somehow assumed that the optimization would be in the length of the recording, not the encoding. I would like a podcast format with very short episodes.
Or just good sections, not just stream of concsciousness rambling. That would also be sticking perfectly to the metaphor, swapping floppy disks should be familiar to everyone who every held one in their hands.
As long as the podcast doesn't approach MS Office 97 territory (50+ disks).
Me too! I was surprised to see the target length is 30 minutes. I think there's plenty of space for even much more compact podcast episodes, under 10 minutes in length. You can say an awful lot in 10 minutes.
Yeah, since a lot of people speed up podcasts anyway, why not speed it up prior to compression (or learn to talk faster)? (Of course, speeding things up 3x is a bit extreme!)
There is so many things you could do to make things go well on very small devices that would cost almost nothing today, if we decided to adopt interface and software standards to run on durable hardware.
Meanwhile phones are always getting faster and faster and always becoming obsolete every year. Throwing software away means we are forced to throw hardware away.
I wish that someday we could just decide to stick to a single system and not change things, so that system can last 10 years and work on the same durable phone. It would require making hard choices in OS and software design, but it would be really worth it.
I hope a day will come where computers or smartphones will be able to last 10 or maybe 20 years and still be affordable. To be honest I don't think I want to be involved in learning how to develop apps on any phone for those reasons. At least microsoft and linux are able to maintain a minimal amount of backward compatibility. For developers that's really important. You don't have a good ecosystem or a good phone if you cannot attract developers to build apps.
I'm looking forward to hearing the actual releases. I still use floppies when I can, and I still have a bunch of old PCs hanging around. It'll be good to have a valid reason to boot them up.
Excellent, I did similar experiments in the past.
RealAudio (ACELP) with Helix is also an efficient option, but Opus is the best. For more quality you can reformat your floppy with 2MF 3.0, it gives you 2 megabytes on a floppy instead of 1.44. I used it to get the bible on it and a stripped down bootable dos+win3.1.. unpack on a ramdrive and start the app. People next to me were still stuck in the University provided dos wp 5.1 wordprocessors while I could do WYSIWYG editing with true type fonts!
Reminds me of https://floppyswop.co.uk: "a place for sharing any files small enough to fit on a conventional floppy disc (1.44meg high density), art, media, sound, noise, its up to you, all files are hosted here for taking and swopping..".
https://auphonic.com/blog/2018/06/01/codec2-podcast-on-flopp...