Hacker News new | past | comments | ask | show | jobs | submit login
Floppycasts – 1.44MB Podcasts (ajroach42.com)
138 points by KirinDave on June 25, 2019 | hide | past | favorite | 83 comments



For super low bitrate speech, (1.44MB/30 minutes is 6.4kbps), codec2 is better than Opus, and can go to even lower bitrates. When combined with WaveNet, the audio quality is incredible, though required compute power is ridiculous.

https://auphonic.com/blog/2018/06/01/codec2-podcast-on-flopp...


> the audio quality is incredible, though required compute power is ridiculous.

LPCnet gives similar quality with much less compuation needed:

http://www.rowetel.com/wordpress/?p=6482

https://people.xiph.org/~jm/demo/lpcnet/

https://people.xiph.org/~jm/demo/lpcnet_codec/


Yes, amazing, but can it be played through Chrome, VLC, mpv or more importantly Android and iOS?


Broad compatibility is important, but is there someone who has actually hooked up a USB floppy drive to an Android or iPhone or iPad and read files off of it?


ha no, i don't think so :-) All these devices didn't exist when the floppy drive went out of vogue. I'm not sure if there even are USB floppydisk drivers for Android or iOS. Technically Android is linux (and iOS bsd) so I suppose there is a /dev/fd0 capability somewhere. I meant that LPCnet is really awesome, but when I would distribute a podcast in this format (floppy or rss) I would like it to be able to be played on devices and OS's with support for it - otherwise the audience can't play it.


> For super low bitrate speech, (1.44MB/30 minutes is 6.4kbps), codec2 is better than Opus

At the top end of that, I think that's not the case. Certainly starting at 3200bps, Opus just becomes a non-option (and codec2 becomes an option).

I have a trove of librivox audiobooks in codec2 from some tests I did, and I've given some of them extensive listens to get a feel for it. Without an improved decoder that isn't a WaveNet, it's still not always fun trying to understand what is said.


> required compute power is ridiculous

Can you elaborate or put some concrete numbers on this? :)


Perhaps you could get it smaller by making a "vector format" for podcasts and leave it to the audio player to "do it live"

<bgmusic character="spooky" />

<archiveaudio year="1972" type="cop" tone="evasive" />

<foley type="tirescreech" />

<foley type="river" weight="300" />


<ad type="e-postage-vendor" />

<ad type="wysiwyg-website-builder" />

<ad type="overpriced-mattress" />


<ad type="standard-audible-plug" />


This reminds me of MIDI files and how depending on one's sound device, would sound subtly different.

My Sansa MP3 player, Acer laptop, and gateway PC all had different ways of 'rendering' the Orchestra hit instrument in particular, which made for a rousing Termina Field listening experience on the bus or at Grandma's compared to the family computer.


The last time I looked at the Web Audio spec it appeared to leave rendering to the implementation. So regarding aliasing with OscillatorNode[1]:

"It is expected that an implementation will take some care in achieving this ideal, but it is reasonable to consider lower-quality, less-costly approaches on lower-end hardware."

Luckily nearly no one seems to be using this API so I doubt there really are anywhere near as many "approaches" as you can hear from MIDI file players.

[1] It is expected that an implementation will take some care in achieving this ideal, but it is reasonable to consider lower-quality, less-costly approaches on lower-end hardware.


Style sheets for sound - I think BBC radio used to sell CDs for this. Now they can provide Audio Style Sheets with the acronym of... oh, wait

Golly - you could make any story sound like it was a Radio 4 play


I was thinking more that one would use mechanical turk to annotate each soundfile with these tags for the set of all podcasts. Then use that as training data to a ML algorithm that gets used to render the output for each tag.

So I'd use my archiveaudio tag from the example and the algo would output the band-limited frequencies of what sounds vaguely like a cop with speech patterns signaling defensiveness and irritation.

It would all sound perfectly like a podcast if you were listening to it in a sleep state. Then if you concentrated on it you'd realize there's no coherent plot or even any actual natural language being spoken in the entire recording.


There actually was work on CSS additions to give hints to Text-To-Speech engines, e.g. https://www.w3.org/TR/2018/NOTE-css3-speech-20180605/


Acronym already taken by Advanced SubStation Alpha subtitle files!


We really need some kind of highly compressible alternative format for podcasts. Representing speech with a very limited set of code points. It might be a bit lossy, so you could lose the tone of the voice and other speech characteristics, but there could be some ways to still represent that by carefully picking the words you utter in the podcast.

Of course I'm only spitballing here, this would require some pretty advanced research to be made.


> We really need some kind of highly compressible alternative format for podcasts.

WebVTT, reproduced at the listener side by TTS? At some point, there's a quality chasm that can only be leaped by using a format other than sampled audio.

My question for you is: Why do we need that?


Might be easier to create 'players' for various peoples' synthesized voices, and just compress and stream the text.


> We really need some kind of highly compressible alternative format for podcasts.

Do we though? In the current state of technology podcasts can be quite long and still consume less bandwidth than a HD movie trailer.


If that is all you are after, then the most efficient codec of all is compressed text, and the player a text-to-speech engine.


Uh.. isn’t that pretty much what things like lpcnet and wavenet and codec2 are (discussed here) doing? These are codecs that are speech specific. I don’t understand what unique idea you are trying to “spitball” here.

Lots of advanced research has been done in this area. These are the results as of now.

Things will hopefully continue to improve.


Would be even more interesting if each sound effect were randomized a bit so you don’t feel like you are hearing the same thing over and over.


randomize=”12%”


See also module music. MIDI + SoundFonts would be a reasonable idea for music and sound effects, but it wouldn't be suitable for voice.

Something like Vocaloid parameters would be interesting, but the synthesis engine is probably pretty intensive.

(To be clear, I think this is an idea that's fun to think about as a novelty but probably a dead end otherwise.)


It would hardly surprise me if an addition like this was included in the ridiculous web audio standards


I love the idea, because there's something to almost ideal to me about the 3.5" floppy disk format. I wish I had some reason to use them today, and so I really want this to work. However the audio quality is just terrible; it may be intelligible, but it is unlistenable IMO.


3.5mm floppies combined a lot of factors that appealed to me. They were cheap enough you could give one to someone without wanting it back, they were contained entirely within the computer so no danger of breaking it off if you leave it in there too long. Unlike CDs they were rewritable and unlike CDRWs they worked properly no matter what PC I was writing or reading from. Not to mention the thoroughly satisfying "chunk" sound when popped into a drive. Network file transfers will never have that "shove the data right into the computer, bam!" feeling.


There was also the tactile feeling of flipping through a stack of floppies looking for one specific disk. Now that everything is so high capacity, you almost never have to physically search for data. Of course the modern way is superior in almost every way, except it was a kind of wonderful sensation holding a disk box full of your stuff.


Floppies had richer, warmer bits, man. The bits coming off USB drives are cold, tinny, and mechanical by comparison.


Not so much fun when you where doing an Oracle Forms client side app, I recall having to use 13 or 14 floppy's in the correct order before I could even install that actual client side application.


USB flash drives are the modern floppies really. You can get one with a few GBs for literally pennies to the point where you can even get some "promotional" flash drives for free, similar to how in the past people gave out "promotional" (writing) pens for free. On the high end side you can have flash drives with as high as 2TB of storage (though the only one i've found is a bit bulky and the more regular sized one has "only" 1TB of storage).

No satisfying chunk sound though :-P.


No sound, but you get USB superposition in exchange. In 50 years, people will gather in nursing homes and reminisce about how bad type-A was.


I'm curious what causes that strange phenomenon. You had a 50-50 chance to get it right, but it's the wrong way almost every time.


Bad angle or not enough force, based on how many times it worked the third time.



Drat. Uh, I was referencing it by, uh, its thickness dimension. Yeah, that's what I was doing, yes sir.


Ah yes, I forgot it was a Euro American joint venture, thus measurements are equally split between metric and imperial.

Fun fact, the spring rate for the slidy thing (technical term) is measured in lb m/in cm!


Plus the warm sentiment of relief when you managed to read the data off the disk when you were lucky enough it wasn't corrupted!


Surprisingly, unlike the pennies-per-diskette garbage of the late 90s, Sony 3.5 disks from the 80s were stiffer, heavier, and massively more reliable. I have some 800k Mac disks that still read perfectly well. Probably other brands as well, but I was always impressed with Sony media from that era.


That's why the Oculus VR intro with the floppy's is also so nice, it has a nice haptic/tactile feel to it.


After a year and a half of owning an Oculus Touch, replaying that intro is still more fun than any other experience I've had on it.


Oh for sure. It's garbage.

But opus still sounds pretty good until you get down to around 10kbps (this, iirc, is 6 or 7.)


If you really wanted to crunch the size you could store the podcast as a text file and run it through text to speech when you want to listen to it.


that might be the next AI level of compression of just about everything. Very lossy, but very small.


It would be.. spooky. Just like the lady who voiced Siri just had to record sounds (letters and combinations of letters) that can be combined for text-to-speech, one could do that for podcasters...

But then it means you can make your podcaster say anything you want.


like an audio PDF or something.


Combined with a good text compressor you could get > 100 hours of speech on it. http://www.mattmahoney.net/dc/text.html


So, no podcast yet, right? Didn't want to miss anything. That's a pretty cool idea. I'm all for some kind of subculture where media have tiny size limits, and if that starts from retro culture then great.

I love to imagine using my 4G connection to download like 60 hours of podcasts in the few minutes before leaving on vacation...


There is something beautiful about efficiency. Small computers (check out /r/sffpc), efficient codecs, those competitions where people make games/music/graphics in just a few KBs (forget what those are called - “demoscene”?). The fact that compressed Wikipedia without images fits on an 8GB USB.

It all fascinates me. It is amazing to see us push the boundaries of communication under such restrictions


There's a demoscene magazine intermittently published in 'demo' form: http://www.hugi.scene.org/


Would be cool to make a demo that would fit on a magazine page in something like a QR code (scannable with a normal phone).


Nah, this was just me screwing around with encoders while I was avoiding editing my podcasts.

When I actually release, I'll do it at a bitrate that's more listenable, but still optimized for space.


Right on, I love that you're paying attention to the space factor. I'm a casual podcaster, about 10 episodes in last I checked :) and I totally know what you mean about editing. I have to do it right after recording or I'll procrastinate for weeks!


It's certainly another tool available to reach people in regions where there are an oppressive dictatorships running media like North Korea.


Does the resampling and bit-depth reduction of the wav file actually help? My intuition is that a modern encoder like Opus will work better if it has more information.


I tried it both ways with opus. More or less the same in terms of quality, but the resampled file was about 3% smaller.

I imagine it would be less of a size difference and more noticable at higher bitrates.


Bit-depth reduction is counter-productive. You're just adding noise to the signal, and reducing the dynamic range.

As for resampling, it depends on the specifics of the resampler used by a specific codec implementation. If a resampler is terrible, it could add artifacts like ringing and aliasing that would degrade the encoding quality.

A great souce for resampler comparisons: http://src.infinitewave.ca/


I really like this idea, I was personally playing around with the idea of a bootable magazine that you could read in an emulator. Another idea was to release an entire album of music on a floppy disk. Both ideas I'm part way through, but have been incredibly busy.

I prefer the sound of the MP3 in the examples, although both are really dull. I think using some modern codec like WaveNet or others, whilst sounding better, kind of defeats the point. I think MP3 is really as new a codec as I would want to go, especially as pretty much any device that can play audio that is actively used today can read an MP3 file. (I'm somebody who still uses an MP3 player.)

That said, I think these floppycasts should simply be shorter, the easier they are to create, the more likely the idea won't die in the crib. I think 15 or 20 minutes is really not too bad for a start (especially a solo act), specials could come in multiple parts and still be in keeping of the theme.


Hmm, listening the the samples [0], the OPUS one is almost unbearably noisy, whereas the MP3 while, while clearly low quality, is still listenable. Am I missing something, why is OPUS doing so poorly?

[0] http://ajroach42.com/floppycast-examples/


As sad as this sounds, I think based upon the slurring of "s" sounds, that I recognize those source HHGTTG encodings, having listened to them on loop for hours at a time whilst coding. I think they're the 56kb/s MP3 ones that were floating around a couple of decades ago. There are some 128kb/s ones that might be a bit fairer.


It's interesting. The OPUS one is noisy, but pretty crisp. MP3 is smooth but more muffled. (like it lacks dynamic range) Honestly, I'd rather listen to the OPUS version if these were the only choices.


That's odd. My experience was the opposite. Although the Opus one was a bit muffled and some of the low end was missing so I had to strain a little to hear, the MP3 was a cacophony of pops and whistles all the way through. It was annoying and hard to concentrate on. I'd take the Opus one every time.

I will say the music part was interesting. I could mostly make out the song on the MP3. On Opus, I could barely tell there was music there. A lot of it was just missing.


he used the opusenc setting "tuned for speech" and music sections (like :22) sound awful. would like to know what it would sound like without that mode turned on, as it also surprises me how bad opus sounds vs mp3.


The opus file is at 6kbps, the mp3 at 8.

Opus file is tuned for speech, mp3 isn't.

The opus file sounds unbearable when the music is playing, but is clearer otherwise.

Double the bitrate and opus is way ahead.


LPC (think Speak'n'spell, PCjr speech adapter, etc.) and variants is really the only way to achieve this. A 1.44MB floppy has 1,457,664 bytes available after formatting. LPC at 2400 bits per second means you can store 80 minutes of speech on a 1.44MB floppy.

A big flaw in the OP's testing is that he converted the audio to 8-bit PCM before feeding it to the encoders. That added 48dB of high-frequency noise and distortion for absolutely no reason. He might have found his sibilants ("s" sounds) sounding better had he not done that.


That feeling of nostalgia when reading about 486 DX and Sound Blaster Pro.


when I read the title I somehow assumed that the optimization would be in the length of the recording, not the encoding. I would like a podcast format with very short episodes.


Or just good sections, not just stream of concsciousness rambling. That would also be sticking perfectly to the metaphor, swapping floppy disks should be familiar to everyone who every held one in their hands.

As long as the podcast doesn't approach MS Office 97 territory (50+ disks).


Me too! I was surprised to see the target length is 30 minutes. I think there's plenty of space for even much more compact podcast episodes, under 10 minutes in length. You can say an awful lot in 10 minutes.


Yeah, since a lot of people speed up podcasts anyway, why not speed it up prior to compression (or learn to talk faster)? (Of course, speeding things up 3x is a bit extreme!)


Hmm why not speak 10 times faster, then slow down by 10 times after decoding.

I havent decided whether this is a serious proposition yet....


There is so many things you could do to make things go well on very small devices that would cost almost nothing today, if we decided to adopt interface and software standards to run on durable hardware.

Meanwhile phones are always getting faster and faster and always becoming obsolete every year. Throwing software away means we are forced to throw hardware away.

I wish that someday we could just decide to stick to a single system and not change things, so that system can last 10 years and work on the same durable phone. It would require making hard choices in OS and software design, but it would be really worth it.

I hope a day will come where computers or smartphones will be able to last 10 or maybe 20 years and still be affordable. To be honest I don't think I want to be involved in learning how to develop apps on any phone for those reasons. At least microsoft and linux are able to maintain a minimal amount of backward compatibility. For developers that's really important. You don't have a good ecosystem or a good phone if you cannot attract developers to build apps.



Codec 2 is on my list to investigate in the near future. Couldn't find software on the target machines that supported it, so it landed on second tier.


I'm looking forward to hearing the actual releases. I still use floppies when I can, and I still have a bunch of old PCs hanging around. It'll be good to have a valid reason to boot them up.


Excellent, I did similar experiments in the past. RealAudio (ACELP) with Helix is also an efficient option, but Opus is the best. For more quality you can reformat your floppy with 2MF 3.0, it gives you 2 megabytes on a floppy instead of 1.44. I used it to get the bible on it and a stripped down bootable dos+win3.1.. unpack on a ramdrive and start the app. People next to me were still stuck in the University provided dos wp 5.1 wordprocessors while I could do WYSIWYG editing with true type fonts!


Reminds me of https://floppyswop.co.uk: "a place for sharing any files small enough to fit on a conventional floppy disc (1.44meg high density), art, media, sound, noise, its up to you, all files are hosted here for taking and swopping..".


Regarding resampling to 8kHz, traditional telephony resampled to 300 Hz to 3400 Hz. AM (medium wave) radio was up to about 5000 Hz https://en.wikipedia.org/wiki/Voice_frequency


Hey ajroach42, can you describe the AAC profile you used, and which encoder? I'm interested to see how small one could get an HE-AAC v2 encode.

Would you mind linking to the uncompressed source as well?


opus sounds amazingly good down to pretty low bitrates in general though, but this obviously pushes beyond those limits


kinda was hoping for some fun pics of floppy disks..... ;)

And also this all reminds of old demoscene/music tracker days when groups would release "music disks"


Note that you can only fit 1.38MB on a standard floppy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: