Hacker News new | past | comments | ask | show | jobs | submit login

Cool, looks like this is trained on 16 million hours of audio (500B tokens at ~.11 seconds per token).

Even the large open source TTS models (see F5 TTS, Mask GCT) are mostly trained on very small audio datasets (say 100k hours) relative to the amount of audio available on the internet, so it's cool to see an open source effort to scale up training significantly.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: