Looking to Listen: Audio-Visual Speech Separation

tropo · on April 13, 2018

This would be a huge improvement for hearing aids, particularly for people who can't hear in stereo. It might need an eye tracker for aiming.

yegle · on April 14, 2018

Exactly! Cocktail party effect requires two functional ears and I'm sure this can be helpful to people with only one functional ear.

> The cocktail party effect works best as a binauraleffect, which requires hearing with both ears. People with only one functioning ear seem much more distracted by interfering noise than people with two typical ears.

[0]: https://en.wikipedia.org/wiki/Cocktail_party_effect

Scaevolus · on April 13, 2018

This method has improvements (better quality than audio-only separation, speaker assignment, and better noise handling), but you can do pretty well with just mixed audio: https://www.youtube.com/watch?v=vW51cG1Ox98

anotheryou · on April 13, 2018

I wonder if we will have visual assisted tts. Humans do it: https://youtu.be/G-lN8vWm3m0?t=74

The McGurk illusion is so strong that I'm sure visual cues have a major role in error correction or voice recognition assistance for humans.

microcolonel · on April 14, 2018

With that particular example, I noticed the issue immediately and heard bah the whole time. I wonder how much people's response to that illusion varies.

ayush_merci · on April 14, 2018

I wonder how a blind person will respond to a cocktail party effect. If a blind person can do it, maybe this separation can be done without visual input?

jacksmith21006 · on April 14, 2018

This is pretty cool. Will be interesting to see it used on older audio sources to clean this up.