Is it just me or does 18% seem like a high error rate - and this is after improv...

romanows · on Aug 29, 2011

The difference in error rates is in large part due to the to the difference between dictated speech and spontaneous, informal conversational speech.

Switchboard (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=...) is a set of telephone conversations between two people. Speakers tend to say a lot of "ums", abruptly restart an utterance in progress, talk past the telephone handset, etc. Dictated speech, especially when speakers know they're talking to a computer, has less acoustic and linguistic noise.

danmaz74 · on Aug 29, 2011

It will be very interesting to see how this approach will work with dictated speech.

Also let's not forget that the word-level error rate can be reduced by using statistical data about words sequences.

braindead_in · on Aug 29, 2011

Nuance is speaker dependent. You have to train the system to understand the speaker's voice. This is speaker independent which is much harder.

mikeash · on Aug 29, 2011

Nuance has speaker-independent systems as well. If you have an iPhone, try an app like Dragon Dictation or Siri. Neither one requires any training, and will start off with high-quality recognition right out of the box.

krmmalik · on Aug 29, 2011

I was going to ask the same thing. It doesn't seem like that significant an improvement but I'm no expert. Also I do wonder if the performance improvements are mostly due to gpu acceleration as opposed to a switch to a different software model.

huffo · on Aug 29, 2011

Benchmarks have always fascinated me :)

scq · on Aug 29, 2011

There are three types of lies: lies, damned lies, and benchmarks.