Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it just me or does 18% seem like a high error rate - and this is after improvement?

I've used technologies (Nuance??) that have significantly lower errors rates than this, even for systems I have not trained personally. Is there something I'm missing?



The difference in error rates is in large part due to the to the difference between dictated speech and spontaneous, informal conversational speech.

Switchboard (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=...) is a set of telephone conversations between two people. Speakers tend to say a lot of "ums", abruptly restart an utterance in progress, talk past the telephone handset, etc. Dictated speech, especially when speakers know they're talking to a computer, has less acoustic and linguistic noise.


It will be very interesting to see how this approach will work with dictated speech.

Also let's not forget that the word-level error rate can be reduced by using statistical data about words sequences.


Nuance is speaker dependent. You have to train the system to understand the speaker's voice. This is speaker independent which is much harder.


Nuance has speaker-independent systems as well. If you have an iPhone, try an app like Dragon Dictation or Siri. Neither one requires any training, and will start off with high-quality recognition right out of the box.


I was going to ask the same thing. It doesn't seem like that significant an improvement but I'm no expert. Also I do wonder if the performance improvements are mostly due to gpu acceleration as opposed to a switch to a different software model.


Benchmarks have always fascinated me :)


There are three types of lies: lies, damned lies, and benchmarks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: