Is it just me or does 18% seem like a high error rate - and this is after improvement?
I've used technologies (Nuance??) that have significantly lower errors rates than this, even for systems I have not trained personally. Is there something I'm missing?
The difference in error rates is in large part due to the to the difference between dictated speech and spontaneous, informal conversational speech.
Switchboard (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=...) is a set of telephone conversations between two people. Speakers tend to say a lot of "ums", abruptly restart an utterance in progress, talk past the telephone handset, etc. Dictated speech, especially when speakers know they're talking to a computer, has less acoustic and linguistic noise.
Nuance has speaker-independent systems as well. If you have an iPhone, try an app like Dragon Dictation or Siri. Neither one requires any training, and will start off with high-quality recognition right out of the box.
I was going to ask the same thing. It doesn't seem like that significant an improvement but I'm no expert. Also I do wonder if the performance improvements are mostly due to gpu acceleration as opposed to a switch to a different software model.
I've used technologies (Nuance??) that have significantly lower errors rates than this, even for systems I have not trained personally. Is there something I'm missing?