Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> Great points regarding aligning precision/recall of AI systems with actual human supervision capabilities.

May I nitpick? The article discusses False Positive Rate and False Negative Rate. These are, respectively, the complements of True Negative Rate and True Positive Rate a.k.a. Sensitivity, a.k.a. Recall.

Precision is not the same as False Negative Rate. Specificity is, as in Sensitivity/Specificity.

These metrics tend to be reported in pairs: TPR/TNR, FPR/FNR, Precision/Recall, Sensitivity/Specificity, which confuses the issue- but not all pairs are the same.



To add to that, since I see it nowhere mentioned explicitly, neither in this thread nor in the article: the theoretical framework here is Signal Detection Theory (SDT) [1]. The problem at hand can be nicely visualized with two overlapping bell curves for the signal present/absent situations [2].

[1] https://en.wikipedia.org/wiki/Detection_theory [2] https://jdlee888.shinyapps.io/SDT_Demo/


Two points - first, these frameworks don't really account for driving into concrete bollards vs. a shaky ride. Second none of these frameworks estimate production performance well; time after time after time I see classifiers that did "95%" in test and "88%" in prod and "that's pretty good!".

My point is that we are really bad at estimating performance of classifiers. Really bad, and mostly we are pretending it's fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: