Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think point 7) needs work. Often times people use words like interpretable to avoid having to think about the data - usually in the context of linear or logistic regression. The model seems "interpretable" because the coefficients are "meaningful" - but often times the model is just as much a black box as other models, for instance regression coefficients depend on what other features you include and the scale of those features. Similarly regression p-values are very easy to misinterpret. I think you should use the data to determine what the model is doing regardless of the model you use.

In summary, it is important to iterate quickly and to validate your results. Using complex models, like gradient boosted decision trees, can often iterate much more quickly than simple models because you don't have to do extensive data preparation. Many analysts are stuck in the mode of using linear or logistic regression for every problem, when there are better tools out there.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: