Hacker News new | past | comments | ask | show | jobs | submit login

You don't think it's reasonable to think the training set comes from the real world? Why not? Seems like a reasonable assumption to me.

The alternative is they are carefully curating a training set (google historically unwilling to do anything manually) or writing one themselves???




They said the training set distribution.

It's frankly impossible that they have a training set without any biases, though I'm sure they worked to eliminate the ones they could think of (which itself would have bias).


You can make the distribution different by eg duplicating some of the data a lot, which you might want to do to improve the actual purpose of the model (translation). Any other purposes (having opinions on gender) is just a coincidence and not being optimized for/regression tested.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: