You don't think it's reasonable to think the training set comes from the real wo...

jhugo · on April 23, 2022

They said the training set distribution.

It's frankly impossible that they have a training set without any biases, though I'm sure they worked to eliminate the ones they could think of (which itself would have bias).

astrange · on April 23, 2022

You can make the distribution different by eg duplicating some of the data a lot, which you might want to do to improve the actual purpose of the model (translation). Any other purposes (having opinions on gender) is just a coincidence and not being optimized for/regression tested.