What would you consider the "training" step here? The part where the author perf...

tpoacher · on Jan 12, 2021

The way he describes the problem in the introduction alludes to this. I.e. "here's a some data, 12 is a clear outlier, let's see if we can confirm this using the mean and standard deviation of the sample to derive lower and upper bounds".

Then he proceeds to include 12 in the calculation deriving these bounds. This is not really the way to do it. In fact, if he had excluded the anomalous measurement from the training data, the '5' values would have been excluded as outliers as well, given his criteria for defining the bounds.

I agree that this is a trivial point made on a trivial example though, and that it is more a matter of 'sensible definitions' of what counts as anomalous or as training set in the first place. But it's still worth thinking about explicitly though, so I thought I'd mention it.