A related really cool thing I learned in grad school is that you can implement Deming regression (https://en.wikipedia.org/wiki/Deming_regression) by storing the moments (outer product sum and point sum) of the training points and then finding the dominant eigenvector of the outer product sum using a singular value decomposition. Since you can approximate the directional part of a 2x2 SVD with atan2(), it effectively becomes a O(1) operation to add or remove training points.
(specifics here: https://april.eecs.umich.edu/courses/eecs568_f12/linefitting... )