This is exactly what I mean by “a trick of language” though. The superficial “number” of parameters is not important.
Think of it this way: suppose someone read this article and as a result they thought, “aha, Occam’s Razor is a sham because this shows that ‘simpler’ models (just “one” parameter!) are possibly even more susceptible to overfitting or generalization error. So I’m free to use whatever complex model I wan’t!”
Then they go and fit a 427-degree polynomial to their data set with 427 observations, and when someone says, “but using that many parameters is overly complex, you should prefer a simpler model” they’ll reply, “No way! Read this paper! It says that even simple models with a small number of parameters can have this problem.”
I know it’s a simplification, but it’s important. The notion of “a parameter” has to meaningfully capture an independent degree of freedom, and be a unit of constraint of the complexity (theoretically, Kolmogorov complexity, minimum description length of a program that models the underlying data generating process), or else it’s not a mathematical parameter, rather just a linguistic construct.
In the paper’s example, saying that the model “has one parameter” is misleading, because you need so much precision specifically to allow the “one parameter” to control many parameters worth of complexity.
So if someone had a takeaway point of “there’s not necessarily a reason to favor simpler models” that would be a big misunderstanding — yet it seems like the paper almost intentionally is set up to mislead in this way.
I absolutely do not disagree. My entire point is in your first paragraph.
> The superficial “number” of parameters is not important.
Whether the number of parameters is small or large does not change the complexity of the function, though if you're gonna pass 64 bools might as well call it a long long with bitwise OR to simplify the usability.
Think of it this way: suppose someone read this article and as a result they thought, “aha, Occam’s Razor is a sham because this shows that ‘simpler’ models (just “one” parameter!) are possibly even more susceptible to overfitting or generalization error. So I’m free to use whatever complex model I wan’t!”
Then they go and fit a 427-degree polynomial to their data set with 427 observations, and when someone says, “but using that many parameters is overly complex, you should prefer a simpler model” they’ll reply, “No way! Read this paper! It says that even simple models with a small number of parameters can have this problem.”
I know it’s a simplification, but it’s important. The notion of “a parameter” has to meaningfully capture an independent degree of freedom, and be a unit of constraint of the complexity (theoretically, Kolmogorov complexity, minimum description length of a program that models the underlying data generating process), or else it’s not a mathematical parameter, rather just a linguistic construct.
In the paper’s example, saying that the model “has one parameter” is misleading, because you need so much precision specifically to allow the “one parameter” to control many parameters worth of complexity.
So if someone had a takeaway point of “there’s not necessarily a reason to favor simpler models” that would be a big misunderstanding — yet it seems like the paper almost intentionally is set up to mislead in this way.