> there's something like a 1/10 risk that you're fooling yourself.
You’re being generous or a touch ironic. It’s at least 1/10 and probably more like 1/5 on average and 1/3 for people who don’t take advice.
Beyond testing changes in a larger test fixture, I also find that sometimes multiplying the call count for the code under examination can help clear things up. Putting a loop in to run the offending code 10 times instead of once is a clearer signal. Of course it still may end up being a false signal.
I like a two phase approach, wheee you use a small scale benchmark while iterating on optimization ideas, with checking the larger context once you feel you’ve made progress, and again before you file a PR.
At the end of the day, eliminating accidental duplication of work is the most reliable form of improvement, and one that current and previous generation analysis tools don’t do well. Make your test cases deterministic and look at invocation counts to verify that you expect n calls of a certain shape to call the code in question exactly kn times. Then figure out why it’s mn instead. (This is why I say caching is the death of perf analysis. Once it’s added this signal disappears)
The first half of the sentence you quoted is "even if you do everything right". What is the point of selectively quoting like that and then responding to something you know they didn't mean?
With respect, the first half of the sentence sounds more like waffling than a clear warning to your peers. I bond with other engineers over how badly the industry as a whole handles the scientific method. Too many of us can’t test a thesis to the satisfaction of others. Hunches, speculation, and confidently incorrect. Every goddamn day.
Feynman said: The most important thing is not to fool yourself, and you’re the easiest person to fool.
That’s a lot more than 1/10, and he’s talking mostly to future scientists, not developers.
You’re being generous or a touch ironic. It’s at least 1/10 and probably more like 1/5 on average and 1/3 for people who don’t take advice.
Beyond testing changes in a larger test fixture, I also find that sometimes multiplying the call count for the code under examination can help clear things up. Putting a loop in to run the offending code 10 times instead of once is a clearer signal. Of course it still may end up being a false signal.
I like a two phase approach, wheee you use a small scale benchmark while iterating on optimization ideas, with checking the larger context once you feel you’ve made progress, and again before you file a PR.
At the end of the day, eliminating accidental duplication of work is the most reliable form of improvement, and one that current and previous generation analysis tools don’t do well. Make your test cases deterministic and look at invocation counts to verify that you expect n calls of a certain shape to call the code in question exactly kn times. Then figure out why it’s mn instead. (This is why I say caching is the death of perf analysis. Once it’s added this signal disappears)