Holy crap. I was just writing a blog to complain about the state of Rust benchmarking and I think this might address most of my points. The biggest one is the ability to have benchmarks collocated within the library like tests which is the biggest annoyance.
It’s also nice to see that it can report multiple counters in parallel. I put up a similar feature[1] for criterion recently but I fear the project isn’t being maintained anymore…
Haven’t looked deeply into divan yet but the other requirements I have for criterion’s power is to run tests with statistical guarantees on the results, terminate quickly when statistical significance is reached (—quick), provide a comparison of the delta from a previous benchmark, and to run async code. Wonder how this stacks up.
* statistical power is principled in terms of following a good paper about how to pick how many iterations, but the variance stuff isn’t implemented yet. By definition, if I’m reading the readme correctly, its execution mode is equivalent to —quick.
* baseline evaluation not there yet.
My overall gripe remains that this stuff isn’t available natively within the Rust/cargo ecosystem. I should be able to swap out benchmarking frameworks without having to rewrite my entire codebase to try out a new one.
That's positive news, but I'll have to reserve my enthusiasm. I've been burned too many times with what seem like reasonable-looking RFCs that seem to have some initial momentum and then get closed out after a few years with no progress being made (& how that's communicated out & tracked is another complaint).
So 5 years with no progress & a restarted effort. Don't get me wrong. I appreciate all the people devoting energy to this & other issues, & it's certainly easy to provide criticism from the sidelines, especially since actually solving the problem I'm sure can be hard & I'm not putting my cycles toward this. All I'm trying to communicate is that I'm personally not going to get my hopes up but I really wish the team luck and hope progress can be made here finally!
Wow, this really looks fantastic from the examples, great work!
I am looking forward to machine-readable output getting implemented. I was using criterion recently and couldn't for the life of me get CSV output to work correctly (and according to their docs it's a feature they are looking to remove). Ended up writing a python script to scrape all the data out of the folders of JSON files it makes.
Are you saying in terms of how long the benchmark takes? Have you tried `--quick`? The duration of the test doesn't matter so much for the time it takes Criterion to benchmark - what Criterion is trying to do is run the function enough times that it thinks it has a statistically defensible estimate of how expensive your code is.
It’s also nice to see that it can report multiple counters in parallel. I put up a similar feature[1] for criterion recently but I fear the project isn’t being maintained anymore…
Haven’t looked deeply into divan yet but the other requirements I have for criterion’s power is to run tests with statistical guarantees on the results, terminate quickly when statistical significance is reached (—quick), provide a comparison of the delta from a previous benchmark, and to run async code. Wonder how this stacks up.
[1] https://github.com/bheisler/criterion.rs/pull/722