I've been working on implementations of classic algorithms in distributed computing, and used Turmoil for testing correctness in message-passing / HTTP systems [0].
Overall, my experience has been positive. When it works, it's great. A pattern I've been following is to have a single fixture that returns a simulation of the system under a standard configuration, for example N replicas of an atomic register, so that each test looks like:
1. Modify the simulation with something like`turmoil::hold("client", "replica-1")`.
2. Submit a request to the server.
3. Make an assertion about either the response, or the state of the simulation once the request has been made. For example, if only some replicas are faulty, the request should succeed, but if too many replicas are faulty, the request / simulation should time-out.
One of the things I have found difficut is that when a test fails, it can be hard to tell if my code is wrong, or if I am using Turmoil incorrectly. I've had to do some deep-dives into the source in order to fully understand what happens, as the behavior sometimes doesn't line-up with my understanding of the documentation.
That's great to hear that you've been using turmoil for this type of work. I'm one of the authors and we'd love to hear about your experience and what we can do improve things. Either a github issue or reaching out on discord works great.
We've discussed improving the tracing experience, and even adding visualizations, but it hasn't been prioritized yet.
Overall, my experience has been positive. When it works, it's great. A pattern I've been following is to have a single fixture that returns a simulation of the system under a standard configuration, for example N replicas of an atomic register, so that each test looks like: 1. Modify the simulation with something like`turmoil::hold("client", "replica-1")`. 2. Submit a request to the server. 3. Make an assertion about either the response, or the state of the simulation once the request has been made. For example, if only some replicas are faulty, the request should succeed, but if too many replicas are faulty, the request / simulation should time-out.
One of the things I have found difficut is that when a test fails, it can be hard to tell if my code is wrong, or if I am using Turmoil incorrectly. I've had to do some deep-dives into the source in order to fully understand what happens, as the behavior sometimes doesn't line-up with my understanding of the documentation.
[0] https://github.com/kaymanb/todc/tree/main/todc-net