For servers, I think the single most important statistic to monitor is percent o...

For servers, I think the single most important statistic to monitor is percent of concurrent capacity in use, that is, the percent of your thread pool or task pool that's processing requests. If you could only monitor one metric, this is the one to monitor.

For example, say a synchronous server has 100 threads in its thread pool, or an asynchronous server has a task pool of size 100; then Concurrent Capacity is an instantaneous measurement of what percentage of these threads/tasks are in use. You can measure this when requests begin and/or end. If when a request begins, 50 out of 100 threads/tasks are currently in-use, then the metric is 0.5 = 50% of concurrent capacity utilization. It's a percentage measurement like CPU Utilization but better!

I've found this is the most important to monitor and understand because it's (1) what you have the most direct control over, as far as tuning, and (2) its behavior will encompass most other performance statistics anyway (such as CPU, RAM, etc.)

For example, if your server is overloaded on CPU usage, and can't process requests fast enough, then they will pile up, and your concurrent capacity will begin to rise until it hits the cap of 100%. At that point, requests begin to queue and performance is impacted. The same is true for any other type of bottleneck: under load, they will all show up as unusually high concurrent capacity usage.

Metrics that measure 'physical' (ish) properties of servers like CPU and RAM usage can be quite noisy, and they are not necessarily actionable; spikes in them don't always indicate a bottleneck. To the extent that you need to care about these metrics, they will be reflected in a rising concurrent capacity metric, so concurrent capacity is what I prefer to monitor primarily, relying on these second metrics to diagnose problems when concurrent capacity is higher than desired.

Concurrent capacity most directly reflects the "slack" available in your system (when properly tuned; see next paragraph). For that reason, it's a great metric to use for scaling, and particularly automated dynamic auto-scaling. As your system approaches 100% concurrent capacity usage in a sustained way (on average, fleet wide), then that's a good sign that you need to scale up. Metrics like CPU or RAM usage do not so directly indicate whether you need to scale, but concurrent capacity does. And even if a particular stat (like disk usage) reflects a bottleneck, it will show up in concurrent capacity anyway.

Concurrent capacity is also the best metric to tune. You want to tune your maximum concurrent capacity so that your server can handle all requests normally when at 100% of concurrent capacity. That is, if you decide to have a thread pool or task pool of size 100, then it's important that your server can handle 100 concurrent tasks normally, without exhausting any other resource (such as CPU, RAM, or outbound connections to another service). This tuning also reinforces the metric's value as a monitoring metric, because it means you can be reasonably confident that your machines will not exhaust their other resources first (before concurrent capacity), and so you can focus on monitoring concurrent capacity primarily.

Depending on your service's SLAs, you might decide to set the concurrent capacity conservatively or aggressively. If performance is really important, then you might tune it so that at 100% of concurrent capacity, the machine still has CPU and RAM in reserve as a buffer. Or if throughput and cost are more important than performance, you might set concurrent capacity so that when it's at 100%, the machine is right at its limits of what it can process.

And it's a great metric to tune because you can adjust it in a straightforward way. Maybe you're leaving CPU on the table with a pool size of 100, so bump it up to 120, etc. Part of the process for tuning your application for each hardware configuration is determining what concurrent capacity it can safely handle. This does require some form of load testing to figure out though.