"If we have a CUDA kernel that continuously runs for 10 seconds but only uses 1 ...

roanakb · on Aug 23, 2024

Yup, you'll see 100% utilization on a kernel over a time period if it's considered active, which includes just having a single thread executing [1]. SM occupancy is great but can be a little difficult to interpret since you're not simply trying to maximize it, unlike SM efficiency.

[1]: https://pytorch.org/blog/pytorch-profiler-1.9-released/#gpu-...

rurban · on Aug 23, 2024

That's why I look mostly at the H100 temperatures. Gives a better utilization metric