Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"If we have a CUDA kernel that continuously runs for 10 seconds but only uses 1 SM, on an H100, this would register 100% utilization, but the SM efficiency would be 1 / 132 = 0.7%."

does this situation register 100% utilization? BTW, the SM OCCUPANCY is also a metric you need to care about if you concern on kernel efficiency



Yup, you'll see 100% utilization on a kernel over a time period if it's considered active, which includes just having a single thread executing [1]. SM occupancy is great but can be a little difficult to interpret since you're not simply trying to maximize it, unlike SM efficiency.

[1]: https://pytorch.org/blog/pytorch-profiler-1.9-released/#gpu-...


That's why I look mostly at the H100 temperatures. Gives a better utilization metric




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: