"If we have a CUDA kernel that continuously runs for 10 seconds but only uses 1 SM, on an H100, this would register 100% utilization, but the SM efficiency would be 1 / 132 = 0.7%."
does this situation register 100% utilization?
BTW, the SM OCCUPANCY is also a metric you need to care about if you concern on kernel efficiency
Yup, you'll see 100% utilization on a kernel over a time period if it's considered active, which includes just having a single thread executing [1]. SM occupancy is great but can be a little difficult to interpret since you're not simply trying to maximize it, unlike SM efficiency.
does this situation register 100% utilization? BTW, the SM OCCUPANCY is also a metric you need to care about if you concern on kernel efficiency