I had to get to grips with the USB spec when designing my USB scope [1].
Adherence to the USB spec itself doesn't necessarily create human-perceptible latency; HID frames can be transmitted every 1 ms and could be parsed in a negligible number of CPU cycles on an embedded machine with USB-OTG and no operating system.
Like everything, the noticeable lag appears because the application-layer software is going through 15 layers of abstraction rather than talking directly to the input device, and this can become as complex and bloated as it wants.
I had to get to grips with the USB spec when designing my USB scope [1].
Adherence to the USB spec itself doesn't necessarily create human-perceptible latency; HID frames can be transmitted every 1 ms and could be parsed in a negligible number of CPU cycles on an embedded machine with USB-OTG and no operating system.
Like everything, the noticeable lag appears because the application-layer software is going through 15 layers of abstraction rather than talking directly to the input device, and this can become as complex and bloated as it wants.
[1] https://espotek.com/labrador/ (In case you become unable to resist your scope urge :P)