We run GHA with auto-scaling self hosted runners at scale (~1000+ runners at peak hours) pretty well for PyTorch but it is a labor of love and patience.
However I'd say that Github's been pretty receptive to feedback and has actively fixed almost every wall that we've run into (if we haven't been able to fix it for ourselves)
However I'd say that Github's been pretty receptive to feedback and has actively fixed almost every wall that we've run into (if we haven't been able to fix it for ourselves)