How have you liked using TensorRT-LLM? Did you come from faster-transformers, vLLM, LMDeploy, TGI, something else?
We started migrating to it the day it came out, very glad to have it, but lots of little annoyances along the way. Biggest one has been loading our model repository; having to hardcode the location of the engine file means we can't use the built-in ways Triton has for downloading from GCS!
We started migrating to it the day it came out, very glad to have it, but lots of little annoyances along the way. Biggest one has been loading our model repository; having to hardcode the location of the engine file means we can't use the built-in ways Triton has for downloading from GCS!