SGLang uses max-autotune-no-cudagraphs mode of torch.compile. The auto-tuning can be slow. If you want to deploy a model on many different machines, you can ship the torch.compile cache to these ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results