triton

Community

NVIDIA Triton Inference Server for deploying AI models at scale. Supports multiple frameworks (ONNX, TensorRT, PyTorch, TensorFlow), model ensembles, dynamic batching, model versioning, and GPU/CPU inference with high throughput and low latency.

Install