What's Changed
- Fix broken throughput computation
- Fix multi-node server dying upon first request
- Improved RDMA support
- Added flash-infer as backend, added sglang to Docker image for future support
- Add optional bind path rendering in CLI launch response, change vllm arg and env var rendering to optional
- Merge CLI arg bind path and config bind path instead of overwrite
- Add CPUs per task and memory per node to CLI launch command
- Improve server status error handling
- Minor code refactor
Full Changelog: v0.7.1...v0.7.2