Skip to content

Conversation

@lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Dec 20, 2025

  • TM_COMM_MAX_CTAS controls the max number of CTAs used in non-LL collectives
  • TM_COMM_NVLS_ENABLE controls whether NVLS can be used (to avoid malfunctioning NVLS)
  • TM_COMM_COPY_THRESHOLD send-size threshold to switch to copy engine based all-gather (10-15% boost for large send size because of reduced protocol cost)
  • default number of CTAs for NVLS based all-reduce collectives increased to 16 (compared to the value of 4 which is tuned for NV8-powered systems, this enables 2.5x peak throughput on NV18-powered systems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants