-
Notifications
You must be signed in to change notification settings - Fork 125
Update GPU condition checks in frontier.mako #1076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -16,5 +16,5 @@ fi | |||||
| if [ "$job_device" = "gpu" ]; then | ||||||
| ./mfc.sh test -a --rdma-mpi --max-attempts 3 -j $ngpus $device_opts -- -c frontier | ||||||
| else | ||||||
| ./mfc.sh test -a --max-attempts 3 -j 32 -- -c frontier | ||||||
| ./mfc.sh test -a --max-attempts 3 -j 32 --no-gpu -- -c frontier | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Hard-coded parallelism: using a fixed Severity Level: Minor
Suggested change
Why it matters? ⭐Hard-coding -j 32 is brittle across runner types. Using $(nproc) adapts to available CPU cores and is a safe, practical improvement for parallelism observable in this hunk. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** .github/workflows/frontier/test.sh
**Line:** 19:19
**Comment:**
*Performance: Hard-coded parallelism: using a fixed `-j 32` ignores the actual number of available CPU cores and can lead to oversubscription or underutilization; replace the literal with a dynamic value such as `$(nproc)` to adapt to the runner's CPU count.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||
| fi | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -10,7 +10,7 @@ | |||||
| #SBATCH --time=${walltime} | ||||||
| #SBATCH --cpus-per-task=7 | ||||||
| #SBATCH -C nvme | ||||||
| % if gpu: | ||||||
| % if gpu != 'no': | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Refine the GPU check to
Suggested change
|
||||||
| #SBATCH --gpus-per-task=1 | ||||||
| #SBATCH --gpu-bind=closest | ||||||
| % endif | ||||||
|
|
@@ -34,12 +34,12 @@ ${helpers.template_prologue()} | |||||
| ok ":) Loading modules:\n" | ||||||
| cd "${MFC_ROOT_DIR}" | ||||||
| % if engine == 'batch': | ||||||
| . ./mfc.sh load -c f -m ${'g' if gpu else 'c'} | ||||||
| . ./mfc.sh load -c f -m ${'g' if gpu != 'no' else 'c'} | ||||||
| % endif | ||||||
| cd - > /dev/null | ||||||
| echo | ||||||
|
|
||||||
| % if gpu: | ||||||
| % if gpu != 'no': | ||||||
| export MPICH_GPU_SUPPORT_ENABLED=1 | ||||||
| % else: | ||||||
| export MPICH_GPU_SUPPORT_ENABLED=0 | ||||||
|
|
@@ -66,7 +66,7 @@ ulimit -s unlimited | |||||
| % if engine == 'interactive': | ||||||
| --unbuffered --nodes ${nodes} --ntasks-per-node ${tasks_per_node} \ | ||||||
| --cpus-per-task 7 \ | ||||||
| % if gpu: | ||||||
| % if gpu != 'no': | ||||||
| --gpus-per-task 1 --gpu-bind closest \ | ||||||
| % endif | ||||||
| ${profiler} "${target.get_install_binpath(case)}") | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The command substitution in the added line is unescaped inside an unquoted here-document, so $([ "$2" = "gpu" ] ...) is evaluated on the submission host when building the heredoc instead of being evaluated on the compute node at job runtime; escape the substitution (or quote the heredoc) so evaluation happens in the intended context. [logic error]
Severity Level: Minor⚠️
Why it matters? ⭐
Correct. The heredoc is unquoted, so the outer shell will perform command and variable expansion before sending the script to sbatch.
Escaping the $ (or quoting the heredoc delimiter) is the right fix if you want that substitution to occur on the compute node at job runtime.
The proposed improved_code (escape the $) achieves that with minimal risk.
Prompt for AI Agent 🤖