Skip to content

Conversation

@sjpb
Copy link
Collaborator

@sjpb sjpb commented Dec 9, 2025

When topology feature is enabled, slurm does not tolerate the default empty --nodelist= parameter for the desktop and rstudio apps. This fixes it by making it conditional.

Also turns on topology in CI to try to catch any future similar issues.

Failure was e.g in login node's /var/log/ondemand-nginx/demo_user/error.log:

App 17541 output: [2025-12-09 11:54:29 +0000 ]  INFO "execve = [{}, \"sbatch\", \"-D\", \"/home/demo_user/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/slurm/output/8dc04023-7393-4a92-918f-06e876446f38\", \"-J\", \"ood-desktop\", \"-o\", \"/home/demo_user/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/slurm/output/8dc04023-7393-4a92-918f-06e876446f38/output.log\", \"-p\", \"cclake_himem\", \"-t\", \"01:00:00\", \"--export\", \"NONE\", \"--nodes=1\", \"--ntasks=1\", \"--nodelist=\", \"--parsable\", \"-M\", \"stg\"]"
App 17541 output: [2025-12-09 11:54:29 +0000 ] ERROR "ERROR: OodCore::JobAdapterError - sbatch: error: Batch job submission failed: Requested node configuration is not available"

@sjpb sjpb marked this pull request as ready for review December 10, 2025 10:37
@sjpb sjpb requested a review from a team as a code owner December 10, 2025 10:37
@sjpb sjpb requested a review from elelaysh December 10, 2025 13:56
@elelaysh
Copy link
Contributor

Why don't rstudio, matlab and codeserver have the --node arg in the launch script?

@sjpb
Copy link
Collaborator Author

sjpb commented Dec 10, 2025

Why don't rstudio, matlab and codeserver have the --node arg in the launch script?

Because the example apps with those didn't add those I think. Its arguable whether all or none should have it TBH! I can sort-of see that for some cases you might want to land on a specific node, but I'm not sure any of these apps are one such case.

Copy link
Contributor

@elelaysh elelaysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix confirmed on an appliance instance.
Indeed the issue is not present if the topology plugin is not active.

@elelaysh
Copy link
Contributor

Why don't rstudio, matlab and codeserver have the --node arg in the launch script?

Because the example apps with those didn't add those I think. Its arguable whether all or none should have it TBH! I can sort-of see that for some cases you might want to land on a specific node, but I'm not sure any of these apps are one such case.

I would like to see unified general parameters (cores, mem, gpus, nodes, ...) for all apps, because of DRY principle + you never know: a user might want to have codeserver to gdb attach to a running program, ...

@sjpb sjpb merged commit 74e1526 into main Dec 11, 2025
31 checks passed
@sjpb sjpb deleted the fix/ondemand-topology-nodelist branch December 11, 2025 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants