Skip to content

Commit 5c94314

Browse files
Cloudac7Han Wangamcadmus
authored
Fix parameter error for LSF GPU tasks (#463)
* fix the bug of the units of taut and taup * fix parameter error for LSF GPU tasks; add exclusive * update document for LSF new syntax for GPU * correct document description * add some extra description for exclusive Co-authored-by: Han Wang <wang_han@iapcm.ac.cn> Co-authored-by: Han Wang <amcadmus@gmail.com>
1 parent 8f26aea commit 5c94314

File tree

2 files changed

+12
-5
lines changed

2 files changed

+12
-5
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1307,6 +1307,8 @@ The following table gives explicit descriptions on keys in param.json.
13071307
| manual_cuda_devices | Interger | 1 | Used with key "manual_cuda_multiplicity" specify the gpu number
13081308
| manual_cuda_multiplicity |Interger | 5 | Used in 01.model_devi,used with key "manual_cuda_devices" specify the MD program number running on one GPU at the same time,dpgen will automatically allocate MD jobs on different GPU. This can improve GPU usage for GPU like V100.
13091309
| node_cpu | Integer | 4 | Only for LSF. The number of CPU cores on each node that should be allocated to the job.
1310+
| new_lsf_gpu | Boolean | false | **Only for LSF.** Control whether new syntax of GPU to be enabled. If enabled, DP-GEN will generate line like `#BSUB -gpu num=1:mode=shared:j_exclusive=yes` in job submission script. Only support LSF>=10.1.0.3, and `LSB_GPU_NEW_SYNTAX=Y` should be set. Default: `false`.
1311+
| exclusive | Boolean | false | **Only for LSF, and only take effect when `new_lsf_gpu` enabled.** Control whether enable `j_exclusive` during running. Default: `false`.
13101312
| source_list | List of string | "....../vasp.env" | Environment needed for certain job. For example, if "env" is in the list, 'source env' will be written in the script.
13111313
| module_list | List of string | [ "Intel/2018", "Anaconda3"] | For example, If "Intel/2018" is in the list, "module load Intel/2018" will be written in the script.
13121314
| partition | String | "AdminGPU" | Partition / queue in which to run the job. |

dpgen/dispatcher/LSF.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -108,13 +108,18 @@ def sub_script_head(self, res):
108108
if res['node_cpu']:
109109
ret += '#BSUB -R span[ptile=%d]\n' % res['node_cpu']
110110
if res.get('new_lsf_gpu', False):
111-
# supported in LSF >= 10.1.0 SP6
112-
# ref: https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_resource_sharing/use_gpu_res_reqs.html
113-
ret += '#BSUB -n %d\n#BSUB -gpu "num=%d:mode=shared:j_exclusive=yes"\n' % (
114-
res['numb_gpu'], res['task_per_node'])
111+
# supported in LSF >= 10.1.0.3
112+
# ref: https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0
113+
# /lsf_resource_sharing/use_gpu_res_reqs.html
114+
if res.get('exclusive', False):
115+
j_exclusive = "no"
116+
else:
117+
j_exclusive = "yes"
118+
ret += '#BSUB -n %d\n#BSUB -gpu "num=%d:mode=shared:j_exclusive=%s"\n' % (
119+
res['task_per_node'], res['numb_gpu'], j_exclusive)
115120
else:
116121
ret += '#BSUB -n %d\n#BSUB -R "select[ngpus >0] rusage[ngpus_excl_p=%d]"\n' % (
117-
res['numb_gpu'], res['task_per_node'])
122+
res['task_per_node'], res['numb_gpu'])
118123
if res['time_limit']:
119124
ret += '#BSUB -W %s\n' % (res['time_limit'].split(':')[
120125
0] + ':' + res['time_limit'].split(':')[1])

0 commit comments

Comments
 (0)