Skip to content

Commit 70fd1ef

Browse files
authored
Update deployment docs (#234)
Clarify and correct the docs. Resolves #222
1 parent 8dff7cd commit 70fd1ef

File tree

2 files changed

+25
-6
lines changed

2 files changed

+25
-6
lines changed

docs/deploy-parallel-cluster.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ The first is the configuration stack and the second is the cluster.
4444

4545
## Create users_groups.json
4646

47+
**NOTE**: If you are using RES and specify RESEnvironmentName in your configuration, these steps will automatically be done for you.
48+
4749
Before you can use the cluster you must configure the Linux users and groups for the head and compute nodes.
4850
One way to do that would be to join the cluster to your domain.
4951
But joining each compute node to a domain effectively creates a distributed denial of service (DDOS) attack on the demain controller
@@ -59,14 +61,14 @@ The outputs of the configuration stack have the commands required.
5961

6062
| Config Stack Output | Description
6163
|-----------------------------------------|------------------
62-
| Command01SubmitterMountHeadNode | Mounts the Slurm cluster's shared file system, adds it to /etc/fstab.
63-
| Command02CreateUsersGroupsJsonConfigure | Create /opt/slurm/{{ClusterName}}/config/users_groups.json and create a cron job to refresh it hourly.
64+
| Command01_MountHeadNodeNfs | Mounts the Slurm cluster's shared file system at /opt/slurm/{{ClusterName}}. This provides access to the configuration script used in the next step.
65+
| Command02_CreateUsersGroupsJsonConfigure | Create /opt/slurm/{{ClusterName}}/config/users_groups.json and create a cron job to refresh it hourly. Update /etc/fstab with the mount in the previous step.
6466

6567
Before deleting the cluster you can undo the configuration by running the commands in the following outputs.
6668

6769
| Config Stack Output | Description
6870
|-------------------------------------------|------------------
69-
| command10CreateUsersGroupsJsonDeconfigure | Removes the crontab that refreshes users_groups.json.
71+
| command10_CreateUsersGroupsJsonDeconfigure | Removes the crontab that refreshes users_groups.json.
7072

7173
Now the cluster is ready to be used by sshing into the head node or a login node, if you configured one.
7274

@@ -75,6 +77,8 @@ in with their own ssh keys.
7577

7678
## Configure submission hosts to use the cluster
7779

80+
**NOTE**: If you are using RES and specify RESEnvironmentName in your configuration, these steps will automatically be done for you on all running DCV desktops.
81+
7882
ParallelCluster was built assuming that users would ssh into the head node or login nodes to execute Slurm commands.
7983
This can be undesirable for a number of reasons.
8084
First, users shouldn't be given ssh access to a critical infrastructure like the cluster head node.
@@ -90,14 +94,19 @@ Run them in the following order:
9094

9195
| Config Stack Output | Description
9296
|-----------------------------------------|------------------
93-
| Command01SubmitterMountHeadNode | Mounts the Slurm cluster's shared file system, adds it to /etc/fstab.
94-
| Command03SubmitterConfigure | Configure the submission host so it can directly access the Slurm cluster.
97+
| Command01_MountHeadNodeNfs | Mounts the Slurm cluster's shared file system at /opt/slurm/{{ClusterName}}. This provides access to the configuration script used in the next step.
98+
| Command03_SubmitterConfigure | Configure the submission host so it can directly access the Slurm cluster. Update /etc/fstab with the mount in the previous step.
9599

96100
The first command simply mounts the head node's NFS file system so you have access to the Slurm commands and configuration.
97101

98102
The second command runs an ansible playbook that configures the submission host so that it can run the Slurm commands for the cluster.
103+
It will also compile the Slurm binaries for the OS distribution and CPU architecture of your host.
99104
It also configures the modulefile that sets up the environment to use the slurm cluster.
100105

106+
**NOTE**: When the new modulefile is created, you need to refresh your shell environment before the modulefile
107+
can be used.
108+
You can do this by opening a new shell or by sourcing your .profile: `source ~/.profile`.
109+
101110
The clusters have been configured so that a submission host can use more than one cluster by simply changing the modulefile that is loaded.
102111

103112
On the submission host just open a new shell and load the modulefile for your cluster and you can access Slurm.
@@ -126,10 +135,20 @@ Then update your aws-eda-slurm-cluster stack by running the install script again
126135

127136
Run the following command in a shell to configure your environment to use your slurm cluster.
128137

138+
**NOTE**: When the new modulefile is created, you need to refresh your shell environment before the modulefile
139+
can be used.
140+
You can do this by opening a new shell or by sourcing your profile: `source ~/.bash_profile`.
141+
129142
```
130143
module load {{ClusterName}}
131144
```
132145

146+
If you want to get a list of all of the clusters that are available execute the following command.
147+
148+
```
149+
module avail
150+
```
151+
133152
To submit a job run the following command.
134153

135154
```

source/cdk/cdk_slurm_stack.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3002,7 +3002,7 @@ def create_parallel_cluster_config(self):
30023002
)
30033003
region = self.cluster_region
30043004
cluster_name = self.config['slurm']['ClusterName']
3005-
CfnOutput(self, "Command01_SubmitterMountHeadNode",
3005+
CfnOutput(self, "Command01_MountHeadNodeNfs",
30063006
value = f"head_ip=head_node.{self.config['slurm']['ClusterName']}.pcluster && sudo mkdir -p /opt/slurm/{cluster_name} && sudo mount $head_ip:/opt/slurm /opt/slurm/{cluster_name}"
30073007
)
30083008
CfnOutput(self, "Command02_CreateUsersGroupsJsonConfigure",

0 commit comments

Comments
 (0)