Skip to content

Commit d356811

Browse files
Update documentation for version 2.0
1 parent e0dd7b9 commit d356811

File tree

10 files changed

+335
-66
lines changed

10 files changed

+335
-66
lines changed

Flows.md

Lines changed: 0 additions & 29 deletions
This file was deleted.

Flows_Diagrams.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Flow Diagrams
2+
3+
- [Training job with SSM](#training-job-with-ssm)
4+
- [Training job with SSH](#training-job-with-ssh)
5+
6+
#### Training job with SSM
7+
8+
This flow corresponds to the [Connecting to SageMaker training jobs with SSM](README.md#training) procedure.
9+
10+
![Screenshot](images/Flow_Train_SSM.png)
11+
12+
13+
#### Training job with SSH
14+
15+
This flow corresponds to the [Remote code execution with PyCharm / VSCode over SSH](README.md#remote-interpreter) procedure.
16+
17+
![Screenshot](images/Flow_Train_SSH.png)
18+

Flows_IDE.md

Lines changed: 0 additions & 28 deletions
This file was deleted.

README.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,28 +8,27 @@ SageMaker SSH Helper is an "army-knife" library that helps you to securely conne
88
realtime inference endpoints, and SageMaker Studio notebook containers for fast interactive experimentation,
99
remote debugging, and advanced troubleshooting.
1010

11-
The three most common scenarios for the library, also known as "SSH into SageMaker", are:
11+
Three most common asks that motivated to create the library, sometimes referred as "SSH into SageMaker", are:
1212
1. A terminal session into a container running in SageMaker to diagnose a stuck training job, use CLI commands
1313
like nvidia-smi, or iteratively fix and re-execute your training script within seconds.
1414
2. Remote debugging of a code running in SageMaker from your local favorite IDE like
1515
PyCharm Professional Edition or Visual Studio Code.
1616
3. Port forwarding to access diagnostic tools running inside SageMaker, e.g., Dask dashboard, TensorBoard or Spark Web UI.
1717

18-
Other scenarios include but not limited to connecting to a remote Jupyter Notebook in SageMaker Studio from your IDE, or start a VNC session to SageMaker Studio to run GUI apps.
19-
20-
Also see our [Frequently Asked Questions](FAQ.md), especially if you're using Windows on your local machine.
18+
Other asks include but not limited to connecting to a remote Jupyter Notebook in SageMaker Studio from your IDE, or start a VNC session to SageMaker Studio to run GUI apps.
2119

2220
## How it works
23-
SageMaker SSH helper uses AWS Systems Manager (SSM) Session Manager, to register the SageMaker container in SSM, followed by creating an SSM session between your client machine and the SageMaker container. Then you can "SSH into SageMaker" by creating an SSH (Secure Shell) connection on top of the SSM session, that allows opening a Linux shell, and/or configuring bidirectional SSH port forwarding to enable applications like remote development/debugging/desktop, and others.
21+
SageMaker SSH helper uses AWS Systems Manager (SSM) Session Manager, to register the SageMaker container in SSM, followed by creating an SSM session between your client machine and the SageMaker container. Then you can "SSH into SageMaker" by creating an SSH (Secure Shell) connection on top of the SSM session, that allows opening a Linux shell and configuring bidirectional SSH port forwarding to run applications like remote development, debugging, desktop GUI, and others.
22+
23+
![Screenshot](images/high-level-architecture.png)
2424

25-
![Screenshot](images/layers.png)
25+
Once you become familiar with the library, check the [Flow Diagrams](Flows_Diagrams.md) of the common use cases.
2626

27-
See detailed architecture diagrams of the complete flow of participating components
28-
in [Training Diagram](Flows.md), and [IDE integration with SageMaker Studio diagram](Flows_IDE.md).
27+
Also make sure you looked at our [Frequently Asked Questions](FAQ.md).
2928

3029
## Getting started
3130

32-
To get started, your AWS system administrator must set up needed IAM and SSM configuration in your AWS account as shown
31+
To get started, your AWS system administrator must configure IAM and SSM in your AWS account as shown
3332
in [Setting up your AWS account with IAM and SSM configuration](IAM_SSM_Setup.md).
3433

3534
> **Note**: This solution is a sample AWS content. You should not use this content in your production accounts, in a production

images/Flow_Train_SSH.png

209 KB
Loading

images/Flow_Train_SSM.png

160 KB
Loading

images/high-level-architecture.png

99.8 KB
Loading

images/layers.png

-18.9 KB
Binary file not shown.

uml/Flow_Train_SSH.puml

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
See https://pdf.plantuml.net/PlantUML_Language_Reference_Guide_en.pdf
2+
3+
@startuml
4+
actor Developer as dev
5+
participant "SageMaker SSH Helper \n library (local)" as sm_ssh_helper_local
6+
participant "Amazon SageMaker" as sagemaker
7+
participant "VPC \n private or managed" as vpc
8+
participant "SageMaker SSH Helper \n library (remote)" as sm_ssh_helper_remote
9+
participant "SSH server" as ssh
10+
participant "SSM agent" as ssm_agent
11+
participant "AWS Systems Manager \n (SSM)" as ssm
12+
participant "IAM" as iam
13+
14+
dev -> sm_ssh_helper_local: Get library dir
15+
16+
note right of dev
17+
.../sagemaker_ssh_helper/
18+
end note
19+
20+
return
21+
22+
23+
note left of dev
24+
estimator:
25+
source_dir/
26+
train.py
27+
28+
.../sagemaker_ssh_helper/
29+
end note
30+
31+
dev -> sm_ssh_helper_local: Create wrapper \n around estimator
32+
activate sm_ssh_helper_local
33+
sm_ssh_helper_local -> sm_ssh_helper_local: Modify estimator \n metadata
34+
return
35+
deactivate sm_ssh_helper_local
36+
37+
dev -> sagemaker: Create training job \n (fit estimator)
38+
note left of dev
39+
sourcedir.tar.gz
40+
end note
41+
42+
sagemaker -> vpc: Start containers
43+
note left vpc
44+
docker run train
45+
end note
46+
47+
activate vpc
48+
49+
vpc -> vpc: Start training
50+
note left vpc
51+
train.py
52+
end note
53+
activate vpc
54+
55+
vpc -> sm_ssh_helper_remote: Setup and start SSH
56+
note left sm_ssh_helper_remote
57+
sm-setup-ssh
58+
end note
59+
activate vpc
60+
61+
activate sm_ssh_helper_remote
62+
63+
sm_ssh_helper_remote -> sm_ssh_helper_remote: Configure and \n install libs
64+
sm_ssh_helper_remote -> sm_ssh_helper_remote: Save environment \n for remote shell
65+
note left sm_ssh_helper_remote
66+
sm-save-env
67+
end note
68+
69+
sm_ssh_helper_remote -> ssh: Start SSH server
70+
activate ssh
71+
72+
sm_ssh_helper_remote -> sm_ssh_helper_remote: Initialize SSM
73+
activate sm_ssh_helper_remote
74+
note left sm_ssh_helper_remote
75+
sm-init-ssm
76+
end note
77+
sm_ssh_helper_remote -> ssm: Create activation
78+
sm_ssh_helper_remote -> ssm_agent: Register
79+
ssm_agent -> ssm:
80+
note right ssm_agent
81+
mi-01234567890abcdef
82+
end note
83+
ssm --> ssm_agent:
84+
deactivate sm_ssh_helper_remote
85+
86+
sm_ssh_helper_remote -> ssm_agent: Start SSM agent
87+
activate ssm_agent
88+
ssm_agent -> ssm: Go online
89+
90+
91+
sm_ssh_helper_remote -> sm_ssh_helper_remote: Start waiting \n loop
92+
activate sm_ssh_helper_remote
93+
note left sm_ssh_helper_remote
94+
sm-wait
95+
end note
96+
97+
note right dev
98+
sm-local-ssh-training connect <training_job_name>
99+
end note
100+
dev -> sm_ssh_helper_local: Connect
101+
sm_ssh_helper_local -> sm_ssh_helper_local: Get instance IDs
102+
sm_ssh_helper_local -> ssm: List and filter instances
103+
sm_ssh_helper_local -> sm_ssh_helper_local: Repeat until successful \n or timeout
104+
105+
note right sm_ssh_helper_local
106+
mi-01234567890abcdef, ...
107+
end note
108+
109+
activate sm_ssh_helper_local
110+
note right sm_ssh_helper_local
111+
sm-connect-ssh-proxy mi-01234567890abcdef
112+
end note
113+
sm_ssh_helper_local -> sm_ssh_helper_local: Generate SSH key pair
114+
sm_ssh_helper_local -> ssm: Copy SSH public key through S3
115+
116+
ssm -> iam: Check SendCommand \n permissions
117+
ssm -> ssm_agent: Run command
118+
ssm_agent -> ssm_agent: Copy key from S3
119+
sm_ssh_helper_local -> ssm: Start SSH session proxy \n over SSM
120+
121+
ssm -> iam: Check StartSession \n permissions
122+
123+
ssm -> ssm_agent: Start SSH session
124+
ssm_agent -> ssm_agent: Start SSH proxy tunnel
125+
ssm_agent --> sm_ssh_helper_local:
126+
sm_ssh_helper_local -> ssm_agent: Start SSH port forwarding \n over SSM proxy tunnel
127+
ssm_agent -> ssh: Start SSH proxy \n tunnel session
128+
activate ssh
129+
note right dev
130+
ssh sagemaker-training
131+
end note
132+
dev -> ssh: Connect with SSH through forwarded SSH port
133+
ssh --> dev:
134+
135+
dev -> vpc: Run commands inside a container \n (before training)
136+
137+
note right dev
138+
sm-local-ssh-training stop-waiting
139+
end note
140+
141+
dev -> vpc: Stop waiting
142+
note right vpc
143+
sm-wait stop
144+
end note
145+
deactivate sm_ssh_helper_remote
146+
147+
vpc -> vpc: Training begins
148+
deactivate vpc
149+
150+
deactivate sm_ssh_helper_remote
151+
152+
dev -> vpc: Run commands inside a container \n (during training)
153+
154+
...Training is in progress...
155+
dev -> vpc: Stop training or wait until finished
156+
157+
deactivate ssh
158+
159+
deactivate ssh
160+
deactivate ssm_agent
161+
deactivate vpc
162+
163+
vpc --> sagemaker: Job is finished
164+
deactivate vpc
165+
deactivate sm_ssh_helper_local
166+
167+
@enduml

0 commit comments

Comments
 (0)