Skip to content

Commit e204f1a

Browse files
#4 - Notebook instance support + moving to SSMManager
1 parent 283194b commit e204f1a

16 files changed

+859
-49
lines changed

.idea/deployment.xml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

FAQ.md

Lines changed: 57 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,12 @@ The scripts like `sm-local-ssh-ide` and `sm-local-ssh-training` will now work fr
3636
Git Bash session under a regular user, and you may continue to work in your local IDE
3737
on Windows as usual.
3838

39+
### Are SageMaker notebook instances supported?
40+
41+
Yes, the setup is similar to SageMaker Studio. Run [SageMaker_SSH_Notebook.ipynb](SageMaker_SSH_Notebook.ipynb) on the notebook instance and `sm-local-ssh-notebook connect <<notebook-instance-name>>` your local machine.
42+
43+
Review the instructions for [SageMaker Studio integration with PyCharm / VSCode](README.md#studio) for the rest of details.
44+
3945
### How do you start the SSM session without knowing EC2 instance or container ID?
4046

4147
Indeed, when you run a SageMaker job, there are no EC2 instances or generic containers visible in AWS console, because the instances and containers are managed by the SageMaker service.
@@ -69,7 +75,7 @@ Yes, requires adding same IAM permissions to SageMaker role as described in the
6975

7076
### How SageMaker SSH Helper protects users from impersonating each other?
7177

72-
This logic is enforced by IAM policy. See the step 3b in [IAM_SSM_Setup.md](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/IAM_SSM_Setup.md)
78+
This logic is enforced by IAM policy. See the manual step 3 in [IAM_SSM_Setup.md](IAM_SSM_Setup.md#manual-setup)
7379
for a policy example.
7480

7581
It works as follows: the SageMaker SSH Helper assigns on behalf of the user the tag `SSHOwner`
@@ -81,7 +87,7 @@ When a user attempts to connect to an instance, IAM will authorize the user base
8187
on their ID and the value of the `SSHOwner` tag. The user will be denied to access the instance
8288
if the instance doesn't belong to them.
8389

84-
Another important part of it is the IAM policy with `ssm:AddTagsToResource` action, described in the step 1d.
90+
Another important part of it is the IAM policy with `ssm:AddTagsToResource` action, described in the manual step 2.
8591
Limiting this action only to SageMaker role as a resource will allow adding and updating tags only for
8692
the newly created activations (instances) and not for existing ones that may already belong to other users.
8793

@@ -115,6 +121,8 @@ In this case, make sure that SageMaker SSH Helper is installed in your `Dockerfi
115121
RUN pip --no-cache-dir install sagemaker-ssh-helper # <--NEW--
116122
```
117123

124+
**Important:** Make sure that the version installed into the container matches the version of the library on your local machine.
125+
118126
The code for running estimators and inference will look like this:
119127
```python
120128
from sagemaker.estimator import Estimator
@@ -187,6 +195,51 @@ predicted_value = predictor.predict(data=..., target_model=model_path)
187195

188196
See [#7](https://github.com/aws-samples/sagemaker-ssh-helper/issues/7) for this request.
189197

198+
### What if I want to use an estimator in a hyperparameter tuning job (HPO) and connect to a stuck training job with SSM?
199+
200+
In this case, `wrapper.get_instance_ids()` won't really work because you don't call `fit()` directly on the estimator and SSH Helper does not understand what training job you are trying to connect to.
201+
202+
You should use extra lower-level APIs to fetch the training job name of your interest first, and then either use `SSMManager` (recommended) or `SSHLog` (slower) to fetch their instance ids from the code:
203+
204+
```python
205+
import time
206+
207+
from sagemaker.mxnet import MXNet
208+
from sagemaker.tuner import HyperparameterTuner
209+
210+
from sagemaker_ssh_helper.manager import SSMManager
211+
from sagemaker_ssh_helper.wrapper import SSHEstimatorWrapper
212+
213+
estimator = MXNet(...)
214+
215+
_ = SSHEstimatorWrapper.create(estimator, connection_wait_time_seconds=0)
216+
217+
objective_metric_name = ...
218+
hyperparameter_ranges = ...
219+
metric_definitions = ...
220+
221+
tuner = HyperparameterTuner(estimator,
222+
objective_metric_name,
223+
hyperparameter_ranges,
224+
metric_definitions,
225+
...
226+
)
227+
228+
tuner.fit(wait=False)
229+
230+
time.sleep(15) # allow training jobs to start
231+
232+
analytics = tuner.analytics()
233+
training_jobs = analytics.training_job_summaries()
234+
training_job_name = training_jobs[0]['TrainingJobName']
235+
236+
instance_ids = SSMManager().get_training_instance_ids(training_job_name, 300)
237+
238+
print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]}')
239+
```
240+
241+
*Note:* If you want to connect to a stuck training job from the command line with SSH, use `sm-local-ssh-training` script, as for any other regular training job.
242+
190243
### How to start a job with SageMaker SSH Helper in an AWS Region different from my default one?
191244

192245
Define the SSH wrapper as usual, e.g.:
@@ -239,7 +292,7 @@ AWS_PROFILE=<<profile_name>> sm-local-ssh-ide <<kernel_gateway_app_name>>
239292
There’s plenty of methods already available for you to automate everything.
240293
Take a loot at the [end-to-end automated tests](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/tests/test_end_to_end.py) as an example.
241294
242-
There's `get_instance_ids()` method already mentioned in the documentation. Underlying automation methods are available in the [SSHLog class](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/sagemaker_ssh_helper/log.py).
295+
There's `get_instance_ids()` method already mentioned in the documentation. Underlying automation methods are available in the [SSMManager class](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/sagemaker_ssh_helper/manager.py) and the [SSHLog class](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/sagemaker_ssh_helper/log.py).
243296
244297
Also check the method `start_ssm_connection_and_continue()` from the [SSHEnvironmentWrapper class](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/sagemaker_ssh_helper/wrapper.py) - it automates creating the SSH tunnel, running remote commands and stopping the waiting loop as well as graceful disconnect. Underlying implementation is in the [SSMProxy class](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/sagemaker_ssh_helper/proxy.py).
245298
@@ -277,7 +330,7 @@ An error occurred (BadRequest) when calling the StartSession operation: Enable a
277330
```
278331
279332
First, check that your instance shows as an advanced instance in Fleet Manager.
280-
If it doesn't show up there, you've probably missed the step "2h" in [IAM_SSM_Setup.md](IAM_SSM_Setup.md).
333+
If it doesn't show up there, you've probably missed the manual step 1 in [IAM_SSM_Setup.md](IAM_SSM_Setup.md#manual-setup).
281334
282335
Also check that you're connecting from the same AWS region. Run the following command on your local machine and check that the region is the same as in your AWS console:
283336
```shell

0 commit comments

Comments
 (0)