You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
logging.info(f"To connect over SSM run: aws ssm start-session --target {instance_ids[0]}")
111
+
logging.info(f"To connect over SSH run: sm-local-ssh-training connect {ssh_wrapper.latest_training_job_name()}")
105
112
```
106
113
107
114
*Note:*`connection_wait_time_seconds` is the amount of time the SSH helper will wait inside SageMaker before it continues normal execution. It's useful for training jobs, when you want to connect before training starts.
@@ -138,7 +145,7 @@ and will appear in the job's CloudWatch log like this:
138
145
Successfully registered the instance with AWS SSM using Managed instance-id: mi-1234567890abcdef0
139
146
```
140
147
141
-
To fetch the instance IDs from the logs in an automated way, call the Python method of `ssh_wrapper`,
148
+
To fetch the instance IDs in an automated way, call the Python method of `ssh_wrapper`,
142
149
as mentioned in the previous step:
143
150
144
151
```python
@@ -218,21 +225,29 @@ Adding SageMaker SSH Helper to inference endpoint is similar to training with th
218
225
1. Wrap your model into `SSHModelWrapper` before calling `deploy()` and add SSH Helper to `dependencies`:
219
226
220
227
```python
228
+
from sagemaker import Predictor
221
229
from sagemaker_ssh_helper.wrapper import SSHModelWrapper # <--NEW--
222
230
223
231
estimator =...
224
232
...
225
233
endpoint_name =...
226
234
227
-
model = estimator.create_model(entry_point='inference.py',
@@ -561,7 +576,7 @@ Note, that if you stop the waiting loop, SageMaker will run your training script
561
576
562
577
But there's a useful trick: submit a dummy script `train_placeholder.py` with the infinite loop, and while this loop will be running, you can
563
578
run your real training script again and again with the remote interpreter.
564
-
Setting `max_run` parameter of the estimator is highly recommended in this case.
579
+
Setting `max_run` parameter of the estimator is highly recommended in this case.
565
580
566
581
The dummy script may look like this:
567
582
@@ -581,6 +596,8 @@ The method `is_last_session_timeout()` will help to prevent unused resources and
581
596
582
597
Keep in mind that SSM sessions will [terminate automatically due to user inactivity](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-preferences-timeout.html), but SSH sessions will keep running until either a user terminates them or network timeout occurs (e.g., when local machine hibernates).
583
598
599
+
Consider also sending e-mail notifications for users of the long-running jobs, so the users don't forget to shut down unused resources.
600
+
584
601
Make also sure that you're aware of [SageMaker Managed Warm Pools](https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html)
585
602
feature, which is also helpful in the scenario when you need to rerun your code multiple times.
0 commit comments