Skip to content

Commit 466df17

Browse files
Adding sessions timeout to the placeholder script + describing port forwarding in README
1 parent ebdcfd4 commit 466df17

File tree

4 files changed

+80
-18
lines changed

4 files changed

+80
-18
lines changed

README.md

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ like nvidia-smi, or iteratively fix and re-execute your training script within s
1515
PyCharm Professional Edition or Visual Studio Code.
1616
3. Port forwarding to access diagnostic tools running inside SageMaker, e.g., Dask dashboard, TensorBoard or Spark Web UI.
1717

18-
Other scenarios include but not limited to connecting to a remote Jupyter Notebook in SageMaker Studio from your IDE, connect with your browser to a TensorBoard process running in the cloud, or start a VNC session to SageMaker Studio to run GUI apps.
18+
Other scenarios include but not limited to connecting to a remote Jupyter Notebook in SageMaker Studio from your IDE, or start a VNC session to SageMaker Studio to run GUI apps.
1919

2020
Also see our [Frequently Asked Questions](FAQ.md), especially if you're using Windows on your local machine.
2121

@@ -49,6 +49,7 @@ monitor resources, produce thread-dumps for stuck jobs, and interactively run yo
4949
- [Connecting to SageMaker inference endpoints with SSM](#inference)
5050
- [Connecting to SageMaker batch transform jobs](#batch-transform)
5151
- [Connecting to SageMaker processing jobs](#processing)
52+
- [Forwarding TCP ports over SSH tunnel](#port-forwarding) - to access remote apps like Dask or Streamlit
5253
- [Remote debugging with PyCharm Debug Server over SSH](#pycharm-debug-server) - let SageMaker run your code that connects to PyCharm, to start line-by-line debugging with [PyDev.Debugger](https://pypi.org/project/pydevd-pycharm/), a.k.a. pydevd
5354
- [Remote code execution with PyCharm / VSCode over SSH](#remote-interpreter) - let PyCharm run or debug your code line-by-line inside SageMaker container with SSH interpreter
5455
- [Local IDE integration with SageMaker Studio over SSH for PyCharm / VSCode](#studio) - iterate fast on a single node at early stages of development without submitting SageMaker jobs
@@ -424,6 +425,31 @@ import sagemaker_ssh_helper
424425
sagemaker_ssh_helper.setup_and_start_ssh()
425426
```
426427

428+
## <a name="port-forwarding"></a>Forwarding TCP ports over SSH tunnel
429+
430+
Previous sections focused on connecting to non-interactive SageMaker containers with SSM.
431+
432+
Next sections rely on the Session Manager capability to create an SSH tunnel over SSM connection. SageMaker SSH Helper in turn runs SSH session over SSH tunnel and forwards the ports, including the SSH server port 22 itself.
433+
434+
The helper script behind this logic is `sm-local-start-ssh`:
435+
436+
```shell
437+
sm-local-start-ssh "$INSTANCE_ID" \
438+
-R localhost:12345:localhost:12345 \
439+
-L localhost:8787:localhost:8787 \
440+
-L localhost:11022:localhost:22
441+
```
442+
443+
You can pass `-L` parameters for forwarding remote container port to local machine (e.g., `8787` for [Dask dashboard](https://docs.dask.org/en/stable/dashboard.html) or `8501` for [Streamlit apps](https://docs.streamlit.io/library/get-started)) or `-R` for forwarding local port to remote container. Read more about these options in the [SSH manual](https://man.openbsd.org/ssh).
444+
445+
This low-level script takes the managed instance ID as a parameter. The next sections describe how to use the higher-level APIs that take the SageMaker resource name as a parameter and resolve it into the instance ID automatically (a.k.a. `sm-local-ssh-*` scripts):
446+
447+
* `sm-local-ssh-training`
448+
* `sm-local-ssh-processing`
449+
* `sm-local-ssh-inference`
450+
* `sm-local-ssh-transform`
451+
* `sm-local-ssh-ide`
452+
427453
## <a name="pycharm-debug-server"></a>Remote debugging with PyCharm Debug Server over SSH
428454

429455
This procedure uses PyCharm's Professional feature: [Remote debugging with the Python remote debug server configuration](https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html#remote-debug-config)
@@ -540,16 +566,22 @@ The dummy script may look like this:
540566
541567
```python
542568
import time
569+
from datetime import timedelta
543570
544-
import sagemaker_ssh_helper
545-
sagemaker_ssh_helper.setup_and_start_ssh()
571+
from sagemaker_ssh_helper import setup_and_start_ssh, is_last_session_timeout
546572
547-
while True:
573+
setup_and_start_ssh()
574+
575+
while not is_last_session_timeout(timedelta(minutes=30)):
548576
time.sleep(10)
549577
```
550578
579+
The method `is_last_session_timeout()` will help to prevent unused resources and the job will end if there's were no SSM or SSH sessions for the specified period of time.
580+
581+
Keep in mind that SSM sessions will [terminate automatically due to user inactivity](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-preferences-timeout.html), but SSH sessions will keep running until either a user terminates them or network timeout occurs (e.g., when local machine hibernates).
582+
551583
Make also sure that you're aware of [SageMaker Managed Warm Pools](https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html)
552-
feature, which is also helpful in such a scenario.
584+
feature, which is also helpful in the scenario when you need to rerun your code multiple times.
553585
554586
*Pro Tip:* Note that you can debug your code line by line in this scenario, too! See [the tutorial in PyCharm documentation](https://www.jetbrains.com/help/pycharm/debugging-your-first-python-application.html#debug). Some users might prefer this option instead of using Debug Server as a simpler alternative.
555587

sagemaker_ssh_helper/__init__.py

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,51 @@
1+
import subprocess
2+
import os
3+
from datetime import datetime, timedelta
4+
15
import sagemaker_ssh_helper.env
26

7+
sagemaker_ssh_helper.last_session_time = datetime.now()
38

4-
def setup_and_start_ssh():
5-
import subprocess
6-
import os
79

10+
def setup_and_start_ssh():
811
if "START_SSH" not in os.environ:
9-
print("WARNING: SageMaker SSH Helper is not correctly initialized. "
12+
print("[sagemaker-ssh-helper] WARNING: SageMaker SSH Helper is not correctly initialized. "
1013
"Did you forget to call wrapper.create() _before_ fit() / run() / transform() / deploy()?")
1114

1215
ssh_instance_count = int(os.environ.get("SSH_INSTANCE_COUNT", "1"))
1316
node_rank = sagemaker_ssh_helper.env.sm_get_node_rank()
1417
start_ssh = os.environ.get("START_SSH", "false")
1518

16-
print(f"SSH Helper startup params: start_ssh={start_ssh}, ssh_instance_count={ssh_instance_count},"
17-
f" node_rank={node_rank}")
19+
print(f"[sagemaker-ssh-helper] SSH Helper startup params: start_ssh={start_ssh}, "
20+
f"ssh_instance_count={ssh_instance_count}, node_rank={node_rank}")
1821

22+
script = sagemaker_ssh_helper.env.get_caller_script_name(2)
1923
if start_ssh == "true" and node_rank < ssh_instance_count:
20-
print(f"Starting SSH Helper setup")
24+
print(f"[sagemaker-ssh-helper] Starting SSH Helper setup from {script}")
2125
absolute_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "sm-setup-ssh")
2226
subprocess.check_call(["bash", absolute_path]) # nosec B607 # absolute path is calculated
2327
else:
24-
print(f"Skipping SSH Helper setup")
28+
print(f"[sagemaker-ssh-helper] Skipping SSH Helper setup from {script}")
29+
30+
31+
def is_last_session_timeout(time_delta: timedelta):
32+
args = ["pgrep", "-f", "ssm-session-worker"]
33+
try:
34+
out = subprocess.check_output(args)
35+
worker_pids = list(map(int, out.splitlines()))
36+
except subprocess.CalledProcessError:
37+
worker_pids = []
38+
print(f"[sagemaker-ssh-helper] Number of open sessions: {len(worker_pids)}")
39+
if worker_pids:
40+
sagemaker_ssh_helper.last_session_time = datetime.now()
41+
timeout = False
42+
else:
43+
time_left = time_delta - (datetime.now() - sagemaker_ssh_helper.last_session_time)
44+
time_str = str(time_left).split(".")[0]
45+
timeout = (time_left <= timedelta(seconds=0))
46+
if not timeout:
47+
print(f"[sagemaker-ssh-helper] Time left before timeout: {time_str}")
48+
else:
49+
print(f"[sagemaker-ssh-helper] Sessions timeout!")
50+
51+
return timeout

sagemaker_ssh_helper/sm-start-ssh

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@ source "$dir"/sm-helper-functions
88

99
_install_helper_scripts
1010

11-
set -v
12-
1311
# Log IP addresses of the container (useful only in training in combination with VPC + VPN)
1412
echo "SSH Helper Log IP: $(hostname -I)"
1513

@@ -31,6 +29,9 @@ sm-save-env
3129
# Dump container bootstrap environment (PID 1) - can be different from above, useful for debugging
3230
ps wwwe -p 1 | tail -1
3331

32+
sed -i -e 's~^ClientAliveInterval~#ClientAliveInterval~' /etc/ssh/sshd_config
33+
echo "ClientAliveInterval 15" >> /etc/ssh/sshd_config
34+
3435
sed -i -e 's~^PermitRootLogin~#PermitRootLogin~' /etc/ssh/sshd_config
3536
echo PermitRootLogin yes >> /etc/ssh/sshd_config
3637

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
import time
2+
from datetime import timedelta
23

3-
import sagemaker_ssh_helper
4-
sagemaker_ssh_helper.setup_and_start_ssh()
4+
from sagemaker_ssh_helper import setup_and_start_ssh, is_last_session_timeout
55

6-
while True:
6+
setup_and_start_ssh()
7+
8+
while not is_last_session_timeout(timedelta(minutes=30)):
79
time.sleep(10)

0 commit comments

Comments
 (0)