|
43 | 43 | "## Monitoring the cifar10 training\n", |
44 | 44 | "Find your previous (cifar10_keras_sm) training job in the SageMaker console.\n", |
45 | 45 | "Open the job details and look at the job cloudwatch logs. \n", |
46 | | - "Configure the metrics regex that fits the logs. use only regex tools to check your expressions, use () to catch each matric" |
| 46 | + "Configure the metrics regex that fits the logs. use regex tools to check your expressions, use () to catch each matric\n", |
| 47 | + "In this example, the solution (One option for a solution) is below." |
47 | 48 | ] |
48 | 49 | }, |
49 | 50 | { |
|
53 | 54 | "outputs": [], |
54 | 55 | "source": [ |
55 | 56 | "metric_definitions = [\n", |
56 | | - " {'Name': 'train:loss', 'Regex': ''},\n", |
57 | | - " {'Name': 'train:accuracy', 'Regex': ''},\n", |
58 | | - " {'Name': 'validation:accuracy', 'Regex': ''},\n", |
59 | | - " {'Name': 'validation:loss', 'Regex': ''},\n", |
| 57 | + " {'Name': 'train:loss', 'Regex': 'loss: ([0-9\\\\.]+) - acc: [0-9\\\\.]+'},\n", |
| 58 | + " {'Name': 'train:accuracy', 'Regex': 'loss: [0-9\\\\.]+ - acc: ([0-9\\\\.]+)'},\n", |
| 59 | + " {'Name': 'validation:accuracy', 'Regex': 'val_loss: [0-9\\\\.]+ - val_acc: ([0-9\\\\.]+)'},\n", |
| 60 | + " {'Name': 'validation:loss', 'Regex': 'val_loss: ([0-9\\\\.]+) - val_acc: [0-9\\\\.]+'},\n", |
60 | 61 | "]" |
61 | 62 | ] |
62 | 63 | }, |
|
140 | 141 | "outputs": [], |
141 | 142 | "source": [ |
142 | 143 | "from sagemaker.tensorflow import TensorFlow\n", |
143 | | - "estimator = ..." |
| 144 | + "estimator = ... # Make sure you use the metric_definitions=metric_definitions argument." |
144 | 145 | ] |
145 | 146 | }, |
146 | 147 | { |
|
211 | 212 | "In the next section we'll update the script to save TensorBoard logs. \n", |
212 | 213 | "We'll be able to use TensorBoard for monitoring our jobs in real time. \n", |
213 | 214 | "\n", |
214 | | - "Update your cifar10-keras-sm.py script to send logs to TensorBoard.\n", |
215 | | - "You can use the `from keras.callbacks import TensorBoard` import.\n", |
| 215 | + "Update your cifar10-keras-sm.py script to send logs to TensorBoard. \n", |
| 216 | + "Add the `from keras.callbacks import TensorBoard` import.\n", |
216 | 217 | "\n", |
217 | | - "Keras will send TensorBoard logs in each batch. sending the logs to S3 will slow down the trainig job, change the TensorBoard callback to send the logs only at the end of an epoch.\n", |
| 218 | + "Keras will send TensorBoard logs in each batch. sending the logs to S3 will slow down the training job, change the TensorBoard callback to send the logs only at the end of an epoch.\n", |
218 | 219 | "\n", |
219 | 220 | "Add the TensorBoard callback to your script (add this line after the ModelCheckpoint callback)\n", |
220 | 221 | "```python\n", |
|
0 commit comments