MNIST model CPU training broken in TF v2.7 (conda_tensorflow2_p37 kernel on NBI ALv2 JLv3)

The current `conda_tensorflow2_p38` kernel on the latest SageMaker Notebook Instance platform (`notebook-al2-v2`, as used in the CFn template) seems to break local CPU-only training for the MNIST migration challenge.

In this environment (TF v2.7.1, TF.Keras v2.7.0), `tensorflow.keras.backend.image_data_format()` asks for `channels_first`, but training fails because MaxPoolingOp only supports channels_last on CPU - per the error message below:

```
InvalidArgumentError:  Default MaxPoolingOp only supports NHWC on device type CPU
	 [[node sequential/max_pooling2d/MaxPool
 (defined at /home/ec2-user/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/keras/layers/pooling.py:357)
]] [Op:__inference_train_function_862]

Errors may have originated from an input operation.
Input Source operations connected to node sequential/max_pooling2d/MaxPool:
In[0] sequential/conv2d_1/Relu (defined at /home/ec2-user/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/keras/backend.py:4867)
```

Overriding the `image_data_format()` check (in "Pre-Process the Data for our CNN") to prepare data in different shape does not work because the model is incompatible (will raise ValueError in conv2d_2).

Still seems to be working fine in current SMStudio kernel (TensorFlow v2.3.2, TF.Keras v2.4.0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MNIST model CPU training broken in TF v2.7 (conda_tensorflow2_p37 kernel on NBI ALv2 JLv3) #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MNIST model CPU training broken in TF v2.7 (conda_tensorflow2_p37 kernel on NBI ALv2 JLv3) #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions