-
Notifications
You must be signed in to change notification settings - Fork 61
[Bug] Keras misconsiders the output of bias initialization method in dense layer as a scalar instead of a vector #271
Description
I am using keras-mxnet==2.2.4.3 along with mxnet==1.5.1 (this issue is also tested in mxnet==1.8.0)
When I am using the following lines to build a simple model containing one dense layer:
import numpy as np
import keras
import os
os.environ["KERAS_BACKEND"] = "mxnet"
from keras import layers
img_input = layers.Input(shape=(10,))
x = layers.Dense(units=1, bias_initializer='RandomUniform')(img_input)
model = keras.models.Model(img_input, x)
Keras will directly fail with the following error:
Traceback (most recent call last):
File "try.py", line 29, in <module>
x = layers.Dense(units=1, bias_initializer='RandomUniform')(img_input)
File "keras/engine/base_layer.py", line 470, in __call__
output = self.call(inputs, **kwargs)
File "keras/layers/core.py", line 893, in call
output = K.bias_add(output, self.bias, data_format='channels_last')
File "keras/backend/mxnet_backend.py", line 95, in func_wrapper
train_symbol = func(*args, **kwargs)
File "keras/backend/mxnet_backend.py", line 3989, in bias_add
% (len(bias_shape), x_dim))
ValueError: MXNet Backend: Unexpected bias dimensions 0, expect to be 1 or 2 dimensions
If we change the output unit to be not a value other than 1, such a problem will not appear.
Then I check into the source code in detail and find that Keras misinterprete the output of sym = mx.sym.random.uniform(shape=shape, low=minval, high=maxval, dtype=dtype) as a scalar instead of a vector.

As is shown in the above figure, before entering the 601 line, the result ret is still a vector, but it seems that Keras forgets to assign the _is_vector attribute to it, since the shape of ret is (1,), Keras will consider ret to be a scalar instead of a vector.
Finally, this lack of consideration will lead to the shape of bias to be 0:

And Keras will raise error assertion in line 3988.
I also test this problem in other bias initialization methods including RandomNormal and TruncatedNormal, all these three initialization method will lead to the same crash. However, when I test this issue on constant bias initialization method including Ones and Zeros, this problem will not occur. This is because keras will not use symbolic computation of mxnet and the type of bias will not be misunderstood.

As is shown above, when we set the bias using constant initialization, the type will not be assigned as KerasSymbol and therefore branch 254-255 will not be hit, so such a problem does not occur.
Base on this finding, can you add an additional guard on bias initialization methods so the (1,) dimension vector will not be misconsidered as a scalar?