I have a question about the input and output (layer) of a DQN.
e.g
Two points: P1(x1, y1) and P2(x2, y2)
P1 has to walk towards P2
I have the following information:
Current position P1 (x/y)
Current position P2 (x/y)
Distance to P1-P2 (x/y)
Direction to P1-P2 (x/y)
P1 has 4 possible actions:
Up
Down
Left
Right
How do I have to setup the input and output layer?
4 input nodes
4 output nodes
Is that correct?
What do I have to do with the output?
I got 4 arrays with 4 values each as output.
Is doing argmax on the output correct?
Edit:
Input / State:
# Current position P1
state_pos = [x_POS, y_POS]
state_pos = np.asarray(state_pos, dtype=np.float32)
# Current position P2
state_wp = [wp_x, wp_y]
state_wp = np.asarray(state_wp, dtype=np.float32)
# Distance P1 - P2
state_dist_wp = [wp_x - x_POS, wp_y - y_POS]
state_dist_wp = np.asarray(state_dist_wp, dtype=np.float32)
# Direction P1 - P2
distance = [wp_x - x_POS, wp_y - y_POS]
norm = math.sqrt(distance[0] ** 2 + distance[1] ** 2)
state_direction_wp = [distance[0] / norm, distance[1] / norm]
state_direction_wp = np.asarray(state_direction_wp, dtype=np.float32)
state = [state_pos, state_wp, state_dist_wp, state_direction_wp]
state = np.array(state)
Network:
def __init__(self):
self.q_net = self._build_dqn_model()
self.epsilon = 1
def _build_dqn_model(self):
q_net = Sequential()
q_net.add(Dense(4, input_shape=(4,2), activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(4, activation='linear', kernel_initializer='he_uniform'))
rms = tf.optimizers.RMSprop(lr = 1e-4)
q_net.compile(optimizer=rms, loss='mse')
return q_net
def random_policy(self, state):
return np.random.randint(0, 4)
def collect_policy(self, state):
if np.random.random() < self.epsilon:
return self.random_policy(state)
return self.policy(state)
def policy(self, state):
# Here I get 4 arrays with 4 values each as output
action_q = self.q_net(state)
Adding input_shape=(4,2) in the first Dense layer is causing the output shape to be (None, 4, 4).
Defining q_net the following way solves it:
q_net = Sequential()
q_net.add(Reshape(target_shape=(8,), input_shape=(4,2)))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(4, activation='linear', kernel_initializer='he_uniform'))
rms = tf.optimizers.RMSprop(lr = 1e-4)
q_net.compile(optimizer=rms, loss='mse')
return q_net
Here, q_net.add(Reshape(target_shape=(8,), input_shape=(4,2))) reshapes the (None, 4, 2) input to (None, 8) [Here, None represents the batch shape].
To verify, print q_net.output_shape and it should be (None, 4) [Whereas in the previous case it was (None, 4, 4)].
You also need to do one more thing. Recall that input_shape does not take batch shape into account. What I mean is, input_shape=(4,2) expects inputs of shape (batch_shape, 4, 2). Verify it by printing q_net.input_shape and it should output (None, 4, 2). Now, what you have to do is - add a batch dimension to your input. Simply you can do the following:
state_with_batch_dim = np.expand_dims(state,0)
And pass state_with_batch_dim to q_net as input. For example, you can call the policy method you wrote like policy(np.expand_dims(state,0)) and get an output of dimension (batch_shape, 4) [in this case (1,4)].
And here are the answers to your initial questions:
Your output layer should have 4 nodes (units).
Your first dense layer does not necessarily have to have 4 nodes (units). If you consider the Reshape layer, the notion of nodes or units does not fit there. You can think of the Reshape layer as a placeholder that takes a tensor of shape (None, 4, 2) and outputs a reshaped tensor of shape (None, 8).
Now, you should get outputs of shape (None, 4) - there, the 4 values represent the q-values of 4 corresponding actions. No need to do argmax here to find the q-values.
It could make sense to feed the DQN some information on the direction it's currently facing too. You could set it up as (Current Pos X, Current Pos Y, X From Goal, Y From Goal, Direction).
The output layer should just be (Up, Left, Down, Right) in an order you determine. An Argmax layer is suitable for the problem. Exact code depends on if you using TF / Pytorch.
Related
I'm building a neural network using keras and I'm a little lost on the LSTM layer input shape. Below is an image of the relevant part.
Both towers are similar with the only difference that the left accepts sequences of any length and the right only accepts sequences of length 5. This results in their LSTM layers receiving an ambiguous sequence length and a sequence length of 4 respectively, both with 8 features per timestep. I'd thus expect both LSTM layers should have an input_shape of (1,8).
My confusion now comes from the fact that both LSTM layers will accept any input shape without a problem, which is why I think this might not work the way I think it does. I'd expect the right LSTM layer to require an input shape with the first dimension either 1, 2 or 4 as only these sizes would be able to divide the input sequence of 4. Further, I'd expect both to require the second dimension to always be 8.
Could someone explain why the LSTM layers can accept any input shape and if they process the sequnces correctly with an input_shape=(1,8)? Below is the relevant code.
# Tower 1
inp_sentence1 = Input(shape=(None, 300, 1))
conv11 = Conv2D(32, (2, 300))(inp_sentence1)
reshape11 = K.squeeze(conv11, 2)
maxpl11 = MaxPooling1D(4, data_format='channels_first')(reshape11)
lstm11 = LSTM(units=6, input_shape=(1,8))(maxpl11)
# Tower 2
inp_sentence2 = Input(shape=(5, 300, 1))
conv21 = Conv2D(32, (2, 300))(inp_sentence2)
reshape21 = Reshape((4,32))(conv21)
maxpl21 = MaxPooling1D(4, data_format='channels_first')(reshape21)
lstm21 = LSTM(units=6, input_shape=(1,8))(maxpl21)
EDIT: Short reproduction of problem on dummy data:
# Tower 1
inp_sentence1 = Input(shape=(None, 300, 1))
conv11 = Conv2D(32, (2, 300))(inp_sentence1)
reshape11 = K.squeeze(conv11, 2)
maxpl11 = MaxPooling1D(4, data_format='channels_first')(reshape11)
lstm11 = LSTM(units=6, input_shape=(1,8))(maxpl11)
# Tower 2
inp_sentence2 = Input(shape=(5, 300, 1))
conv21 = Conv2D(32, (2, 300))(inp_sentence2)
reshape21 = Reshape((4,32))(conv21)
maxpl21 = MaxPooling1D(4, data_format='channels_first')(reshape21)
lstm21 = LSTM(units=6, input_shape=(1,8))(maxpl21)
# Combine towers
substract = Subtract()([lstm11, lstm21])
dense = Dense(16, activation='relu')(substract)
final = Dense(1, activation='sigmoid')(dense)
# Build model
model = Model([inp_sentence1, inp_sentence2], final)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Create data
random_length = random.randint(2, 10)
x1 = numpy.random.random((100, random_length, 300))
x2 = numpy.random.random((100, 5, 300))
y = numpy.random.randint(2, size=100)
# Train and predict on data
model.fit([x1, x2], y, epochs=10, batch_size=5)
prediction = model.predict([x1, x2])
prediction = [round(x) for [x] in prediction]
classification = prediction == y
print("accuracy:", sum(classification)/len(prediction))
Using Keras API, I am trying to write the MobilenetV3 as explained in this article: https://arxiv.org/pdf/1905.02244.pdf with the architecture as described in this picture:
For that, I need to implement the bottloneck_blocks from the previous article https://arxiv.org/pdf/1801.04381.pdf. See image for architecture:
I managed to glue together the Initial and final Conv layers:
from tensorflow.keras.layers import Input, Conv2D, Add, AvgPool2D, UpSampling2D
first_input = Input(shape=(256, 256, 3))
firt_conv = Conv2D(16,3, strides=2, name="FirstConv2d", padding="same")(first_input)
bneck1 = add_bottleneck_block(firt_conv, 16, 16)
bneck2 = add_bottleneck_block(bneck1, 64, 24, strides=2)
#... Skiping all the other BottleNeck Blocks for simplicity
lastBneck = add_bottleneck_block(second2LastBneck, 960, 160, bneck_depth=5)
middleConv = Conv2D(160, 1 , strides=1, name="MiddleConv", )(bneck3)
pool7 = AvgPool2D(7, strides=1, padding='same', name="7x7Pool")(middleConv)
SecondLastConv = Conv2D(1280, 1, strides=1, name="SecondLastConv")(pool7)
lastConv = Conv2D(3,1, strides=1, name="lastConv1x1")(SecondLastConv)
upScale = UpSampling2D(2)(lastConv) # This layer is application specific for my training.
v3 = tf.keras.models.Model(inputs=[first_input], outputs=upScale)
v3.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(),)
v3.summary()
Where the bottleneck_block is given in the next snippet of code (modified from https://towardsdatascience.com/mobilenetv2-inverted-residuals-and-linear-bottlenecks-8a4362f4ffd5)
def bottleneck_block(x, expand=64, squeeze=16, strides=1, bneck_depth=3):
"""
Bottleneck block with Activation and batch normalization commented since
I don't believe this is the issue in my problem
"""
m = tf.keras.layers.Conv2D(expand, (1,1), strides=1)(x)
#m = tf.keras.layers.BatchNormalization()(m)
#m = tf.keras.layers.Activation('relu6')(m)
m = tf.keras.layers.DepthwiseConv2D(bneck_depth, padding='same', strides=strides)(m)
#m = tf.keras.layers.BatchNormalization()(m)
#m = Activation('relu6')(m)
m = tf.keras.layers.Conv2D(squeeze, (1,1), strides=1)(m)
#m = tf.keras.layers.BatchNormalization()(m)
return tf.keras.layers.Add()([m, x])
However, in bneck2 I get the following error:
ValueError: Operands could not be broadcast together with shapes (16, 16, 24) (128, 128, 16)
I know the error means the dimension of the inputs and outputs are off, but I don't know how to fix it to structure the network as the MobileNetV3.
What am I missing here?
For reference, here is source code in the tensorflow repo for the same network: https://github.com/tensorflow/models/blob/a174bf5b1db0e2c1e04697ff5aae5182bd1c60e7/research/slim/nets/mobilenet/mobilenet_v3.py#L130
The Solution is to modify the bottleneck_block as described in the V3 author's repo:
import tensorflow as tf
def bottleneck_block(x, expand=64, squeeze=16, strides=1, bneck_depth=3, se=False):
"""
se stands for squeeze_excite
"""
m = tf.keras.layers.Conv2D(expand, (1,1), strides=1)(x)
m = tf.keras.layers.BatchNormalization()(m)
#m = tf.keras.layers.Activation('relu6')(m)
m = tf.keras.layers.DepthwiseConv2D(bneck_depth, padding='same', strides=strides)(m)
m = tf.keras.layers.BatchNormalization()(m)
#m = Activation('relu6')(m)
if se:
m = squeeze_excite_block(m, ratio=4)
m = tf.keras.layers.Conv2D(squeeze, (1,1), strides=1, padding='same')(m)
m = tf.keras.layers.BatchNormalization()(m)
if (
# stride check enforces that we don't add residuals when spatial
# dimensions are None
strides == 1 and
# Depth matches
m.get_shape().as_list()[3] == x.get_shape().as_list()[3]
):
m = tf.keras.layers.Add()([m, x])
return m
The check in dimension and stride prevents the error I initially got when adding two nets that do not match the dimension
In your bottolneck layers, there are Add() ops.
Now, Add expects two tensors with the same shape. But, as you have skipped so many layers when this line is run, tf.keras.layers.Add()([m, x]) - m and x have different dimensions.
So, either design a smaller network with fewer layers or just implement all of the intermediate layers.
I am using python 3 with anaconda, and keras with over tensorflow, My goal is to create a network with a Conv layer of variable input size
I found here to use this code
i = Input((None, None, 1))
o = Conv2D(1, 3, 3)(i)
model = Model(i, o)
model.compile('sgd', 'mse')
I have used it to create my own model with this code (I need a flatten layer)
model = Sequential()
I = Input((None, None, 1))
c = Conv2D(filters=1, kernel_size=(1, 1))(I)
f = Flatten()(c)
o = Dense(10, activation="softmax")(f)
m = Model(I, o)
m.compile(loss=categorical_crossentropy, optimizer=SGD(), metrics=["accuracy"])
And I keep getting this error
ValueError: The shape of the input to "Flatten" is not fully defined
(got (None, None, 1). Make sure to pass a complete "input_shape" or
"batch_input_shape" argument to the first layer in your model.
Seems like the issue is with the input shape for the Flatten layer, When I remove it, it's fine.
How can I make it play well with the variable size?
Thanks
Dense needs fix-sized inputs/outputs because the number of weights variables of it must be fixed.
There are two solutions in your case.
To use GAP(Global Average Pooling) instead of Flatten. GAP's outputs size is the number of channels of the previous layer. so, its size is fixed in your case.
To employ the all convolution net that doesn't have dense layer. In this case, the output of the net is two dimensional, not one. So the size of y should be that size.
below was added for Allen M's request.
Here is a code sample:
# The original number of Conv filters are one.
# But I set it 16 to depict how GAP works.
# And B/H/W means BatchSize/Height/Width.
#1. using GAP
I = Input((None, None, 1)) # output shape=(B, H(None), W(None), 1)
c = Conv2D(filters=16, kernel_size=(1, 1))(I) # output shape=(B, H, W, 16)
f = GlobalAveragePooling2D()(c) # output shape=(B, 16) <- space data(H/W) are aggregated by average
o = Dense(10, activation="softmax")(f) # output shape = (B, 10)
m = Model(I, o)
#2. all conv
I = Input((None, None, 1)) # output shape=(B, H, W, 1)
c = Conv2D(filters=16, kernel_size=(1, 1))(I) # output shape=(B, H, W, 16)
o = Conv2D(filters=10, kernel_size=(1, 1), activation="softmax")(c)
# output shape=(B, H, W, 10)
m = Model(I, o)
# The output size of all conv is H * W * 10, where 10 is the number of classes.
# so the shape of y should be (B, H, W, 1) or (B, H, W) or (B, H, W, 10).
# That is pixel-wise classification or semantic segmentation.
Flatten method doesn't take input size as argument.
model = Sequential()
I = Input((None, None, 1))
c = Conv2D(filters=1, kernel_size=(1, 1))(I)
f = Flatten()
o = Dense(10, activation="softmax")(I)
m = Model(I, o)
m.compile(loss="categorical_crossentropy", optimizer=SGD(), metrics=["accuracy"])
This should solve your problem.
I think the problem is due to your variable input_sizes. It says here that you can't vary input_sizes if you're using a fully connected layer. See: How to train images, when they have different size ?
Was trying to implement a ResNet- CIFAR 10 model on Google Colab, using the code from https://github.com/jzuern/cifar-classifier.
Instead of ReLU activation I'm using my own custom activation function. Here is the code:
def fonlaaf(x):
return x/(1-tf.exp(-x))
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1, activation='fonlaaf',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=tf.keras.regularizers.l2(1e-4))
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = fonlaaf(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = fonlaaf(x)
x = conv(x)
return x
def resnet_v2(input_shape, depth=20, num_classes=10):
"""ResNet Version 2 Model builder [b]
Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
bottleneck layer
First shortcut connection per layer is 1 x 1 Conv2D.
Second and onwards shortcut connection is identity.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filter maps is
doubled. Within each stage, the layers have the same number filters and the
same filter map sizes.
Features maps sizes:
conv1 : 32x32, 16
stage 0: 32x32, 64
stage 1: 16x16, 128
stage 2: 8x8, 256
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'relu'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = tf.keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = fonlaaf(x)
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(lr=hparams.learning_rate),
metrics=['accuracy'])
return model
tf.logging.set_verbosity(tf.logging.DEBUG)
resnet_model = resnet_v2((32, 32, 3), depth=56, num_classes=hparams.n_classes)
# Download and extract CIFAR-10 data
maybe_download_and_extract()
# training data
x_train, y_train = load_training_data()
# Validation data
x_val, y_val = load_validation_data()
# Testing data
x_test, y_test = load_testing_data()
# Define callbacks
callbacks = [
tf.keras.callbacks.TensorBoard(log_dir=hparams.checkpoint_dir)
]
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
zca_epsilon=1e-06, # epsilon for ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
# randomly shift images horizontally (fraction of total width)
width_shift_range=0.1,
# randomly shift images vertically (fraction of total height)
height_shift_range=0.1,
# set mode for filling points outside the input boundaries
fill_mode='nearest',
cval=0., # value used for fill_mode = "constant"
horizontal_flip=True, # randomly flip images
vertical_flip=False)
# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
resnet_model.fit_generator(
datagen.flow(x_train, y_train,batch_size=hparams.train_batch_size),
epochs=hparams.n_epochs,
validation_data=(x_val, y_val),
workers=4,
callbacks=callbacks)
Got the following error: ValueError: Output tensors to a Model must be the output of a TensorFlow Layer (thus holding past layer metadata). Found: Tensor("dense/Softmax:0", shape=(?, 10), dtype=float32)
The previous answers mostly to this error didn't work out. What am I missing here?
As the error states, you have to pass the output of a Layer. As fonlaaf() is an activation function with no state, you can use Lambda layer.
Replace,
def fonlaaf(x):
return x/(1-tf.exp(-x))
with
def fonlaaf(x):
return tf.keras.layers.Lambda(lambda x: x/(1-tf.exp(-x)))(x)
https://www.tensorflow.org/guide/keras/#custom_layers
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
This is summary of my model.
My model is basically similar to a convolution network.
I want my model to work regardless of the width of the input. So the width size appears as None.
and I want to attach a decoder to my model.
However, when I attach the decoder, an error occurs. (If I don't attach the decoder, the program works fine)
This part is my decoder part (below)
if args.decoder==True:
decoder = ConvCapsuleLayer(kernel_size=args.kernel_size, num_capsule=4, num_atoms=1, strides=1, padding='same',
routings=1)(conv_cap)
_, H, W, C, A = decoder.get_shape()
y = layers.Input(shape=(n_class,))
masked_by_y = Mask()([decoder, y])
masked = Mask()(decoder)
def shared_decoder(mask_layer):
recon_1 = layers.Conv2DTranspose(4, (5,5), strides=(2, 2), padding='same', kernel_initializer='he_normal', name='decoder_1', activation='relu')(mask_layer)
recon_2 = layers.Conv2DTranspose(8, (5,5), strides=(2, 2), padding='same', kernel_initializer='he_normal', name='decoder_2', activation='relu')(recon_1)
recon_3 = layers.Conv2DTranspose(1, (1,1), strides=(1, 1), padding='same', kernel_initializer='he_normal', name='decoder_3', activation='linear')(recon_2)
return recon_3
if args.decoder==True:
train_model = models.Model(inputs=[x, y], outputs=[out_seg, shared_decoder(masked_by_y)]) # [x:image,y: mask] // [out_seg:length, reconstruction output]
eval_model = models.Model(x, [out_seg, shared_decoder(masked)])
else:
train_model = models.Model(inputs=x, outputs=out_seg)
eval_model = models.Model(inputs=x, outputs=out_seg)
return train_model, eval_model
mask_1 is my Mask layer.
If a label is given, only the channel of the label is returned. (masked_by_y)
If a label is not given, this layer only returns channel with the largest sum of the element values in conv_capsule_layer_1. (masked)
The shape of conv_capsule_layer_1 is (batch_size = None, height = 50, width = None, num_channel = 4, 1)
That is, the mask layer returns a channel having the largest sum of the element values among the four channels.
Then, use Conv2DTranspose to make it equal to the size of the original input using the returned value (output of mask layer).
However, the following error occurs
InvalidArgumentError (see above for traceback): Only one input size may be -1, not both 0 and 2
[[Node: mask_1/Reshape_1 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mask_1/boolean_mask/Gather, mask_1/Reshape_1/shape)]]
How can I make the length variable not using -1? I already tried that non_zero_masked = K.reshape(non_zero,[-1, masked.shape[1], masked.shaped[2],1])
This is function name is call in my Mask layer
def call(self, inputs, **kwargs):
if type(inputs) is list: # true label is provided with shape = [None, n_classes], i.e. one-hot code.
assert len(inputs) == 2
inputs, mask = inputs
inputs = K.squeeze(inputs, axis=-1) # [batch, input_height, input_width, num_cap, num_atom] -> [batch, input_height, input_width, num_cap]
else: # if no true label, mask by the max length of capsules. Mainly used for prediction
inputs = K.squeeze(inputs, axis=-1) #[batch, input_height, input_width, num_cap]
x = K.softmax(K.sqrt(K.sum(K.square(inputs), axis=(1,2)) + K.epsilon())) # x: [batch, 4]
mask = K.one_hot(indices=K.argmax(x, 1), num_classes=x.get_shape().as_list()[1]) # mask: [batch,4]
expand_mask = K.reshape(mask,[-1,1,1,mask.shape[1]]) #[batch_size, 1, 1, num_class]
masked = inputs*expand_mask
non_zero = tf.boolean_mask(masked, tf.not_equal(masked,0))
non_zero_masked = K.reshape(non_zero,[-1, masked.shape[1], -1,1])
return non_zero_masked
Does anybody know why this error is happening? How can I solve it?