Using Keras API, I am trying to write the MobilenetV3 as explained in this article: https://arxiv.org/pdf/1905.02244.pdf with the architecture as described in this picture:
For that, I need to implement the bottloneck_blocks from the previous article https://arxiv.org/pdf/1801.04381.pdf. See image for architecture:
I managed to glue together the Initial and final Conv layers:
from tensorflow.keras.layers import Input, Conv2D, Add, AvgPool2D, UpSampling2D
first_input = Input(shape=(256, 256, 3))
firt_conv = Conv2D(16,3, strides=2, name="FirstConv2d", padding="same")(first_input)
bneck1 = add_bottleneck_block(firt_conv, 16, 16)
bneck2 = add_bottleneck_block(bneck1, 64, 24, strides=2)
#... Skiping all the other BottleNeck Blocks for simplicity
lastBneck = add_bottleneck_block(second2LastBneck, 960, 160, bneck_depth=5)
middleConv = Conv2D(160, 1 , strides=1, name="MiddleConv", )(bneck3)
pool7 = AvgPool2D(7, strides=1, padding='same', name="7x7Pool")(middleConv)
SecondLastConv = Conv2D(1280, 1, strides=1, name="SecondLastConv")(pool7)
lastConv = Conv2D(3,1, strides=1, name="lastConv1x1")(SecondLastConv)
upScale = UpSampling2D(2)(lastConv) # This layer is application specific for my training.
v3 = tf.keras.models.Model(inputs=[first_input], outputs=upScale)
v3.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(),)
v3.summary()
Where the bottleneck_block is given in the next snippet of code (modified from https://towardsdatascience.com/mobilenetv2-inverted-residuals-and-linear-bottlenecks-8a4362f4ffd5)
def bottleneck_block(x, expand=64, squeeze=16, strides=1, bneck_depth=3):
"""
Bottleneck block with Activation and batch normalization commented since
I don't believe this is the issue in my problem
"""
m = tf.keras.layers.Conv2D(expand, (1,1), strides=1)(x)
#m = tf.keras.layers.BatchNormalization()(m)
#m = tf.keras.layers.Activation('relu6')(m)
m = tf.keras.layers.DepthwiseConv2D(bneck_depth, padding='same', strides=strides)(m)
#m = tf.keras.layers.BatchNormalization()(m)
#m = Activation('relu6')(m)
m = tf.keras.layers.Conv2D(squeeze, (1,1), strides=1)(m)
#m = tf.keras.layers.BatchNormalization()(m)
return tf.keras.layers.Add()([m, x])
However, in bneck2 I get the following error:
ValueError: Operands could not be broadcast together with shapes (16, 16, 24) (128, 128, 16)
I know the error means the dimension of the inputs and outputs are off, but I don't know how to fix it to structure the network as the MobileNetV3.
What am I missing here?
For reference, here is source code in the tensorflow repo for the same network: https://github.com/tensorflow/models/blob/a174bf5b1db0e2c1e04697ff5aae5182bd1c60e7/research/slim/nets/mobilenet/mobilenet_v3.py#L130
The Solution is to modify the bottleneck_block as described in the V3 author's repo:
import tensorflow as tf
def bottleneck_block(x, expand=64, squeeze=16, strides=1, bneck_depth=3, se=False):
"""
se stands for squeeze_excite
"""
m = tf.keras.layers.Conv2D(expand, (1,1), strides=1)(x)
m = tf.keras.layers.BatchNormalization()(m)
#m = tf.keras.layers.Activation('relu6')(m)
m = tf.keras.layers.DepthwiseConv2D(bneck_depth, padding='same', strides=strides)(m)
m = tf.keras.layers.BatchNormalization()(m)
#m = Activation('relu6')(m)
if se:
m = squeeze_excite_block(m, ratio=4)
m = tf.keras.layers.Conv2D(squeeze, (1,1), strides=1, padding='same')(m)
m = tf.keras.layers.BatchNormalization()(m)
if (
# stride check enforces that we don't add residuals when spatial
# dimensions are None
strides == 1 and
# Depth matches
m.get_shape().as_list()[3] == x.get_shape().as_list()[3]
):
m = tf.keras.layers.Add()([m, x])
return m
The check in dimension and stride prevents the error I initially got when adding two nets that do not match the dimension
In your bottolneck layers, there are Add() ops.
Now, Add expects two tensors with the same shape. But, as you have skipped so many layers when this line is run, tf.keras.layers.Add()([m, x]) - m and x have different dimensions.
So, either design a smaller network with fewer layers or just implement all of the intermediate layers.
Related
I am trying to implement a tensorflow model (encoder decoder like) in which I train initially with a small number of layers, and append the model with more layers after training. I thought it would be easiest to create the layers as Models as I intend on setting various layers to trainable = False at points and thought it'd be easiest this way.
The following code is a simple demonstration of an error I'm getting.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate, Input
from tensorflow.keras.layers import MaxPool2D, UpSampling2D, ReLU
from tensorflow.keras.layers import BatchNormalization
def conv_block(x, filters, kernel_size=(3,3), padding="same", strides=1):
c = Conv2D(filters, kernel_size, padding=padding, strides=strides)(x)
c = ReLU()(c)
c=BatchNormalization()(c)
c = Conv2D(filters, kernel_size, padding=padding, strides=strides)(c)
c = ReLU()(c)
c=BatchNormalization()(c)
return c
def down_block(x, filters, kernel_size=(3,3), padding="same", strides=1):
c = conv_block(x, filters, kernel_size = kernel_size,
padding = padding, strides = strides)
p = MaxPool2D((2,2))(c)
return c,p
def up_block(x, skip, filters, kernel_size=(3,3), padding="same", strides=1):
us = UpSampling2D((2,2))(x)
concat = concatenate([us, skip])
c = conv_block(concat, filters, kernel_size = kernel_size,
padding = padding, strides = strides)
return c
def create_base_model():
inner_input = Input((None,None,128))
bn = conv_block(inner_input,128)
inner_model = Model(inputs=inner_input,outputs=bn)
return inner_model
def create_downblock_model():
model_input = Input((None,None,128))
c,p = down_block(model_input, 128)
down_model = Model(inputs = model_input, outputs = [c,p])
return down_model
def create_upblock_model():
input_u = Input((None,None,128))
input_c = Input((None,None,128))
u = up_block(input_u, input_c, 128)
up_model = Model(inputs=[input_u,input_c], outputs = u)
return up_model
bn_model = create_base_model()
# 1ST METHOD - This works
down_model1 = create_downblock_model()
up_model1 = create_upblock_model()
x = bn_model(down_model1.output[-1])
x = up_model1([x,down_model1.output[0]])
inner_model = Model(inputs=down_model1.input, outputs=x)
# 2ND METHOD - This doesn't work
down_model2 = create_downblock_model()
up_model2 = create_upblock_model()
x = down_model2(down_model1.output[-1])
x = bn_model(x[-1])
x = up_model2([x,down_model2.output[0]])
x = up_model1([x,down_model1.output[0]])
inner_model = Model(inputs=down_model1.input, outputs=x)
gets the following error for the second method.
Graph disconnected: cannot obtain value for tensor Tensor("input_5:0", shape=(None, None, None, 128), dtype=float32) at layer "input_5". The following previous layers were accessed without issue: ['input_2', 'conv2d_2', 're_lu_2']
Now down_model2 has the layer input_5:0, so I am assuming the issue is with the line x = down_model2(down_model1.output[-1]). I searched around and topics with a similar error would suggest that maybe the fact that: down_model1.output[-1] isn't an input layer is the issue, however I really don't understand why my method one works completely fine, but when I try to incorporate 2 downblocks, the same way of doing things fails? In my 1st method, I use down_block1.output[-1] as input when defining a new model fine, however it doesn't work in the second method?
I'm using tensorflow2.1.
Apologies if I'm overlooking something simple but I can't understand why this isn't working. Cheers
The problem is cause by x = up_model2([x,down_model2.output[0]]) at third-to-last line probably due to wrong repeated reference, you need change the last block of code to:
down_model2_output = down_model2(down_model1.output[-1])
x = bn_model(down_model2_output[-1])
x = up_model2([x,down_model2_output[0]])
x = up_model1([x,down_model1.output[0]])
inner_model = Model(inputs=down_model1.input, outputs=x)
I'm building a neural network using keras and I'm a little lost on the LSTM layer input shape. Below is an image of the relevant part.
Both towers are similar with the only difference that the left accepts sequences of any length and the right only accepts sequences of length 5. This results in their LSTM layers receiving an ambiguous sequence length and a sequence length of 4 respectively, both with 8 features per timestep. I'd thus expect both LSTM layers should have an input_shape of (1,8).
My confusion now comes from the fact that both LSTM layers will accept any input shape without a problem, which is why I think this might not work the way I think it does. I'd expect the right LSTM layer to require an input shape with the first dimension either 1, 2 or 4 as only these sizes would be able to divide the input sequence of 4. Further, I'd expect both to require the second dimension to always be 8.
Could someone explain why the LSTM layers can accept any input shape and if they process the sequnces correctly with an input_shape=(1,8)? Below is the relevant code.
# Tower 1
inp_sentence1 = Input(shape=(None, 300, 1))
conv11 = Conv2D(32, (2, 300))(inp_sentence1)
reshape11 = K.squeeze(conv11, 2)
maxpl11 = MaxPooling1D(4, data_format='channels_first')(reshape11)
lstm11 = LSTM(units=6, input_shape=(1,8))(maxpl11)
# Tower 2
inp_sentence2 = Input(shape=(5, 300, 1))
conv21 = Conv2D(32, (2, 300))(inp_sentence2)
reshape21 = Reshape((4,32))(conv21)
maxpl21 = MaxPooling1D(4, data_format='channels_first')(reshape21)
lstm21 = LSTM(units=6, input_shape=(1,8))(maxpl21)
EDIT: Short reproduction of problem on dummy data:
# Tower 1
inp_sentence1 = Input(shape=(None, 300, 1))
conv11 = Conv2D(32, (2, 300))(inp_sentence1)
reshape11 = K.squeeze(conv11, 2)
maxpl11 = MaxPooling1D(4, data_format='channels_first')(reshape11)
lstm11 = LSTM(units=6, input_shape=(1,8))(maxpl11)
# Tower 2
inp_sentence2 = Input(shape=(5, 300, 1))
conv21 = Conv2D(32, (2, 300))(inp_sentence2)
reshape21 = Reshape((4,32))(conv21)
maxpl21 = MaxPooling1D(4, data_format='channels_first')(reshape21)
lstm21 = LSTM(units=6, input_shape=(1,8))(maxpl21)
# Combine towers
substract = Subtract()([lstm11, lstm21])
dense = Dense(16, activation='relu')(substract)
final = Dense(1, activation='sigmoid')(dense)
# Build model
model = Model([inp_sentence1, inp_sentence2], final)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Create data
random_length = random.randint(2, 10)
x1 = numpy.random.random((100, random_length, 300))
x2 = numpy.random.random((100, 5, 300))
y = numpy.random.randint(2, size=100)
# Train and predict on data
model.fit([x1, x2], y, epochs=10, batch_size=5)
prediction = model.predict([x1, x2])
prediction = [round(x) for [x] in prediction]
classification = prediction == y
print("accuracy:", sum(classification)/len(prediction))
I am trying to create a model with Normalized cross correlation custom layer, code taken from here
from keras import backend as K
from keras.layers import Conv2D, MaxPooling2D, Dense, Input, Flatten
from keras.models import Model, Sequential
from keras.engine import InputSpec, Layer
from keras import regularizers
from keras.optimizers import SGD, Adam
from keras.utils.conv_utils import conv_output_length
from keras import activations
import numpy as np
class Normalized_Correlation_Layer(Layer):
# create a class inherited from keras.engine.Layer.
def __init__(self, patch_size=(5, 5),
dim_ordering='tf',
border_mode='same',
stride=(1, 1),
activation=None,
**kwargs):
if border_mode != 'same':
raise ValueError('Invalid border mode for Correlation Layer '
'(only "same" is supported as of now):', border_mode)
self.kernel_size = patch_size
self.subsample = stride
self.dim_ordering = dim_ordering
self.border_mode = border_mode
self.activation = activations.get(activation)
super(Normalized_Correlation_Layer, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
return(input_shape[0][0], input_shape[0][1], input_shape[0][2], self.kernel_size[0] * input_shape[0][2]*input_shape[0][-1])
def get_config(self):
config = {'patch_size': self.kernel_size,
'activation': self.activation.__name__,
'border_mode': self.border_mode,
'stride': self.subsample,
'dim_ordering': self.dim_ordering}
base_config = super(Correlation_Layer, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def call(self, x, mask=None):
input_1, input_2 = x
stride_row, stride_col = self.subsample
inp_shape = input_1._keras_shape
output_shape = self.compute_output_shape([inp_shape, inp_shape])
padding_row = (int(self.kernel_size[0] / 2),int(self.kernel_size[0] / 2))
padding_col = (int(self.kernel_size[1] / 2),int(self.kernel_size[1] / 2))
input_1 = K.spatial_2d_padding(input_1, padding =(padding_row,padding_col))
input_2 = K.spatial_2d_padding(input_2, padding = ((padding_row[0]*2, padding_row[1]*2),padding_col))
output_row = output_shape[1]
output_col = output_shape[2]
output = []
for k in range(inp_shape[-1]):
xc_1 = []
xc_2 = []
# print("here")
for i in range(padding_row[0]):
for j in range(output_col):
xc_2.append(K.reshape(input_2[:, i:i+self.kernel_size[0], j:j+self.kernel_size[1], k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
for i in range(output_row):
slice_row = slice(i, i + self.kernel_size[0])
slice_row2 = slice(i + padding_row[0], i +self.kernel_size[0] + padding_row[0])
# print("dfg")
for j in range(output_col):
slice_col = slice(j, j + self.kernel_size[1])
xc_2.append(K.reshape(input_2[:, slice_row2, slice_col, k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
xc_1.append(K.reshape(input_1[:, slice_row, slice_col, k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
for i in range(output_row, output_row+padding_row[1]):
for j in range(output_col):
xc_2.append(K.reshape(input_2[:, i:i+ self.kernel_size[0], j:j+self.kernel_size[1], k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
xc_1_aggregate = K.concatenate(xc_1, axis=1)
xc_1_mean = K.mean(xc_1_aggregate, axis=-1, keepdims=True)
xc_1_std = K.std(xc_1_aggregate, axis=-1, keepdims=True)
xc_1_aggregate = (xc_1_aggregate - xc_1_mean) / xc_1_std
xc_2_aggregate = K.concatenate(xc_2, axis=1)
xc_2_mean = K.mean(xc_2_aggregate, axis=-1, keepdims=True)
xc_2_std = K.std(xc_2_aggregate, axis=-1, keepdims=True)
xc_2_aggregate = (xc_2_aggregate - xc_2_mean) / xc_2_std
xc_1_aggregate = K.permute_dimensions(xc_1_aggregate, (0, 2, 1))
block = []
len_xc_1= len(xc_1)
print("asdf")
for i in range(len_xc_1):
#This for loop is to compute the product of a given patch of feature map 1 and the feature maps on which it is supposed to
sl1 = slice(int(i/inp_shape[2])*inp_shape[2],
int(i/inp_shape[2])*inp_shape[2]+inp_shape[2]*self.kernel_size[0])
#This calculates which are the patches of feature map 2 to be considered for a given patch of first feature map.
block.append(K.reshape(K.batch_dot(xc_2_aggregate[:,sl1,:],
xc_1_aggregate[:,:,i]),(-1,1,1,inp_shape[2] *self.kernel_size[0])))
block = K.concatenate(block, axis=1)
# print("zxcv")
block= K.reshape(block,(-1,output_row,output_col,inp_shape[2] *self.kernel_size[0]))
output.append(block)
output = self.activation(output)
print(output)
return output
My model is a combination of cross correlation and Conv2D layers,
dt = 'float32'
def create_model():
ip = keras.layers.Input((50,50, 1))
ncx1_1 = Normalized_Correlation_Layer(patch_size=(1, 1))([ip,ip])
ncn1_1 = keras.layers.Conv2D(64, (1,1), activation = 'relu', dtype=dt)(ip)
ncn2_1 = keras.layers.Conv2D(64, (1,1), activation = 'relu', dtype=dt)(ncx1_1)
ncx2_1 = Normalized_Correlation_Layer(patch_size=(1, 1),dtype=dt)([ncn1_1,ncn2_1])
# ncx2_1 = keras.layers.Reshape((50, 50, 3200))(ncx2_1)
# Problem occurs here
ncn3 = keras.layers.Conv2D(filters=64,kernel_size=(1,1), activation = 'relu', dtype=dt)(ncx2_1)
ncn4 = keras.layers.Conv2D(12, (1,1), activation = 'sigmoid', dtype=dt)(ncn3)
model = keras.models.Model(ip,ncn4)
return model
The model till the last cross correlation layer is successfully created, but I get problem for ncn3 layer
ValueError: number of input channels does not match corresponding dimension of filter, 50 != 3200
The output shape printed from the ncx2_1 layer, while creating it is printed as (?, 50, 50, 50),
when I print ncx2_1.shape and also the outputs returned from call function of layer class ([<tf.Tensor 'normalized__correlation__layer_4/Reshape_10000:0' shape=(?, 50, 50, 50) dtype=float32>]).
But the model summary shows it as (?,50,50,3200) when I create the model till that layer only, ie. model = keras.models.Model(ip,ncx2_1)
When I reshape the layer using ncx2_1 = keras.layers.Reshape((50, 50, 3200))(ncx2_1) , I can create the model successfully, but when I try to fit the data on it, I get :
InvalidArgumentError: Input to reshape is a tensor with 6250000 values, but the requested shape has 400000000
[[node reshape_1/Reshape (defined at /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1781) ]]
[[node loss/mul (defined at /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:865) ]]
Here, my batch size is 50, so for a layer with (B,H,W,C) inputs of (50,50,50,50), the size should be 6250000, butt for (50,50,50,3200), it should be 400000000, which means that the output of cross correlation layer is 50 channels.
I am either interpreting this wrong or I have made a mistake somewhere which I would like to know about.
I am using keras 2.1.2 with tensorflow 1.13.1 (That was the version in which the custom layer was written and I was getting other problems with latest version)
I am also using a custom generator if that is needed info and calling fit using md.fit_generator(train_gen,verbose=1). I can also add any other detail necessary.
In Keras implementation of Wavenet, the input shape is (None, 1). I have a time series (val(t)) in which the target is to predict the next data point given a window of past values (the window size depends on maximum dilation). The input-shape in wavenet is confusing. I have few questions about it:
How Keras figure out the input dimension (None) when a full sequence is given? According to dilations, we want the input to have a length of 2^8.
If a input series of shape (1M, 1) is given as training X, do we need to generate vectors of 2^8 time-steps as input? It seems, we can just use the input series as input of wave-net (Not sure why raw time series input does not give error).
In general, how we can debug such Keras networks. I tried to apply the function on numerical data like Conv1D(16, 1, padding='same', activation='relu')(inputs), however, it gives error.
#
n_filters = 32
filter_width = 2
dilation_rates = [2**i for i in range(7)] * 2
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Activation, Dropout, Lambda, Multiply, Add, Concatenate
from keras.optimizers import Adam
history_seq = Input(shape=(None, 1))
x = history_seq
skips = []
for dilation_rate in dilation_rates:
# preprocessing - equivalent to time-distributed dense
x = Conv1D(16, 1, padding='same', activation='relu')(x)
# filter
x_f = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
dilation_rate=dilation_rate)(x)
# gate
x_g = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
dilation_rate=dilation_rate)(x)
# combine filter and gating branches
z = Multiply()([Activation('tanh')(x_f),
Activation('sigmoid')(x_g)])
# postprocessing - equivalent to time-distributed dense
z = Conv1D(16, 1, padding='same', activation='relu')(z)
# residual connection
x = Add()([x, z])
# collect skip connections
skips.append(z)
# add all skip connection outputs
out = Activation('relu')(Add()(skips))
# final time-distributed dense layers
out = Conv1D(128, 1, padding='same')(out)
out = Activation('relu')(out)
out = Dropout(.2)(out)
out = Conv1D(1, 1, padding='same')(out)
# extract training target at end
def slice(x, seq_length):
return x[:,-seq_length:,:]
pred_seq_train = Lambda(slice, arguments={'seq_length':1})(out)
model = Model(history_seq, pred_seq_train)
model.compile(Adam(), loss='mean_absolute_error')
you are using extreme values for dilatation rate, they don't make sense. try to reduce them using, for example, a sequence made of [1, 2, 4, 8, 16, 32]. the dilatation rates aren't a constraint on the dimension of the input passed
your network work simply passing this input
n_filters = 32
filter_width = 2
dilation_rates = [1, 2, 4, 8, 16, 32]
....
model = Model(history_seq, pred_seq_train)
model.compile(Adam(), loss='mean_absolute_error')
n_sample = 5
time_step = 100
X = np.random.uniform(0,1, (n_sample,time_step,1))
model.predict(X)
specify a None dimension in Keras means to leave the model free to receive every dimension. this not means you can pass samples of various dimension, they always must have the same format... you can build the model every time with a different dimension size
for time_step in np.random.randint(100,200, 4):
print('temporal dim:', time_step)
n_sample = 5
model = Model(history_seq, pred_seq_train)
model.compile(Adam(), loss='mean_absolute_error')
X = np.random.uniform(0,1, (n_sample,time_step,1))
print(model.predict(X).shape)
I suggest also you a premade library in Keras which provide WAVENET implementation: https://github.com/philipperemy/keras-tcn you can use it as a baseline and investigate also the code to create a WAVENET
I am building image classifier with localisation using CNN.
My CNN has image as input, however after last CONV layer i want to split it into two , one part for image classification, and next part for image localisation.
Needless to say one part should use mean squared error, another one should use binary binary_crossentropy. My structure is something like:
input_image = Input(shape=(IMG_W, IMG_H, 3))
# Layer 1
x = Conv2D(32, (3,3), strides=(1,1), padding='same', name='conv_1', use_bias=False)(input_image)
x = BatchNormalization(name='norm_1')(x)
x = LeakyReLU(alpha=0.1)(x)
# Layer 2
x = Conv2D(64, (3,3), strides=(1,1), padding='same', name='conv_2', use_bias=False)(x)
x = BatchNormalization(name='norm_2')(x)
x = LeakyReLU(alpha=0.1)(x)
now i want to divied it into two Dense (FC) layer
class_layer = x
class_layer = Dense(256,activation="relu")(class_layer)
class_layer = Dense(2,activation="softmax")(class_layer)
model_one = Model(input_image,class_layer)
model_one.compile(loss="binary_crossentrophy", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
and layer for image localisation
x = Dense(1024,activation="relu")(x)
x = Dense(256,activation="relu")(x)
x = Dense(4,activation="relu")(x)
model = Model(input_image,x)
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
However how can i concat the layes so the result vector will be ( 2 + 4 ) ?
Can i even achieve splitting like this?
I know about model.concatenate However this should be called before compiling, so each part wouldnt have different loss function
Thanks for help and answers
You can initialize your model with multiple outputs, and specify losses for each of them. If you want your loss from model_one to have weight a, and the loss from model to have weight b, so your total loss would look like a*mse + b*binary_ce, then you would have something like
model = Model(input_image, [x, class_layer])
model.compile(loss=['mean_squared_error', 'binary_crossentropy'],
loss_weights=[a, b],
optimizer=keras.optimizers.Adam())
See the loss and loss_weights parameters in the documentation for Model.compile for more details https://keras.io/models/model/.