Keras with tensorflow Conv net with variable input size - python

I am using python 3 with anaconda, and keras with over tensorflow, My goal is to create a network with a Conv layer of variable input size
I found here to use this code
i = Input((None, None, 1))
o = Conv2D(1, 3, 3)(i)
model = Model(i, o)
model.compile('sgd', 'mse')
I have used it to create my own model with this code (I need a flatten layer)
model = Sequential()
I = Input((None, None, 1))
c = Conv2D(filters=1, kernel_size=(1, 1))(I)
f = Flatten()(c)
o = Dense(10, activation="softmax")(f)
m = Model(I, o)
m.compile(loss=categorical_crossentropy, optimizer=SGD(), metrics=["accuracy"])
And I keep getting this error
ValueError: The shape of the input to "Flatten" is not fully defined
(got (None, None, 1). Make sure to pass a complete "input_shape" or
"batch_input_shape" argument to the first layer in your model.
Seems like the issue is with the input shape for the Flatten layer, When I remove it, it's fine.
How can I make it play well with the variable size?
Thanks

Dense needs fix-sized inputs/outputs because the number of weights variables of it must be fixed.
There are two solutions in your case.
To use GAP(Global Average Pooling) instead of Flatten. GAP's outputs size is the number of channels of the previous layer. so, its size is fixed in your case.
To employ the all convolution net that doesn't have dense layer. In this case, the output of the net is two dimensional, not one. So the size of y should be that size.
below was added for Allen M's request.
Here is a code sample:
# The original number of Conv filters are one.
# But I set it 16 to depict how GAP works.
# And B/H/W means BatchSize/Height/Width.
#1. using GAP
I = Input((None, None, 1)) # output shape=(B, H(None), W(None), 1)
c = Conv2D(filters=16, kernel_size=(1, 1))(I) # output shape=(B, H, W, 16)
f = GlobalAveragePooling2D()(c) # output shape=(B, 16) <- space data(H/W) are aggregated by average
o = Dense(10, activation="softmax")(f) # output shape = (B, 10)
m = Model(I, o)
#2. all conv
I = Input((None, None, 1)) # output shape=(B, H, W, 1)
c = Conv2D(filters=16, kernel_size=(1, 1))(I) # output shape=(B, H, W, 16)
o = Conv2D(filters=10, kernel_size=(1, 1), activation="softmax")(c)
# output shape=(B, H, W, 10)
m = Model(I, o)
# The output size of all conv is H * W * 10, where 10 is the number of classes.
# so the shape of y should be (B, H, W, 1) or (B, H, W) or (B, H, W, 10).
# That is pixel-wise classification or semantic segmentation.

Flatten method doesn't take input size as argument.
model = Sequential()
I = Input((None, None, 1))
c = Conv2D(filters=1, kernel_size=(1, 1))(I)
f = Flatten()
o = Dense(10, activation="softmax")(I)
m = Model(I, o)
m.compile(loss="categorical_crossentropy", optimizer=SGD(), metrics=["accuracy"])
This should solve your problem.

I think the problem is due to your variable input_sizes. It says here that you can't vary input_sizes if you're using a fully connected layer. See: How to train images, when they have different size ?

Related

TensorFlow: Convert GRUCell weights from compat.v1 to tensorflow 2

I am trying to convert a saved model in tensorflow 1 to tensorflow 2. I am migrating the code to tensorflow 2, as higlighted in tensorflow docs. However, I would like to simply update my model_weights.ckpt to tensorflow 2. Some weights (Linear, Embdedding) have a similar shape to tensorflow 2 syntax, but I am struggling to transform the weights from my GRUCell.
How to convert the GRUCell weights from compat.v1.nn.rnn_cell.GRUCell to keras.layers.GRUCell ?
The GRUCell has four weights:
gru_cell/gates/kernel:0 of shape (S + H, 2 x H),
gru_cell/gates/bias:0 of shape (2 x H, ),
gru_cell/candidate/kernel:0 of shape (S + H, H),
gru_cell/candidate/bias:0 of shape (H, )
I would like to have weights with a similar shape to tensoflow 2 API (or PyTorch API), i.e. a GRUCell with the following weights:
gru_cell/kernel:0 of shape (S, 3 x H)
gru_cell/recurrent_kernel:0 of shape (H, 3 x H)
gru_cell/bias:0 of shape (2, 3 x H)
To illustrate, you can reproduce these results:
1. GRUCell with tensorflow 1 API
import tensorflow as tf
SEQ_LENGTH = 4
HIDDEN_SIZE = 512
BATCH_SIZE = 1
inputs = tf.random.normal([BATCH_SIZE, SEQ_LENGTH])
# GRU cell
gru = tf.compat.v1.nn.rnn_cell.GRUCell(HIDDEN_SIZE)
# Hidden state
state = gru.zero_state(BATCH_SIZE, tf.float32)
# Forward
output, state = gru(inputs, state)
for weight in gru.weights:
print(weight.name, weight.shape)
Output:
gru_cell/gates/kernel:0 (516, 1024)
gru_cell/gates/bias:0 (1024,)
gru_cell/candidate/kernel:0 (516, 512)
gru_cell/candidate/bias:0 (512,)
2. GRUCell with tensorflow 2 API
import tensorflow as tf
SEQ_LENGTH = 4
HIDDEN_SIZE = 512
BATCH_SIZE = 1
inputs = tf.random.normal([BATCH_SIZE , SEQ_LENGTH])
# GRU cell
gru = tf.keras.layers.GRUCell(HIDDEN_SIZE)
# Hidden state
state = tf.zeros((BATCH_SIZE, HIDDEN_SIZE), dtype=tf.float32)
# Forward
output, state = gru(inputs, state)
# Display the weigths
for weight in gru.weights:
print(weight.name, weight.shape)
Output:
gru_cell/kernel:0 (4, 1536)
gru_cell/recurrent_kernel:0 (512, 1536)
gru_cell/bias:0 (2, 1536)
Note
I tried _convert_rnn_weights tensorflow function to convert the desired weights. It works but only for CuDNN weights, so I can't use it in my case.
For the benefit of community providing solution here though it is presented in Github.
In short, the weights between compat.v1.nn.rnn_cell.GRUCell and keras.layers.GRUCell are not compatible between each other. We don't have a function to convert between them, and if you really want to do it, you will need to do it manually.
Math wise, if you have the numpy value of the v1 weights, the formula are:
B = batch_size
H = state_size
all_kernel = np.concat([gru_cell/gates/kernel, gru_cell/candidate/kernel], axis=1) # shape (B+H, 3 * H)
kernel = all_kernel[:B] # shape(B, 3 * H)
recurrent_kernel = all_kernel[B:] # shape (H, 3 * H)
bias = np.concat([gru_cell/gates/bias, gru_cell/candidate/bias], axis=0) # shape (B, 3 * H)
zero_bias = np.zeros([B, 3 * H])
bias = np.concat([bias, zero_bias], axis=0)

LSTM layers accepting any input shape when expecting them not to

I'm building a neural network using keras and I'm a little lost on the LSTM layer input shape. Below is an image of the relevant part.
Both towers are similar with the only difference that the left accepts sequences of any length and the right only accepts sequences of length 5. This results in their LSTM layers receiving an ambiguous sequence length and a sequence length of 4 respectively, both with 8 features per timestep. I'd thus expect both LSTM layers should have an input_shape of (1,8).
My confusion now comes from the fact that both LSTM layers will accept any input shape without a problem, which is why I think this might not work the way I think it does. I'd expect the right LSTM layer to require an input shape with the first dimension either 1, 2 or 4 as only these sizes would be able to divide the input sequence of 4. Further, I'd expect both to require the second dimension to always be 8.
Could someone explain why the LSTM layers can accept any input shape and if they process the sequnces correctly with an input_shape=(1,8)? Below is the relevant code.
# Tower 1
inp_sentence1 = Input(shape=(None, 300, 1))
conv11 = Conv2D(32, (2, 300))(inp_sentence1)
reshape11 = K.squeeze(conv11, 2)
maxpl11 = MaxPooling1D(4, data_format='channels_first')(reshape11)
lstm11 = LSTM(units=6, input_shape=(1,8))(maxpl11)
# Tower 2
inp_sentence2 = Input(shape=(5, 300, 1))
conv21 = Conv2D(32, (2, 300))(inp_sentence2)
reshape21 = Reshape((4,32))(conv21)
maxpl21 = MaxPooling1D(4, data_format='channels_first')(reshape21)
lstm21 = LSTM(units=6, input_shape=(1,8))(maxpl21)
EDIT: Short reproduction of problem on dummy data:
# Tower 1
inp_sentence1 = Input(shape=(None, 300, 1))
conv11 = Conv2D(32, (2, 300))(inp_sentence1)
reshape11 = K.squeeze(conv11, 2)
maxpl11 = MaxPooling1D(4, data_format='channels_first')(reshape11)
lstm11 = LSTM(units=6, input_shape=(1,8))(maxpl11)
# Tower 2
inp_sentence2 = Input(shape=(5, 300, 1))
conv21 = Conv2D(32, (2, 300))(inp_sentence2)
reshape21 = Reshape((4,32))(conv21)
maxpl21 = MaxPooling1D(4, data_format='channels_first')(reshape21)
lstm21 = LSTM(units=6, input_shape=(1,8))(maxpl21)
# Combine towers
substract = Subtract()([lstm11, lstm21])
dense = Dense(16, activation='relu')(substract)
final = Dense(1, activation='sigmoid')(dense)
# Build model
model = Model([inp_sentence1, inp_sentence2], final)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Create data
random_length = random.randint(2, 10)
x1 = numpy.random.random((100, random_length, 300))
x2 = numpy.random.random((100, 5, 300))
y = numpy.random.randint(2, size=100)
# Train and predict on data
model.fit([x1, x2], y, epochs=10, batch_size=5)
prediction = model.predict([x1, x2])
prediction = [round(x) for [x] in prediction]
classification = prediction == y
print("accuracy:", sum(classification)/len(prediction))

"Only one input size may be -1, not both 0 and 2" error in Keras

This is summary of my model.
My model is basically similar to a convolution network.
I want my model to work regardless of the width of the input. So the width size appears as None.
and I want to attach a decoder to my model.
However, when I attach the decoder, an error occurs. (If I don't attach the decoder, the program works fine)
This part is my decoder part (below)
if args.decoder==True:
decoder = ConvCapsuleLayer(kernel_size=args.kernel_size, num_capsule=4, num_atoms=1, strides=1, padding='same',
routings=1)(conv_cap)
_, H, W, C, A = decoder.get_shape()
y = layers.Input(shape=(n_class,))
masked_by_y = Mask()([decoder, y])
masked = Mask()(decoder)
def shared_decoder(mask_layer):
recon_1 = layers.Conv2DTranspose(4, (5,5), strides=(2, 2), padding='same', kernel_initializer='he_normal', name='decoder_1', activation='relu')(mask_layer)
recon_2 = layers.Conv2DTranspose(8, (5,5), strides=(2, 2), padding='same', kernel_initializer='he_normal', name='decoder_2', activation='relu')(recon_1)
recon_3 = layers.Conv2DTranspose(1, (1,1), strides=(1, 1), padding='same', kernel_initializer='he_normal', name='decoder_3', activation='linear')(recon_2)
return recon_3
if args.decoder==True:
train_model = models.Model(inputs=[x, y], outputs=[out_seg, shared_decoder(masked_by_y)]) # [x:image,y: mask] // [out_seg:length, reconstruction output]
eval_model = models.Model(x, [out_seg, shared_decoder(masked)])
else:
train_model = models.Model(inputs=x, outputs=out_seg)
eval_model = models.Model(inputs=x, outputs=out_seg)
return train_model, eval_model
mask_1 is my Mask layer.
If a label is given, only the channel of the label is returned. (masked_by_y)
If a label is not given, this layer only returns channel with the largest sum of the element values ​​in conv_capsule_layer_1. (masked)
The shape of conv_capsule_layer_1 is (batch_size = None, height = 50, width = None, num_channel = 4, 1)
That is, the mask layer returns a channel having the largest sum of the element values ​​among the four channels.
Then, use Conv2DTranspose to make it equal to the size of the original input using the returned value (output of mask layer).
However, the following error occurs
InvalidArgumentError (see above for traceback): Only one input size may be -1, not both 0 and 2
[[Node: mask_1/Reshape_1 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mask_1/boolean_mask/Gather, mask_1/Reshape_1/shape)]]
How can I make the length variable not using -1? I already tried that non_zero_masked = K.reshape(non_zero,[-1, masked.shape[1], masked.shaped[2],1])
This is function name is call in my Mask layer
def call(self, inputs, **kwargs):
if type(inputs) is list: # true label is provided with shape = [None, n_classes], i.e. one-hot code.
assert len(inputs) == 2
inputs, mask = inputs
inputs = K.squeeze(inputs, axis=-1) # [batch, input_height, input_width, num_cap, num_atom] -> [batch, input_height, input_width, num_cap]
else: # if no true label, mask by the max length of capsules. Mainly used for prediction
inputs = K.squeeze(inputs, axis=-1) #[batch, input_height, input_width, num_cap]
x = K.softmax(K.sqrt(K.sum(K.square(inputs), axis=(1,2)) + K.epsilon())) # x: [batch, 4]
mask = K.one_hot(indices=K.argmax(x, 1), num_classes=x.get_shape().as_list()[1]) # mask: [batch,4]
expand_mask = K.reshape(mask,[-1,1,1,mask.shape[1]]) #[batch_size, 1, 1, num_class]
masked = inputs*expand_mask
non_zero = tf.boolean_mask(masked, tf.not_equal(masked,0))
non_zero_masked = K.reshape(non_zero,[-1, masked.shape[1], -1,1])
return non_zero_masked
Does anybody know why this error is happening? How can I solve it?

How to implement a 1D convolutional neural network with residual connections and batch-normalization in Keras?

I am trying to develop a 1D convolutional neural network with residual connections and batch-normalization based on the paper Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks, using keras.
This is the code so far:
# define model
x = Input(shape=(time_steps, n_features))
# First Conv / BN / ReLU layer
y = Conv1D(filters=n_filters, kernel_size=n_kernel, strides=n_strides, padding='same')(x)
y = BatchNormalization()(y)
y = ReLU()(y)
shortcut = MaxPooling1D(pool_size = n_pool)(y)
# First Residual block
y = Conv1D(filters=n_filters, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
y = BatchNormalization()(y)
y = ReLU()(y)
y = Dropout(rate=drop_rate)(y)
y = Conv1D(filters=n_filters, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
# Add Residual (shortcut)
y = add([shortcut, y])
# Repeated Residual blocks
for k in range (2,3): # smaller network for testing
shortcut = MaxPooling1D(pool_size = n_pool)(y)
y = BatchNormalization()(y)
y = ReLU()(y)
y = Dropout(rate=drop_rate)(y)
y = Conv1D(filters=n_filters * k, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
y = BatchNormalization()(y)
y = ReLU()(y)
y = Dropout(rate=drop_rate)(y)
y = Conv1D(filters=n_filters * k, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
y = add([shortcut, y])
z = BatchNormalization()(y)
z = ReLU()(z)
z = Flatten()(z)
z = Dense(64, activation='relu')(z)
predictions = Dense(classes, activation='softmax')(z)
model = Model(inputs=x, outputs=predictions)
# Compiling
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
# Fitting
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch)
And this is the graph of a simplified model of what I am trying to build.
The model described in the paper uses an incrementing number of filters:
The network consists of 16 residual blocks with 2 convolutional layers per block. The convolutional layers all have a filter length of 16 and have 64k filters, where k starts out as 1 and is incremented every 4-th residual block. Every alternate residual block subsamples its inputs by a factor of 2, thus the original input is ultimately subsampled by a factor of 2^8. When a residual block subsamples the input, the corresponding shortcut connections also subsample their input using a Max Pooling operation with the same subsample factor.
But I can only make it work if I use the same number of filters in every Conv1D layer, with k=1, strides=1 and padding=same, without applying any MaxPooling1D. Any changes in these parameters causes a tensor size mismatch and failure to compile with the following error:
ValueError: Operands could not be broadcast together with shapes (70, 64) (70, 128)
Does anyone have any idea on how to fix this size mismatch and make it work?
In addition, if the input has more than one channel (or features) the mismatch is even worst! Is there a way to deal with more than one channel?
The issue of tensor shape mismatch should be happening in add([y, shortcut]) layer. Because of the fact that you are using MaxPooling1D layer, this halves your time-steps by default, which you can change it by using the pool_size parameter. On the other hand, your residual portion is not reducing the time-steps by same amount. You should apply stride=2 with padding='same' before adding shortcut and y in any one of Conv1D layer (preferably the last one).
For reference, you can check out the Resnet code here Keras-applications-github

How to get number of neurons in TensorFlow layer?

Suppose I am trying to connect the output of a pooling layer to a dense layer. In order to do this, I need to flatten the pooled tensor. Consider the layers below:
def conv_layer(input, in_channels, out_channels, name="conv"):
w = tf.get_variable("W", initializer=tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.1))
b = tf.get_variable("B", initializer=tf.constant(0.1, shape=[out_channels]))
conv = tf.nn.conv2d(input, w, strides=[1,1,1,1], padding="SAME")
act = tf.nn.relu(conv + b)
return act
def pool_layer(input, name="pool"):
pool = tf.nn.max_pool(input, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
return pool
def dense_layer(input, size_in, size_out, name="dense"):
w = tf.get_variable("W", initializer=tf.truncated_normal([size_in, size_out], stddev=0.1))
b = tf.get_variable("B", initializer=tf.constant(0.1, shape=[size_out]))
act = tf.nn.relu(tf.matmul(input, w) + b)
return act
I am using them to create a network:
def cnn_model(x):
x_image = tf.reshape(x, [-1, nseries, present_window, 1])
conv1 = conv_layer(x_image, 1, 32, "conv1")
pool1 = pool_layer(conv1, "pool1")
conv2 = conv_layer(pool1, 32, 64, "conv2")
pool2 = pool_layer(conv2, "pool2")
nflat = 17*15*64 # hard-coded
flat = tf.reshape(pool2, [-1, nflat])
yhat = dense_layer(flat, nflat, future_window, "dense1")
return yhat
As you can see I am hard-coding the variable nflat. How to avoid this?
If it's a tensor pool.get_shape() should work on Keras or Tensorflow.
This will actually return a tuple with the size of each dimension, so you need to choose from it, probably it's the 2nd in your case.
If input is actually your input (without any other layer), why are you max-pooling? aren't you looking for dropout ?
Indeed you will find a problem if your batch size is variable, since there's no way of telling the model the size of the reshape

Categories

Resources