I have an input shape of 64x60x4 for reinforcement learning an agent to play Mario.
The problem is, it seems very "if screen looks like this then do that", which isn't very good for this problem.
I want to add an LSTM layer after 3 conv2D layers in Keras (TensorFlow) but it complains that it expects 5 dimensions, but received 4. When I play with the layers, it then becomes 6 and 5.
So how do I get an LSTM layer into the following model with input_shape 64x60x4 (the 4 being the last 4 frames for helping learn acceleration and direction of objects):
image_input = Input(shape=input_shape)
out = Conv2D(filters=32, kernel_size=8, strides=(4, 4), padding=padding, activation='relu')(image_input)
out = Conv2D(filters=64, kernel_size=4, strides=(2, 2), padding=padding, activation='relu')(out)
out = Conv2D(filters=64, kernel_size=4, strides=(1, 1), padding=padding, activation='relu')(out)
out = MaxPooling2D(pool_size=(2, 2))(out)
out = Flatten()(out)
out = Dense(256, activation='relu')(out)
### LSTM should go here ###
q_value = Dense(num_actions, activation='linear')(out)
Any other suggestions/pointers for this would be welcome.
I would suggest something like this, after your MaxPooling Layer)
out = Reshape((64, -1))(out)
out = LSTM(...)(out)
out = Flatten...
Also I don't recommend starting with 32 filters then going up, I suggest starting with 64 then going down, but hey, you do you.
Also I would suggest separate CNN layers for different aspects, like score, time...etc. Other than that, all is set.
I'm new to Deep Learning and I can't find anywhere how to do the bottleneck in my AE with convolutional and dense layers. The code below is the specific part where I'm struggling:
encoded = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
# encoded = Dense(2)(encoded) # Linear activation function at the bottleneck
decoded = Conv2D(8, (3, 3), activation='relu', padding='same')(decoded)
I tried some solutions, like flatten and reshape, but nothing seems to work here. The point is that I need the latent space to be a dense layer of 2 because I need to sample points [x,y] from it. I did it with MLP following this link (https://www.kaggle.com/code/apapiu/manifold-learning-and-autoencoders/notebook) and it worked, but I can't manage to do the same with my structure.
Thanks in advice, and best regards!
Convolution2D takes the input of a 4+ Dimension tensor, hence you need to reshape the input before passing it to Convolution2D layer. You can use a model like below.
input_img = Input(shape=(784,))
input_img1 = Reshape(target_shape=(28,28,1))(input_img)
encoded = Convolution2D(8, (3, 3), activation='relu', padding='same')(input_img1)
encoded = Dense(2)(encoded)
decoded1 = Convolution2D(8, (3, 3), activation='relu', padding='same')(encoded)
decoded2 = Flatten()(decoded1)
decoded = Dense(784,)(decoded2)
Please refer to this gist for complete code with random data.
I am trying to configure a network for character recognition of sequential data like license plates.
Now I would like to use the architecture which is noted in Table 3 in Deep Automatic Licence Plate Recognition system (link: http://www.ee.iisc.ac.in/people/faculty/soma.biswas/Papers/jain_icgvip2016_alpr.pdf).
The architecture the authors presented is this one:
The first layers are very common, but where I was stumbling was the top (the part in the red frame) of the architecture. They mention 11 parallel layers and I am really unsure how to get this in Python. I coded this architecture but it does not seem to be right to me.
model = Sequential()
model.add(Conv2D(64, kernel_size=(5, 5), input_shape = (32, 96, 3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, kernel_size=(3, 3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, kernel_size=(3, 3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(1024, activation = "relu"))
model.add(Dense(11*37, activation="Softmax"))
model.add(keras.layers.Reshape((11, 37)))
Could someone help? How do I have to code the top to get an equal architecture like the authors?
The code below can build the architecture described in the image.
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, Flatten, MaxPooling2D, Dense, Input, Reshape, Concatenate, Dropout
def create_model(input_shape = (32, 96, 1)):
input_img = Input(shape=input_shape)
Add the ST Layer here.
model = Conv2D(64, kernel_size=(5, 5), input_shape = input_shape, activation = "relu")(input_img)
model = MaxPooling2D(pool_size=(2, 2))(model)
model = Dropout(0.25)(model)
model = Conv2D(128, kernel_size=(3, 3), input_shape = input_shape, activation = "relu")(model)
model = MaxPooling2D(pool_size=(2, 2))(model)
model = Dropout(0.25)(model)
model = Conv2D(256, kernel_size=(3, 3), input_shape = input_shape, activation = "relu")(model)
model = MaxPooling2D(pool_size=(2, 2))(model)
model = Dropout(0.25)(model)
model = Flatten()(model)
backbone = Dense(1024, activation="relu")(model)
branches = []
for i in range(11):
branches[i] = Dense(37, activation = "softmax", name="branch_"+str(i))(branches[i])
output = Concatenate(axis=1)(branches)
output = Reshape((11, 37))(output)
model = Model(input_img, output)
return model
From my understanding, your implementation is almost correct. The authors train 11 individual classifiers taking as input the output from the Fully Connected Layer. Here, you can think of "parallel" as "independent".
However, you cannot apply the Softmax activation right after the Fully Connected Layer. Since all the classifiers are independent, we want each of them to output a probability for each possible character. Putting things differently, we want the sum of the outputs of each classifier to be 1. Hence, the correct implementation would be:
model.add(Dense(1024, activation = "relu"))
# Feeding every neuron with the previous layer's output
model.add(keras.layers.Reshape((11, 37)))
model.add(keras.activations.softmax(x, axis=1))
I would like to train an autoencoder by using only specific PARTS of a layer (the layer named FEATURES in the autoencoder example at the bottom of this question).
In my case, NOK pictures for a new product are very rare, but needed for training. The aim is generate NOK pictures from OK pictures (all examples I found did the opposite). The idea is to force learning OK-picture structure in features[0:n-x] and learning NOK-picture structure (maybe from a similiar product) in features[n-x:n] in order to use the NOK-features as parameters to generate NOK-pictures from OK-pictures.
Two ideas came to my mind using a non-random dropout
(1) keras.layers.Dropout(rate, noise_shape=None, seed=None) has the noise_shape argument, but I am not sure if it helps me as it only describes the shape. It would be perfect to be able to provide a mask consisting of {0,1} to apply on the layer in order to switch on/off specific nodes
(2) creating a custom layer (named MaskLayer below) which performs masking specific nodes of the layer e.g. as a tuple of {0,1}.
I have read this, but I do not think it applies (generate a layer by concatenating layers which can be freezed separately).
def autoEncGenerate0( imgSizeX=28, imgSizeY=28, imgDepth=1): ####:
''' keras blog autoencoder'''
input_img = Input(shape=(imgSizeX, imgSizeY, imgDepth))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((4, 4), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded0 = MaxPooling2D((8, 8), padding='same', name="FEATURES")(x)
encoded1 = MaskLayer(mask)(encoded0) # TO BE DONE (B2) masking layer parts
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded1)
x = UpSampling2D((8, 8))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((4, 4))(x)
decoded = Conv2D( imgDepth, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
return( autoencoder)
Thanks for hints.
There is trainable attribute that each instance of tf.keras.layer.Layer has which disables training of the variables of that layer. UpSampling2D doesn't have any variables so you CAN'T train it. What you want is to train the variables of the convolutional layer that comes before that upsampling layer.
You could do it like this:
# define architecture here
autoencoder = Model(input_img, decoded)
layers_names = [l.name for l in autoencoder.layers]
trainable_layer_index = layers_names.index('FEATURES') - 1
for i in range(len(autoencoder.layers)):
if i != trainable_layer_index:
autoencoder.layers[i].trainable = False
# compile here
NOTE that you compile the model AFTER you set layers to trainable/non-trainable.
I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.
Characteristics that denote a "bad" image:
Incorrect white balance
Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?
I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.
My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:
# CONV => RELU => POOL layer set
# define convolutional layers, use "ReLU" activation function
# and reduce the spatial size (width and height) with pool layers
model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# only set of FC => RELU layers
# sigmoid classifier (output layer)
Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?
Thanks for your time and experience,
Here is my code for compiling/training the model:
# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]
# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)
Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:
# sigmoid classifier (output layer)
A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that
with three different images, my results are 0.987 bad and 0.999 good
I was giving you the predictions from the model previously
you should use a softmax activation, i.e.
Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.
The latter is usually preferred in binary classification settings, but the results should be the same in principle.
UPDATE (after updating the question):
sparse_categorical_crossentropy is not the correct loss here, either.
All in all, try the following changes:
model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])
# final layer:
with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).
I suggest you go for transfer learning instead of training the whole network.
use the weights trained on a huge Dataset like ImageNet
you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.
and you can train the model as usual way.
Demo code -
from keras.applications import *
from keras.models import Model
base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
layer.trainable = False
I'm trying to identify the sequence of images. I've 2 images and I need to identify the 3rd one. All are color images.
I'm getting below error:
ValueError: Error when checking input: expected
time_distributed_1_input to have 5 dimensions, but got array with
shape (32, 128, 128, 6)
This is my layer:
batch_size = 32
height = 128
width = 128
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), activation = 'relu'), input_shape=(batch_size, height, width, 2 * 3)))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(LSTM(256, return_sequences=True, dropout=0.5))
model.add(Conv2D(3, (3, 3), activation='relu', padding='same'))
My input images shapes are:
(128, 128, 2*3) [as I'm concatenating 2 input images]
My output image shape is:
(128, 128, 3)
You have applied the conv layer after Flatten(). This causes error because after flattening the data flowing through the Network is no more a 2D object.
I suggest you to keep the convolutional and the recurrent phases separated. First, you apply convolution to images, training the model to extract their relevant features. Later, you push these features into LSTM layers, so that you can capture also the information hidden in their sequence.
Hope this helps, otherwise let me know.
According to the error that you get, it seems that you are also not feeding the exact input shape. Keras is saying: "I need 5 dimensions, but you gave me 4". A TimeDistributed() layers needs a shape such as: (sample, time, width, length, channel). Your input lacks time, apparently.
I suggest you to print your model.summary() before running, and check the layer called time_distributed_1_input. That's the one your compiler is upset with.