Why cannot I overfit convolutional autoencoder on one image? - python

I have this convolutional autoencoder:
inputs = layers.Input(shape=(384, 128, 3))
# coder
x = layers.Conv2D(8, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(inputs)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(16, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(32, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(64, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
# decoder
x = layers.Conv2DTranspose(64, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2DTranspose(16, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2DTranspose(8, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2D(3, (3, 3), activation="sigmoid", padding="same")(x)
# autoencoder
autoencoder = Model(inputs, x)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])
autoencoder.summary()
I have following image:
I am trying to overfit my model only on this image, but I cannot get loss lower than ~0,42 and accuracy is somewhere around ~0,79. Reconstruction is pretty decent (I mean, image looks more or less the same, but blurry, without details) but loss doesn't fall neither accuracy rises. I tried a few things (data preprocessing, augmentation, using leaky ReLU instead of normal ReLU), except using larger model (training would take a long time, and I am not sure if this is solution).
How can I overfit this model and increase accuracy and lower my loss? Is this image too much for this model (I am still missing intuition on how large models should be for specific type of data)?

Related

Why does a Convolutional Autoencoder only need image dimensions for its input, but not for its output?

I am following this example of an autoencoder:
https://keras.io/examples/vision/autoencoder/
In it, they train the autoencoder to denoise images, which works fine.
Now, I wanted to adapt the code to instead upscale the image, by simply training it with resized, smaller images.
However, when I look at the model definition of the autoencoder itself, there only seems to be an argument for the image dimensions of its input, i.e. (28, 28), but nowhere would it allow me to specify that the output should be (14, 14):
input = layers.Input(shape=(28, 28, 1))
# Encoder
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(input)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
# Decoder
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)
# Autoencoder
autoencoder = Model(input, x)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy")
autoencoder.summary()
From my understanding, the decoder performs the reverse steps as the encoder, so I would have expected an argument of (28, 28) somewhere in this line:
x = layers.Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)
Why does this even work as it is, and how can I achieve my goal of having the output be (14,14)?
The output dimensions are determined by CNN parameters such as padding, kernel size and stride. A stride of (2,2) will reduce your output dimensions to (14,14).

Calculate the dimension of the latent space

I am implementing an autoencoder and I want to calculate the dimension of the latent space.
Let's say that I want a 3D latent space. So from my code how to calculate the current latent space.
Thank you
my current code:
x = Input(shape=(28, 28,1))
# Encoder
conv1_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
pool1 = MaxPooling2D((2, 2))(conv1_1)
conv1_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool1)
pool2 = MaxPooling2D((2, 2))(conv1_2)
conv1_3 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool2)
h = MaxPooling2D((2, 2))(conv1_3)
# Decoder
conv2_1 = Conv2D(8, (3, 3), activation='relu', padding='same')(h)
up1 = UpSampling2D((2, 2))(conv2_1)
conv2_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(up1)
up2 = UpSampling2D((2, 2))(conv2_2)
conv2_3 = Conv2D(16, (3, 3), activation='relu')(up2)
up3 = UpSampling2D((2, 2))(conv2_3)
r = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up3)
autoencoder = Model(inputs=x, outputs=r)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
The architecture of an Autoencoder is shaped in the form of a funnel, with the number of nodes decreasing as we move from the input layer to a layer (shown in red) known as "latent space." From latent space, the number of nodes appears to increase again until it reaches the output layer, where it equals the number of nodes in the input layer.
From the model summary we can see that the 7th layer is the latent space which is the compressed form of the input data.
tf.keras.backend.ndim(autoencoder.layers[6].output)
To get the dimension of a particular layer from a model you can use the above API. Thank you!

Input and output layers of Keras autoencoder don't match, can't run model

I am trying to work on building an autoencoder in Keras, with an input shape of (470,470,3) but the output never seems to match, even when I try to switch around padding. This is my code, can you please help? The way it is currently written my model summary shows an output of (472, 472, 3).
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras import Input, Model
input_image = Input(shape=(470, 470, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_image)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded_image = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_image, decoded_image)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Thank you!
Change your last padding to 'valid':
decoded_image = Conv2D(3, (3, 3), activation='sigmoid', padding='valid')(x)

expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)

I'm having trouble with this autoencoder I'm building using Keras. The input's shape is dependent on the screen size, and the output is going to be a prediction of the next screen size... However there seems to be an error that I cannot figure out... Please excuse my awful formatting on this website...
Code:
def model_build():
input_img = InputLayer(shape=(1, env_size()[1], env_size()[0]))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
model = Model(input_img, decoded)
return model
if __name__ == '__main__':
model = model_build()
model.compile('adam', 'mean_squared_error')
y = np.array([env()])
print(y.shape)
print(y.ndim)
debug = model.fit(np.array([[env()]]), np.array([[env()]]))
Error:
Traceback (most recent call last):
File "/home/ai/Desktop/algernon-test/rewarders.py", line 46, in
debug = model.fit(np.array([[env()]]), np.array([[env()]]))
File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)
File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training.py", line 789, in _standardize_user_data
exception_prefix='target')
File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data
str(data_shape))
ValueError: Error when checking target: expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)
EDIT:
Code for get_screen imported as env():
def get_screen():
img = screen.grab()
img = img.resize(screen_size())
img = img.convert('L')
img = np.array(img)
return img
You have three 2x downsampling steps, and three x2 upsampling steps. These steps have no knowledge of the original image size, so they will round out the size to the nearest multiple of 8 = 2^3.
cropX = 7 - ((size[0]+7) % 8)
cropY = 7 - ((size[1]+7) % 8)
cropX = 7 - ((npix+7) % 8)
cropY = 7 - ((nlin+7) % 8)
It ought to work if you add a new final layer...
decoded = layers.Cropping2D(((0,cropY),(0,cropX)))(x)
Looks like env_size() and env() mess image dimensions somehow. Consider this example:
image1 = np.random.rand(1, 1, 270, 480) #First dimension is batch size for test purpose
image2 = np.random.rand(1, 4, 268, 1) #Or any other arbitrary dimensions
input_img = layers.Input(shape=image1[0].shape)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
model = tf.keras.Model(input_img, decoded)
model.compile('adam', 'mean_squared_error')
model.summary()
This line will work:
model.fit(image1, nb_epoch=1, batch_size=1)
But this doesn't
model.fit(image2, nb_epoch=1, batch_size=1)
Edit:
In order to get output of the same size as input you need to calculate convolution kernel size carefully.
image1 = np.random.rand(1, 1920, 1080, 1)
input_img = layers.Input(shape=image1[0].shape)
x = layers.Conv2D(32, 3, activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, 3, activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, 3, activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, 1, activation='relu')(x) # set kernel size to 1 for example
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, 3, activation='sigmoid', padding='same')(x)
model = tf.keras.Model(input_img, decoded)
model.compile('adam', 'mean_squared_error')
model.summary()
This will output same dimensions.
As per this guide http://cs231n.github.io/convolutional-networks/
We can compute the spatial size of the output volume as a function of
the input volume size (W), the receptive field size of the Conv Layer
neurons (F), the stride with which they are applied (S), and the
amount of zero padding used (P) on the border. You can convince
yourself that the correct formula for calculating how many neurons
“fit” is given by (W−F+2P)/S+1. For example for a 7x7 input and a 3x3
filter with stride 1 and pad 0 we would get a 5x5 output. With stride
2 we would get a 3x3 output.

How do I go from a dense layer to a conv2D layer using Keras?

I'm simply trying to do what the title says. Here's my code:
def ConvAutoEncoder(train_data,test_data,n_epochs = 50,batchSize = 128,data_shape=(IMAGE_SIZE,IMAGE_SIZE,3)):
print('Training Neural Network')
input_img = Input(shape=data_shape)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
print(x.shape)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
print(x.shape)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
print(x.shape)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
print(encoded.shape)
# at this point the representation is (6, 6, 4 i.e. 128-dimensional
encoded = Flatten()(encoded)
encoded = Dense( 6*6*4,activation='relu')(encoded)
print(encoded.shape)
endoded = Reshape((6,6,4))(encoded)
print(encoded.shape)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
print(x.shape)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
print(x.shape)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
print(x.shape)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
print(x.shape)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(train_data, train_data,
epochs=n_epochs,
batch_size=batchSize,
shuffle=True,
verbose=2,
validation_data=(test_data, test_data),
callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
return autoencoder
However when I run it the Reshape layer doesn't do anything at all, the shape of the output before the reshape is (?,144) and the shape after is also (?,144). Am I using reshape wrong or is there some other way to connect a dense layer to a conv2D layer?

Categories

Resources