I am implementing an autoencoder and I want to calculate the dimension of the latent space.
Let's say that I want a 3D latent space. So from my code how to calculate the current latent space.
Thank you
my current code:
x = Input(shape=(28, 28,1))
# Encoder
conv1_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
pool1 = MaxPooling2D((2, 2))(conv1_1)
conv1_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool1)
pool2 = MaxPooling2D((2, 2))(conv1_2)
conv1_3 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool2)
h = MaxPooling2D((2, 2))(conv1_3)
# Decoder
conv2_1 = Conv2D(8, (3, 3), activation='relu', padding='same')(h)
up1 = UpSampling2D((2, 2))(conv2_1)
conv2_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(up1)
up2 = UpSampling2D((2, 2))(conv2_2)
conv2_3 = Conv2D(16, (3, 3), activation='relu')(up2)
up3 = UpSampling2D((2, 2))(conv2_3)
r = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up3)
autoencoder = Model(inputs=x, outputs=r)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
The architecture of an Autoencoder is shaped in the form of a funnel, with the number of nodes decreasing as we move from the input layer to a layer (shown in red) known as "latent space." From latent space, the number of nodes appears to increase again until it reaches the output layer, where it equals the number of nodes in the input layer.
From the model summary we can see that the 7th layer is the latent space which is the compressed form of the input data.
tf.keras.backend.ndim(autoencoder.layers[6].output)
To get the dimension of a particular layer from a model you can use the above API. Thank you!
Related
I am following this example of an autoencoder:
https://keras.io/examples/vision/autoencoder/
In it, they train the autoencoder to denoise images, which works fine.
Now, I wanted to adapt the code to instead upscale the image, by simply training it with resized, smaller images.
However, when I look at the model definition of the autoencoder itself, there only seems to be an argument for the image dimensions of its input, i.e. (28, 28), but nowhere would it allow me to specify that the output should be (14, 14):
input = layers.Input(shape=(28, 28, 1))
# Encoder
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(input)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
# Decoder
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)
# Autoencoder
autoencoder = Model(input, x)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy")
autoencoder.summary()
From my understanding, the decoder performs the reverse steps as the encoder, so I would have expected an argument of (28, 28) somewhere in this line:
x = layers.Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)
Why does this even work as it is, and how can I achieve my goal of having the output be (14,14)?
The output dimensions are determined by CNN parameters such as padding, kernel size and stride. A stride of (2,2) will reduce your output dimensions to (14,14).
I am trying to work on building an autoencoder in Keras, with an input shape of (470,470,3) but the output never seems to match, even when I try to switch around padding. This is my code, can you please help? The way it is currently written my model summary shows an output of (472, 472, 3).
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras import Input, Model
input_image = Input(shape=(470, 470, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_image)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded_image = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_image, decoded_image)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Thank you!
Change your last padding to 'valid':
decoded_image = Conv2D(3, (3, 3), activation='sigmoid', padding='valid')(x)
I need to train an image classifier using inception V3 model from Keras. The images pass through 5 Conv2D layers and 2 MaxPool2D layers before entering the pre-trained inception V3 model. However my code gives me an error of ValueError: Depth of input (64) is not a multiple of input depth of filter (3) for 'inception_v3_4/conv2d_123/convolution' (op: 'Conv2D') with input shapes: [?,2,2,224], [3,3,3,32]
I reckon my output shape from previous layers is not compatible with the input shape required by Inception. But i am not able to solve it or is it even possible to solve this error. I am a beginner in machine learning and any light in this matter will be greatly appreciated.
My code is as follows:
inception_model = inception_v3.InceptionV3(weights='imagenet', include_top = False)
for layer in inception_model.layers:
layer.trainable = False
input_layer = Input(shape=(224,224,3)) #Image resolution is 224x224 pixels
x = Conv2D(128, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_layer)
x = Conv2D(128, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = MaxPool2D((3, 3), padding='same',strides=(2, 2))(x)
x = Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = MaxPool2D((4, 4), padding='same', strides=(2, 2))(x)
x = inception_model (x) #Error in this line
x = GlobalAveragePooling2D()(x)
predictions = Dense(11, activation='softmax')(x) #I have 11 classes of image to classify
model = Model(inputs = input_layer, outputs=predictions)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
model.summary()
Like #CAFEBABE said it would be almost useless to do this because the feature map can have almost 3 values but if you still want to try it then you can do this:
x = Conv2D(3, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_layer)
Another thing you will have to remember is that like you used 5 Conv2D and 2 MaxPooling layers above but you can't do that because even in the Inception model there are Conv2D and max-pooling layers which will take the dimensions to negative and give an error. I tried with 2 Conv2D layers and got an error so at max you can use 1.
Also when you are specifying InceptionV3 model specify the input shape.
input_layer = Input(shape=(224,224,3)) #Image resolution is 224x224 pixels
x = Conv2D(128, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_layer)
inception_model = tf.keras.applications.InceptionV3(weights='imagenet', include_top = False, input_shape=x.shape[0])
for layer in inception_model.layers:
layer.trainable = False
x = inception_model (x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(11, activation='softmax')(x) #I have 11 classes of image to classify
model = Model(inputs = input_layer, outputs=predictions)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
model.summary()
This would work but I doubt it will help the model. Anyways try it who knows what will happen.
I am trying to get values in the simple vector using autoencoder
here is my code
input_img = Input(shape=(28, 28, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
Here I need a flatten layer
encoder = Model(input_img, encoded)
And then make it convolutional back
encoderOutputShape = encoded._keras_shape[1:]
# unflatten here
decoder_input= Input(encoderOutputShape)
decoder = Conv2D(32, (3, 3), activation='relu', padding='same')(decoder_input)
x = UpSampling2D((2, 2))(decoder)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
decoder = Model(decoder_input, decoded)
auto_input = Input(shape=(28,28,1))
encoded = encoder(auto_input)
decoded = decoder(encoded)
auto_encoder = Model(auto_input, decoded)
How to do it in the right way?
In other words, I want to get the output of the encoder (or use random data), change it and put into the decoder and get the decoded result.
There is a question here why do you flat the tensor if you don't use any Dense layers?
but you can make like this:
encoder_output = Flatten()(encoded)
decoder_input = Reshape((7, 7, 32))(encoder_output)
decoder = Conv2D(32, (3, 3), activation='relu', padding='same')(decoder_input)
that is because you need to reshape your tensor before.
I'm training an autoencoder in Keras right now, below is the network structure.
input_img = Input(shape=(target_size[0], target_size[1], 3))
x = Conv2D(8, (3, 3), activation = 'relu', padding = 'same')(input_img)
x = MaxPooling2D((2, 2), padding = 'same')(x)
x = Conv2D(16, (3, 3), activation = 'relu', padding = 'same')(x)
x = MaxPooling2D((2, 2), padding = 'same')(x)
x = Conv2D(32, (3, 3), activation = 'relu', padding = 'same')(x)
encoded = MaxPooling2D((2, 2), padding = 'same')(x)
x = Conv2D(32, (3, 3), padding = 'same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation = 'relu', padding = 'same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation = 'relu', padding = 'same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation = 'sigmoid', padding = 'same')(x)
I'm having trouble thinking of a way to:
Insert an input layer between where "encoded" is defined and the Conv2D layer after it - the intention is to get the encoding of two different images, create a bunch of iterative "steps" between their encoding, then input these encodings into the "decoder" half of this network to generate an image for every step. I want to make a gif of the output "morphing" from one image to the other.
I wanted to try inserting another Conv2D(...) and MaxPooling2D(...) pair right after the input layer, and a corresponding UpSampling2D(...) and Conv2D(...) at the end. I had this idea from NVidia's "Progressive Growing of GANs for Improved Quality, Stability, and Variation" paper, where they trained their GAN to generate really good images at low resolutions, then progressively added more layers at the beginning and end of their network and trained the whole network with the new layers.
Does this make sense? Please let me know if I can clarify anything, I feel like this is very specific problem that's hard to explain over text all at once.
Thanks!