I'm working on a recursive auto-encoder. The neural network takes two 2D images each shaped (28,28,1) and combined to create an input of (28,28,2). They are encoded into a (28,28,1) shape, and decoded back into the original shape (28,28,2). Thus, the encoded form of data can be fed into the auto-encoder for recursive operation.
We can assume that channel 1 is a new image, and channel 2 is previously encoded data. How do I create a loss function that penalises more heavily for mistakes reconstructing channel 2 (as this will carry previously encoded data)?
I am working in Keras, with a Tensorflow back-end.
Alternatively, is there a way to train the network as a complete tree, as opposed to doing it only for single two input - two output blocks at a time?
You can separate your decoded (28, 28, 2) back into 2 images as output and use loss_weights to assign a weight of importance. From the documentation:
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
And yes, all models in Keras are like layers so you can chain them together to construct a tree for example. Then you can train the network in one go and decide whether you would like to share weights etc. However, it could be more difficult to train. I would recommend using the functional API to create these more complex structures so you have more control.
Related
Is it possible to combine multiple types of data through a neural network and in return output a particular datatype?
For example, can I input an image and some metadata about that image, then the algorithm will output a number.
I'm thinking along the lines of stitching a CNN and ANN together.
Thanks in advance!
Yes, that is possible. Actually, that is pretty straightforward.
Commonly, what happens in image classification (for example) is that a so-called feature map (generated by the last convolutional layer in this case) gets flattened. This flattened tensor is then fed through a feed-forward Neural Network (NN) to perform the classification task.
The easiest would be to simply add one layer to your network which concatenates two tensors a and b, where a is the flattened output of the last convolutional layer and b is your meta-data (which is then ideally also a flattened tensor).
Afterwards, you simply feed the concatenated tensor (containing both the encoding of your image and the meta-data) through the feed-forward NN to perform the final classification (or generation of whatever the desired output is).
To be more precise. Lets say I already have a vector that represents something (word, object, image...) and that I can not change the way I get it. What I would like to do is create a NN without the embedding and pooling layer and am wondering if tensorflow supports this kind of aproach.
Lets say my vector is 10 features long (10 floats). For each vector I also have a label, lets say there are 3 labels to chose from.
What I am (struggling/trying) to do is this. I would like to push this sort of vector input into a keras dense layer with relu activation and 10 neurons (stack maybe 2 or 3) and then as a final layer use sigmoid activation with 3 output neurons.
Then fit with labels on 40(?) epochs and so on...
My main question is well.. Is this possible? I have yet to finish the code and maybe I am asking this a bit too soon, but nevertheless.
Is this how one would approach this or would you build the model from embedding layer down and would not use the already made vectors?
Indeed it is possible.
One way to do it is to create a generator function yielding the vectors (that will do your vector representation, whatever it is) you want to pass to the network. Then create a TensorFlow dataset by calling tf.data.Dataset.from_generator.
The model will be then probably just a Sequential of dense layers.
I'm interested in using multiple outputs from a Keras model in a single objective function.
I can't simply use a concatenate layer, as the outputs are of different size. As an example: consider a model that will do the standard digit recognition task on the MNIST dataset. I need the model to output one tensor of size (784,) (the input shape after the images have been flattened) and one of size (10,) (the class probabilities), and use them both in a single custom loss function.
I am prototyping a deep learning segmentation model that needs six channels of input (two aligned 448x448 RGB images under different lighting conditions). I wish to compare the performance of several pretrained models to that of my current model, which I trained from scratch. Can I use the pretrained models in tf.keras.applications for input images with more than 3 channels?
I tried applying a convolution first to reduce the channel dimension to 3 and then passed that output to tf.keras.applications.DenseNet121() but received the following error:
import tensorflow as tf
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
dense_stem = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', input_tensor=dense_filter)
*** ValueError: You are trying to load a weight file containing 241 layers into a model with 242 layers.
Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?
Technically, it should be possible. Perhaps using the model's __call__ itself:
orig_model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet')
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
output = orig_model(dense_filter)
model = tf.keras.Model(dense_input, output)
model.compile(...)
model.summary()
On a conceptual level, though, I'd be worried that the new input doesn't look much like the original input that the pretrained model was trained on.
Cross Modality Pre-training may be the method you need. Proposed by Wang et al. (2016), this method averages the weights of the pre-trained model across the channels in the first layer and replicates the mean by the number of target channels. The experiment result indicates that the network gets better performance by using this kind of pre-training method even it has 20 input channels and its input modality is not RGB.
To apply this, one can refer to another answer that use layer.get_weights() and layer.set_weights() to manually set the weights in the first layer of the pre-trained model.
As a complementary approach to adding a convolutional layer before a pre-trained architecture, e.g. any of the pre-trained models available in tf.keras.applications that were trained with RGB-inputs, you could consider manipulating the existing weights so that they would match with your model with 6-channel inputs. For example, if your architecture remains the same besides the added input modalities, you can repeat the green channel to the newly added 3 input channels: see here.
"Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?"
Both the aforementioned and commonly used techniques
adding convolution layer(s) before the pre-trained architecture to convert the modalities
repeating the pre-trained channels to match with the newly added modalities
enable transfer learning, which is virtually always a better choice than starting the training from scratch. However, do not expect neither of the options to work without some retraining. In my opinion/experience, the latter is better. The reason is that the randomly initialized Conv-layers in the former approach would (at least initially) result in radically different inputs than what the rest of the architecture has "got used to seeing". This was already reasoned in the earlier answer by #Kris. The latter technique takes advantage of the fact that many of the relevant features are fairly similar in the different input modalities: a dog might still look like a dog even in a newly added input modality (e.g. RGB vs thermal light).
I am trying to replicate a neural network for depth estimation. The original authors have taken a pre-trained network and added between the fully connected layer and the convolutional layer a 'Superpixel Pooling Layer'. In this layer, the convolutional feature maps are upsampled and the features per superpixel are averaged.
My problem is that in order to successfully achieve this, I need to calculate the superpixels per image. How can I access the data being used by keras/tensorflow during batch processing to perform SLIC oversegmentation?
I considered splitting the tasks and working by pieces i.e. feed the images into the convolutional network. Process the outputs separately and then feed them into a fully connected layer. However, this makes further training of the network impossible.
At the time it seems to be impossible to actually access the data within the symbolic tensor. It also seems unlikely that such functionality will be added in the future since in the Tensorflow page it says:
A Tensor object is a symbolic handle to the result of an operation, but
does not actually hold the values of the operation's output.
Keras allows for the creation of personalized layers. However, these are limited by the available backend operations. As such, it is simply not possible to access the batch data.