Change number of input channels to pretrained keras.applications model?

Change number of input channels to pretrained keras.applications model? - python

I am prototyping a deep learning segmentation model that needs six channels of input (two aligned 448x448 RGB images under different lighting conditions). I wish to compare the performance of several pretrained models to that of my current model, which I trained from scratch. Can I use the pretrained models in tf.keras.applications for input images with more than 3 channels?
I tried applying a convolution first to reduce the channel dimension to 3 and then passed that output to tf.keras.applications.DenseNet121() but received the following error:
import tensorflow as tf
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
dense_stem = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', input_tensor=dense_filter)
*** ValueError: You are trying to load a weight file containing 241 layers into a model with 242 layers.
Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?

Technically, it should be possible. Perhaps using the model's __call__ itself:
orig_model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet')
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
output = orig_model(dense_filter)
model = tf.keras.Model(dense_input, output)
model.compile(...)
model.summary()
On a conceptual level, though, I'd be worried that the new input doesn't look much like the original input that the pretrained model was trained on.

Cross Modality Pre-training may be the method you need. Proposed by Wang et al. (2016), this method averages the weights of the pre-trained model across the channels in the first layer and replicates the mean by the number of target channels. The experiment result indicates that the network gets better performance by using this kind of pre-training method even it has 20 input channels and its input modality is not RGB.
To apply this, one can refer to another answer that use layer.get_weights() and layer.set_weights() to manually set the weights in the first layer of the pre-trained model.

As a complementary approach to adding a convolutional layer before a pre-trained architecture, e.g. any of the pre-trained models available in tf.keras.applications that were trained with RGB-inputs, you could consider manipulating the existing weights so that they would match with your model with 6-channel inputs. For example, if your architecture remains the same besides the added input modalities, you can repeat the green channel to the newly added 3 input channels: see here.
"Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?"
Both the aforementioned and commonly used techniques
adding convolution layer(s) before the pre-trained architecture to convert the modalities
repeating the pre-trained channels to match with the newly added modalities
enable transfer learning, which is virtually always a better choice than starting the training from scratch. However, do not expect neither of the options to work without some retraining. In my opinion/experience, the latter is better. The reason is that the randomly initialized Conv-layers in the former approach would (at least initially) result in radically different inputs than what the rest of the architecture has "got used to seeing". This was already reasoned in the earlier answer by #Kris. The latter technique takes advantage of the fact that many of the relevant features are fairly similar in the different input modalities: a dog might still look like a dog even in a newly added input modality (e.g. RGB vs thermal light).

Related

Improve the result of EfficientNet

Halo there, I'm still struggling in python.
Now I'm going to use the EfficientNet model to detect the ripeness of palm oil.
I'm using 5852 training picture which is divided into 4 class (1463 per class) with 132 testing picture (33 per class).
After testing with 200 epoch, the result is far from good.
Is there any solution for me to improve the result?
Here's the result of my model accuracy and model loss.
Here's my code
https://colab.research.google.com/drive/18AtIP7aOycHPDR84PuQ7iS8aYUdclZIe?usp=sharing
your help means a lot to me.

You have rescaling in your generators, and it may be the root of the problem.
Tensorflow implementation of Efficientnets already contain rescaling layer, so you mustn't rescale images in your ImageDataGenerator. You can check this via .summary() method.
Official documentation says:
Note: each Keras Application expects a specific kind of input preprocessing. For EfficientNet, input preprocessing is included as part of the model (as a Rescaling layer), and thus tf.keras.applications.efficientnet.preprocess_input is actually a pass-through function. EfficientNet models expect their inputs to be float tensors of pixels with values in the [0-255] range
Resnets, for example, don't have this layer, and you should rescale images before feeding them to the model. It's tricky to remember those things for every single network from tf.keras.applications, so I suggest to just check them before using new models.

How to build a Neural Network with sentence embeding concatenated to pre-trained CNN

I want to build a neural network that will take the feature map from the last layer of a CNN (VGG or resnet for example), concatenate an additional vector (for example , 1X768 bert vector) , and re-train the last layer on classification problem.
So the architecture should be like in:
but I want to concat an additional vector to each feature vector (I have a sentence to describe each frame).
I have 5 possible labels , and 100 frames in the input frames.
Can someone help me as to how to implement this type of network?

I would recommend looking into the Keras functional API.
Unlike a sequential model (which is usually enough for many introductory problems), the functional API allows you to create any acyclic graph you want. This means that you can have two input branches, one for the CNN (image data) and the other for any NLP you need to do (relating to the descriptive sentence that you mentioned). Then, you can feed in the combined outputs of these two branches into the final layers of your network and produce your result.
Even if you've already created your model using models.Sequential(), it shouldn't be too hard to rewrite it to use the functional API.
For more information and implementation details, look at the official documentation here: https://keras.io/guides/functional_api/

How can I change number of channels on Resnet to make it work only on B/W images?

I'm working on tensorflow and my dataset is composed only by Black and White images, so I thought that I could make my neural net (currently I am using Resnet50) less heavy and easier to train and test by changing the number of channels from 3 to 1,
Is there a way to do so?
(Ik I can treat b/w images as rgb images but I don't want to do that)
Thanks in advance for the answer

The pretrained weights in keras.applications require a 3 channel input. You could one of two things:
Use a different pretrained model that works on grayscale images.
Set the R, G and B channels to replicate your BW input, then fine-tune the entire neural network on your own dataset. This probably won't work without the fine-tuning step.
On a side note, I must say this task will not help in your goal of making it 'less heavy and easier to train and test'. If you call model.summary() on keras Resnet50, you see that of the total trainable 23,534,592 parameters, only about 10K of them are in the initial layer. So at best you can reduce the number of parameters by an insignificant few thousand.
I would instead suggest using a lighter model such as MobileNet that are also available in Keras.

Can CNN autoencoders have different input and output dimensions?

I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.

Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.

reducing output feature number of pretraind mode weight in keras

I want extract 1000 image features using pretrained Xception model.
but xception models last layer(avg_pool) give 2048 features.
Can i reduce final output feature number without additional training?
I want image features before the softmax not predcition result.
base_model = xception.Xception(include_top=True, weights='imagenet')
base_model.summary()
self.model = Model(inputs = base_model.input, outputs=base_model.get_layer('avg_pool').output)

This model was trained to produce embeddings in 2048-dimensional space to the classifier after it. There is no sense in trying to reduce the dimensionality of the embedding space, unless you are combining very complex and inflexible models. If you are just doing simple transfer learning with no memory constraints, just snap your new classifier (extra layers) on top of it, and retrain after freezing (or not) all layers in the original Xception. That should work regardless of Xception output_shape. See the keras docs.
That said, if you REALLY need to reduce dimensionality to 1000-d, you will need a method that preserves (or at least tries to preserve) the original topology of the embedding space, otherwise your model will not benefit at all from transfer learning. Take a look at PCA, SVD, or T-SNE.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.