reducing output feature number of pretraind mode weight in keras

reducing output feature number of pretraind mode weight in keras - python

I want extract 1000 image features using pretrained Xception model.
but xception models last layer(avg_pool) give 2048 features.
Can i reduce final output feature number without additional training?
I want image features before the softmax not predcition result.
base_model = xception.Xception(include_top=True, weights='imagenet')
base_model.summary()
self.model = Model(inputs = base_model.input, outputs=base_model.get_layer('avg_pool').output)

This model was trained to produce embeddings in 2048-dimensional space to the classifier after it. There is no sense in trying to reduce the dimensionality of the embedding space, unless you are combining very complex and inflexible models. If you are just doing simple transfer learning with no memory constraints, just snap your new classifier (extra layers) on top of it, and retrain after freezing (or not) all layers in the original Xception. That should work regardless of Xception output_shape. See the keras docs.
That said, if you REALLY need to reduce dimensionality to 1000-d, you will need a method that preserves (or at least tries to preserve) the original topology of the embedding space, otherwise your model will not benefit at all from transfer learning. Take a look at PCA, SVD, or T-SNE.

Related

Keras data augmentation layers in model or out of model

So this may be a silly question but how exactly do the preprocessing layers in keras work, especially in the context of as a part of the model itself. This being compared to preprocessing being applied outside the model then inputting the results for training.
I'm trying to understand running data augmentation in keras models. Lets say I have 1000 images for training. Out of model I can apply augmentation 10x and get 10000 resultant images for training.
But I don't understand what's happening when you use a preprocess layer for augmentation. Does this (or these if you use many) layers take each image and apply the transformations before training? Does this mean the total number of images used for training (and validation I assume) to be the number of epochs*the original number of images?
Is one option better than the other? Does that depend on the number of images one originally has before augmentation?

The benefit of preprocessing layers is that the model is truly end-to-end, i.e. raw data comes in and a prediction comes out. It makes your model portable since the preprocessing procedure is included in the SavedModel.
However, it will run everything on the GPU. Usually it makes sense to load the data using CPU worker(s) in the background while the GPU optimizes the model.
Alternatively, you could use a preprocessing layer outside of the model and inside a Dataset. The benefit of that is that you can easily create an inference-only model including the layers, which then gives you the portability at inference time but still the speedup during training.
For more information, see the Keras guide.

Keras model for multiclass classification for sentiment analysis with LSTM - how can my model be improved?

So I want to do predict the number of stars a product gets on Amazon through keras, I have seen other ways of doing this, but I have used the universal sentence encoder with one-hot encoding (I have followed a Youtube tutorial to embed the reviews). Now without using an LSTM layer and using the following layers:
`model.add(keras.layers.Dense(units=256,input_shape=(X_train.shape[1], ),activation='relu'))
model.add(keras.layers.Dropout(rate=0.5))
model.add(keras.layers.Dense(units=128,activation='relu'))
model.add(keras.layers.Dropout(rate=0.5))
model.add(keras.layers.Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(0.0001),metrics= ['accuracy'])`
I am able to get an accuracy of around 0.55 and a loss of 1, which isn't great. However when I reshape my X_train and X_test data to be 3D input for an LSTM layer and then put it into a model such as:
`model.add(keras.layers.Dense(units=256,input_shape=(512, 1), activation='relu'))
model.add(keras.layers.Dropout(rate=0.5))
model.add(keras.layers.Bidirectional(keras.layers.LSTM(100, dropout=0.2, recurrent_dropout=0.3)))
model.add(keras.layers.Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(0.0001),metrics= ['accuracy'])`
I get an accuracy of around 0.2 which is even worse, with a loss of close to 2.00.
I have no idea whether an LSTM is necessary as I am new to neural networks but I have been trying to do this for my project.
So I am asking should I stick with the first model without an LSTM or is there a way of changing the second neural network with LSTM to have an accuracy of 0.2 whilst using the embedding methods that I have used?
Thanks for your time!

Why you should choose LSTM instead of normal neurons is because in language, there is a relationship between words and that is important in understanding what the sentence means. The model with only dense layer is not able to do that great because there are no connections it that can store such information, it just predicts by looking at the whole picture and not the connections the words have in between. Coming to LSTM, they stand for Long Short Term Memory, in short, what they have is the capability to remember data that they had seen previously, which helps it in creating connections with different words in the same sentence.
Coming to how you would go about creating your model. First, you need a Tokenizer in the TF library to create token out of your data, then convert your sequence into numbers through it, then pad your data using pad_sequences. Your data is then ready. In your network, your first layer should be an Embedding layer. Followed by it you can have the LSTM (as I have explained why you should use them) or Bidirectional LSTM (they can learn the dependency from left-to-right and right-to-left, performs better than unidirectional LSTM) or Conv1D (according to filter size it is able model dependencies in lying in its filter length, it has been used and works, you can try) layers, followed by pooling layer (GlobalMaxPooling1D) and then, dense layers to get your predictions.

How to use Deep Learning Models from Keras for a problem that does not fit imagenet dataset?

I followed a blog on how to implement a vgg16-model from scratch and want to do the same with the pretrained model from Keras. I looked up some other blogs but can't find a fitting solution I think. My task is to classify integrated circuit images into defect or non defects.
I have seen on a paper that they used pretrained imagenet model of vgg16 for fabric defect detection, where they freezed the first seven layers and fine tuned the last nine for their own problem.
(Source: https://journals.sagepub.com/doi/full/10.1177/1558925019897396)
I have already seen examples on how to freeze all layers except the fully connected layers, but how can I try the example with freezing first x layers and fine tune the others for my problem?
The VGG16 is fairly easy to implement from scratch but for models like resnet or xception it is getting a little trickier.

It is not necessary to implement a model from scratch to freeze a few layers. You can do this on pre-trained models as well. In keras, you'd use trainable = False.
For example, let's say you want to use the pre-trained Xception model from keras and want to freeze the first x layers:
#In your includes
from keras.applications import Xception
#Since you're using the model for a different task, you'd want to remove the top
base_model = Xception(weights='imagenet', include_top=False)
#Freeze layers 0 to x
for layer in base_model.layers[0:x]:
layer.trainable = False
#To see all the layers in detail and to check trainable parameters
base_model.summary()
Ideally you'd want to add another layer on top of this model with the output as your classes. For more details, you can check this keras guide: https://keras.io/guides/transfer_learning/
A lot of times the pre-trained weights can be very useful in other classification tasks but in case you want to train a model from scratch on your dataset, you can load the model without the imagenet weights. Or better, load the weights but don't freeze any layers. This will retrain every layer taking imagenet weights as an initialization.
I hope I've answered your question.

Change number of input channels to pretrained keras.applications model?

I am prototyping a deep learning segmentation model that needs six channels of input (two aligned 448x448 RGB images under different lighting conditions). I wish to compare the performance of several pretrained models to that of my current model, which I trained from scratch. Can I use the pretrained models in tf.keras.applications for input images with more than 3 channels?
I tried applying a convolution first to reduce the channel dimension to 3 and then passed that output to tf.keras.applications.DenseNet121() but received the following error:
import tensorflow as tf
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
dense_stem = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', input_tensor=dense_filter)
*** ValueError: You are trying to load a weight file containing 241 layers into a model with 242 layers.
Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?

Technically, it should be possible. Perhaps using the model's __call__ itself:
orig_model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet')
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
output = orig_model(dense_filter)
model = tf.keras.Model(dense_input, output)
model.compile(...)
model.summary()
On a conceptual level, though, I'd be worried that the new input doesn't look much like the original input that the pretrained model was trained on.

Cross Modality Pre-training may be the method you need. Proposed by Wang et al. (2016), this method averages the weights of the pre-trained model across the channels in the first layer and replicates the mean by the number of target channels. The experiment result indicates that the network gets better performance by using this kind of pre-training method even it has 20 input channels and its input modality is not RGB.
To apply this, one can refer to another answer that use layer.get_weights() and layer.set_weights() to manually set the weights in the first layer of the pre-trained model.

As a complementary approach to adding a convolutional layer before a pre-trained architecture, e.g. any of the pre-trained models available in tf.keras.applications that were trained with RGB-inputs, you could consider manipulating the existing weights so that they would match with your model with 6-channel inputs. For example, if your architecture remains the same besides the added input modalities, you can repeat the green channel to the newly added 3 input channels: see here.
"Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?"
Both the aforementioned and commonly used techniques
adding convolution layer(s) before the pre-trained architecture to convert the modalities
repeating the pre-trained channels to match with the newly added modalities
enable transfer learning, which is virtually always a better choice than starting the training from scratch. However, do not expect neither of the options to work without some retraining. In my opinion/experience, the latter is better. The reason is that the randomly initialized Conv-layers in the former approach would (at least initially) result in radically different inputs than what the rest of the architecture has "got used to seeing". This was already reasoned in the earlier answer by #Kris. The latter technique takes advantage of the fact that many of the relevant features are fairly similar in the different input modalities: a dog might still look like a dog even in a newly added input modality (e.g. RGB vs thermal light).

Updating pre-trained Deep Learning model with respect to new data points

Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.

If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.