I'm new to pytorch. I wrote the below code to do predication using Resnet with Sigmoid for binary classification. I just need to change it to softmax because I might have more than 2 classes.
I understood that pytorch, unlike, Keras, the softmax is in the CrossEntropyLoss. So I'm not sure how could I change the top layer to make the model uses softmax:
model = torchvision.models.resnet50(pretrained=False)
model.fc = torch.nn.Sequential(
torch.nn.Linear(
in_features=2048,
out_features=1
) , torch.nn.Sigmoid()
)
model = model.cpu()
and later:
lossFunc=torch.nn.BCELoss(class_weights)
You can try this:
model.fc[1] = torch.nn.Softmax(10)
where 10 are the number of classes, you can put value based on your needs.
Related
Ng's Convolutional Neural Network class's Week 2 Lab on using Transfer Learning with MobileNetV2 (summary: https://github.com/EhabR98/Transfer-Learning-with-MobileNetV2) and an additional tutorial (https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/) both begin like this:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False
They then proceed to add a pooling layer(s), a Dropout layer and a Dense 1-unit layer to the end, apply a BinaryCrossentropy loss and some kind of optimizer, then train it on some custom data that has been inputted. Lets call this custom model "model2" as Ng's lab does
Here's what my the Coursera class model looks like, its important to include here because the variable base_model is called in two different closures throughout the Coursera lab (previous to this it was called outside of a method, as base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=True, weights='imagenet'); base_model.trainable= False)
def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter())
input_shape = image_shape + (3,0)
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')
base_model.trainable = False
inputs = tf.keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = tfl.GlobalAveragePooling2D()(x)
x = tfl.Dropout(0.2)(x)
prediction_layer = tfl.Dense(1)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
model2 = alpaca_model()
base_learning_rate = 0.001
initial_epochs = 5
model2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=["accuracy"])
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)
This performs OK, getting as much as 80% accuracy
Fine tuning -- Now in both the course lab and the tutorial, they then proceed to "unfreeze" some of the last layers of the internal network so that they can be trained, like so:
fine_tune_at = 120
base_model = model2.layers[4] #totally separate question, but I would love to hear in comments, what this does exactly. It is difficult to Google this.
base_model.trainable = True
print("#/layers in base model: ", len(base_model.layers))
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
loss_function = tf.keras.losses.BinaryCrossentrop(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=base_learning_rate*0.1)
metrics = ['accuracy']
fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs
Up until this point, I'm satisfied, I can clearly see what is going on, but then:
model2.compile(loss=loss_function,optimizer=optimizer,metrics=metrics)
history_fine = model2.fit(train_dataset, epochs=total_epochs, initial_epoch=history.epoch[-1], validation_data = validation_dataset)
This leads to a marked improvement in results. Which confused me, I was very much expecting base_model to get passed in somehow. I didn’t imagine that altering some other variable that hasn’t been passed into or been initially called would come into play.
So given all of that context, the question is: How is altering the base_model affecting model2?
If the above example from the Coursera lab is as confusing to you as it is to me, the example shown on https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/ as mentioned above is much simpler and contains much less ambiguity as base_model is defined only once. Regardless, the same dynamic applies and I'm equally confused on both. Thanks again for your time
Your
totally separate question, but I would love to hear in comments, what this does exactly.
The following list get the MobileNetV2 model:
base_model = model2.layers[4]
Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.
How is altering the base_model affecting model2
During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.
To view the full model summary with nested models, use:
>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)
I'll go ahead and post an answer that speaks to (the problem with) Professor Ng's Convolution Neural Network, Week 2 Assignment, "Transfer Learning with MobileNet", with the hope that other students might find this answer and realize that they are not crazy and that the Lab was poorly coded.
I'm not sure how it (appears to work) on Jupyter, but the main reason I was having problems with this lab was that base_model was defined several times within the lab. It should have only been defined once. Even worse, the base_model was redefined inside the alpaca_model() function, but that's not accessible outside the closure of the function. I'm not in the industry, but that is just plain terrible coding to redefine a variable inside a method that's already been defined then call it again outside of the method.
Once I took base_model out of the function, defining it beforehand, everything works perfectly not just on the computer, but in my head.
I am developing a Siamese Network for Face Recognition using Keras for 224x224x3 sized images. The architecture of a Siamese Network is like this:
For the CNN model, I am thinking of using the InceptionV3 model which is already pretrained in the Keras.applications module.
#Assume all the other modules are imported correctly
from keras.applications.inception_v3 import InceptionV3
IMG_SHAPE=(224,224,3)
def return_siamese_net():
left_input=Input(IMG_SHAPE)
right_input=Input(IMG_SHAPE)
model1=InceptionV3(include_top=False, weights="imagenet", input_tensor=left_input) #Left SubConvNet
model2=InceptionV3(include_top=False, weights="imagenet", input_tensor=right_input) #Right SubConvNet
#Do Something here
distance_layer = #Do Something
prediction = Dense(1,activation='sigmoid')(distance_layer) # Outputs 1 if the images match and 0 if it does not
siamese_net = #Do Something
return siamese_net
model=return_siamese_net()
I get error since the model is pretrained, and I am now stuck at implementing the Distance Layer for the Twin Network.
What should I add in between to make this Siamese Network work?
A very important note, before you use the distance layer, is to take into consideration that you have only one convolutional neural network.
The shared weights actually refer to only one convolutional neural network, and the weights are shared because the same weights are used when passing a pair of images (depending on the loss function used) in order to compute the features and subsequently the embeddings of each input image.
You would have only one neural network, and the block logic will need to look like:
def euclidean_distance(vectors):
(features_A, features_B) = vectors
sum_squared = K.sum(K.square(features_A - features_B), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_squared, K.epsilon()))
image_A = Input(shape=...)
image_B = Input(shape=...)
feature_extractor_model = get_feature_extractor_model(shape=...)
features_A = feature_extractor(image_A)
features_B = feature_extractor(image_B)
distance = Lambda(euclidean_distance)([features_A, features_B])
outputs = Dense(1, activation="sigmoid")(distance)
siamese_model = Model(inputs=[image_A, image_B], outputs=outputs)
Of course, the feature extractor model can be a pretrained network from Keras/TensorFlow, with the output classification layer improved.
The main logic should be like the one above, of course, if you want to use triplet loss, that would require three inputs (Anchor, Positive, Negative), but for the beginning I would recommend to stick to the basics.
Also, it would a good idea to consult this documentation:
https://www.pyimagesearch.com/2020/11/30/siamese-networks-with-keras-tensorflow-and-deep-learning/
https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras-17f34e75bb3d
I've trained the following model for some timeseries in Keras:
input_layer = Input(batch_shape=(56, 3864))
first_layer = Dense(24, input_dim=28, activation='relu',
activity_regularizer=None,
kernel_regularizer=None)(input_layer)
first_layer = Dropout(0.3)(first_layer)
second_layer = Dense(12, activation='relu')(first_layer)
second_layer = Dropout(0.3)(second_layer)
out = Dense(56)(second_layer)
model_1 = Model(input_layer, out)
Then I defined a new model with the trained layers of model_1 and added dropout layers with a different rate, drp, to it:
input_2 = Input(batch_shape=(56, 3864))
first_dense_layer = model_1.layers[1](input_2)
first_dropout_layer = model_1.layers[2](first_dense_layer)
new_dropout = Dropout(drp)(first_dropout_layer)
snd_dense_layer = model_1.layers[3](new_dropout)
snd_dropout_layer = model_1.layers[4](snd_dense_layer)
new_dropout_2 = Dropout(drp)(snd_dropout_layer)
output = model_1.layers[5](new_dropout_2)
model_2 = Model(input_2, output)
Then I'm getting the prediction results of these two models as follow:
result_1 = model_1.predict(test_data, batch_size=56)
result_2 = model_2.predict(test_data, batch_size=56)
I was expecting to get completely different results because the second model has new dropout layers and theses two models are different (IMO), but that's not the case. Both are generating the same result. Why is that happening?
As I mentioned in the comments, the Dropout layer is turned off in inference phase (i.e. test mode), so when you use model.predict() the Dropout layers are not active. However, if you would like to have a model that uses Dropout both in training and inference phase, you can pass training argument when calling it, as suggested by François Chollet:
# ...
new_dropout = Dropout(drp)(first_dropout_layer, training=True)
# ...
Alternatively, If you have already trained your model and now want to use it in inference mode and keep the Dropout layers (and possibly other layers which have different behavior in training/inference phase such as BatchNormalization) active, you can define a backend function that takes the model's inputs as well as Keras learning phase:
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# to use it pass 1 to set the learning phase to training mode
outputs = func([input_arrays] + [1.])
your question has a simple solution in the latest version of Tensorflow. you can set the training argument of the call method to true.
you can run a code like the below code:
model(input,training=True)
by using training=True TensorFlow automatically applies the Dropout layer in inference mode.
As there are already some working code solutions above, I will simply add a few more details regarding dropout during inference to prevent confusion.
Based on the original paper, Dropout layers play the role of turning off (setting gradients to zero) the neuron nodes during training to reduce overfitting. However, once we finish off with training and start testing the model, we do not 'touch' any neurons, thus, all the units are considered to make the decision when inferencing. This causes previously 'dead' neuron weights to be large than expected due to the usage of Dropout. To prevent this, a scaling factor is applied to balance the network node. To be more precise, if a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p during the prediction stage.
I am trying to do transfer learning in Keras + Tensorflow on a selected subset of Places-205 dataset, containing only 27 categories. I am using InceptionV3, DenseNet121 and ResNet50, pre-trained on ImageNet, and add a couple of extra layers to adapt to my classes. If the model is ResNet, I add Flatten + Dense for classfication, and if it is DenseNet or Inceptionv3, I add Global Avg Pool + Dense (relu) + Dense (classification).
This is the code snippet:
x = base_model.output
if FLAGS.model in 'resnet50':
x = Flatten(name="flatten")(x)
else:
x = GlobalAveragePooling2D()(x)
# Let's add a fully-connected layer
x = Dense(1024, activation = 'relu')(x)
# And a logistic layer
predictions = Dense(classes, activation = 'softmax')(x)
For DenseNet and Inceptionv3 the training is ok, and the validation accuracy hits 70%, but for ResNet the validation accuracy stays fixed at 0.0369/0.037 (which is 1/27, my number of classes). It seems like it always predicts one class, but it's weird because its training progresses ok and the unspecific model code is exactly the same as for DenseNet and InceptionV3, which do work as expected.
Do you have any idea why it happens?
Thanks a lot!
I had a similar issue as you #Ciprian Andrei Focsaneanu, and what I have found to have worked was to make the previous layers (before the fully connected layers) trainable, as the filters/features of the ResNet50 were not suitable for my application.
Strangely enough, I also trained the VGG16 models, which was initially on the same images (imagenet) but its filters worked for my application, but I digress.
Here's the link to a page that inspired me to do this: https://datascience.stackexchange.com/questions/16840/multi-class-neural-net-always-predicting-1-class-after-optimization
Hope this helps!
I am trying to estimate the third band(Blue) in an RGB image using convolutional neural networks. my design using Keras is a sequentiol model with a convolution2D layer as input layer two hidden layers and output neuron. if i want loss(rmse) to be zero how should i change my model?
my model in python goes like this
in_image = skimage.io.imread('test.jpg')[0:50,0:50,:].astype(float)
data = in_image[:,:,0:2]
target = in_image[:,:,2:3]
model1 = keras.models.Sequential()
model1.add(keras.layers.Convolution2D(50,(3,3),strides = (1,1),padding = "same",input_shape=(None,None,2))) #Convolution Layer
model1.add(keras.layers.Dense(50,activation = 'relu')) # Hiden Layer1
model1.add(keras.layers.Dense(50,activation = 'sigmoid')) # Hidden Layer 2
model1.add(keras.layers.Dense(1)) # Output Layer
adadelta = keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
model1.compile(loss='mean_squared_error', optimizer=adadelta) # Compile the model
model1.fit(np.array([data]),np.array([target]),epochs = 5000)
estimated_band = model1.predict(np.array([data]))
Given your problem setup, it looks like you're trying to training a neural network on one image such that it is able to predict the blue channel of an image from other 2 images. Putting aside the use of such an experiment, there are a few important things when training neural networks properly, including.
learning rate
weight initialization
optimizer
model complexity.
Yann Lecun's Efficient backprop is a late 90s paper that talks about numbers 1, 2 and 3. Number 4 holds on the assumption that as the number of free parameters increase, at some point you'll be able to match each parameter to each output.
Note that achieving zero-loss provides no guarantees on generalization nor does it mean that your model will not generalize, as brilliantly described in a paper presented at ICLR.