Keras: unsupervised pre-training kills performance - python

I'm trying to train a deep classifier in Keras both with and without pretraining of the hidden layers via stacked autoencoders. My problem is that the pretraining seems to drastically degrade performance (i.e. if pretrain is set to False in the code below the training error of the final classification layer converges much faster). This seems completely outrageous to me given that pretraining should only initialize the weights of the hidden layers and I don't see how that could completely kill the models performance even if that initialization does not work very well. I can not include the specific dataset I used but the effect should occur for any appropriate dataset (e.g. minist). What is going on here and how can I fix it?
EDIT: code is now reproducible with the MNIST data, final line prints change in loss function, which is significantly lower with pre-training.
I have also slightly modified the code and added sample learning curves below:
from functools import partial
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.regularizers import l2
from keras.utils import to_categorical
(inputs_train, targets_train), _ = mnist.load_data()
inputs_train = inputs_train[:1000].reshape(1000, 784)
targets_train = to_categorical(targets_train[:1000])
hidden_nodes = [256] * 4
learning_rate = 0.01
regularization = 1e-6
epochs = 30
def train_model(pretrain):
model = Sequential()
layer = partial(Dense,
activation='sigmoid',
kernel_initializer='random_normal',
kernel_regularizer=l2(regularization))
for i, hn in enumerate(hidden_nodes):
kwargs = dict(units=hn, name='hidden_{}'.format(i + 1))
if i == 0:
kwargs['input_dim'] = inputs_train.shape[1]
model.add(layer(**kwargs))
if pretrain:
# train autoencoders
inputs_train_ = inputs_train.copy()
for i, hn in enumerate(hidden_nodes):
autoencoder = Sequential()
autoencoder.add(layer(units=hn,
input_dim=inputs_train_.shape[1],
name='hidden'))
autoencoder.add(layer(units=inputs_train_.shape[1],
name='decode'))
autoencoder.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='binary_crossentropy')
autoencoder.fit(
inputs_train_,
inputs_train_,
batch_size=32,
epochs=epochs,
verbose=0)
autoencoder.pop()
model.layers[i].set_weights(autoencoder.layers[0].get_weights())
inputs_train_ = autoencoder.predict(inputs_train_)
num_classes = targets_train.shape[1]
model.add(Dense(units=num_classes,
activation='softmax',
name='classify'))
model.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='categorical_crossentropy')
h = model.fit(
inputs_train,
targets_train,
batch_size=32,
epochs=epochs,
verbose=0)
return h.history['loss']
plt.plot(train_model(pretrain=False), label="Without Pre-Training")
plt.plot(train_model(pretrain=True), label="With Pre-Training")
plt.xlabel("Epoch")
plt.ylabel("Cross-Entropy")
plt.legend()
plt.show()

Related

Confusion matrix not making any sense after training a 3 class image classification model

I'm a rookie at machine learning so please bear with me.
I have a model that trains images and classifies them in 3 different classes. I'm trying to get the confusion matrix for the test data, but either I don't understand it or it's not making any sense. When the model is done training after 200 epochs it shows that the accuracy is around 65%:
loss: 1.5386 - accuracy: 0.6583
But then when the confusion matrix is printed like this:
[[23 51 42]
[20 27 25]
[47 69 56]]
Which isn't correct because the "true" results (23, 27 and 56) don't make up for 65% of all the results. If you add up the numbers in the matrix it adds to the amount of test images so I know that part is correct.
I had a tensorflow warning just before printing the confusion matrix that says the following, but I don't really get its meaning:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 13 batches). You may need to use the repeat() function when building your dataset
This is my code:
import sys
from matplotlib import pyplot
import numpy as np
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPooling2D, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import SGD, RMSprop, Adam
from keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix
CLASSES = 3
# define cnn model
def define_model():
# load model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
#add new classifier layers
x = base_model.output
x = Dropout(0.4)(x)
x = GlobalAveragePooling2D(name='avg_pool')(x)
predictions = Dense(CLASSES, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# mark loaded layers as not trainable
for layer in base_model.layers:
layer.trainable = False
# compile model
opt = RMSprop(lr=0.0001)
#momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# plot diagnostic learning curves
def summarize_diagnostics(history):
# plot loss
pyplot.subplot(211)
pyplot.title('Cross Entropy Loss')
pyplot.plot(history.history['loss'], color='blue', label='train')
pyplot.plot(history.history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(212)
pyplot.title('Classification Accuracy')
pyplot.plot(history.history['accuracy'], color='blue', label='train')
pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
# save plot to file
filename = sys.argv[0].split('/')[-1]
pyplot.savefig(filename + '_plot.png')
pyplot.close()
# run the test harness for evaluating a model
def run_test_harness():
# define model
model = define_model()
# create data generator
datagen = ImageDataGenerator(featurewise_center=True)
# specify imagenet mean values for centering
datagen.mean = [123.68, 116.779, 103.939]
# prepare iterators
train_it = datagen.flow_from_directory('../datasetMainPruebas3ClasesQuitandoConfusosCV1/train/',
class_mode='categorical', batch_size=32, target_size=(224, 224))
test_it = datagen.flow_from_directory('../datasetMainPruebas3ClasesQuitandoConfusosCV1/test/',
class_mode='categorical', batch_size=32, target_size=(224, 224))
# fit model
history = model.fit_generator(train_it, steps_per_epoch=len(train_it),
validation_data=test_it, validation_steps=len(test_it), epochs=200, verbose=1)
# evaluate model
_, acc = model.evaluate_generator(test_it, steps=len(test_it), verbose=1)
print('> %.3f' % (acc * 100.0))
#confusion matrix
Y_pred = model.predict_generator(test_it, len(test_it) + 1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
cm = confusion_matrix(test_it.classes, y_pred)
print(cm)
# learning curves
summarize_diagnostics(history)
# entry point, run the test harness
run_test_harness()
Any help or tip is welcome, thank you
Edit: after checking one of the comments I changed my CM code to the following:
#confusion matrix
all_y_pred = []
all_y_true = []
for i in range(len(test_it)):
x, y = test_it[i]
y_pred = model.predict(x)
all_y_pred.append(y_pred)
all_y_true.append(y)
all_y_pred = np.concatenate(all_y_pred, axis=0)
all_y_true = np.concatenate(all_y_true, axis=0)
print('Confusion Matrix')
cm = confusion_matrix(all_y_true, all_y_pred)
print(cm)
And now the error I get says "Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets"
Any idea why?

How to get F-1 Accuracy and Confusion matrix of Keras model?

I have a folder called train and in that I have three separate folders for "Covid", "Pneumonia", "Healthy". I have created VGG19 model using transfer learning. I want help with creating confusion matrix for my test data. I have split my train data as 75%,25%. I have tried everything to create confusion matrix and calculate F-1 score. It would be really great if someone provides me the code, since I am new to this.
#import needed packages
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.layers import Dense,GlobalAveragePooling2D,Dropout,SeparableConv2D,BatchNormalization, Activation, Dense
from keras.applications.vgg19 import VGG19
from keras.applications.resnet import ResNet101
from keras.optimizers import Adam
from keras.layers import Input, Lambda, Dense, Flatten
import matplotlib.pyplot as plt
# dataset has 3 classes
num_class = 3
# Base model without Fully connected Layers
base_model = VGG19(include_top=False, weights='imagenet', input_shape=(224,224,3))
# don't train existing weights
for layer in base_model.layers:
layer.trainable = False
x=base_model.output
x = Flatten()(base_model.output)
##3 classes ,COVID, PNEUNOMIA, NORMAL
preds=Dense(num_class, activation='softmax')(x) #final layer with softmax activation
model=Model(inputs=base_model.input,outputs=preds)
##check the model
model.summary()
# tell the model what cost and optimization method to use
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
##Preparing Data
train_datagen=ImageDataGenerator(preprocessing_function=keras.applications.vgg16.preprocess_input,
validation_split=0.25)
train_generator=train_datagen.flow_from_directory('C:/Users/prani/OneDrive/Desktop/Masters project/chest_xray/Train-250/',
target_size=(224,224),
batch_size=64,
class_mode='categorical',
subset='training')
validation_generator = train_datagen.flow_from_directory(
'C:/Users/prani/OneDrive/Desktop/Masters project/chest_xray/Train-250/', # same directory as training data
target_size=(224,224),
batch_size=64,
class_mode='categorical',
subset='validation') # set as validation data
##Training
##SEtting hyper parameter
epochs = 50
learning_rate = 0.0005
decay_rate = learning_rate / epochs
opt = Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, decay=decay_rate, amsgrad=False)
model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accuracy'])
##Train
step_size_train = train_generator.n/train_generator.batch_size
step_size_val = validation_generator.samples // validation_generator.batch_size
r = model.fit_generator(generator=train_generator,
steps_per_epoch=step_size_train,
validation_data = validation_generator,
validation_steps =step_size_val,
epochs=50)
##Confusion Matrix (Sample code for confusion matrix but this is not working, it doesn't produce error neither results)
from sklearn.metrics import classification_report, confusion_matrix
batch_size=64
Y_pred = model.predict_generator(validation_generator,186)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))
print('Classification Report')
target_names = ['COVID','NORMAL','PNEUMONIA']
print(classification_report(validation_generator.classes, y_pred, target_names=target_names))
You could look at this page for information on how to code a confusion matrix and other metrics:
https://www.tensorflow.org/tutorials/structured_data/imbalanced_data

Bad training results when I am trying to find out defected and non-defected solar cells

I am trying to classify which are defected solar cells. I have a huge dataset of both defected plates and non-defected solar cells. As per a few suggestions from research papers I have been using the VGG 16 model for the training purpose. But even after 3 epochs, it is showing 100 % accuracy. I don't why it is coming like this. Is there any other way to solve this problem, any other Algorithm.
I am uploading some of the defected cells which I have in my dataset
]2]2]3
from keras.layers import Input, Lambda, Dense, Flatten
from keras.models import Model
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
import numpy as np
from glob import glob
import matplotlib.pyplot as plt
# re-size all the images to this
IMAGE_SIZE = [224, 224]
train_path = 'Datasets/Train'
valid_path = 'Datasets/Test'
# add preprocessing layer to the front of VGG
vgg = VGG16(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)
# don't train existing weights
for layer in vgg.layers:
layer.trainable = False
# useful for getting number of classes
folders = glob('Datasets/Train/*')
# our layers - you can add more if you want
x = Flatten()(vgg.output)
# x = Dense(1000, activation='relu')(x)
prediction = Dense(len(folders), activation='softmax')(x)
# create a model object
model = Model(inputs=vgg.input, outputs=prediction)
# view the structure of the model
model.summary()
# tell the model what cost and optimization method to use
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('Datasets/Train',
target_size = (224, 224),
batch_size = 32,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('Datasets/Test',
target_size = (224, 224),
batch_size = 32,
class_mode = 'categorical')
# fit the model
r = model.fit_generator(
training_set,
validation_data=test_set,
epochs=10,
steps_per_epoch=len(training_set),
validation_steps=len(test_set)
)
# loss
plt.plot(r.history['loss'], label='train loss')
plt.plot(r.history['val_loss'], label='val loss')
plt.legend()
plt.show()
plt.savefig('LossVal_loss')
# accuracies
plt.plot(r.history['acc'], label='train acc')
plt.plot(r.history['val_acc'], label='val acc')
plt.legend()
plt.show()
plt.savefig('AccVal_acc')
import tensorflow as tf
from keras.models import load_model
model.save('defect_features_new_model.h5')
It feels like the images are too simple and do not contain any complicated data. Hence they tend to overfit the VGG16 model. VGG16 models are generally trained on much more complicated data. You can try defining your own convolutional neural network. Since you are already using keras you can create sequential model to define your own model with minimum as 2 or 3 convolutional layersPlease refer the link for keras sequential model
Your data for the network looks very simple so it is not unexpected that you achieve high accuracy. Remember the network trains in batches so for every 32 images it goes through a back propagation cycle and updates the weights accordingly. So if you have a lot of images which you say you do then you are executing a lot of iterations on the weights.
I do not see a problem here. You are not over training in the sense that your validation accuracy is 100%. Certainly you could get good results with a simpler model but why bother your results are what you would want.

Tuning number of hidden layers in Keras

I'm just trying to explore keras and tensorflow with the famous MNIST dataset.
I already applied some basic neural networks, but when it comes to tuning some hyperparameters, especially the number of layers, thanks to the sklearn wrapper GridSearchCV, I get the error below:
Parameter values for parameter (hidden_layers) need to be a sequence(but not a string) or np.ndarray.
So you can have a better view I post the main parts of my code.
Data preparation
# Extract label
X_train=train.drop(labels = ["label"],axis = 1,inplace=False)
Y_train=train['label']
del train
# Reshape to fit MLP
X_train = X_train.values.reshape(X_train.shape[0],784).astype('float32')
X_train = X_train / 255
# Label format
from keras.utils import np_utils
Y_train = keras.utils.to_categorical(Y_train, num_classes = 10)
num_classes = Y_train.shape[1]
Keras part
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
# Function with hyperparameters to optimize
def create_model(optimizer='adam', activation = 'sigmoid', hidden_layers=2):
# Initialize the constructor
model = Sequential()
# Add an input layer
model.add(Dense(32, activation=activation, input_shape=784))
for i in range(hidden_layers):
# Add one hidden layer
model.add(Dense(16, activation=activation))
# Add an output layer
model.add(Dense(num_classes, activation='softmax'))
#compile model
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=
['accuracy'])
return model
# Model which will be the input for the GridSearchCV function
modelCV = KerasClassifier(build_fn=create_model, verbose=0)
GridSearchCV
from keras.activations import relu, sigmoid
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import Dropout
from keras.utils import np_utils
activations = [sigmoid, relu]
param_grid = dict(hidden_layers=3,activation=activations, batch_size = [256], epochs=[30])
grid = GridSearchCV(estimator=modelCV, param_grid=param_grid, scoring='accuracy')
grid_result = grid.fit(X_train, Y_train)
I just want to let you know that the same kind of question has already been asked here Grid Search the number of hidden layers with keras but the answer is not complete at all and I can't add a comment to reply to the answerer.
Thank you!
You should add:
for i in range(int(hidden_layers)):
# Add one hidden layer
model.add(Dense(16, activation=activation))
Try to add the values of param_grid as lists :
params_grid={"hidden_layers": [3]}
When you are setting your parameter hidden layer =2 it goes as a string thus an error it throw.
Ideally it should a sequence to run the code that's what you error says

Convolutional Neural Net-Keras-val_acc Keyerror 'acc'

I am trying to implement CNN by Theano. I used Keras library. My data set is 55 alphabet images, 28x28.
In the last part I get this error:
train_acc=hist.history['acc']
KeyError: 'acc'
Any help would be much appreciated. Thanks.
This is part of my code:
from keras.models import Sequential
from keras.models import Model
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, RMSprop, adam
from keras.utils import np_utils
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from urllib.request import urlretrieve
import pickle
import os
import gzip
import numpy as np
import theano
import lasagne
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from nolearn.lasagne import visualize
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from PIL import Image
import PIL.Image
#from Image import *
import webbrowser
from numpy import *
from sklearn.utils import shuffle
from sklearn.cross_validation import train_test_split
from tkinter import *
from tkinter.ttk import *
import tkinter
from keras import backend as K
K.set_image_dim_ordering('th')
%%%%%%%%%%
batch_size = 10
# number of output classes
nb_classes = 6
# number of epochs to train
nb_epoch = 5
# input iag dimensions
img_rows, img_clos = 28,28
# number of channels
img_channels = 3
# number of convolutional filters to use
nb_filters = 32
# number of convolutional filters to use
nb_pool = 2
# convolution kernel size
nb_conv = 3
%%%%%%%%
model = Sequential()
model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
border_mode='valid',
input_shape=(1, img_rows, img_clos)))
convout1 = Activation('relu')
model.add(convout1)
model.add(Convolution2D(nb_filters, nb_conv, nb_conv))
convout2 = Activation('relu')
model.add(convout2)
model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta')
%%%%%%%%%%%%
hist = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
show_accuracy=True, verbose=1, validation_data=(X_test, Y_test))
hist = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
show_accuracy=True, verbose=1, validation_split=0.2)
%%%%%%%%%%%%%%
train_loss=hist.history['loss']
val_loss=hist.history['val_loss']
train_acc=hist.history['acc']
val_acc=hist.history['val_acc']
xc=range(nb_epoch)
#xc=range(on_epoch_end)
plt.figure(1,figsize=(7,5))
plt.plot(xc,train_loss)
plt.plot(xc,val_loss)
plt.xlabel('num of Epochs')
plt.ylabel('loss')
plt.title('train_loss vs val_loss')
plt.grid(True)
plt.legend(['train','val'])
print (plt.style.available) # use bmh, classic,ggplot for big pictures
plt.style.use(['classic'])
plt.figure(2,figsize=(7,5))
plt.plot(xc,train_acc)
plt.plot(xc,val_acc)
plt.xlabel('num of Epochs')
plt.ylabel('accuracy')
plt.title('train_acc vs val_acc')
plt.grid(True)
plt.legend(['train','val'],loc=4)
#print plt.style.available # use bmh, classic,ggplot for big pictures
plt.style.use(['classic'])
In a not-so-common case (as I expected after some tensorflow updates), despite choosing metrics=["accuracy"] in the model definitions, I still got the same error.
The solution was: replacing metrics=["acc"] with metrics=["accuracy"] everywhere. In my case, I was unable to plot the parameters of the history of my training. I had to replace
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
to
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
Your log variable will be consistent with the metrics when you compile your model.
For example, the following code
model.compile(loss="mean_squared_error", optimizer=optimizer)
model.fit_generator(gen,epochs=50,callbacks=ModelCheckpoint("model_{acc}.hdf5")])
will gives a KeyError: 'acc' because you didn't set metrics=["accuracy"] in model.compile.
This error also happens when metrics are not matched. For example
model.compile(loss="mean_squared_error",optimizer=optimizer, metrics="binary_accuracy"])
model.fit_generator(gen,epochs=50,callbacks=ModelCheckpoint("model_{acc}.hdf5")])
still gives a KeyError: 'acc' because you set a binary_accuracy metric but asking for accuracy later.
If you change the above code to
model.compile(loss="mean_squared_error",optimizer=optimizer, metrics="binary_accuracy"])
model.fit_generator(gen,epochs=50,callbacks=ModelCheckpoint("model_{binary_accuracy}.hdf5")])
it will work.
You can use print(history.history.keys()) to find out what metrics you have and what they are called. In my case also, it was called "accuracy", not "acc"
In my case switching from
metrics=["accuracy"]
to
metrics=["acc"]
was the solution.
from keras source :
warnings.warn('The "show_accuracy" argument is deprecated, '
'instead you should pass the "accuracy" metric to '
'the model at compile time:\n'
'`model.compile(optimizer, loss, '
'metrics=["accuracy"])`')
The right way to get the accuracy is indeed to compile your model like this:
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=["accuracy"])
does it work?
Make sure to check this "breaking change":
Metrics and losses are now reported under the exact name specified by the user (e.g. if you pass metrics=['acc'], your metric will be reported under the string "acc", not "accuracy", and inversely metrics=['accuracy'] will be reported under the string "accuracy".
If you are using Tensorflow 2.3 then you can specify like this
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.CategoricalCrossentropy(), metrics=[tf.keras.metrics.CategoricalAccuracy(name="acc")])
In the New version of TensorFlow, some things have changed so we have to replace it with :
acc = history.history['accuracy']
print(history.history.keys())
output--
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
so you need to change "acc" to "accuracy" and "val_acc" to "val_accuracy"
For Practice
3.5-classifying-movie-reviews.ipynb
Change
acc = history.history['acc']
val_acc = history.history['val_acc']
To
acc = history.history['binary_accuracy']
val_acc = history.history['val_binary_accuracy']
&
Change
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
To
acc_values = history_dict['binary_accuracy']
val_acc_values = history_dict['val_binary_accuracy']
================
Practice
3.6-classifying-newswires.ipynb
Change
acc = history.history['acc']
val_acc = history.history['val_acc']
To
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

Categories

Resources