confusion_matrix() library is giving ValueError

confusion_matrix() library is giving ValueError - python

When trying to get confusion matrix for a ConvNet constantly getting the same error.
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
from sklearn.metrics import classification_report, confusion_matrix
img_width, img_height = 150, 150
train_data_dir = "train"
validation_data_dir = "test"
nb_train_samples = 2000
nb_validation_samples = 400
epochs = 50
batch_size = 40 #16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
train_datagen = ImageDataGenerator(
rescale = 1. / 255,
shear_range = 0.2,
zoom_range= 0.2,
horizontal_flip= True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = test_datagen.flow_from_directory(
train_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')`
Applying CNN Layers ...
model.compile(loss= 'binary_crossentropy',
optimizer= 'rmsprop',
metrics= ['accuracy'] )
`model.fit_generator(
train_generator,
steps_per_epoch= nb_train_samples // batch_size,
epochs= epochs,
validation_data= validation_generator,
validation_steps= nb_validation_samples // batch_size)
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size+1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))`
Getting error mentioned below but don't know how to resolve it
ValueError: Found input variables with inconsistent numbers of samples: [400, 440]

I am able to recreate your error using Dogs_Vs_Cats dataset. Where i have 2000 samples in train directory and 400 samples in validation directory.
Please change model.predict_generator from
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size+1)
to
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)
will resolve this ValueError: Found input variables with inconsistent numbers of samples: [400, 440]
Please refer complete code as below
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
from sklearn.metrics import classification_report, confusion_matrix
from google.colab import drive
drive.mount('/content/drive')
train_data_dir = '/content/drive/My Drive/Dogs_Vs_Cats/train'
validation_data_dir = '/content/drive/My Drive/Dogs_Vs_Cats/validation'
img_width, img_height = 150, 150
nb_train_samples = 2000
nb_validation_samples = 400
epochs = 10
batch_size = 40 #16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
train_datagen = ImageDataGenerator(
rescale = 1. / 255,
shear_range = 0.2,
zoom_range= 0.2,
horizontal_flip= True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = test_datagen.flow_from_directory(
train_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')
model = Sequential()
model.add(Conv2D(32, (3, 3), strides = (1, 1), input_shape = input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), strides = (1, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics = ['accuracy'])
model.fit_generator(
train_generator,
steps_per_epoch= nb_train_samples // batch_size,
epochs= epochs,
validation_data= validation_generator,
validation_steps= nb_validation_samples // batch_size)
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))
Output:
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Found 2000 images belonging to 2 classes.
Found 400 images belonging to 2 classes.
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 148, 148, 32) 896
_________________________________________________________________
activation_9 (Activation) (None, 148, 148, 32) 0
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 74, 74, 32) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 72, 72, 64) 18496
_________________________________________________________________
activation_10 (Activation) (None, 72, 72, 64) 0
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 36, 36, 64) 0
_________________________________________________________________
flatten_5 (Flatten) (None, 82944) 0
_________________________________________________________________
dense_9 (Dense) (None, 64) 5308480
_________________________________________________________________
dropout_5 (Dropout) (None, 64) 0
_________________________________________________________________
dense_10 (Dense) (None, 1) 65
=================================================================
Total params: 5,327,937
Trainable params: 5,327,937
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
50/50 [==============================] - 12s 233ms/step - loss: 0.9345 - accuracy: 0.5375 - val_loss: 0.6303 - val_accuracy: 0.5225
Epoch 2/10
50/50 [==============================] - 11s 226ms/step - loss: 0.6745 - accuracy: 0.5965 - val_loss: 0.6094 - val_accuracy: 0.6725
Epoch 3/10
50/50 [==============================] - 11s 223ms/step - loss: 0.6196 - accuracy: 0.6605 - val_loss: 0.5694 - val_accuracy: 0.7150
Epoch 4/10
50/50 [==============================] - 11s 223ms/step - loss: 0.5501 - accuracy: 0.7285 - val_loss: 0.6216 - val_accuracy: 0.7225
Epoch 5/10
50/50 [==============================] - 11s 221ms/step - loss: 0.4794 - accuracy: 0.7790 - val_loss: 0.6268 - val_accuracy: 0.6025
Epoch 6/10
50/50 [==============================] - 11s 226ms/step - loss: 0.4038 - accuracy: 0.8195 - val_loss: 0.4842 - val_accuracy: 0.6975
Epoch 7/10
50/50 [==============================] - 11s 222ms/step - loss: 0.3207 - accuracy: 0.8595 - val_loss: 0.5600 - val_accuracy: 0.7325
Epoch 8/10
50/50 [==============================] - 13s 257ms/step - loss: 0.2574 - accuracy: 0.8920 - val_loss: 0.9705 - val_accuracy: 0.7525
Epoch 9/10
50/50 [==============================] - 13s 252ms/step - loss: 0.2049 - accuracy: 0.9235 - val_loss: 0.7311 - val_accuracy: 0.7475
Epoch 10/10
50/50 [==============================] - 13s 251ms/step - loss: 0.1448 - accuracy: 0.9515 - val_loss: 1.0541 - val_accuracy: 0.7150
Confusion Matrix
[[200 0]
[200 0]]
Hope this answers your question. If not please share complete traceback and code for debug, i am happy to help you.

Related

Convert Tensoflow model to PyTorch model - model isn't learning

I'm trying to port a tensorflow neural network to pytorch, as an exercise to familiarize myself with both / their nuances. This is the tensorflow network I'm porting to pytorch:
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import Conv1D, GlobalMaxPooling1D
from tensorflow.keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=5000)
x_train = sequence.pad_sequences(x_train, maxlen=400, padding="post")
x_test = sequence.pad_sequences(x_test, maxlen=400, padding="post")
model = Sequential()
model.add(Embedding(5000, 50, input_length=400))
model.add(Dropout(0.2))
model.add(Conv1D(250, 3, padding='valid',activation='relu',strides=1))
model.add(GlobalMaxPooling1D())
model.add(Dense(250))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
h2 = model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))
The shapes of each layer is shown below:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 400, 50) 250000
dropout (Dropout) (None, 400, 50) 0
conv1d (Conv1D) (None, 398, 250) 37750
global_max_pooling1d (Globa (None, 250) 0
lMaxPooling1D)
dense (Dense) (None, 250) 62750
dropout_1 (Dropout) (None, 250) 0
activation (Activation) (None, 250) 0
dense_1 (Dense) (None, 1) 251
activation_1 (Activation) (None, 1) 0
=================================================================
Total params: 350,751
Trainable params: 350,751
Non-trainable params: 0
And the output of the tensorflow model is:
Epoch 1/10
loss: 0.4043 - accuracy: 0.8021 - val_loss: 0.2764 - val_accuracy: 0.8854
Epoch 2/10
loss: 0.2332 - accuracy: 0.9052 - val_loss: 0.2690 - val_accuracy: 0.8888
Epoch 3/10
loss: 0.1598 - accuracy: 0.9389 - val_loss: 0.2948 - val_accuracy: 0.8832
Epoch 4/10
loss: 0.1112 - accuracy: 0.9600 - val_loss: 0.3015 - val_accuracy: 0.8906
Epoch 5/10
loss: 0.0810 - accuracy: 0.9700 - val_loss: 0.3057 - val_accuracy: 0.8868
Epoch 6/10
loss: 0.0537 - accuracy: 0.9811 - val_loss: 0.4055 - val_accuracy: 0.8868
Epoch 7/10
loss: 0.0408 - accuracy: 0.9860 - val_loss: 0.4083 - val_accuracy: 0.8852
Epoch 8/10
loss: 0.0411 - accuracy: 0.9845 - val_loss: 0.4789 - val_accuracy: 0.8789
Epoch 9/10
loss: 0.0380 - accuracy: 0.9862 - val_loss: 0.4828 - val_accuracy: 0.8827
Epoch 10/10
loss: 0.0329 - accuracy: 0.9879 - val_loss: 0.4999 - val_accuracy: 0.8825
Here's what I have in my PyTorch port over:
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
import torch
from tqdm import tqdm
import torch.nn.functional as F
from sklearn.metrics import accuracy_score
class CustomDataset(Dataset):
def __init__(self, x, y):
self.x = x
self.y = y
def __len__(self):
return len(self.y)
def __getitem__(self, idx):
return self.x[idx], self.y[idx]
train_dataloader = DataLoader(CustomDataset(torch.Tensor(x_train), torch.Tensor(y_train)), batch_size=32, shuffle=True)
test_dataloader = DataLoader(CustomDataset(torch.Tensor(x_test), torch.Tensor(y_test)), batch_size=32, shuffle=True)
class MyModel(torch.nn.Module):
def __init__(self, vocab_size=5000, input_len=400, embedding_dims=50, kernel_size=3, filters=250, hidden_dims=250):
super(MyModel, self).__init__()
self.embedding_dims = embedding_dims
self.input_len = input_len
self.embedding = torch.nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dims)
self.dropout1 = torch.nn.Dropout(p=0.2)
self.conv1d = torch.nn.Conv1d(in_channels=embedding_dims, out_channels=filters, kernel_size=kernel_size, padding=(0,), stride=1)
self.pool = torch.nn.AdaptiveMaxPool1d(1)
self.linear1 = torch.nn.Linear(in_features=hidden_dims, out_features=hidden_dims)
self.dropout2 = torch.nn.Dropout(p=0.2)
self.activation = torch.nn.ReLU()
self.output = torch.nn.Linear(in_features=hidden_dims, out_features=1)
self.activation2 = torch.nn.Sigmoid()
def forward(self, x):
x = self.dropout1(self.embedding(x.type(torch.LongTensor)))
x = self.conv1d(x.view(-1, self.embedding_dims, self.input_len))
x = self.pool(x)
x = self.activation(self.dropout2(self.linear1(x.view(-1,x.size()[1]))))
x = self.activation2(self.output(x))
return x
class FitTorchModel():
def __init__(self, model, num_epochs=10, steps_per_epoch=782):
self.model = model
self.epochs = num_epochs
self.steps_per_epoch = steps_per_epoch
def fit(self, train_dataloader, test_dataloader):
opt = torch.optim.Adam(self.model.parameters(), lr=0.001)
crit = torch.nn.BCELoss(reduction = "mean")
history_df = pd.DataFrame(columns = ["Loss", "Accuracy", "Val_Loss", "Val_Acc"])
for epoch in range(self.epochs):
self.model.train()
print(f"Epoch {epoch}")
epoch_loss = 0
epoch_acc = 0
it = iter(train_dataloader)
for step in tqdm(range(self.steps_per_epoch)):
opt.zero_grad()
x, y = next(it)
y_pred = self.model(x).view(-1)
loss = crit(y_pred, y)
epoch_loss += loss.item()
epoch_acc += accuracy_score(y==1, y_pred > 0.5)
loss.backward()
opt.step()
val_loss, val_acc = self.predict_proba(test_dataloader, crit)
df = pd.DataFrame({"Loss": epoch_loss/(step+1),
"Accuracy": epoch_acc/(step+1),
"Val_Loss": val_loss, "Val_Acc": val_acc}, index=[0])
history_df = pd.concat((history_df, df), ignore_index=True)
return history_df
def predict_proba(self, test_dataloader, crit):
self.model.eval()
val_loss = 0
val_acc = 0
it = iter(test_dataloader)
with torch.no_grad():
for step in tqdm(range(self.steps_per_epoch)):
x,y = next(it)
y_pred = self.model(x).view(-1)
batch_loss = crit(y_pred, y)
val_loss += batch_loss.item()
val_acc += accuracy_score(y==1, y_pred > 0.5)
return val_loss/(step+1), val_acc/(step+1)
ftm = FitTorchModel(model=MyModel(), num_epochs=10, steps_per_epoch=782)
history_df = ftm.fit(train_dataloader, test_dataloader)
The shape of each layer is:
After embedding layer: torch.Size([32, 400, 50])
After dropout1 layer: torch.Size([32, 400, 50])
After convolution1d layer: torch.Size([32, 250, 398])
After maxpooling layer: torch.Size([32, 250, 1])
After linear1 layer: torch.Size([32, 250])
After dropout2 layer: torch.Size([32, 250])
After activation layer: torch.Size([32, 250])
After output layer: torch.Size([32, 1])
After activation2 layer: torch.Size([32, 1])
The output of the pytorch model training is:
Loss Accuracy Val_Loss Val_Acc
0 0.697899 0.505874 0.692495 0.511629
1 0.693063 0.503477 0.693186 0.503637
2 0.693190 0.496044 0.693149 0.499201
3 0.693181 0.501359 0.693082 0.502038
4 0.693169 0.503237 0.693234 0.495964
5 0.693177 0.500240 0.693154 0.500679
6 0.693069 0.507473 0.693258 0.498881
7 0.693948 0.500320 0.693145 0.501598
8 0.693196 0.499640 0.693164 0.496324
9 0.693170 0.500759 0.693140 0.501918
Couple things: the accuracy hovers around guessing (this is a binary classification task), no matter how many epochs have passed. Secondly, the training loss barely improves. I set the learning rate to the default learning rate described by tensorflow's Adam Optimizer docs. What else am I missing here? I had some trouble with the input / output dimensions for the various layers - did I mess those up at all?

Some observations:
Use BCEWithLogitsLoss as loss on the output of the last linear layer, before the sigmoid. This includes the sigmoid activation in a more numerically stable fashion.
The TensorFlow model has a ReLU after the Convolution, the pytorch implementations does not.
In general, for debugging, one might want to look at weight.grad of some of your weights after the loss.backward() and see if gradients calculated. Also printing out the value of one of the weights in each iteration to see if your optimizer actually changes the weights can help...
Also, it can depend on the input data:
(Are you sure that x_test is scaled correctly?)
If you are transforming your inputs to Long before embedding them and all x_test, for example, are floats between 0 and 1, they will all be converted to 0! And the network will have a hard time predicting the labels from all zeros as constant input!
But now to the actual issue in this particular case:
Be careful with .view! It might not do what you expect. It just reshapes the tensor but does not move the data around.
What you really want is .moveaxes(-1,2) instead!!
Loss Accuracy Val_Loss Val_Acc
0 0.573489 0.671715 0.402601 0.819413
1 0.376908 0.830163 0.33786 0.850783
2 0.308343 0.868646 0.296171 0.872323
3 0.258806 0.893342 0.319121 0.865849
4 0.227044 0.907649 0.3172 0.868326
5 0.202789 0.918478 0.281184 0.886549
6 0.179744 0.928549 0.291027 0.886589
7 0.161205 0.93702 0.329196 0.879156
8 0.145447 0.944094 0.294914 0.889746
9 0.133034 0.949568 0.291476 0.889826
After adding the relu after the convolution and, more importantly, fixing the view!
class MyModel(torch.nn.Module):
def __init__(self, vocab_size=5000, input_len=400, embedding_dims=50, kernel_size=3, filters=250, hidden_dims=250):
super(MyModel, self).__init__()
self.embedding_dims = embedding_dims
self.input_len = input_len
self.embedding = torch.nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dims)
self.dropout1 = torch.nn.Dropout(p=0.2)
self.conv1d = torch.nn.Conv1d(in_channels=embedding_dims, out_channels=filters, kernel_size=kernel_size, padding=(0,), stride=1)
self.pool = torch.nn.AdaptiveMaxPool1d(1)
self.linear1 = torch.nn.Linear(in_features=hidden_dims, out_features=hidden_dims)
self.dropout2 = torch.nn.Dropout(p=0.2)
self.activation = torch.nn.ReLU()
self.output = torch.nn.Linear(in_features=hidden_dims, out_features=1)
self.activation2 = torch.nn.Sigmoid()
def forward(self, x):
x = self.dropout1(self.embedding(x.type(torch.LongTensor)))
x = self.activation(self.conv1d(x.moveaxis(-1,-2)))
x = self.pool(x).squeeze(-1)
x = self.activation(self.dropout2(self.linear1(x)))
x = self.activation2(self.output(x))
return x

What is tinymodel you init opt with in fit function:
opt = torch.optim.Adam(tinymodel.parameters(), lr=0.001)
It seems like your optimizer is not working on the right model (see this answer on the relation between the optimizer and the parameters of the model).
You need to replace this line in fit function:
def fit(self, train_dataloader, test_dataloader):
opt = torch.optim.Adam(self.model.parameters(), lr=0.001)
# ...
Additionally, you are using Dropout layer that has different behavior in train and test.
You should add self.model.train() and self.model.eval() at the beginning of your fit and predict_proba functions respectively.

how can i continue training from last epoch?

I saved the history of training by
history = model.fit(train_generator, epochs=epochs, steps_per_epoch=train_steps,
verbose=1, callbacks=callbacks, validation_data=val_generator,
validation_steps=val_steps,batch_size=16)
with open('history_epochs.pkl', 'wb') as f:
dump(history.history, f)
Can I use the file of history to continue from the last epoch? and how please

Below applies to any deep learning library …
Build model
Train model.
Save model (should be saving parameters/weights as well).
Load model from the saved file (any time any where).
Continue with more training.

You can use the pickle file to save and load your model and continue training:
Create your model
Train your model
Save your model as a pickle file
Code for the above steps:
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import joblib
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
fig, axes = plt.subplots(2,5,figsize=(15,6))
for idx, axe in enumerate(axes.flatten()):
axe.axis('off')
idx_img = np.argwhere(y_train==idx)[0][0]
axe.imshow(X_train[idx_img], cmap=plt.cm.binary)
axe.set_title(class_names[y_train[idx_img]])
X_train = X_train.astype('float32') / 255.0
X_train = tf.expand_dims(X_train, axis=-1)
X_test = X_test.astype('float32') / 255.0
X_test = tf.expand_dims(X_test, axis=-1)
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(X_train.shape[1], X_train.shape[1], 1)))
model.add(tf.keras.layers.Conv2D(128, (3,3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Conv2D(64, (3,3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Conv2D(128, (3,3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, batch_size=256, epochs=3, verbose=1, validation_split=.2)
model.evaluate(X_test, y_test, verbose=1)
joblib.dump(model, 'model.pkl')
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 128) 1280
batch_normalization (BatchN (None, 26, 26, 128) 512
ormalization)
dropout (Dropout) (None, 26, 26, 128) 0
conv2d_1 (Conv2D) (None, 24, 24, 64) 73792
batch_normalization_1 (Batc (None, 24, 24, 64) 256
hNormalization)
dropout_1 (Dropout) (None, 24, 24, 64) 0
conv2d_2 (Conv2D) (None, 22, 22, 128) 73856
batch_normalization_2 (Batc (None, 22, 22, 128) 512
hNormalization)
dropout_2 (Dropout) (None, 22, 22, 128) 0
flatten (Flatten) (None, 61952) 0
dense (Dense) (None, 512) 31719936
dropout_3 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 128) 65664
dropout_4 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 31,937,098
Trainable params: 31,936,458
Non-trainable params: 640
_________________________________________________________________
Epoch 1/3
188/188 [==============================] - 19s 81ms/step - loss: 0.8264 - accuracy: 0.7398 - val_loss: 3.4644 - val_accuracy: 0.1245
Epoch 2/3
188/188 [==============================] - 14s 75ms/step - loss: 0.4896 - accuracy: 0.8283 - val_loss: 1.2240 - val_accuracy: 0.5802
Epoch 3/3
188/188 [==============================] - 14s 77ms/step - loss: 0.4055 - accuracy: 0.8544 - val_loss: 0.3711 - val_accuracy: 0.8675
313/313 [==============================] - 2s 5ms/step - loss: 0.3850 - accuracy: 0.8591
[0.3849639296531677, 0.8590999841690063]
INFO:tensorflow:Assets written to: ram://****/assets
['model.pkl']
Load your model
Continue Training
Code for the above steps:
model = joblib.load("/content/model.pkl")
model.fit(X_train, y_train, batch_size=256, epochs=2, verbose=1, validation_split=.2)
model.evaluate(X_test, y_test, verbose=1)
Output:
Epoch 1/2
188/188 [==============================] - 17s 84ms/step - loss: 0.4414 - accuracy: 0.8496 - val_loss: 0.3449 - val_accuracy: 0.8697
Epoch 2/2
188/188 [==============================] - 15s 82ms/step - loss: 0.3704 - accuracy: 0.8708 - val_loss: 0.2884 - val_accuracy: 0.8965
313/313 [==============================] - 1s 5ms/step - loss: 0.3114 - accuracy: 0.8938
[0.31136029958724976, 0.8938000202178955]

Loss is NaN using activation softmax and loss function categorical_crossentropy

I'm trying to make this model work. Initially x.shape is (6703, 56) and y.shape is a binary column having shape (6703, ). Then I run
y = y.to_numpy()
y = y.astype("float32")
y = tf.keras.utils.to_categorical(y, 2)
and now y.shape is (6703, 2). I run
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2, random_state=42)
and now
X_train shape is (5362, 56)
Y_train shape is (5362, 2)
X_test shape is (1341, 56)
Y_test shape is (1341, 2)
Then I build the model:
model = tf.keras.models.Sequential(name="3layers")
model.add(keras.layers.Dense(N_HIDDEN,
input_shape=(len(X_train[0]),),
name="dense1",
activation="relu"))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(N_HIDDEN,
name="dense2",
activation="relu"))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(NB_CLASSES,
name="dense3",
activation="softmax"))
model.summary()
model.compile(optimizer="SGD", #SGD adam
loss="categorical_crossentropy",
metrics=["accuracy"])
model.fit(X_train, Y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=VERBOSE,
validation_split=VALIDATION_SPLIT)
test_loss, test_acc = model.evaluate(X_test, Y_test)
The summary is what I expect:
dense1 (Dense) (None, 64) 3648
dropout_18 (Dropout) (None, 64) 0
dense2 (Dense) (None, 64) 4160
dropout_19 (Dropout) (None, 64) 0
dense3 (Dense) (None, 2) 130
but the output is
Epoch 1/5
> 429/429 [==============================] - 1s 1ms/step - loss: nan - accuracy: 0.5141 - val_loss: nan - val_accuracy: 0.4884
Epoch 2/5
> 429/429 [==============================] - 0s 1ms/step - loss: nan - accuracy: 0.5143 - val_loss: nan - val_accuracy: 0.4884
Epoch 3/5
> 429/429 [==============================] - 0s 987us/step - loss: nan - accuracy: 0.5143 - val_loss: nan - val_accuracy: 0.4884
I've tried changing many parameters, I'm stuck.

I found what it was. There were some "None" values in the x matrix that caused the problem. Removing them it started evaluating a numeric loss. Very poor accuracy, but this will be another problem to solve.

How to get weight on each layers

I'm trying to get the input weight on each layer, including the lstm 1, lstm 2, and weight after the attention layer, and want to display them using a heatmap. But when I run the code, the following error appears. What happened? Because the layer exists.
Here is the code:
model.add(LSTM(32, input_shape=(n_timesteps,n_features), return_sequences=True))
#print weights
print(model.get_layer(LSTM).get_weights()[0])
model.add(LSTM(32, input_shape=(n_timesteps,n_features), return_sequences=True))
model.add(Dropout(0.1))
model.add(attention(return_sequences=False)) # receive 3D and output 2D
model.add(Dense(n_outputs, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
# evaluate model
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
Attention layer:
class attention(Layer):
def __init__(self, return_sequences=True):
self.return_sequences = return_sequences
super(attention,self).__init__()
def build(self, input_shape):
self.W=self.add_weight(name="att_weight", shape=(input_shape[-1],1),
initializer="normal")
self.b=self.add_weight(name="att_bias", shape=(input_shape[1],1),
initializer="zeros")
super(attention,self).build(input_shape)
def call(self, x):
e = K.tanh(K.dot(x,self.W)+self.b)
a = K.softmax(e, axis=1)
output = x*a
if self.return_sequences:
return output
return K.sum(output, axis=1)
And this is the error that appears:
ValueError: No such layer: <class 'keras.layers.recurrent_v2.LSTM'>. Existing layers are [<keras.layers.recurrent_v2.LSTM object at 0x7f7b5c215910>].

You can get certain layer weights using model.layers after defining your whole model:
import tensorflow as tf
import seaborn as sb
import matplotlib.pyplot as plt
class attention(tf.keras.layers.Layer):
def __init__(self, return_sequences=True):
self.return_sequences = return_sequences
super(attention,self).__init__()
def build(self, input_shape):
self.W=self.add_weight(name="att_weight", shape=(input_shape[-1],1),
initializer="normal")
self.b=self.add_weight(name="att_bias", shape=(input_shape[1],1),
initializer="zeros")
super(attention,self).build(input_shape)
def call(self, x):
e = tf.keras.backend.tanh(tf.keras.backend.dot(x,self.W)+self.b)
a = tf.keras.backend.softmax(e, axis=1)
output = x*a
if self.return_sequences:
return output
return tf.keras.backend.sum(output, axis=1)
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(5,10), return_sequences=True))
model.add(tf.keras.layers.LSTM(32, return_sequences=True))
model.add(tf.keras.layers.Dropout(0.1))
model.add(attention(return_sequences=False)) # receive 3D and output 2D
model.add(tf.keras.layers.Dense(3, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
trainx = tf.random.normal((25, 5, 10))
trainy = tf.random.uniform((25, 3), maxval=3)
model.fit(trainx, trainy, epochs=5, batch_size=4)
lstm1_weights = model.layers[0].get_weights()[0]
lstm2_weights = model.layers[1].get_weights()[0]
attention_weights = model.layers[3].get_weights()[0]
heat_map = sb.heatmap(lstm1_weights)
plt.show()
Model: "sequential_16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_24 (LSTM) (None, 5, 32) 5504
lstm_25 (LSTM) (None, 5, 32) 8320
dropout_12 (Dropout) (None, 5, 32) 0
attention_12 (attention) (None, 32) 37
dense_12 (Dense) (None, 3) 99
=================================================================
Total params: 13,960
Trainable params: 13,960
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
7/7 [==============================] - 4s 10ms/step - loss: 5.5033 - accuracy: 0.4400
Epoch 2/5
7/7 [==============================] - 0s 8ms/step - loss: 5.4899 - accuracy: 0.5200
Epoch 3/5
7/7 [==============================] - 0s 9ms/step - loss: 5.4771 - accuracy: 0.4800
Epoch 4/5
7/7 [==============================] - 0s 9ms/step - loss: 5.4701 - accuracy: 0.5200
Epoch 5/5
7/7 [==============================] - 0s 8ms/step - loss: 5.4569 - accuracy: 0.5200
                              
If you want to see how the weights of your layers change during training, you should define a callback as shown in this post.

Keras Sequential to Functional API

I am new to deep learning and have been trying to convert the Keras sequential API to the functional API running on the CIFAR10 image dataset but have been having some difficulty. I've converted the model which looks the same except for the input layer yet the sequential has an average accuracy of around ~70% and my functional has an average accuracy of around ~10%. I would really appreciate some help with regards to figuring out what is going wrong. Here is my functional code:
import tensorflow as tf
from tensorflow import keras
from keras import datasets, layers, models
from keras.models import Model, Input, Sequential
import matplotlib.pyplot as plt
Download and prepare:
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
input_shape = train_images[0,:,:,:].shape
Create model:
input = layers.Input(shape=input_shape)
x = layers.Conv2D(32, (3, 3), activation='relu',padding='valid')(input)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(10)(x)
model = Model(input, x, name='Functional')
Compile and train:
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
Here is a link to the original sequential CNN which is a google collaboratory notebook. I would really appreciate any help in trying to understand and fix what is going wrong. Thank you in advance.

There seems to be some issues with SparseCategoricalCrossentropy loss.
Check this: https://github.com/tensorflow/tensorflow/issues/38632
The following model gives good accuracy:
import tensorflow as tf
from tensorflow import keras
from keras import datasets, layers, models
from keras.models import Model, Input, Sequential
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
train_labels, test_labels = tf.keras.utils.to_categorical(train_labels, 10) , tf.keras.utils.to_categorical(test_labels, 10)
input_shape = train_images[0,:,:,:].shape
input = layers.Input(shape=input_shape)
x = layers.Conv2D(32, (3, 3), activation='relu',padding='valid')(input)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(10, activation='softmax')(x)
model = Model(input, x, name='Functional')
model.summary()
model.compile(optimizer='adam',
loss=loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
conv2d_16 (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_17 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_18 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten_6 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_11 (Dense) (None, 64) 65600
_________________________________________________________________
dense_12 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 15s 305us/step - loss: 1.4870 - accuracy: 0.4600 - val_loss: 1.2874 - val_accuracy: 0.5488
Epoch 2/10
50000/50000 [==============================] - 15s 301us/step - loss: 1.1365 - accuracy: 0.5989 - val_loss: 1.0789 - val_accuracy: 0.6191
Epoch 3/10
50000/50000 [==============================] - 15s 301us/step - loss: 0.9869 - accuracy: 0.6547 - val_loss: 0.9506 - val_accuracy: 0.6700
Epoch 4/10
50000/50000 [==============================] - 15s 301us/step - loss: 0.8896 - accuracy: 0.6907 - val_loss: 0.9509 - val_accuracy: 0.6695
Epoch 5/10
50000/50000 [==============================] - 16s 311us/step - loss: 0.8135 - accuracy: 0.7151 - val_loss: 0.8688 - val_accuracy: 0.7046
Epoch 6/10
50000/50000 [==============================] - 15s 303us/step - loss: 0.7566 - accuracy: 0.7351 - val_loss: 0.8411 - val_accuracy: 0.7141

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

confusion_matrix() library is giving ValueError - python

Related

Convert Tensoflow model to PyTorch model - model isn't learning

how can i continue training from last epoch?

Loss is NaN using activation softmax and loss function categorical_crossentropy

How to get weight on each layers

Keras Sequential to Functional API

Categories

Resources