Why doesn't my CNN's accuracy/loss change during training?

Why doesn't my CNN's accuracy/loss change during training? - python

My goal is to train a convolutional neural network to recognise the images present in the mnist sign language dataset. Here is my attempt to process the data and train the model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
import random
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Activation, Dropout, Flatten, Dense
import cv2
import keras
import sys
import tensorflow as tf
from keras import optimizers
import json
train_df = pd.read_csv("data/sign_mnist_train.csv")
test_df = pd.read_csv("data/sign_mnist_test.csv")
X = np.array(train_df.drop(["label"], axis=1))
y = np.array(train_df[["label"]])
X = X.reshape(-1, 28, 28, 1)
X = tf.cast(X, tf.float32)
model = Sequential()
model.add(Conv2D(28, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(24, activation = 'softmax'))
model.compile(optimizer='RMSprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(X, y, epochs=10, validation_split=0.2)
and after running this I get this result
Epoch 1/10
687/687 [==============================] - 4s 6ms/step - loss: 174.9729 - accuracy: 0.0438 - val_loss: 174.6281 - val_accuracy: 0.0382
Epoch 2/10
687/687 [==============================] - 2s 3ms/step - loss: 174.9779 - accuracy: 0.0433 - val_loss: 174.6281 - val_accuracy: 0.0382
Epoch 3/10
687/687 [==============================] - 2s 3ms/step - loss: 174.9777 - accuracy: 0.0433 - val_loss: 174.6281 - val_accuracy: 0.0382
and this continues for the remaining 7 epochs. My model is slightly different from what I have provided (for brevity) but this sequential model has the same issue, which makes me suspect that the issue must come before the model = Sequential() line. Furthermore, I have tried countless combinations of optimizers/loss and all those do is make the accuracy/loss converge to slightly different numbers, so I doubt that's the problem.

One of potential is that you use loss='binary_crossentropy' rather than loss='CategoricalCrossentropy'.
Besides, you defined the split datasets for training and testing, but you again defined it as model.fit(X, y, epochs=10, validation_split=0.2) to split datasets with 20% for validation and 80% for training.

Related

How to create custom Attention layer for 1D-CNN on Tabular Data?

I am trying to create a custom layer for multiclass classification problem in a Tabular dataset using 1d-cnn. my original dataset has ~20000 features and ~5000000 rows with 8 classes. It suffers from severe imbalance, so I want to add attention layer to the 1D-CNNs such that the minority classes are paid attention to. To simplify the problem I will show the model I built on Iris Dataset. I would like to check if what I am doing is correct, the aim is to add attention to 1D-cnn layers in a VGG variant model. I would appreciate any help and input.
# Load libraries
import sklearn
from sklearn.datasets import load_iris
import numpy as np
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
## Plotting Library
import matplotlib
import matplotlib.pyplot as plt
## File I/O
import os
## ML & DeepL libraries:
import tensorflow.keras
from tensorflow.keras.utils import to_categorical
import np_utils # This library has been separated from keras and tensorflow.
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Flatten, Conv1D, Dropout, BatchNormalization, MaxPooling1D, LeakyReLU
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit
## ML & DL Model Evaluaiton Libraries_Classes:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from sklearn.utils.class_weight import compute_class_weight
# preprocessing libraries:
from sklearn.preprocessing import minmax_scale
# Compute sample weights using sklearn:
from sklearn.utils.class_weight import compute_sample_weight
from tensorflow.keras import Model
from tensorflow.keras.layers import Layer
import tensorflow.keras.backend as K
from tensorflow.keras.layers import Input, Dense
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
# load Iris
data = load_iris()
# Get the X features and corresponding labels:
X_ = data['data']
y_ = data['target']
X_ = X_[0:]
y_ = y_[0:]
Xtrain, Xtest, ytrain, ytest = train_test_split(X_, y_, test_size=0.33, random_state=42)
Creat attention layer:
# Add attention layer to the deep learning network
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight',
shape=(input_shape[-1],1),
initializer='random_normal',
trainable=True)
self.b=self.add_weight(name='attention_bias',
shape=(input_shape[1],1),
initializer='zeros',
trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
Create 1D-CNN with attention:
hidden_units = 2
epochs = 40
activation = ['tanh', 'tanh']
def create_cnn_with_attention(hidden_units,
dense_units,
input_shape,
activation):
x=Input(shape=input_shape)
cnn_layer = Conv1D(filters=64,
kernel_size =21,
strides =1,
padding ='same',
activation ='relu',
input_shape = input_shape,
kernel_initializer= tensorflow.keras.initializers.he_normal())(x)
attention_layer = attention()(cnn_layer)
outputs=Dense(dense_units,
trainable=True,
activation=activation)(attention_layer)
model=Model(x,outputs)
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics = ['sparse_categorical_accuracy'])
return model
model_attention = create_cnn_with_attention(hidden_units=hidden_units,
dense_units=3,
input_shape = (input_shape),
activation='tanh')
model_attention.summary()
Model: "model_12"
Layer (type) Output Shape Param #
input_14 (InputLayer) [(None, 4, 1)] 0
conv1d_18 (Conv1D) (None, 4, 64) 1408
attention_12 (attention) (None, 64) 68
dense_14 (Dense) (None, 3) 195
Total params: 1,671
Trainable params: 1,671
Non-trainable params: 0
model_attention.fit(Xtrain, ytrain, epochs = 40)
Epoch 1/40
4/4 [==============================] - 0s 5ms/step - loss: 11.0361 - sparse_categorical_accuracy: 0.3100
Epoch 2/40
4/4 [==============================] - 0s 2ms/step - loss: 10.8962 - sparse_categorical_accuracy: 0.3100
Epoch 3/40
4/4 [==============================] - 0s 2ms/step - loss: 10.6803 - sparse_categorical_accuracy: 0.1000
Epoch 4/40
4/4 [==============================] - 0s 1ms/step - loss: 2.7534 - sparse_categorical_accuracy: 0.0000e+00
Epoch 5/40
4/4 [==============================] - 0s 1ms/step - loss: 1.0986 - sparse_categorical_accuracy: 0.0000e+00
Epoch 6/40
4/4 [==============================] - 0s 2ms/step - loss: 1.0986 - sparse_categorical_accuracy: 0.0000e+00
Epoch 7/40
4/4 [==============================] - 0s 2ms/step - loss: 1.0986 - sparse_categorical_accuracy: 0.0000e+00
Epoch 8/40
4/4 [==============================] - 0s 2ms/step - loss: 1.0986 - sparse_categorical_accuracy: 0.0000e+00 ....
train_mse_attn = model_attention.evaluate(Xtrain, ytrain)
4/4 [==============================] - 0s 2ms/step - loss: 1.0986 - sparse_categorical_accuracy: 0.0000e+00

keras network doesn't train

First time I try to make the simplest net. I train it for XOR. It does not work absolutely. Tryed everithing: different activation functions, number of layers, neurons, epoches, batches, optimisers... Everytime result is 1,1,1,1 (accuracy=0.5). Please, help! What I do wrong?
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import keras
import numpy as np
X = np.array([ [0,0],
[0,1],
[1,0],
[1,1] ])
Y = np.array([[1,0,0,1]]).T
model = Sequential()
model.add(Dense(10, input_dim=2, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics='accuracy')
#Traiting a model
model.fit(X, Y, epochs=100, batch_size=len(X))
# Prediction
predictions = model.predict(X)
print(predictions)
I noticed, that there are always 1/1 at the left side of the output. But, I guess, there must be something like 4/4. May be this is the reason? But I can't understand how to fix it...
Tail of output:
...
...
Epoch 97/100
1/1 [==============================] - 0s 1ms/step - loss: 0.0000e+00 - accuracy: 0.5000
Epoch 98/100
1/1 [==============================] - 0s 1ms/step - loss: 0.0000e+00 - accuracy: 0.5000
Epoch 99/100
1/1 [==============================] - 0s 1ms/step - loss: 0.0000e+00 - accuracy: 0.5000
Epoch 100/100
1/1 [==============================] - 0s 1ms/step - loss: 0.0000e+00 - accuracy: 0.5000
1/1 [==============================] - 0s 165ms/step - loss: 0.0000e+00 - accuracy: 0.5000
[0.0, 0.5]
[[1.]
[1.]
[1.]
[1.]]

Thank you very much to all!
Here the working net below. Strange, that it takes too long time for training! I remember, I did the same task but without Keras several years ago. Training was almost instantly (of course without any GPU). But here "Adam optimisation" (with "fast relu" I managed to do only 4 layers net). Seems that that functions has the opposit effect for such simple tasks.
from keras.models import Sequential
from keras import initializers
from keras.layers import Dense
from tensorflow import keras
import numpy as np
X = np.array([0,0,
0,1,
1,0,
1,1] )
X = X.reshape(4,2).astype("float32")
Y = np.array([1,
0,
0,
1] )
Y = Y.reshape(4,1).astype("float32")
init_2 = initializers.TruncatedNormal(mean=0.0, stddev=0.05, seed=12345)
model = Sequential()
model.add(Dense(4, input_dim=2, activation='sigmoid', kernel_initializer=init_2, bias_initializer=init_2))
model.add(Dense(1, activation='sigmoid', kernel_initializer=init_2, bias_initializer=init_2))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#Traiting a model
model.fit(X, Y, epochs=7000, batch_size=4, verbose=0)
scores = model.evaluate(X, Y)
print(scores)
# Prediction
predictions = model.predict(X)
print(predictions)

Keras - val_accuraccy stuck at 0.0000e+00 in LSTM model [duplicate]

This question already has answers here:
Keras NN regression model gives low loss, and 0 acuracy
(2 answers)
Closed 2 years ago.
Below is my Keras model for predicting cryptocurrency prices. The problem is although the loss and val_loss decreases, the accuracy is stuck at a certain value (2.5840e-04) and doesn't change, and the val_accuracy is stuck at 0.0000e+00. I checked each of my inputs thoroughly, but I couldn't find anything wrong. Is there a problem with my model?
(This is a link to my notebook if you need)
This is my data preparation (ethusd.csv is a standard stock format dataset. you can see it in my notebook in the link above)
DATASET_PATH = "../input/crifier/ethusd.csv"
import math
import matplotlib.pyplot as plt
import tensorflow.keras as keras
import pandas as pd
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import *
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from sklearn.utils import shuffle
df=pd.read_csv(DATASET_PATH)
df = df[-100000:-90000]
global WINDOW
global LEN_TRAIN
WINDOW = 400
TRAIN_RATIO = 0.9
LEN_TRAIN = int(LEN_DF*TRAIN_RATIO)
LEN_TEST = int(LEN_DF - LEN_TRAIN)
training_set = df.iloc[:LEN_TRAIN, 1:2].values
test_set = df.iloc[LEN_TRAIN:, 1:2].values
def reshaper(dataset):
sc = MinMaxScaler(feature_range = (0, 1))
dataset_scaled = sc.fit_transform(dataset)
# Creating a data structure with 60 time-steps and 1 output
X_scaled = []
y_scaled = []
for i in range(0, len(dataset)-WINDOW):
X_scaled.append(dataset_scaled[i:i+WINDOW, 0])
y_scaled.append(dataset_scaled[i+WINDOW, 0])
X_scaled, y_scaled = np.array(X_scaled), np.array(y_scaled)
X_scaled = np.reshape(X_scaled, (X_scaled.shape[0], X_scaled.shape[1], 1))
return X_scaled, y_scaled,sc
X_train, y_train, sc_train = reshaper(training_set)
X_train, y_train = shuffle(X_train, y_train)
X_test, y_test, sc_test = reshaper(test_set)
And this is my model
model = Sequential()
#Adding the first LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
# Adding a second LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
# Adding a third LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
# Adding a fourth LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50))
model.add(Dropout(0.2))
# Adding the output layer
model.add(Dense(units = 1))
# Compiling the RNN
model.compile(optimizer = Adam(learning_rate=0.001), loss = 'mean_squared_error',metrics=['accuracy'])
model.fit(X_train, y_train, validation_split=0.1, epochs = 100, batch_size = 1000, verbose=2)
I keep getting the following accuracy and val_accuracy
Epoch 1/100
8/8 - 3s - loss: 0.0991 - accuracy: 1.2920e-04 - val_loss: 0.0046 - val_accuracy: 0.0000e+00
Epoch 2/100
8/8 - 2s - loss: 0.0217 - accuracy: 2.5840e-04 - val_loss: 0.0068 - val_accuracy: 0.0000e+00
Epoch 3/100
8/8 - 2s - loss: 0.0132 - accuracy: 2.5840e-04 - val_loss: 0.0047 - val_accuracy: 0.0000e+00
.
.
.
Epoch 28/100
8/8 - 2s - loss: 0.0036 - accuracy: 2.5840e-04 - val_loss: 0.0011 - val_accuracy: 0.0000e+00
Epoch 29/100
8/8 - 2s - loss: 0.0034 - accuracy: 2.5840e-04 - val_loss: 0.0012 - val_accuracy: 0.0000e+00
Epoch 30/100
8/8 - 2s - loss: 0.0033 - accuracy: 2.5840e-04 - val_loss: 0.0011 - val_accuracy: 0.0000e+00

Your model is regression model (you are predicting float not category). Accuracy is not applicable here.
When calculating accuracy keras is comparing your output to labels. E.g. it will give you zero accuracy even if your output is say 0.499999, but label is 0.5

Why is my CNN pre trained image classifier overfitting?

I have just started with Computer Vision and in the current task i am classifying images in 4 categories.
Total number of image files=1043
I am using pretrained InceptionV3 and fine tuning it on my dataset.
This is what i have after the epoch:
Epoch 1/5
320/320 [==============================] - 1925s 6s/step - loss: 0.4318 - acc: 0.8526 - val_loss: 1.1202 - val_acc: 0.5557
Epoch 2/5
320/320 [==============================] - 1650s 5s/step - loss: 0.1807 - acc: 0.9446 - val_loss: 1.2694 - val_acc: 0.5436
Epoch 3/5
320/320 [==============================] - 1603s 5s/step - loss: 0.1236 - acc: 0.9572 - val_loss: 1.2597 - val_acc: 0.5546
Epoch 4/5
320/320 [==============================] - 1582s 5s/step - loss: 0.1057 - acc: 0.9671 - val_loss: 1.3845 - val_acc: 0.5457
Epoch 5/5
320/320 [==============================] - 1580s 5s/step - loss: 0.0982 - acc: 0.9700 - val_loss: 1.2771 - val_acc: 0.5572
That is a huge difference. Kindly help me to figure out why is my model not able to generalize as it is fitting quite well on the train data.
my code for reference:-
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Dropout
from keras.applications.inception_v3 import InceptionV3, preprocess_input
CLASSES = 4
# setup model
base_model = InceptionV3(weights='imagenet', include_top=False)
from sklearn.preprocessing import OneHotEncoder
x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(CLASSES, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df['Category']= encoder.fit_transform(df['Category'])
from keras.preprocessing.image import ImageDataGenerator
WIDTH = 299
HEIGHT = 299
BATCH_SIZE = 32
train_datagen = ImageDataGenerator(rescale=1./255,preprocessing_function=preprocess_input)
validation_datagen = ImageDataGenerator(rescale=1./255)
df['Category'] =df['Category'].astype(str)
#dfval['Category'] = dfval['Category'].astype(str)
from sklearn.utils import shuffle
df = shuffle(df)
from sklearn.model_selection import train_test_split
dftrain,dftest = train_test_split(df, test_size = 0.2, random_state = 0)
train_generator = train_datagen.flow_from_dataframe(dftrain,target_size=(HEIGHT, WIDTH),batch_size=BATCH_SIZE,class_mode='categorical', x_col='Path', y_col='Category')
validation_generator = validation_datagen.flow_from_dataframe(dftest,target_size=(HEIGHT, WIDTH),batch_size=BATCH_SIZE,class_mode='categorical', x_col='Path', y_col='Category')
EPOCHS = 5
BATCH_SIZE = 32
STEPS_PER_EPOCH = 320
VALIDATION_STEPS = 64
MODEL_FILE = 'filename.model'
history = model.fit_generator(
train_generator,
epochs=EPOCHS,
steps_per_epoch=STEPS_PER_EPOCH,
validation_data=validation_generator,
validation_steps=VALIDATION_STEPS)
Any help would be appreciated :)

If you don't use preprocess_input in "all" your data, you will get terrible results.
Look at these:
train_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
...)
validation_datagen = ImageDataGenerator()
Now, I notice you are using rescale. Since you imported the correct preprocess_input function from the inception code, I really think you should not be using this rescale. The preprocess_input function is supposed to do all the necessary preprocessing. (Not all models were trained with normalized inputs)
But would rescale be a problem if you're applying it to both batasets?
Well... if the trainable=False applied correctly to the BatchNormalization layers, this means that these layers have stored values for mean and variation which will only work well if the data is within the expected range.

Cannot find the reason behind very low accuracy in my Convolutional Net model with Keras

I have built and trained my Convolutional Neural Network model and trained it using the Handwritten_Dataset and have used the epochs=2 and sent the training data in batches of 128 but cannot find the reason behind its very low accuracy.
The code is :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import keras
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
import tables
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Flatten, Dropout, Dense
from keras.utils import to_categorical
#hd=pd.read_hdf('data.h5')
hd=pd.read_csv('../input/handwritten_data_785.csv')
hd.head()
Y=hd.iloc[:,0]
X=hd.iloc[:,1:]
Y=to_categorical(Y)
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,stratify=Y,random_state=34,test_size=0.25)
X_train=X_train.values.reshape(X_train.shape[0],28,28,1)
X_test=X_test.values.reshape(X_test.shape[0],28,28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
print("X.shape ",X.shape)
print("Y.shape ",Y.shape)
type(Y)
input_shape=(28,28,1)
n_classes=Y_train.shape[1]
batch_size=128
epochs=2
model=Sequential()
model.add(Conv2D(filters=32,kernel_size=(4,4),strides=(1,1),padding='same',activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2)))#,strides=(1,1)))
model.add(Conv2D(filters=64,kernel_size=(4,4),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=(1,1)))
model.add(Flatten())
model.add(Dense(1000,activation='relu'))
model.add(Dense(n_classes,activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.SGD(lr=0.05),metrics=["accuracy"])
model.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(X_test,Y_test))
model.evaluate(X_test,Y_test,verbose=0)
Can anyone point out the reason behind such low accuracy ?
Have I divided the dataset correctly ?
The output Accuracy is :
Train on 279027 samples, validate on 93010 samples
Epoch 1/2
279027/279027 [==============================] - 63s 225us/step - loss: 15.6456 - acc: 0.0293 - val_loss: 15.6455 - val_acc: 0.0293
Epoch 2/2
279027/279027 [==============================] - 58s 208us/step - loss: 15.6455 - acc: 0.0293 - val_loss: 15.6455 - val_acc: 0.0293
[15.64552185918654, 0.02931942801857274]

The following helps me attaining good accuracy :
1) Normalizing the data affect the Convolutional neural network by large scale.
Normalize the data : X /=255
The accuracy seems to go up to 90% without changing the epochs.
2) Increasing the epochs will also increase the model accuracy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why doesn't my CNN's accuracy/loss change during training? - python

Related

How to create custom Attention layer for 1D-CNN on Tabular Data?

keras network doesn't train

Keras - val_accuraccy stuck at 0.0000e+00 in LSTM model [duplicate]

Why is my CNN pre trained image classifier overfitting?

Cannot find the reason behind very low accuracy in my Convolutional Net model with Keras

Categories

Resources