ANN Implementation Overfitting - python

I am new in ML field and learning it, I made a model by following a tutorial but resulted accuracy is always jumps to 100% soon. I searched online and find about it that i have issue related to model overfitting according to my understanding. Dataset i have used is pretty small from UCI site named Indian Liver Patients Dataset. The dataset contains very few observation around 600.
My Question is how i could overcome this overfitting in the data. Any Help will be appreciated, Thanks.
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
df = pd.read_csv("C:/TF/TEST/ILDP.csv")
df["ag_ratio"].fillna("0.6", inplace=True)
df.isnull().sum()
print(df.head())
LD, NLD = df['is_patient'].value_counts()
df_sex = pd.get_dummies(df['gender'])
df_new = pd.concat([df, df_sex], axis=1)
Droop_gender = df_new.drop(labels=['gender'], axis=1)
Droop_gender.columns = ['age', 'tot_bilirubin', 'direct_bilirubin', 'tot_proteins', 'albumin', 'ag_ratio',
'sgpt', 'sgot', 'alkphos', 'Female', 'Male', 'is_patient']
X = Droop_gender.drop('is_patient', axis=1)
y = Droop_gender['is_patient']
print(X.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
classifier = Sequential() # Initialising the ANN
classifier.add(Dense(units=16, kernel_initializer='uniform', activation='relu', input_dim=11))
classifier.add(Dense(units=8, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
# compile ANN
classifier.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
# Fitting the data
histroy = classifier.fit(X_train, y_train, batch_size=20, epochs=50)
y_pred = classifier.predict(X_test)
y_pred = [1 if y >= 0.5 else 0 for y in y_pred]
print(classification_report(y_test, y_pred))

That your model is overfitting is encouraging because it means your model has the capacity to learn. Now you have to gradually reduce the capacity of your model to make it generalize better. My recommendation is to add regularization.
Add dropout layers between some of your fully connected layers:
classifier.add(Dense(units=16, kernel_initializer='uniform', activation='relu', input_dim=11))
classifier.add(keras.layers.Dropout(0.5))
classifier.add(Dense(units=8, kernel_initializer='uniform', activation='relu'))
You can add these dropout layers between any layers, but adding between layers with more neurons is better.
If that doesn't work well you can try weight decay. Here is an example from the documentation:
from keras import regularizers
model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Although try either kernel_regularize or activity_regularizer first. They should both work about the same anyway. Try tuning and see how different parameters change. In the end it's a lot of black magic so you'll have to experiment a bit. Good luck!

Related

ValueError: 'logits' and 'labels' must have the same shape

I am working on my first neural network, and i'm stuck on one error. Here is the code:
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('iris.csv')
X = pd.get_dummies(df.drop(['variety'], axis=1))
y = df['variety'].apply(lambda x: 0 if x=='Setosa' else (1 if x=='Versicolor' else 2))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)
print(y_train.head())
from keras.models import Sequential, load_model
from keras.layers import Dense
from sklearn.metrics import accuracy_score
model = Sequential()
model.add(Dense(units=8, activation='relu', input_dim=len(X_train.columns)))
model.add(Dense(units=3, activation='sigmoid'))
model.add(flatten())
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics='accuracy')
model.fit(X_train, y_train, epochs=50, batch_size=1)
I am working off of a tutorial on tensorflow, and am using https://www.kaggle.com/datasets/arshid/iris-flower-dataset as the dataset to train on. I used the code from the tutorial, but changed it to fit my dataset. Still, I get the ValueError. Any help?

Not getting reproducible results TensorFlow-Keras-Google Collab

I've been trying to create a model that recognizes different singing techniques. I have got good results but I want to do different tests with different optimizers, layers, etc. However, I can't get reproducible results. By running twice this model training:
num_epochs = 100
batch_size = 128
history = modelo.fit(X_train_f, Y_train, validation_data=(X_test_f,Y_test), epochs=num_epochs, batch_size=batch_size, verbose=2)
I can get 25% accuracy the first run and then 34% the second. Then if I change the optimizer from "sgd" to "adam", I would get a 99%. If I come back to the previous "sgd" optimizer that got me 34% the second run, I would get 100% or something crazy like that. I don't understand why.
I've tried many things I've read in similar questions. The following lines show how I am trying to make my code to be reproducible, and these are actually the first lines of my whole code:
import numpy as np
import tensorflow as tf
import random as rn
import os
#https://stackoverflow.com/questions/57305909/tensorflow-keras-reproducibility-problem-on-google-colab
os.environ['PYTHONHASHSEED']=str(5)
np.random.seed(5)
rn.seed(12345)
session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
tf.compat.v1.set_random_seed(1234)
sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)
Question is, what am I doing wrong with the code above that is not working (as I mentioned)?
Here's where I create the training sets:
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers.core import Dense, Flatten
from keras.layers import BatchNormalization,Activation
from keras.optimizers import SGD, Adam
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.2, random_state=2)
My model:
from tensorflow.keras import layers
from tensorflow.keras import initializers
input_dim = X_train_f.shape[1]
output_dim = Y_train.shape[1]
modelo = Sequential()
modelo.add(Conv1D(filters=6, kernel_initializer=initializers.glorot_uniform(seed=5), kernel_size=5, activation='relu', input_shape=(40, 1))) # 6
modelo.add(MaxPooling1D(pool_size=2))
modelo.add(Conv1D(filters=16, kernel_initializer=initializers.glorot_uniform(seed=5), kernel_size=5, activation='relu')) # 16
modelo.add(MaxPooling1D(pool_size=2))
modelo.add(Flatten())
modelo.add(Dense(120, kernel_initializer=initializers.glorot_uniform(seed=5), activation='relu')) # 120
modelo.add(Dense(84, kernel_initializer=initializers.glorot_uniform(seed=5), activation='relu')) # 84
modelo.add(Dense(nclases, kernel_initializer=initializers.glorot_uniform(seed=5), activation='softmax'))
sgd = SGD(lr=0.1)
#modelo.compile(loss='categorical_crossentropy',
# optimizer='adam',
# metrics=['accuracy'])
modelo.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
modelo.summary()
modelo.input_shape
It is a normal situation. Adam optimizer is much more powerful comparing to SGD. Adam implicitly performs coordinate-wise gradient clipping and can hence, unlike SGD, tackle heavy-tailed noise.

Tensorflow model accuracy low

So my main goal is to use data from 2018 and try to predict data for 2019. I'm using a GRU model and I have the following code. I have a few issues, I'm not sure if the code is actually correct or if I am missing something, and also for model.fit should I use validation_split=0.1 or validation_data=X_test,y_test since I'm using a different dataframe for tesing.
Regarding the accuracy, it is very small and doesn't make any sense and I have no idea why.
import pandas as pd
import tensorflow as tf
from keras.layers.core import Dense
from keras.layers.recurrent import GRU
from keras.models import Sequential
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
df = pd.read_csv('IF 10 PERCENT.csv',index_col=None)
#Loading Second Dataframe
df2 = pd.read_csv('2019 10minutes IF 10 PERCENT.csv',index_col=None)
tbc=TensorBoardColab() # Tensorboard
X_train= df[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X_train=X_train.values
y_train= df['Power_kW']
y_train=y_train.values
X_test= df2[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X_test=X_test.values
y_test= df2['Power_kW']
y_test=y_test.values
# conversion to numpy array
# scaling values for model
x_scale = MinMaxScaler()
y_scale = MinMaxScaler()
X_train= x_scale.fit_transform(X_train)
y_train= y_scale.fit_transform(y_train.reshape(-1,1))
X_test=x_scale.fit_transform(X_test)
y_test=y_scale.fit_transform(y_test.reshape(-1,1))
X_train = X_train.reshape((-1,1,12))
X_test = X_test.reshape((-1,1,12))
# splitting train and test
# creating model using Keras
model = Sequential()
model.add(GRU(units=512, return_sequences=True, input_shape=(1,12)))
model.add(GRU(units=256, return_sequences=True))
model.add(GRU(units=256))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss=['mse'], optimizer='adam',metrics=['accuracy'])
model.summary()
#model.fit(X_train, y_train, batch_size=250, epochs=10, validation_split=0.1, verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
model.fit(X_train, y_train, batch_size=250, epochs=10, validation_data=(X_test,y_test), verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
score = model.evaluate(X_test, y_test)
print('Score: {}'.format(score))
print('Accuracy: {}'.format(acc))
y_predicted = model.predict(X_test)
y_predicted = y_scale.inverse_transform(y_predicted)
y_t
est = y_scale.inverse_transform(y_test)
plt.plot(y_predicted, label='Predicted')
plt.plot(y_test, label='Measurements')
plt.legend()
plt.show()
Thank you
It sounds to me that you are trying to solve a regression problem here. if it is so, It does not make sense to measure accuracy as a metric, since accuracy is about to measure the exact label matching. MSE should be pretty good for the regression

Keras - Moderate Accuracy, bad predictions

I'm doing my first steps into machine learning and trying to do a sign-language machine learning project using the Kaggle dataset. It is supposed to be able to predict characters in ASL. Here's the data presented by Kaggle.
Image of Dataset here.
My current issue is that I can achieve moderate accuracy that fits the data given by Kaggle using their testing data, but if I try to predict a single image, say a random letter of the alphabet, it will be consistently wrong. Here's my code.
from keras.models import Sequential, load_model
from keras.preprocessing.image import load_img, img_to_array
from keras.layers import Dense, Dropout, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD
import numpy as np
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelBinarizer
import matplotlib.pyplot as plt
trainer = read_csv("sign_mnist_train.csv")
labels = trainer["label"].values
trainer = trainer.drop(["label"], axis=1) #
tester = read_csv("sign_mnist_test.csv")
testlabels = tester["label"].values
tester = tester.drop(["label"], axis=1)
def preProcessing(raw, classes):
OH = OneHotEncoder(sparse=False) # One hot's the labels, can be replaced with LabelBinarizer
binary = classes.reshape(len(classes), 1)
binary = OH.fit_transform(binary)
images = raw.values
for c, i in enumerate(images, 0):
image = np.reshape(i, (28, 28))
image = image.flatten()
images[c] = np.array(image)
return images, binary
def defineModel(): # Builds the layers for our model
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=(x_test.shape[1:]), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(y_train.shape[1], activation='softmax'))
opt = SGD(lr=0.001, momentum=0.9)
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["categorical_accuracy"])
return model
def testModel(): #Test's a single image, predicting the class.
model = load_model("my_model.hl5")
img = load_img("C.jpg", color_mode="grayscale", target_size=(28, 28))
img = img_to_array(img)
img = np.reshape(img, (-1, 28, 28, 1))
test = model.predict_classes(img)
print(test)
test_test = model.predict_proba(img)[0]
test_test = "%.2f" % (test_test[test]*100)
print(test_test)
if __name__ == "__main__":
data, labels = preProcessing(trainer, labels)
x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.33, random_state=42)
x_train = x_train.astype('float32')
x_train = x_train/255.0
x_train = np.reshape(x_train, (x_train.shape[0], 28, 28, 1))
x_test = x_test.astype('float32')
x_test = x_test/255.0
x_test = np.reshape(x_test, (x_test.shape[0], 28, 28, 1))
model = defineModel()
history = model.fit(x_train, y_train, validation_data = (x_test, y_test), epochs=40, verbose=1, batch_size=128)
model.evaluate(x_test, y_test)
model.save("my_model.hl5")
Apologies for the messy code, but essentially I try to break the data into usable parts using Panda, then using Keras/Sklearn to fit the data. I wanted to look deeper and was advised to use accuracy_score in the Sklearn library.
testStuff, testlabels = preProcessing(tester, testlabels)
testStuff = testStuff.reshape(testStuff.shape[0], 28, 28, 1)
pred = model.predict(testStuff).round()
print(accuracy_score(testlabels, pred))
This showed that my accuracy was only around 70% compared to the 99% model.evaluate posed. Regardless, I still a very low accuracy on random predictions, some of my individual tests were snipped straight from the Kaggle example images. From there, I tried removing layers, increasing/reducing filters on the Conv2d layers to see what happens, but nothing seems to make a difference. I picked up Pyplot to display the graph and I get this. I don't see a problematic trend, but I may be looking in the wrong area.
Is it because of overfitting/underfitting? I feel that I am getting something wrong at a fundamental level and could use some tips. Looking at similar questions, they point toward possible indexing issues and otherwise mismanagement of the dataset, I am unsure how to test if these issues are present in my code. This is my first time using StackOverflow to ask a question so feel free to ask anything since I understand that reading my rambling code/question is confusing.
Summary: Okay accuracy, bad predictions, why?
In general this behaviour often occurs due to overfitting:
Try to tweak your network to have fewer parameters and try to add some regularizations.
Further it could be that your test set only contains a part of the planned real world domain. Meaning that your training set is far away from reality, which also could lead to bad predictions.
A way to tweak your dataset could be data augmentation, I assume it could work very well on this ASL DataSet - but I did not had a deep look.
Data Augmentation is basically an artifical way to increase the size of your dataset, reducing overfitting as well and improves on slight rotations of your hand or other "random" distortions, like different background or different clothing.
A great article about data augmentation can be found here:
https://towardsdatascience.com/data-augmentation-for-deep-learning-4fe21d1a4eb9

Regression result in keras using python

This is a regression problem. Below is my code
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
os.chdir(r'C:\Users\Swapnil\Desktop\RP TD\first\Changes')
## Load the dataset
dataset1 = pd.read_csv("Main Lane Plaza 1.csv")
X_train = dataset1.iloc[:,0:11].values
Y_train = dataset1.iloc[:,11].values
dataset2 = pd.read_csv("Main Lane Plaza 1_070416010117.csv")
X_test = dataset2.iloc[:,0:11].values
Y_test = dataset2.iloc[:,11].values
##Define base model
def base_model():
model = Sequential()
model.add(Dense(11, input_dim=11, kernel_initializer='normal',
activation='sigmoid'))
model.add(Dense(7, kernel_initializer='normal', activation='sigmoid'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
seed = 7
np.random.seed(seed)
clf = KerasRegressor(build_fn=base_model, nb_epoch=100,
batch_size=5,verbose=0)
clf.fit(X_train, Y_train)
res = clf.predict(X_train)
##Result
clf.score(X_test, Y_test)
Not sure if the score should be negative??
Kindly advise if i am doing something wrong.
Thanks in advance.
I am not able to figure it out can this be problem due to feature scaling as I did feature scaling using R and saved the csv files to use in python.
When you get a negative score for regression problem, it usually means that your the model you choose can't fit your data well.
You have layer 1 activation as sigmoid, layer 2 also as sigmoid and then final layer as 1 output.
change the activations to relu, as sigmoid would be squashing the values between 0 to 1. Making the numbers really small, causing the vanishing gradient problem over the 2 hidden layer.
def base_model():
model = Sequential()
model.add(Dense(11, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(7, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer='adam')
return model

Categories

Resources