Keras Binary Classifier Tutorial Example gives only 50% validation accuracy.
The near 50% accuracy can be gotten from an un-trained classifier itself for binary classification.
This example is straight from https://keras.io/getting-started/sequential-model-guide/
import numpy as np
import tensorflow as tf
from tensorflow_core.python.keras.models import Sequential
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
np.random.seed(10)
# Generate dummy data
x_train = np.random.random((1000, 20))
y_train = np.random.randint(2, size=(1000, 1))
x_test = np.random.random((800, 20))
y_test = np.random.randint(2, size=(800, 1))
model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=50,
batch_size=128,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, batch_size=128)
Accuracy output.
I tried with multiple trials.
Increased the number of hidden layers
Epoch 50/50 1000/1000 [==============================] - 0s
211us/sample - loss: 0.6905 - accuracy: 0.5410 - val_loss: 0.6959 -
val_accuracy: 0.4812
Could someone help me understand if anything is wrong here?
How to increase the accuracy for this "example" problem presented in the tutorial?
If you train a classifier with random examples, you will always get aprrox. 50% accuracy at validation data here represented by x_test. It is because your training samples get trained with random classes. Also the validation or test set has been assigned to random classes. This is why the random accuracy i.e. 50-50% occurs.
The more epoch you test the training set the more accuracy you will get on training set as an effect of overfitting.
Related
data = np.random.random((10000, 150))
labels = np.random.randint(10, size=(10000, 1))
labels = to_categorical(labels, num_classes=10)
model = Sequential()
model.add(Dense(units=32, activation='relu', input_shape=(150,)))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=30, validation_split=0.2)
I created 10000 random samples to train my net, but it use only few of them(250/10000)
Exaple of the 1st epoch:
Epoch 1/30
250/250 [==============================] - 0s 2ms/step - loss: 2.1110 - accuracy: 0.2389 - val_loss: 2.2142 - val_accuracy: 0.1800
Your data is split into training and validation subsets (validation_split=0.2).
Training subset has size 8000 and validation 2000.
Training goes in batches, each batch has size 32 samples by default.
So one epoch should take 8000/32=250 batches, as it shows in the progress.
Try code like following example
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))
# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)
I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function
The network isn't learning for first 5-10 epochs, and then everything is fine.
I tried using initializers and regularizers, but that only made it worse.
I use the network like this:
import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()
# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
datagen = ImageDataGenerator(
rescale=1./255,
width_shift_range=[-1, 0, 1],
height_shift_range=[-1, 0, 1],
rotation_range=10)
epochs = 20
batch_size = 50
num_classes = 10
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
input_shape=x_train.shape[1:],
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
verbose=2, shuffle=True)
With the code above I get results like these:
Epoch 1/20
- 55s - loss: 2.3098 - accuracy: 0.1036
Epoch 2/20
- 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20
- 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20
- 56s - loss: 2.3060 - accuracy: 0.1079
...
For 7 epochs (different every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.
But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch.
This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?
Activation function suggestion:
In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of “dead” units in network, try Leaky ReLU and tanh. Never use sigmoid.
Reasons for not using the sigmoid:
A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.
In order to analyze data, I need loss for each of output dimensions, instead I get only one loss which i suspect is a mean of the losses for all output dimensions.
Any help to understand what is the loss I get and how to get separate loss for each output:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from scipy import stats
from keras import models
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import optimizers
from sklearn.model_selection import KFold
siz=100000
inp0=np.random.randint(100, 1000000 , size=(siz,3))
rand0=np.random.randint(-100, 100 , size=(siz,2))
a1=0.2;a2=0.8;a3=2.5;a4=2.6;a5=1.2;a6=0.3
oup1=np.dot(inp0[:,0],a1)+np.dot(inp0[:,1],a2)+np.dot(inp0[:,2],a3)\
+rand0[:,0]
oup2=np.dot(inp0[:,0],a4)+np.dot(inp0[:,1],a5)+np.dot(inp0[:,2],a6)\
+rand0[:,1]
oup_tot=np.concatenate((oup1.reshape(siz,1), oup2.reshape(siz,1)),\
axis=1)
normzer_inp = MinMaxScaler()
inp_norm = normzer_inp.fit_transform(inp0)
normzer_oup = MinMaxScaler()
oup_norm = normzer_oup.fit_transform(oup_tot)
X=inp_norm
Y=oup_norm
kfold = KFold(n_splits=2, random_state=None, shuffle=False)
opti_SGD = SGD(lr=0.01, momentum=0.9)
model1 = Sequential()
for train, test in kfold.split(X, Y):
model = Sequential()
model.add(Dense(64, input_dim=X.shape[1], activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(Y.shape[1], activation='linear'))
model.compile(loss='mean_squared_error', optimizer=opti_SGD)
history = model.fit(X[train], Y[train], \
validation_data=(X[test], Y[test]), \
epochs=100,batch_size=2048, verbose=2)
I get:
Epoch 1/100
- 0s - loss: 0.0864 - val_loss: 0.0248
Epoch 2/100
- 0s - loss: 0.0218 - val_loss: 0.0160
Epoch 3/100
- 0s - loss: 0.0125 - val_loss: 0.0091
I would like to know what is the loss i got now and how to get losses for each output dimension.
Pass a list of functions to the metrics argument in the compile function. See here: https://keras.io/metrics/#custom-metrics
import keras.backend as K
...
def loss_first_dim(y_true, y_pred):
return K.mean(K.square(y_pred[:, 0] - y_true[:, 0]))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=[loss_first_dim])
...
I am trying to train a regression model of a dummy function with 3 variables with fully connected neural nets in Keras and I always get a training loss much higher than the validation loss.
I split the data set in 2/3 for training and 1/3 for validation. I have tried lots of different things:
changing the architecture
adding more neurons
using regularization
using different batch sizes
Still the training error is one order of magnitude higer than the validation error:
Epoch 5995/6000
4020/4020 [==============================] - 0s 78us/step - loss: 1.2446e-04 - mean_squared_error: 1.2446e-04 - val_loss: 1.3953e-05 - val_mean_squared_error: 1.3953e-05
Epoch 5996/6000
4020/4020 [==============================] - 0s 98us/step - loss: 1.2549e-04 - mean_squared_error: 1.2549e-04 - val_loss: 1.5730e-05 - val_mean_squared_error: 1.5730e-05
Epoch 5997/6000
4020/4020 [==============================] - 0s 105us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4372e-05 - val_mean_squared_error: 1.4372e-05
Epoch 5998/6000
4020/4020 [==============================] - 0s 96us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4151e-05 - val_mean_squared_error: 1.4151e-05
Epoch 5999/6000
4020/4020 [==============================] - 0s 80us/step - loss: 1.2487e-04 - mean_squared_error: 1.2487e-04 - val_loss: 1.4342e-05 - val_mean_squared_error: 1.4342e-05
Epoch 6000/6000
4020/4020 [==============================] - 0s 79us/step - loss: 1.2494e-04 - mean_squared_error: 1.2494e-04 - val_loss: 1.4769e-05 - val_mean_squared_error: 1.4769e-05
This makes no sense, please help!
Edit: this is the full code
I have 6000 training examples
# -*- coding: utf-8 -*-
"""
Created on Mon Feb 26 13:40:03 2018
#author: Michele
"""
#from keras.datasets import reuters
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras import optimizers
import matplotlib.pyplot as plt
import os
import pylab
from keras.constraints import maxnorm
from sklearn.model_selection import train_test_split
from keras import regularizers
from sklearn.preprocessing import MinMaxScaler
import math
from sklearn.metrics import mean_squared_error
import keras
# fix random seed for reproducibility
seed=7
np.random.seed(seed)
dataset = np.loadtxt("BabbaX.csv", delimiter=",")
#split into input (X) and output (Y) variables
#x = dataset.transpose()[:,10:15] #only use power
x = dataset
del(dataset) # delete container
dataset = np.loadtxt("BabbaY.csv", delimiter=",")
#split into input (X) and output (Y) variables
y = dataset.transpose()
del(dataset) # delete container
#scale labels from 0 to 1
scaler = MinMaxScaler(feature_range=(0, 1))
y = np.reshape(y, (y.shape[0],1))
y = scaler.fit_transform(y)
lenData=x.shape[0]
x=np.transpose(x)
xtrain=x[:,0:round(lenData*0.67)]
ytrain=y[0:round(lenData*0.67),]
xtest=x[:,round(lenData*0.67):round(lenData*1.0)]
ytest=y[round(lenData*0.67):round(lenData*1.0)]
xtrain=np.transpose(xtrain)
xtest=np.transpose(xtest)
l2_lambda = 0.1 #reg factor
#sequential type of model
model = Sequential()
#stacking layers with .add
units=300
#model.add(Dense(units, input_dim=xtest.shape[1], activation='relu', kernel_initializer='normal', kernel_regularizer=regularizers.l2(l2_lambda), kernel_constraint=maxnorm(3)))
model.add(Dense(units, activation='relu', input_dim=xtest.shape[1]))
#model.add(Dropout(0.1))
model.add(Dense(units, activation='relu'))
#model.add(Dropout(0.1))
model.add(Dense(1)) #no activation function should be used for the output layer
rms = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0) #It is recommended to leave the parameters
adam = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-6, amsgrad=False)
#of this optimizer at their default values (except the learning rate, which can be freely tuned).
#adam=keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
#configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])
# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=1000, batch_size=round(xtest.shape[0]/100),
validation_data=(xtest, ytest), shuffle=True, verbose=2)
#extract weights for each layer
weights = [layer.get_weights() for layer in model.layers]
#evaluate on training data set
valuesTrain=model.predict(xtrain)
#evaluate on test data set
valuesTest=model.predict(xtest)
#invert predictions
valuesTrain = scaler.inverse_transform(valuesTrain)
ytrain = scaler.inverse_transform(ytrain)
valuesTest = scaler.inverse_transform(valuesTest)
ytest = scaler.inverse_transform(ytest)
TL;DR:
When a model is learning well and quickly the validation loss can be lower than the training loss, since the validation happens on the updated model, while the training loss did not have any (no batches) or only some (with batches) of the updates applied.
Okay I think I found out what's happening here. I used the following code to test this.
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
np.random.seed(7)
N_DATA = 6000
x = np.random.uniform(-10, 10, (3, N_DATA))
y = x[0] + x[1]**2 + x[2]**3
xtrain = x[:, 0:round(N_DATA*0.67)]
ytrain = y[0:round(N_DATA*0.67)]
xtest = x[:, round(N_DATA*0.67):N_DATA]
ytest = y[round(N_DATA*0.67):N_DATA]
xtrain = np.transpose(xtrain)
xtest = np.transpose(xtest)
model = Sequential()
model.add(Dense(10, activation='relu', input_dim=3))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))
adam = keras.optimizers.Adam()
# configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])
# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtest, ytest), shuffle=False, verbose=2)
plt.plot(history.history['mean_squared_error'])
plt.plot(history.history['val_loss'])
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
This is essentially the same as your code and replicates the problem, which is not actually a problem. Simply change
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtest, ytest), shuffle=False, verbose=2)
to
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtrain, ytrain), shuffle=False, verbose=2)
So instead of validating with your validation data you validate using the training data again, which leads to exactly the same behavior. Weird isn't it? No actually not. What I think is happening is:
The initial mean_squared_error given by Keras on every epoch is the loss before the gradients have been applied, while the validation happens after the gradients have been applied, which makes sense.
With highly stochastic problems for which NNs are usually used you do not see that, because the data varies so much that the updated weights simply are not good enough to describe the validation data, the slight overfitting effect on the training data is still so much stronger that even after updating the weights the validation loss is still higher than the training loss from before. That is only how I think it is though, I might be completely wrong.
One of the reasons that I think is maybe you can increase the size of training data and lower the size of validation data. Then your model will be trained on more samples which may include some complex samples as well and then can be validated on the remaining samples. Try something like train-80% and Validation-20% or any other numbers a little higher than what you used previously.
If you don't want to change the size of training and validation sets, then you can try changing the random seed value to some other number so that you will get a training set with different samples which might be helpful in training the model well.
Check this answer here to get more understanding of the other possible reasons.
Check this link if you want a more detailed explanation with an example. #Michele
If training loss is a little higher or nearer to validation loss, it mean that model is not overfitting.
Efforts are always there to use best out of features to have less overfitting and better validation and test accuracies.
Probable reason that you are always getting train loss higher can be the features and data you are using to train.
Please refer following link and observe the training and validation loss in case of dropout:
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
I'm currently using a Naive Bayes algorithm to do my text classification.
My end goal is to be able to highlight parts of a big text document if the algorithm has decided the sentence belonged to a category.
Naive Bayes results are good, but I would like to train a NN for this problem, so I've followed this tutorial:
http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/ to build my LSTM network on Keras.
All these notions are quite difficult for me to understand right now, so excuse me if you see some really stupid things in my code.
1/ Preparation of the training data
I have 155 sentences of different sizes that have been tagged to a label.
All these tagged sentences are in a training.csv file:
8,9,1,2,3,4,5,6,7
16,15,4,6,10,11,12,13,14
17,18
22,19,20,21
24,20,21,23
(each integer representing a word)
And all the results are in another label.csv file:
6,7,17,15,16,18,4,27,30,30,29,14,16,20,21 ...
I have 155 lines in trainings.csv, and of course 155 integers in label.csv
My dictionnary has 1038 words.
2/ The code
Here is my current code:
total_words = 1039
## fix random seed for reproducibility
numpy.random.seed(7)
datafile = open('training.csv', 'r')
datareader = csv.reader(datafile)
data = []
for row in datareader:
data.append(row)
X = data;
Y = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
max_sentence_length = 500
X_train = sequence.pad_sequences(X, maxlen=max_sentence_length)
X_test = sequence.pad_sequences(X, maxlen=max_sentence_length)
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(total_words, embedding_vecor_length, input_length=max_sentence_length))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, Y, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_train, Y, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
This model is never converging:
155/155 [==============================] - 4s - loss: 0.5694 - acc: 0.0000e+00
Epoch 2/3
155/155 [==============================] - 3s - loss: -0.2561 - acc: 0.0000e+00
Epoch 3/3
155/155 [==============================] - 3s - loss: -1.7268 - acc: 0.0000e+00
I would like to have one of the 24 labels as a result, or a list of probabilities for each label.
What am I doing wrong here?
Thanks for your help!
I've updated my code thanks to the great comments posted to my question.
Y_train = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
Y_test = numpy.genfromtxt("labels_test.csv", dtype="int", delimiter=",")
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
max_review_length = 50
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_review_length))
model.add(LSTM(10, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(31, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
model.fit(X_train, Y_train, epochs=100, batch_size=30)
I think I can play with LSTM size (10 or 100), number of epochs and batch size.
Model has a very poor accuracy (40%). But currently I think it's because I don't have enough data (150 sentences for 24 labels).
I will put this project in standby mode until I get more data.
If someone has some ideas to improve this code, feel free to comment!