I'm trying to perform a regression using Keras in Python. During training, the loss, mean_squared_error, val_mean_squared_error are all nan. How can I fix this?
I've checked thoroughly, and there are no missing values (NaNs) or infinite values in my data. I've also tried gradient value clipping and gradient norm scaling, but none of them worked.
Below is my neural net model.
model = Sequential()
n_cols = train_X.shape[1]
early_stopping_monitor = EarlyStopping(patience=10)
model.add(Dense(400, activation='relu', input_shape=(n_cols,)))
model.add(Dense(400, activation='relu'))
model.add(Dense(400, activation='relu'))
model.add(Dense(400, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_squared_error'])
model.fit(train_X, train_y, validation_split=0.2, epochs=500, callbacks=[early_stopping_monitor])
Related
I am running a plain Sequential API MLP and encounter a huge loss in my output. I am not sure what is wrong and how to fix. I am using Batch Normalization layers as I read that would help and it actually does but my output is still ridiculous. Any other advice on the architecture is more than welcome.
RMSE loss is about 56000
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(Dense(3, input_dim=X_train_enc.shape[1], activation='relu', kernel_initializer='he_normal'))
model.add(BatchNormalization())
model.add(Dense(10, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(30, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(10, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(3, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation = 'linear'))
# # summarize layers
print(model.summary())
from tensorflow.keras import backend
from tensorflow.keras.losses import mean_squared_error
#Creating/Defining our own metrics
def mean_absolute_percentage_error(y_true, y_pred):
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true),
K.epsilon(),
None))
return 100. * K.mean(diff, axis=-1)
tf.keras.backend.set_epsilon(1) #https://stackoverflow.com/questions/49729522/why-is-the-mean-average-percentage-errormape-extremely-high
def rmse(y_true, y_pred):
return backend.sqrt(mean_squared_error(y_true, y_pred))
#def rmse(y_true, y_pred):
# return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1))
Adamax = tf.keras.optimizers.Adamax(lr=0.02, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
#Compiling model
model.compile(optimizer='adam',loss=rmse, metrics=['accuracy',rmse, 'mae', 'mape'])#Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model
# # Creating a checkpoint
filepath="weights-improvement-{epoch:02d}-{val_rmse:.9f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_rmse', verbose=1, save_best_only=True, mode='min')
# # monitor = "val_loss", save_best_only=False
callbacks_list = [checkpoint]
history = model.fit(X_train_enc, y_train, epochs=200, batch_size=16, validation_split=0.2681867, callbacks=callbacks_list, verbose=1)
I try to train a NN for regression. When using SGD optimizer class from Keras I suddently get NAN values as prediction from my network after the first step. Before I was running trainings with the Adam optimizer class and everything worked fine. I already tried to change the learning rate of SGD but still NAN values occure as model prediction after the first step and after compiling.
Since my training worked with Adam optimizer I don't believe my inputs are causing the NAN's. I already checked my input valus for NaNs and removed all of them. So what could cause this behavior?
Here is my code:
from keras.optimizers import Adam
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(300,input_shape=(50,), kernel_initializer='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(300, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(500, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(400, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dense(1, kernel_initializer='glorot_uniform', activation='linear'))
opt = SGD(lr=0.001, decay=1e-6)
model.compile(loss='mse', optimizer=opt)
model.fit(x_train, y_train, epochs=100, batch_size=32, verbose=0, validation_data=(x_test, y_test))
#print(type(x_train)) ='pandas.core.frame.DataFrame'>
#print( x_train.shape) = (10000 , 50)
Using ANNs for regression is a bit tricky as outputs don't have an upper bound.
The NaNs in the loss function is mostly likely because you have exploding gradients.
The reason that it does not show NaN when you use Adam is that Adam adapts the learning rate. Adam works most of the times, so avoid using SGD as long as you don't have a specific reason.
I am not sure what your dataset contains but, you can try:
Adding L2 regularization
Normalizing inputs
Increasing batch size.
I am trying to use a CNN for classification. My training data is shown in the picture below and has 9923 pieces of data with each piece containing 1k numeric values.
My current model has only around 10 percent accuracy and I am wondering if anyone knows if I am doing something wrong.
model = Sequential()
model.add(Conv1D(64,3, activation ='relu', input_shape= (1000, 1)))
model.add(MaxPooling1D(2))
model.add(Conv1D(64,3, activation ='relu'))
model.add(MaxPooling1D(pool_size=(2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(28, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs = 30, validation_split = 0.1)
I have recently written and run this code to train a CNN with Theano and Keras:
#Building the model
model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,8,182)))
model.add(Convolution2D(32, 3, 3, activation='relu'))
model.add(Convolution2D(32, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
#Compiling the CNN
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['binary_accuracy'])
#Fitting data and training the model
model.fit(X_train, y_train, batch_size=32, nb_epoch=100, verbose=1)
#Saving weights
model.save_weights('trained_cnn.h5', overwrite=True)
I tested it on my CPU, and each epoch took about 3 minutes. A sample output for the first epoch is this:
Epoch 1/100
72000/72000 [==============================] - 204s - loss: 0.6935 - binary_accuracy: 0.5059
Now, I have migrated to a Nvidia Titan X GPU. I have also been forced to move to Keras 2 and thus have updated my code as follows, implementing the necessary changes:
#Building the model
model = Sequential()
model.add(Conv2D(32, 3, activation='relu', input_shape=(1,8,182)))
model.add(Conv2D(32, 3, activation='relu'))
model.add(Conv2D(32, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
#Compiling the CNN
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['binary_accuracy'])
#Fitting data and training the model
model.fit(X_train, y_train, batch_size=32,epochs=100, verbose=2)
#Saving weights
model.save_weights('trained_cnn_b_2.h5', overwrite=True)
Now whenever I run on my GPU, the program just gets stuck saying
epoch 1/100
and nothing happens after this, even when I wait for more than 10 minutes.
Why is this happening and how can I fix it? I haven't changed any of my code, besides the Keras functions. No errors are thrown. Where am I going wrong? Is there something wrong with the verbose command which is stopping the program from executing?
Edit 1: I have left my set up running overnight, but there is still no execution after that line.
Edit 2: I am using CUDA 7.5.17
Edit 3: This program from here works perfectly fine. It completes execution in less than 10 seconds, as expected.
# Create your first MLP in Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10)
# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
NOTE: I have verified that my GPU is working completely fine.
EDIT: I migrated to Tensorflow, and it had no problems with the CUDA version being below 9. With Tensorflow, the program executes perfectly.
I have created with keras a neural network for predicting addition.
I have 2 inputs and 1 output (result of adding the 2 inputs).
I trained my neural network with tensorflow and then I tried to predict addition but the program returns 0 or 1 value not 3,4,5,etc.
This is my code :
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:2]
Y = dataset[:,2]
# create model
model = Sequential()
model.add(Dense(12, input_dim=2, init='uniform', activation='relu'))
model.add(Dense(2, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
And my file data.csv:
1,2,3
3,3,6
4,5,9
10,8,18
1,3,4
5,3,8
For example:
1+2=3
3+3=6
4+5=9
...etc.
But I get this as output : 0,1,0,0,1,0,1...
Why didn't I get the output as 3,6,9...?
i updated code for use other loss function but i have same error :
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:2]
Y = dataset[:,2]
# create model
model = Sequential()
model.add(Dense(12, input_dim=2, init='uniform', activation='relu'))
model.add(Dense(2, init='uniform', activation='relu'))
#model.add(Dense(1, init='uniform', activation='sigmoid'))
model.add(Dense(1, input_dim=2, init='uniform', activation='linear'))
# Compile model
#model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
outout=1,1,1,3,1,1,...etc
As #ebeneditos mentioned, you need to change your activation function in the last layer to something other than sigmoid. You can try changing it to linear.
model.add(Dense(1, init='uniform', activation='linear'))
You should also change your loss function to something like mean squared error, as your problem is more of a regression problem than a classification problem (binary_crossentropy is used as a loss function for binary classification problems)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
This is due to the Sigmoid function you have in the last layer. As it is defined:
It can only take values from 0 to 1. You should change last layer's activation function.
You can try this instead (with Dense(8) instead of Dense(2)):
# Create model
model = Sequential()
model.add(Dense(12, input_dim=2, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='linear'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)