I try to fit keras network, but in each epoch loss is 'nan' and accuracy doesn't change... I tried to change epoch, layers count, neurons count, learning rate, optimizers, I checked nan data in datasets, normalize data by different ways, but problem was not solved. Thanks for your help.
np.random.seed(1337)
# example of input vector: [-1.459746, 0.2694708, ... 0.90043]
# example of output vector: [1, 0] or [0, 1]
model = Sequential()
model.add(Dense(1000, activation='tanh', init='normal', input_dim=503))
model.add(Dense(2, init='normal', activation='softmax'))
opt = optimizers.sgd(lr=0.01)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, batch_size=1000, nb_epoch=100, verbose=1)
99804/99804 [==============================] - 5s 52us/step - loss: nan - acc: 0.4938
Epoch 1/100
99804/99804 [==============================] - 5s 49us/step - loss: nan - acc: 0.4938
Epoch 2/100
99804/99804 [==============================] - 5s 51us/step - loss: nan - acc: 0.4938
Epoch 3/100
99804/99804 [==============================] - 5s 52us/step - loss: nan - acc: 0.4938
Epoch 4/100
99804/99804 [==============================] - 5s 52us/step - loss: nan - acc: 0.4938
Epoch 5/100
99804/99804 [==============================] - 5s 51us/step - loss: nan - acc: 0.4938
...
Oh, problem has been found! After normalization, one nan neuron appeared in the input vector
First convert your output to categorical, as described in Keras documentation:
Note: when using the categorical_crossentropy loss, your targets should be in categorical format. In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical:
from keras.utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)
Related
I have a time series imbalanced dataset on which I have to perform binary classification. I cannot split training and test sets randomly or even perform stratify on them. The issue is that while training the model validation accuracy is always 1. I realize this has somethings to do with the train-test split but I may be wrong.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=None, shuffle=False)
from collections import Counter
print(Counter(y))
print(Counter(y_train))
print(Counter(y_test))
Counter({0.0: 55534, 1.0: 10000})
Counter({0.0: 9995, 1.0: 3111})
Counter({0.0: 45539, 1.0: 6889})
model = Sequential()
#First Hidden Layer
model.add(Dense(128, activation='relu', kernel_initializer='random_normal', input_dim=19))
#model.add(Dropout(0.3))
#Second Hidden Layer
model.add(Dense(64, activation='relu', kernel_initializer='random_normal'))
#Output Layer
model.add(Dense(1, activation='sigmoid', kernel_initializer='random_normal'))
history = model.fit(X_train,y_train, batch_size=128, validation_split=0.1, epochs=50)
Train on 11795 samples, validate on 1311 samples
Epoch 1/50
11795/11795 [==============================] - 0s 34us/step - loss: 1.1359 - accuracy: 0.8719 - val_loss: 4.2016e-18 - val_accuracy: 1.0000
Epoch 2/50
11795/11795 [==============================] - 0s 12us/step - loss: 0.1247 - accuracy: 0.9442 - val_loss: 1.0255e-19 - val_accuracy: 1.0000
Epoch 3/50
11795/11795 [==============================] - 0s 13us/step - loss: 0.1177 - accuracy: 0.9462 - val_loss: 3.2516e-23 - val_accuracy: 1.0000
Epoch 4/50
11795/11795 [==============================] - 0s 12us/step - loss: 0.1103 - accuracy: 0.9519 - val_loss: 1.1607e-23 - val_accuracy: 1.0000
Epoch 5/50
11795/11795 [==============================] - 0s 13us/step - loss: 0.0805 - accuracy: 0.9739 - val_loss: 6.2345e-26 - val_accuracy: 1.0000
Appreciate any help on this problem.Thanks
I have a LSTM model that predicts tomorrow's water outflow volume based on today's outflow volume, temperature, and precipitation.
model = Sequential()
model.add(LSTM(units=24, return_sequences=True,
input_shape=(X_Train.shape[1],X_Train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(20, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
history = model.fit(X_Train, Y_Train, epochs=8,
validation_data=(X_Test, Y_Test))
While training I got:
Epoch 1/8
4638/4638 [==============================] - 78s 17ms/step - loss:
1.9951e-04 - val_loss: 1.5074e-04
Epoch 2/8
4638/4638 [==============================] - 77s 17ms/step - loss:
9.6735e-05 - val_loss: 1.0922e-04
Epoch 3/8
4638/4638 [==============================] - 78s 17ms/step - loss:
6.5202e-05 - val_loss: 5.9079e-05
Epoch 4/8
4638/4638 [==============================] - 77s 17ms/step - loss:
5.1011e-05 - val_loss: 4.9478e-05
Epoch 5/8
4638/4638 [==============================] - 77s 17ms/step - loss:
4.3992e-05 - val_loss: 5.1148e-05
Epoch 6/8
4638/4638 [==============================] - 77s 17ms/step - loss:
3.9901e-05 - val_loss: 4.2351e-05
Epoch 7/8
4638/4638 [==============================] - 74s 16ms/step - loss:
3.6884e-05 - val_loss: 4.0763e-05
Epoch 8/8
4638/4638 [==============================] - 74s 16ms/step - loss:
3.5287e-05 - val_loss: 3.6736e-05
But when I manually calculate the mean-square error, I get a different result
mean_square_root = mean_squared_error(predicted_y_values_unnor, Y_test_actual)
130.755469707972
Manual Calculation:
I wanted to know why they would the validation loss be different while training than while calculating manually. How is the loss calculated while training?
The loss you have chosen is mean_squared_error in the line
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
That is the loss your LSTM model is minimizing.
The Mean Squared Error, or MSE, loss is the default loss to use for regression problems. Mean squared error is calculated as the average of the squared differences between the predicted and actual values. The result is always positive regardless of the sign of the predicted and actual values and a perfect value is 0.0. The squaring means that larger mistakes result in more error than smaller mistakes, meaning that the model is punished for making larger mistakes.
LSTM is a general model and you can choose lots of different loss functions. Here is keras inbuilt available functions list
https://keras.io/api/losses/
You need to select a loss function as per your problem.
I was just having the same exact problem and i solved it with flatten() function.
When you do: "mean_squared_error(predicted_y_values_unnor, Y_test_actual)" the shape of "Y_test_actual" is just a one dimensional array. Example: array([297, 290, 308, 308, 214]) but the other argument is a bidimensional array in wich one of the dimensions is 1. Example: "array([300.53693, 299.9416 , 295.61334, 218.2563 , 219.74983],
dtype=float32)"
you just have to do flatten() on "predicted_y_values_unnor"
I'm creating a very simple 2 layer feed forward network but am finding that the loss is not updating at all. I have some ideas but I wanted to get additional feedback/guidance.
Details about the data:
X_train:
(336876, 158)
X_dev:
(42109, 158)
Y_train counts:
0 285793
1 51083
Name: default, dtype: int64
Y_dev counts:
0 35724
1 6385
Name: default, dtype: int64
And here is my model architecture:
# define the architecture of the network
model = Sequential()
model.add(Dense(50, input_dim=X_train.shape[1], init="uniform", activation="relu"))
model.add(Dense(3print("[INFO] compiling model...")
adam = Adam(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128, verbose=1)Dense(1, activation = 'sigmoid'))
Now, with this, my loss after the first few epochs are as follows:
Epoch 1/12
336876/336876 [==============================] - 8s - loss: 2.4441 - acc: 0.8484
Epoch 2/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 3/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
Epoch 4/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 5/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 6/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 7/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 8/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
Epoch 9/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
And when I test the model after, my f1_score is 0. My main thought was that I may need more data but I'd still expect it to perform better than it is now on the test set. Could it be that it is overfitting? I added Dropout but no luck there either.
Any help would be much appreciated.
at first glance, I believe that your learning rate is too high. Also, please consider normalizing your data especially if different features have different ranges of values (look at Scaling). Also, please consider changing your layer activations depending on whether your labels are multi-class or not. Assuming your code is of this form (you seem to have some typos in problem description):
# define the architecture of the network
model = Sequential()
#also what is the init="uniform" argument? I did not find this in keras documentation, consider removing this.
model.add(Dense(50, input_dim=X_train.shape[1], init="uniform",
activation="relu"))
model.add(Dense(1, activation = 'sigmoid')))
#a slightly more conservative learning rate, play around with this.
adam = Adam(lr=0.0001)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128,
verbose=1)
This should lead the loss to converge. If not, please consider deepening your neural net (think about how many parameters you may need).
Consider adding the classification layer before compiling your model.
model.add(Dense(1, activation = 'sigmoid'))
adam = Adam(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128, verbose=1)
I am training a model on fruits360 dataset from kaggle.
I have 0 dense layers, and 3 convolutional layers in my keras model. My input shape is (60,60,3) as the images are loaded in rgb format. please help me to troubleshoot what is the problem with this model why is it not training properly. I've tried with different combinations of layers but the accuracy and loss remains constant no matter whatever you change.
following is the model:
dense_layers = [0]
layer_sizes = [64]
conv_layers = [3]
for dense_layer in dense_layers:
for layer_size in layer_sizes:
for conv_layer in conv_layers:
NAME = "{}-conv-{}-nodes-{}-dense-{}".format(conv_layer, layer_size, dense_layer, int(time.time()))
print(NAME)
model = Sequential()
model.add(Conv2D(layer_size, (3, 3), input_shape=(60, 60, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
for l in range(conv_layer-1):
model.add(Conv2D(layer_size, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
for _ in range(dense_layer):
model.add(Dense(layer_size))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
tensorboard = TensorBoard(log_dir="logs/")
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'],
)
model.fit(X_norm, y,
batch_size=32,
epochs=10,
validation_data=(X_norm_test,y_test),
callbacks=[tensorboard])
but the accuracy remains constant as follows:
Epoch 1/10
42798/42798 [==============================] - 27s 641us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 2/10
42798/42798 [==============================] - 27s 638us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 3/10
42798/42798 [==============================] - 27s 637us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 4/10
42798/42798 [==============================] - 27s 635us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 5/10
42798/42798 [==============================] - 27s 635us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 6/10
42798/42798 [==============================] - 27s 631us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 7/10
42798/42798 [==============================] - 27s 631us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 8/10
42798/42798 [==============================] - 27s 631us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 9/10
42798/42798 [==============================] - 27s 635us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
Epoch 10/10
42798/42798 [==============================] - 27s 626us/step - loss: nan - acc: 0.0115 - val_loss: nan - val_acc: 0.0114
what can I do to train this model properly. To increase the accuracy.
I'm not sure sparse_categorical_crossentropy is a proper loss for an output with only 1 unit.
Notice your loss is nan. This means there is a mathematical error in your model/data/loss somewhere. Many times this is caused by division by zero, overflowing numbers, etc.
I suppose you should be using 'binary_crossentropy' as a loss function.
Notice that you will still have a risk of frozen losses because of 'relu' activations. If this happens, you can add BatchNormalization() layers before the Activation('relu') layers.
Please take into account #desertnaut's comments. You're creating a new Sequential model inside every loop.
The bug is with the last dense layer. It is set to one (i.e., one class). The Fruits-360 dataset (depending on the version) has upwards of a 100 classes. The 1% accuracy is also another giveaway. The labels from the training set have 100 different values. A dense layer of one output node can only (randomly) get it right 1 out of a 100 -- hence the 1%.
The 'sparse_categorical_crossentropy' is okay as long as the labels are not one-hot encoded. If so, you should use 'categorical_crossentropy'. The Fruits-360 dataset when used with Adam tends not to converge at high learning rates (like 0.1 and 0.01), best to set the learning rate low to something between 0.001 and 0.0001.
I'm following this tutorial (section 6: Tying it All Together), with my own dataset. I can get the example in the tutorial working, no problem, with the sample dataset provided.
I'm getting a binary cross-entropy error that is negative, and no improvements as epochs progress. I'm pretty sure binary cross-entropy should always be positive, and I should see some improvement in the loss. I've truncated the sample output (and code call) below to 5 epochs. Others seem to run into similar problems sometimes when training CNNs, but I didn't see a clear solution in my case. Does anyone know why this is happening?
Sample output:
Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
10240/10240 [==============================] - 2s - loss: -5.5378 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 2/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 3/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 4/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 5/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
My code:
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import History
history = History()
seed = 7
np.random.seed(seed)
dataset = np.loadtxt('train_rows.csv', delimiter=",")
#print dataset.shape (10240, 64)
# split into input (X) and output (Y) variables
X = dataset[:, 0:(dataset.shape[1]-2)] #0:62 (63 of 64 columns)
Y = dataset[:, dataset.shape[1]-1] #column 64 counting from 0
#print X.shape (10240, 62)
#print Y.shape (10240,)
testset = np.loadtxt('test_rows.csv', delimiter=",")
#print testset.shape (2560, 64)
X_test = testset[:,0:(testset.shape[1]-2)]
Y_test = testset[:,testset.shape[1]-1]
#print X_test.shape (2560, 62)
#print Y_test.shape (2560,)
num_units_per_layer = [100, 50]
### create model
model = Sequential()
model.add(Dense(100, input_dim=(dataset.shape[1]-2), init='uniform', activation='relu'))
model.add(Dense(50, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
## Fit the model
model.fit(X, Y, validation_data=(X_test, Y_test), nb_epoch=5, batch_size=128)
I should have printed out my response variable. The categories were labelled as 1 and 2 instead of 0 and 1, which confused the classifier.