LSTM, Exploding gradients or wrong approach?

LSTM, Exploding gradients or wrong approach? - python

Having a dataset of monthly activity of users, segment to country and browser. each row is 1 day of user activity summed up and a score for that daily activity. For example: number of sessions per day is one feature. The score is a floating point number calculated from that daily features.
My goal is to try and predict the "average user" score at the end of the month using just 2 days of users data.
I have 25 month of data, some are full and some have only partial of the total days, in order to have a fixed batch size I've padded the sequences like so:
from keras.preprocessing.sequence import pad_sequences
padded_sequences = pad_sequences(sequences, maxlen=None, dtype='float64', padding='pre', truncating='post', value=-10.)
so sequences with less then the max where padded with -10 rows.
I've decided to create an LSTM model to digest the data, so at the end of each batch the model should predict the average user score. Then later I'll try to predict using just 2 days sample.
My Model look like that:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking
from tensorflow.keras import metrics
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.optimizers import Adam
import datetime, os
model = Sequential()
opt = Adam(learning_rate=0.0001, clipnorm=1)
num_samples = train_x.shape[1]
num_features = train_x.shape[2]
model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(Dropout(0.3))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(64, return_sequences=False, stateful=False, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam' ,metrics=['acc',metrics.mean_squared_error])
logdir = os.path.join(logs_base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=logdir, update_freq=1)
model.summary()
Model: "sequential_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
masking_5 (Masking) (None, 4283, 16) 0
_________________________________________________________________
lstm_20 (LSTM) (None, 4283, 64) 20736
_________________________________________________________________
dropout_14 (Dropout) (None, 4283, 64) 0
_________________________________________________________________
lstm_21 (LSTM) (None, 64) 33024
_________________________________________________________________
dropout_15 (Dropout) (None, 64) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 65
=================================================================
Total params: 53,825
Trainable params: 53,825
Non-trainable params: 0
_________________________________________________________________
While training I get NaN value on the 19th epoch
Epoch 16/1000
16/16 [==============================] - 14s 855ms/sample - loss: 298.8135 - acc: 0.0000e+00 - mean_squared_error: 298.8135 - val_loss: 220.7307 - val_acc: 0.0000e+00 - val_mean_squared_error: 220.7307
Epoch 17/1000
16/16 [==============================] - 14s 846ms/sample - loss: 290.3051 - acc: 0.0000e+00 - mean_squared_error: 290.3051 - val_loss: 205.3393 - val_acc: 0.0000e+00 - val_mean_squared_error: 205.3393
Epoch 18/1000
16/16 [==============================] - 14s 869ms/sample - loss: 272.1889 - acc: 0.0000e+00 - mean_squared_error: 272.1889 - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 19/1000
16/16 [==============================] - 14s 852ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 20/1000
16/16 [==============================] - 14s 856ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 21/1000
I tried to apply the methods described here with no real success.
Update:
I've changed my activation from relu to tanh and it solved the NaN issue. However it seems that the accuracy of my model stays 0 while the loss goes down
Epoch 100/1000
16/16 [==============================] - 14s 869ms/sample - loss: 22.8179 - acc: 0.0000e+00 - mean_squared_error: 22.8179 - val_loss: 11.7422 - val_acc: 0.0000e+00 - val_mean_squared_error: 11.7422
Q: What am I doing wrong here?

You are solving a regression task, using accuracy is not meaningful here.
Use mean_absollute_error to check if your error is decreasing over time or not.
Instead of blindly predicting the score, you can make the score bounded to (0, 1).
Just use a min max normalization to bring the output in a range https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
After that you can use sigmoid in last layer.
Also, you're choosing slightly longer sequences for this simple model 4283, how skewed your sequence lengths are?
Maybe do a histogram plot of all the signal length and see if 4283 is, in fact, a good choice or not. Maybe you can bring this down to something like 512 which may become easier for the model.
Also, padding with -10 seems a pretty weird choice is it something specific for your data or you're choosing randomly? This -10 also suggests you're not normalizing your input data which can become a problem with an LSTM with relu, maybe you should try to normalizing it before training.
After these add a validation plot of the mean absolute error if the performance is still not good.

Related

Why is the tensorflow maxout not calculating the gradient respectively where is the mistake?

I am trying to use the tensorflow maxout implementation (https://www.tensorflow.org/addons/api_docs/python/tfa/layers/Maxout) but struggle with it;
I try to illustrate my problem: If I have the following
d=3
x_in=Input(shape=d)
x_out=Dense(d, activation='relu')(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
Then it is working normally, i.e. the loss is continuously getting smaller and I can get the estimated weights:
model.layers[1].get_weights()
Out[141]:
[array([[-0.15133516, -0.14892222, -0.64674205],
[ 0.34437487, 0.7822309 , -0.08931279],
[-0.8330534 , -0.13827904, -0.23096593]], dtype=float32),
array([-0.03069788, -0.03311999, -0.02603031], dtype=float32)]
However, when I want to use a maxout activation instead, things do not work out
d=3
x_in=Input(shape=d)
x_out = tfa.layers.Maxout(3)(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
The loss stays constant for all Epochs and
model.layers[1].get_weights()
Out[141]: []
Where is my mistake?

It will only work in combination with another layer, for example a Dense layer. Also, the Maxout layer itself does not have any trainable weights as you can see in the model summary but it does have a hyperparameter num_units:
import tensorflow as tf
import tensorflow_addons as tfa
d=3
x_in=tf.keras.layers.Input(shape=d)
x = tf.keras.layers.Dense(3)(x_in)
x_out = tfa.layers.Maxout(3)(x)
model = tf.keras.Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
print(model.summary())
Epoch 1/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0404
Epoch 2/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0361
Epoch 3/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0322
Epoch 4/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0283
Epoch 5/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0244
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 3)] 0
dense_5 (Dense) (None, 3) 12
maxout_4 (Maxout) (None, 3) 0
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
None
Maybe also take a look at the paper regarding Maxout:
The maxout model is simply a feed-forward achitecture, such as a multilayer perceptron or deep convolutional neural network, that uses a new type of activation function: the maxout unit.

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

So, I am trying to code a multivariate LSTM for time series forecasting, and in my model, the losses decrease but accuracy metrics do not change at all. I tried changing number of neurons, layers, learning rate, early stopping, activation function on the output layer, and l2 regularization but nothing works. I am a beginner in machine learning, and so any help would be appreciated.Most of my efforts were like throwing stones in the dark. I am attaching a the GitHub link to my code, as well as a few of the training epochs.
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
model = Sequential()
model.add(LSTM(64,activation='sigmoid',return_sequences=True,input_shape = (trainX.shape[1],trainX.shape[2])))
model.add(LSTM(32,activation='sigmoid',return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(trainY.shape[1]))
opt = Adam(learning_rate= 1e-3)
model.compile(optimizer='adam',loss = 'mse', metrics=['accuracy'])
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 200, 64) 19200
_________________________________________________________________
lstm_7 (LSTM) (None, 32) 12416
_________________________________________________________________
dropout_3 (Dropout) (None, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 33
=================================================================
Total params: 31,649
Trainable params: 31,649
Non-trainable params: 0
es_callback = EarlyStopping(monitor='val_loss', patience=3)
history = model.fit(trainX,trainY,epochs=40,batch_size= 32,verbose=1,validation_split=0.2, callbacks= [es_callback])
Epoch 1/40
214/214 [==============================] - 58s 169ms/step - loss: 0.1663 - accuracy: 0.0000e+00 - val_loss: 0.0483 - val_accuracy: 5.8617e-04
Epoch 2/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0497 - accuracy: 0.0000e+00 - val_loss: 0.0446 - val_accuracy: 5.8617e-04
Epoch 3/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0309 - accuracy: 0.0000e+00 - val_loss: 0.0092 - val_accuracy: 5.8617e-04
Epoch 4/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0143 - accuracy: 0.0000e+00 - val_loss: 0.0230 - val_accuracy: 5.8617e-04
Epoch 5/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0115 - accuracy: 0.0000e+00 - val_loss: 0.0160 - val_accuracy: 5.8617e-04
Epoch 6/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0099 - accuracy: 0.0000e+00 - val_loss: 0.0172 - val_accuracy: 5.8617e-04
My code: https://github.com/RiddhimanRaut/Deep-Learning-based-CPR-estimation/blob/main/CPR_prediction_multivariate_LSTM_tobetrialled_1.ipynb
Thank you!

Accuracy is the metric for classification tasks. To measure if a regression model is good or not, measurement such as MSE can be applied.
I think the discussion here can provide more information.

Keras LSTM-based neural network doesn't learn

I am very new to neural networks in general, and I'm trying to build a basic recurrent network for a student project.
So, here's my code:
import tensorflow
from tensorflow import keras
import numpy
X = numpy.zeros((21, 210, 3))
# Some code filling X from a file
Y = numpy.array([0,0,0,0,0,0, -1,-1,-1,-1,-1,-1,-1,-1,-1, 1,1,1,1,1,1])
Y = Y*0.5 + 0.5
model = keras.models.Sequential()
#model.add(keras.layers.InputLayer(input_shape=(210, 3))) -commented out as it didn't make a difference
model.add(keras.layers.LSTM(32, input_shape=(210, 3), activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(X, Y, epochs=5)
So, X is 21 "pages" of data with 3 features along 210 timesteps - which is how LSTM should expect it, and Y is an array of 21 values associated with each of the pages of X.
The network is supposed to learn to predict associated Y value based on a page of X.
The problem is that runnig the program speeds through the training and the accuracy stays constant (with some occasional, seemingly random spikes like 0.4286 shown below):
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 32) 4608
_________________________________________________________________
dense (Dense) (None, 1) 33
=================================================================
Total params: 4,641
Trainable params: 4,641
Non-trainable params: 0
_________________________________________________________________
2021-07-07 14:41:56.530633: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
1/1 [==============================] - 1s 961ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 2/5
1/1 [==============================] - 0s 39ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 3/5
1/1 [==============================] - 0s 35ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 4/5
1/1 [==============================] - 0s 36ms/step - loss: 0.0000e+00 - accuracy: 0.4286
Epoch 5/5
1/1 [==============================] - 0s 34ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Process finished with exit code 0
What am I doing wrong? Again, it is all very new to me, so i assume it can be almost anything.
Thanks in advance

How to add Gaussian noise with varying std during training?

I am training a CNN using keras and tensorflow. I would like to add Gaussian noise to my input data during training and reduce the percentage of the noise in further steps. What I do right now, I use:
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
inputs = Input(shape=x_train_n.shape[1:])
bn0 = BatchNormalization(axis=1, scale=True)(inputs)
g0 = GaussianNoise(0.5)(bn0)
The variable that GaussianNoise takes is the standard deviation of the noise distribution and I couldn't assign a dynamic value to it, how can I add for example a noise, and then decrease this value based on the epoch that I am in?

You can simply design a custom callback which changes the stddev before training for a epoch.
Reference:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise
https://www.tensorflow.org/guide/keras/custom_callback
from tensorflow.keras.layers import Input, Dense, Add, Activation
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
inputs = Input(shape=100)
bn0 = BatchNormalization(axis=1, scale=True)(inputs)
g0 = GaussianNoise(0.5)(bn0)
d0 = Dense(10)(g0)
model = Model(inputs, d0)
model.compile('adam', 'mse')
model.summary()
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
self.model.layers[2].stddev = random.uniform(0, 1)
print('updating sttdev in training')
print(self.model.layers[2].stddev)
X_train = np.zeros((10,100))
y_train = np.zeros((10,10))
noise_change = MyCustomCallback()
model.fit(X_train,
y_train,
batch_size=32,
epochs=5,
callbacks = [noise_change])
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 100)] 0
_________________________________________________________________
batch_normalization_5 (Batch (None, 100) 400
_________________________________________________________________
gaussian_noise_5 (GaussianNo (None, 100) 0
_________________________________________________________________
dense_5 (Dense) (None, 10) 1010
=================================================================
Total params: 1,410
Trainable params: 1,210
Non-trainable params: 200
_________________________________________________________________
Epoch 1/5
updating sttdev in training
0.984045691131548
1/1 [==============================] - 0s 1ms/step - loss: 1.6031
Epoch 2/5
updating sttdev in training
0.02821459469022025
1/1 [==============================] - 0s 742us/step - loss: 1.5966
Epoch 3/5
updating sttdev in training
0.6102984511769268
1/1 [==============================] - 0s 1ms/step - loss: 1.8818
Epoch 4/5
updating sttdev in training
0.021155188690323512
1/1 [==============================] - 0s 1ms/step - loss: 1.2032
Epoch 5/5
updating sttdev in training
0.35950227285165115
1/1 [==============================] - 0s 2ms/step - loss: 1.8817
<tensorflow.python.keras.callbacks.History at 0x7fc67ce9e668>

Keras - Single pixel classification without ImageDataGenerator

This is a newbie question, but I just can't get the simplest Keras experiment work. I went through a course and its samples work well, so my computer is set up correctly.
I've a few thousand 16x12 images, called "GGYRBGBBW.png", "BBYWBRBBB.png" and so on. The images have a single color with minimal shade differences and are reduced to a single pixel when loading. The first character of the filenames serves as training labels (e.g. green images' names start with 'G'). I need to build and train a simple model that indicates one of the 6 possible colors in the image. (This is a first learning step towards a much more complicated project).
I don't want to use ImageDataGenerator because this whole project will grow beyond that a simple directory-structure categorization can do, and I'll do the image randomization with my own external image generator.
I created a Keras model that looks like this:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 3) 0
_________________________________________________________________
dense_1 (Dense) (None, 6) 24
=================================================================
Total params: 24
Trainable params: 24
Non-trainable params: 0
So, pretty simple - input shape is only three values (RGB), flattened, then 6 output neurons for each color category.
When running the experiments, beginning with random initial values, accuracy stays super low, sometimes 0.
210/210 [==============================] - 0s 2ms/step - loss: 7.2430 - acc: 0.2095 - val_loss: 6.7980 - val_acc: 0.2000
Epoch 2/50
210/210 [==============================] - 0s 10us/step - loss: 7.2411 - acc: 0.2095 - val_loss: 9.6617 - val_acc: 0.2000
Epoch 3/50
210/210 [==============================] - 0s 5us/step - loss: 9.9256 - acc: 0.2095 - val_loss: 9.6598 - val_acc: 0.2000
Epoch 4/50
210/210 [==============================] - 0s 5us/step - loss: 9.9236 - acc: 0.2095 - val_loss: 9.6579 - val_acc: 0.2000
Epoch 5/50
210/210 [==============================] - 0s 10us/step - loss: 9.9217 - acc: 0.2095 - val_loss: 9.6560 - val_acc: 0.2000
Epoch 6/50
210/210 [==============================] - 0s 10us/step - loss: 9.9197 - acc: 0.2095 - val_loss: 9.6541 - val_acc: 0.2000
I must be missing something trivial, but since I'm a noob in this, I can't figure out what. Here's the complete source code:
from __future__ import print_function
import random
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
os.system('cls')
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras import backend as K
from keras.preprocessing.image import img_to_array
import numpy as np
num_classes = 6
batch_size = 300
epochs = 50
Image_width, Image_height = 1, 1
train_dir = './OneColorTrainingData'
colors="WYBGRO"
def load(numOfPics):
X = []
y = []
print("Reading training images")
allFiles = os.listdir(train_dir)
randomFiles = random.choices(allFiles, k=numOfPics)
for f in randomFiles:
path = os.path.join(train_dir, f)
img = keras.preprocessing.image.load_img(path, grayscale=False, target_size=(Image_width, Image_width))
img = img_to_array(img)
img /= 255
X.append(img)
y.append(colors.index(f[0]))
y = keras.utils.to_categorical(y, num_classes=num_classes)
return X, y
Data, labels = load(batch_size)
print(str(len(Data)) + " training files loaded")
print(labels)
model = Sequential()
model.add(Flatten(input_shape=(Image_height, Image_width, 3)))
model.add(Dense(num_classes, activation=K.tanh))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])
print(model.input_shape)
print(model.summary())
hist = model.fit(np.array(Data), np.array(labels), batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.3)
score = model.evaluate(np.array(Data), np.array(labels), verbose=0)
print('Test loss: ', score[0])
print('Test accuracy: ', score[1])
Any help would be appreciated.

You have K.tanh as your finaly Dense layer activation. When doing 1-of-many classification we often use softmax instead which will produce a probability distribution over the colour classes:
model.add(Dense(num_classes, activation='softmax'))
Now your target labels will be one-hot vectors [0,0,1,0,0,0] indication which class it is. You can also use sparse_categorical_crossentropy loss and give labels as class integers 2 as your target. It means the same thing in this context.

I'm not positive, but I think you need a hidden layer. You currently have a single affine transformation on your input: h_0 = W_0*x + b_0. This is passed through a K.tanh, so your logits are simply y = tanh(h_0). I have a hunch that for a small enough problem like this you may be able to prove that this cannot converge, but I am not certain.
I would just add a second dense layer, and use softmax for your final output as #nuric suggests:
model = Sequential()
model.add(Flatten(input_shape=(Image_height, Image_width, 3)))
model.add(Dense(10, activation=K.tanh))
model.add(Dense(num_classes, activation='softmax'))

OK, I figured it out. I needed to switch to softmax (thanks all who suggested), and use much more epochs because convergence was random and slow. I didn't think I needed to use more epochs because nothing changed after the first few - but it turned out that once I had 500 instead of 50, the network managed to learn all the colors one by one and hit 100% accuracy (with a single output layer) almost every time. Thank you all for the help!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

LSTM, Exploding gradients or wrong approach? - python

Related

Why is the tensorflow maxout not calculating the gradient respectively where is the mistake?

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

Keras LSTM-based neural network doesn't learn

How to add Gaussian noise with varying std during training?

Keras - Single pixel classification without ImageDataGenerator

Categories

Resources