Keras LSTM-based neural network doesn't learn

Keras LSTM-based neural network doesn't learn - python

I am very new to neural networks in general, and I'm trying to build a basic recurrent network for a student project.
So, here's my code:
import tensorflow
from tensorflow import keras
import numpy
X = numpy.zeros((21, 210, 3))
# Some code filling X from a file
Y = numpy.array([0,0,0,0,0,0, -1,-1,-1,-1,-1,-1,-1,-1,-1, 1,1,1,1,1,1])
Y = Y*0.5 + 0.5
model = keras.models.Sequential()
#model.add(keras.layers.InputLayer(input_shape=(210, 3))) -commented out as it didn't make a difference
model.add(keras.layers.LSTM(32, input_shape=(210, 3), activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(X, Y, epochs=5)
So, X is 21 "pages" of data with 3 features along 210 timesteps - which is how LSTM should expect it, and Y is an array of 21 values associated with each of the pages of X.
The network is supposed to learn to predict associated Y value based on a page of X.
The problem is that runnig the program speeds through the training and the accuracy stays constant (with some occasional, seemingly random spikes like 0.4286 shown below):
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 32) 4608
_________________________________________________________________
dense (Dense) (None, 1) 33
=================================================================
Total params: 4,641
Trainable params: 4,641
Non-trainable params: 0
_________________________________________________________________
2021-07-07 14:41:56.530633: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
1/1 [==============================] - 1s 961ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 2/5
1/1 [==============================] - 0s 39ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 3/5
1/1 [==============================] - 0s 35ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 4/5
1/1 [==============================] - 0s 36ms/step - loss: 0.0000e+00 - accuracy: 0.4286
Epoch 5/5
1/1 [==============================] - 0s 34ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Process finished with exit code 0
What am I doing wrong? Again, it is all very new to me, so i assume it can be almost anything.
Thanks in advance

Related

Why is the tensorflow maxout not calculating the gradient respectively where is the mistake?

I am trying to use the tensorflow maxout implementation (https://www.tensorflow.org/addons/api_docs/python/tfa/layers/Maxout) but struggle with it;
I try to illustrate my problem: If I have the following
d=3
x_in=Input(shape=d)
x_out=Dense(d, activation='relu')(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
Then it is working normally, i.e. the loss is continuously getting smaller and I can get the estimated weights:
model.layers[1].get_weights()
Out[141]:
[array([[-0.15133516, -0.14892222, -0.64674205],
[ 0.34437487, 0.7822309 , -0.08931279],
[-0.8330534 , -0.13827904, -0.23096593]], dtype=float32),
array([-0.03069788, -0.03311999, -0.02603031], dtype=float32)]
However, when I want to use a maxout activation instead, things do not work out
d=3
x_in=Input(shape=d)
x_out = tfa.layers.Maxout(3)(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
The loss stays constant for all Epochs and
model.layers[1].get_weights()
Out[141]: []
Where is my mistake?

It will only work in combination with another layer, for example a Dense layer. Also, the Maxout layer itself does not have any trainable weights as you can see in the model summary but it does have a hyperparameter num_units:
import tensorflow as tf
import tensorflow_addons as tfa
d=3
x_in=tf.keras.layers.Input(shape=d)
x = tf.keras.layers.Dense(3)(x_in)
x_out = tfa.layers.Maxout(3)(x)
model = tf.keras.Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
print(model.summary())
Epoch 1/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0404
Epoch 2/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0361
Epoch 3/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0322
Epoch 4/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0283
Epoch 5/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0244
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 3)] 0
dense_5 (Dense) (None, 3) 12
maxout_4 (Maxout) (None, 3) 0
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
None
Maybe also take a look at the paper regarding Maxout:
The maxout model is simply a feed-forward achitecture, such as a multilayer perceptron or deep convolutional neural network, that uses a new type of activation function: the maxout unit.

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

So, I am trying to code a multivariate LSTM for time series forecasting, and in my model, the losses decrease but accuracy metrics do not change at all. I tried changing number of neurons, layers, learning rate, early stopping, activation function on the output layer, and l2 regularization but nothing works. I am a beginner in machine learning, and so any help would be appreciated.Most of my efforts were like throwing stones in the dark. I am attaching a the GitHub link to my code, as well as a few of the training epochs.
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
model = Sequential()
model.add(LSTM(64,activation='sigmoid',return_sequences=True,input_shape = (trainX.shape[1],trainX.shape[2])))
model.add(LSTM(32,activation='sigmoid',return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(trainY.shape[1]))
opt = Adam(learning_rate= 1e-3)
model.compile(optimizer='adam',loss = 'mse', metrics=['accuracy'])
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 200, 64) 19200
_________________________________________________________________
lstm_7 (LSTM) (None, 32) 12416
_________________________________________________________________
dropout_3 (Dropout) (None, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 33
=================================================================
Total params: 31,649
Trainable params: 31,649
Non-trainable params: 0
es_callback = EarlyStopping(monitor='val_loss', patience=3)
history = model.fit(trainX,trainY,epochs=40,batch_size= 32,verbose=1,validation_split=0.2, callbacks= [es_callback])
Epoch 1/40
214/214 [==============================] - 58s 169ms/step - loss: 0.1663 - accuracy: 0.0000e+00 - val_loss: 0.0483 - val_accuracy: 5.8617e-04
Epoch 2/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0497 - accuracy: 0.0000e+00 - val_loss: 0.0446 - val_accuracy: 5.8617e-04
Epoch 3/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0309 - accuracy: 0.0000e+00 - val_loss: 0.0092 - val_accuracy: 5.8617e-04
Epoch 4/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0143 - accuracy: 0.0000e+00 - val_loss: 0.0230 - val_accuracy: 5.8617e-04
Epoch 5/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0115 - accuracy: 0.0000e+00 - val_loss: 0.0160 - val_accuracy: 5.8617e-04
Epoch 6/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0099 - accuracy: 0.0000e+00 - val_loss: 0.0172 - val_accuracy: 5.8617e-04
My code: https://github.com/RiddhimanRaut/Deep-Learning-based-CPR-estimation/blob/main/CPR_prediction_multivariate_LSTM_tobetrialled_1.ipynb
Thank you!

Accuracy is the metric for classification tasks. To measure if a regression model is good or not, measurement such as MSE can be applied.
I think the discussion here can provide more information.

TensorFlow not running correct number of epochs with no errors

I am very much novice at neural networks / machine learning. I am trying to learn more by using RotNet, a NN that will classify rotation angles in images. I am trying to train my network using the MNIST dataset, and have changed only one line of the repo (a log directory file path) but other than that have been able to run it successfully.
Here is how I am running it based on the README:
& .../Anaconda3/envs/tflow/python.exe .../RotNet/train/train_mnist.py
and then the output:
Using TensorFlow backend.
Input shape: (28, 28, 1)
60000 train samples
10000 test samples
2020-10-16 12:18:17.031214: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 28, 28, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 26, 26, 64) 640
_________________________________________________________________
conv2d_2 (Conv2D) (None, 24, 24, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 9216) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 1179776
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 360) 46440
=================================================================
Total params: 1,263,784
Trainable params: 1,263,784
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
1/468 [..............................] - ETA: 2:21 - loss: 5.8862 - angle_error: 87.14062020-10-16 12:18:18.337183: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
469/468 [==============================] - 61s 130ms/step - loss: 5.0338 - angle_error: 81.4492 - val_loss: 4.1144 - val_angle_error: 65.9470
Epoch 2/50
469/468 [==============================] - 61s 131ms/step - loss: 4.3072 - angle_error: 64.7485 - val_loss: 3.4630 - val_angle_error: 53.0140
Epoch 3/50
469/468 [==============================] - 63s 134ms/step - loss: 4.0303 - angle_error: 56.3245 - val_loss: 3.2241 - val_angle_error: 47.0283
Epoch 4/50
469/468 [==============================] - 63s 134ms/step - loss: 3.8824 - angle_error: 52.2043 - val_loss: 3.3227 - val_angle_error: 43.2439
Epoch 5/50
469/468 [==============================] - 63s 135ms/step - loss: 3.7982 - angle_error: 49.9996 - val_loss: 3.1930 - val_angle_error: 41.1242
Epoch 6/50
469/468 [==============================] - 73s 155ms/step - loss: 3.7288 - angle_error: 48.4027 - val_loss: 2.9600 - val_angle_error: 39.9322
Epoch 7/50
469/468 [==============================] - 63s 133ms/step - loss: 3.6781 - angle_error: 46.5616 - val_loss: 3.2243 - val_angle_error: 38.6193
Epoch 8/50
469/468 [==============================] - 62s 132ms/step - loss: 3.6439 - angle_error: 45.2133 - val_loss: 2.8629 - val_angle_error: 38.0046
Epoch 9/50
469/468 [==============================] - 62s 132ms/step - loss: 3.6132 - angle_error: 44.7204 - val_loss: 3.0085 - val_angle_error: 37.4514
Epoch 10/50
469/468 [==============================] - 62s 132ms/step - loss: 3.5817 - angle_error: 43.8439 - val_loss: 3.0073 - val_angle_error: 35.8109
The script train_mnist.py is located here and it specifies 50 epochs. I am getting no error, the program simply stops after the 8th or 10th epoch. I am at a loss for how to fix this issue. Any advice would be appreciated!

I took a quick look at the code. In it there is this line:
callbacks=[checkpointer, early_stopping, tensorboard]
The call back early_stopping by default monitors the validation loss. The code used for early stopping is set such that if the validation loss fails to improve for more than 2 consecutive epochs training will halt. That is why it does not train for 50 epochs. If you want it to continue training for the full 50 remove early_stopping from the line of code above. You can see that early_stopping is causing the training to terminate by changing the code in the script from
early_stopping = EarlyStopping(patience=2)
# change code to
early_stopping = EarlyStopping(patience=2, verbose=1)
From the training data this model does not appear to be training very well. I suggest you try transfer learning with MobileNet. Code below shows how to use it,
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False, input_shape=(img_size, img_size,3), pooling='max', weights='imagenet', dropout=.5)
x=mobile.layers[-1].output # this is the last layer in the mobilenet model the global max pooling layer
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x=Dense(126, activation='relu')(x)
x=Dropout(rate=.3, seed = 123)(x)
predictions=Dense (len(classes), activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
Adapt the above to your situation it should work much better
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])

LSTM, Exploding gradients or wrong approach?

Having a dataset of monthly activity of users, segment to country and browser. each row is 1 day of user activity summed up and a score for that daily activity. For example: number of sessions per day is one feature. The score is a floating point number calculated from that daily features.
My goal is to try and predict the "average user" score at the end of the month using just 2 days of users data.
I have 25 month of data, some are full and some have only partial of the total days, in order to have a fixed batch size I've padded the sequences like so:
from keras.preprocessing.sequence import pad_sequences
padded_sequences = pad_sequences(sequences, maxlen=None, dtype='float64', padding='pre', truncating='post', value=-10.)
so sequences with less then the max where padded with -10 rows.
I've decided to create an LSTM model to digest the data, so at the end of each batch the model should predict the average user score. Then later I'll try to predict using just 2 days sample.
My Model look like that:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking
from tensorflow.keras import metrics
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.optimizers import Adam
import datetime, os
model = Sequential()
opt = Adam(learning_rate=0.0001, clipnorm=1)
num_samples = train_x.shape[1]
num_features = train_x.shape[2]
model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(Dropout(0.3))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(64, return_sequences=False, stateful=False, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam' ,metrics=['acc',metrics.mean_squared_error])
logdir = os.path.join(logs_base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=logdir, update_freq=1)
model.summary()
Model: "sequential_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
masking_5 (Masking) (None, 4283, 16) 0
_________________________________________________________________
lstm_20 (LSTM) (None, 4283, 64) 20736
_________________________________________________________________
dropout_14 (Dropout) (None, 4283, 64) 0
_________________________________________________________________
lstm_21 (LSTM) (None, 64) 33024
_________________________________________________________________
dropout_15 (Dropout) (None, 64) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 65
=================================================================
Total params: 53,825
Trainable params: 53,825
Non-trainable params: 0
_________________________________________________________________
While training I get NaN value on the 19th epoch
Epoch 16/1000
16/16 [==============================] - 14s 855ms/sample - loss: 298.8135 - acc: 0.0000e+00 - mean_squared_error: 298.8135 - val_loss: 220.7307 - val_acc: 0.0000e+00 - val_mean_squared_error: 220.7307
Epoch 17/1000
16/16 [==============================] - 14s 846ms/sample - loss: 290.3051 - acc: 0.0000e+00 - mean_squared_error: 290.3051 - val_loss: 205.3393 - val_acc: 0.0000e+00 - val_mean_squared_error: 205.3393
Epoch 18/1000
16/16 [==============================] - 14s 869ms/sample - loss: 272.1889 - acc: 0.0000e+00 - mean_squared_error: 272.1889 - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 19/1000
16/16 [==============================] - 14s 852ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 20/1000
16/16 [==============================] - 14s 856ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 21/1000
I tried to apply the methods described here with no real success.
Update:
I've changed my activation from relu to tanh and it solved the NaN issue. However it seems that the accuracy of my model stays 0 while the loss goes down
Epoch 100/1000
16/16 [==============================] - 14s 869ms/sample - loss: 22.8179 - acc: 0.0000e+00 - mean_squared_error: 22.8179 - val_loss: 11.7422 - val_acc: 0.0000e+00 - val_mean_squared_error: 11.7422
Q: What am I doing wrong here?

You are solving a regression task, using accuracy is not meaningful here.
Use mean_absollute_error to check if your error is decreasing over time or not.
Instead of blindly predicting the score, you can make the score bounded to (0, 1).
Just use a min max normalization to bring the output in a range https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
After that you can use sigmoid in last layer.
Also, you're choosing slightly longer sequences for this simple model 4283, how skewed your sequence lengths are?
Maybe do a histogram plot of all the signal length and see if 4283 is, in fact, a good choice or not. Maybe you can bring this down to something like 512 which may become easier for the model.
Also, padding with -10 seems a pretty weird choice is it something specific for your data or you're choosing randomly? This -10 also suggests you're not normalizing your input data which can become a problem with an LSTM with relu, maybe you should try to normalizing it before training.
After these add a validation plot of the mean absolute error if the performance is still not good.

How to add Gaussian noise with varying std during training?

I am training a CNN using keras and tensorflow. I would like to add Gaussian noise to my input data during training and reduce the percentage of the noise in further steps. What I do right now, I use:
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
inputs = Input(shape=x_train_n.shape[1:])
bn0 = BatchNormalization(axis=1, scale=True)(inputs)
g0 = GaussianNoise(0.5)(bn0)
The variable that GaussianNoise takes is the standard deviation of the noise distribution and I couldn't assign a dynamic value to it, how can I add for example a noise, and then decrease this value based on the epoch that I am in?

You can simply design a custom callback which changes the stddev before training for a epoch.
Reference:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise
https://www.tensorflow.org/guide/keras/custom_callback
from tensorflow.keras.layers import Input, Dense, Add, Activation
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
inputs = Input(shape=100)
bn0 = BatchNormalization(axis=1, scale=True)(inputs)
g0 = GaussianNoise(0.5)(bn0)
d0 = Dense(10)(g0)
model = Model(inputs, d0)
model.compile('adam', 'mse')
model.summary()
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
self.model.layers[2].stddev = random.uniform(0, 1)
print('updating sttdev in training')
print(self.model.layers[2].stddev)
X_train = np.zeros((10,100))
y_train = np.zeros((10,10))
noise_change = MyCustomCallback()
model.fit(X_train,
y_train,
batch_size=32,
epochs=5,
callbacks = [noise_change])
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 100)] 0
_________________________________________________________________
batch_normalization_5 (Batch (None, 100) 400
_________________________________________________________________
gaussian_noise_5 (GaussianNo (None, 100) 0
_________________________________________________________________
dense_5 (Dense) (None, 10) 1010
=================================================================
Total params: 1,410
Trainable params: 1,210
Non-trainable params: 200
_________________________________________________________________
Epoch 1/5
updating sttdev in training
0.984045691131548
1/1 [==============================] - 0s 1ms/step - loss: 1.6031
Epoch 2/5
updating sttdev in training
0.02821459469022025
1/1 [==============================] - 0s 742us/step - loss: 1.5966
Epoch 3/5
updating sttdev in training
0.6102984511769268
1/1 [==============================] - 0s 1ms/step - loss: 1.8818
Epoch 4/5
updating sttdev in training
0.021155188690323512
1/1 [==============================] - 0s 1ms/step - loss: 1.2032
Epoch 5/5
updating sttdev in training
0.35950227285165115
1/1 [==============================] - 0s 2ms/step - loss: 1.8817
<tensorflow.python.keras.callbacks.History at 0x7fc67ce9e668>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras LSTM-based neural network doesn't learn - python

Related

Why is the tensorflow maxout not calculating the gradient respectively where is the mistake?

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

TensorFlow not running correct number of epochs with no errors

LSTM, Exploding gradients or wrong approach?

How to add Gaussian noise with varying std during training?

Categories

Resources