Constant Output and Prediction Syntax with LSTM Keras Network

Constant Output and Prediction Syntax with LSTM Keras Network - python

I am new to neural networks and have two, probably pretty basic, questions. I am setting up a generic LSTM Network to predict the future of sequence, based on multiple Features.
My training data is therefore of the shape (number of training sequences, length of each sequence, amount of features for each timestep).
Or to make it more specific, something like (2000, 10, 3).
I try to predict the value of one feature, not of all three.
Problem:
If I make my Network deeper and/or wider, the only output I get is the constant mean of the values to be predicted. Take this setup for example:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z0)
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(128, activation='softsign', recurrent_activation='softsign')(z)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=50, verbose=1)])
If I just use one layer, like:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(4, activation='soft sign', recurrent_activation='softsign')(z0)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=200, verbose=1)])
The predictions are somewhat reasonable, at least they are not constant anymore.
Why does that happen? Around 2000 samples not that many, but in the case of overfitting, I would expect the predictions to match perfectly...
EDIT: Solved, as stated in the comments, it's just that Keras always expects Batches: Keras
When I use:
`test=model.predict(trainX[0])`
to get the prediction for the first sequence, I get an dimension error:
"Error when checking : expected input_1 to have 3 dimensions, but got array with shape (3, 3)"
I need to feed in an array of sequences like:
`test=model.predict(trainX[0:1])`
This is a workaround, but I am not really sure, whether this has any deeper meaning, or is just a syntax thing...

This is because you have not normalised input data.
Any neural network model will initially have weights normalised around zero. Since your training dataset has all positive values, the model will try to adjust its weights to predict only positive values. However, the activation function (in your case softsign) will map it to 1. So the model can do nothing except adding the bias. That is why you are getting an almost constant line around the average value of the dataset.
For this, you can use a general tool like sklearn to pre-process your data. If you are using pandas dataframe, something like this will help
data_df = (data_df - data_df.mean()) / data_df.std()
Or to have the parameters in the model, you can consider adding batch normalization layer to your model

Related

Transfer Learning model with Keras: using other metrics than accuracy

I'm working on a binary classification model for leaves from the Swedish leaves data and thought Transfer Learning could be practical. I found this tutorial, but in the compile function, I want to use different metrics than accuracy. When I try to get AUC or FP/FN/TP/TN, ValueError is raised, claiming the shape of true y (None, 1) and the shape of the y_pred (None, 2) are incompatible.
I fail to understand:
why would y_pred have this shape?
how can the accuracy be calculated, but not the parts of the confusion matrix?!
A solution without a reasoned explanation is also very welcome :)
feature_extractor_model = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
pretrained_model_without_top_layer = hub.KerasLayer(
feature_extractor_model, input_shape=(224, 224, 3), trainable=False)
classes_num = 2
model = tf.keras.Sequential([
pretrained_model_without_top_layer,
tf.keras.layers.Dense(classes_num)
])
model.compile(
optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[['acc'], [tf.keras.metrics.TruePositives(), tf.keras.metrics.FalsePositives(), tf.keras.metrics.TrueNegatives(), tf.keras.metrics.FalseNegatives()]])
model.fit(X_train_scaled, y_train, steps_per_epoch=9, epochs=5)

If you have two classes (e.g. cats and dogs) you could either encode it sparsely as zero or one, or one-hot as [0,1] and [1,0].
Your training data is sparsely, so your loss is SparseCCE. Metrics are just losses functionally, so any metric you use would need to accept sparse. In your case, just write a "custom" loss function that accept a sparse y_true, one-hots it, and passes it to the recall/precision/etc metric function.

Simple neural network refuses to overfit

I wrote this super simple piece of code
model = Sequential()
model.add(Dense(1, input_dim=d, activation='linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(X_train, y_train, epochs=10000, batch_size=n)
test_mse = model.evaluate(X_test, y_test)
print('test mse is {}'.format(test_mse))
X_train is an n by d numpy matrix and y is n by 1 numpy matrix.
This is basically the simplest linear neural network you could think of. One layer, input dimension is d, and we output a number.
It simply refuses to overfit. Even after running an insane amount of iterations (10k as you can see), the training loss is at around 0.17.
I expect the loss to be zero. Why do I expect that? Because in my case, d is much greater than n. I have a lot more degrees of freedom. And as a further piece of evidence, when I actually solve X_train # w = y_train using numpy.linalg.lstsq, the max value of X_train # w - y is something like 10 to the -14.
So this system is definitely solvable. I expected to see zero loss or very close to zero loss, but I don't. Why?

In Tensorflow2.0, how do I implement a trainable set of Sample Weights?

In summary, my problem stems from have a set of sample_weights that are trained along with the other weights/variables while fitting my network. I can't seem to figure out a way to update the values of my weight vector during training.
So, to simplify some of the code (not full actual model, just a simpler version for simplicity sake):
def initialize_model():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=3, activation="relu")
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Conv1D(32, kernel_size=3, activation="relu")
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Dense(10, activation="softmax")
return model
I pretrain this model on my data (X: (, 28, 28), Y: (,10)), then augment the dataset with new generated virtual samples. As this isn't important to the problem, I won't include that portion of the code. But, once I have my new augmented dataset (X': (, 28, 28), Y': (,10)), I want to continue the training, but with a sample_weight vector:
W = tf.Variable(tf.random.uniform([len(X_s)], minval=0, maxval=1, dtype="float32"), trainable=True)
W = W / tf.math.reduce_sum(W) # to enforce sum(w_i) = 1
The training is handled like so:
classifier.fit(X', Y', batch_size=800, epochs=10, validation_split=0.2, sample_weight=W)
This does indeed weight each sample properly, but I want these weights to change over the course of training. I've tried researching this, but I can't seem to find a solution that worked.
Thank you very much in advance - I've been struggling with this for a while now. I have tried:
classifier.layers[-1].trainable_weights.extend([W])
As well as using tf.GradientTape, but nothing seems to work.

How to choose dimensionality of the Dense layer in LSTM?

I have a task of multi-label text classification. My dataset has 1369 classes:
# data shape
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
(54629, 500)
(23413, 500)
(54629, 1369)
(23413, 1369)
For this task, I've decided to use LSTM NN with the next parameters:
# define model
maxlen = 400
inp = Input(shape=(maxlen, ))
embed_size = 128
x = Embedding(max_features, embed_size)(inp)
x = LSTM(60, return_sequences=True,name='lstm_layer')(x)
x = GlobalMaxPool1D()(x)
x = Dropout(0.1)(x)
x = Dense(2000, activation="relu")(x)
x = Dropout(0.1)(x)
x = Dense(1369, activation="sigmoid")(x)
model = Model(inputs=inp, outputs=x)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy']
batch_size = 32
epochs = 2
model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
Question: Are there any scientific methods for determining Dense and LSTM dimensionality (in my example, LSTM dimension=60, I Dense dimension=2000, and II Dense dimension=1369)?
If there are no scientific methods, maybe there are some heuristics or tips on how to do this with data with similar dimension.
I randomly chose these parameters. I would like to improve the accuracy of the model and correctly approach to solving similar problems.

I heard that optimizing hyper parameters is an np problem, even there is a better way to do it, it may not worth it for your project given the overhead cost.
For the dimension of LSTM layer, I heard some empirically well working numbers from some conference talks, such as 128 or 256 units and 3 stacked layers. If you can plot your loss along training, and you saw the loss decrease dramatically in the first several epoch but then stopped decreasing, you may want to increase the capacity of your model. This means to make it either deeper or wider. Otherwise, should have less parameters as possible.
For the dimension of dense layer, if your task is many-to-many which means you have a label of certain dimension, then you have to have same number of that dimension as number of units in the dense layer.

Keras Masking for RNN with Varying Time Steps

I'm trying to fit an RNN in Keras using sequences that have varying time lengths. My data is in a Numpy array with format (sample, time, feature) = (20631, max_time, 24) where max_time is determined at run-time as the number of time steps available for the sample with the most time stamps. I've padded the beginning of each time series with 0, except for the longest one, obviously.
I've initially defined my model like so...
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(max_time, 24)))
model.add(LSTM(100, input_dim=24))
model.add(Dense(2))
model.add(Activation(activate))
model.compile(loss=weibull_loglik_discrete, optimizer=RMSprop(lr=.01))
model.fit(train_x, train_y, nb_epoch=100, batch_size=1000, verbose=2, validation_data=(test_x, test_y))
For completeness, here's the code for the loss function:
def weibull_loglik_discrete(y_true, ab_pred, name=None):
y_ = y_true[:, 0]
u_ = y_true[:, 1]
a_ = ab_pred[:, 0]
b_ = ab_pred[:, 1]
hazard0 = k.pow((y_ + 1e-35) / a_, b_)
hazard1 = k.pow((y_ + 1) / a_, b_)
return -1 * k.mean(u_ * k.log(k.exp(hazard1 - hazard0) - 1.0) - hazard1)
And here's the code for the custom activation function:
def activate(ab):
a = k.exp(ab[:, 0])
b = k.softplus(ab[:, 1])
a = k.reshape(a, (k.shape(a)[0], 1))
b = k.reshape(b, (k.shape(b)[0], 1))
return k.concatenate((a, b), axis=1)
When I fit the model and make some test predictions, every sample in the test set gets exactly the same prediction, which seems fishy.
Things get better if I remove the masking layer, which makes me think there's something wrong with the masking layer, but as far as I can tell, I've followed the documentation exactly.
Is there something mis-specified with the masking layer? Am I missing something else?

The way you implemented masking should be correct. If you have data with the shape (samples, timesteps, features), and you want to mask timesteps lacking data with a zero mask of the same size as the features argument, then you add Masking(mask_value=0., input_shape=(timesteps, features)). See here: keras.io/layers/core/#masking
Your model could potentially be too simple, and/or your number of epochs could be insufficient for the model to differentiate between all of your classes. Try this model:
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(max_time, 24)))
model.add(LSTM(256, input_dim=24))
model.add(Dense(1024))
model.add(Dense(2))
model.add(Activation(activate))
model.compile(loss=weibull_loglik_discrete, optimizer=RMSprop(lr=.01))
model.fit(train_x, train_y, nb_epoch=100, batch_size=1000, verbose=2, validation_data=(test_x, test_y))
If that does not work, try doubling the epochs a few times (e.g. 200, 400) and see if that improves the results.

I could not validate without actual data, but I had a similar experience with an RNN. In my case normalization solved the issue. Add a normalization layer to your model.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Constant Output and Prediction Syntax with LSTM Keras Network - python

Related

Transfer Learning model with Keras: using other metrics than accuracy

Simple neural network refuses to overfit

In Tensorflow2.0, how do I implement a trainable set of Sample Weights?

How to choose dimensionality of the Dense layer in LSTM?

Keras Masking for RNN with Varying Time Steps

Categories

Resources