lstm prediction result delay phenomenon

lstm prediction result delay phenomenon - python

Recently I'm using lstm to predict time series. I'm using keras 2.0 to construct my lstm model. It has a structure like this:
model = Sequential()
model.add(LSTM(128, input_shape=(timesteps, 1), return_sequences=False, stateful=False)
model.add(Dropout(rate=0.1))
model.add(Dense(1))
I have tried to use this network to predict several time series including sin(t) and a real traffic flow dataset. I found that the prediction for sin is fine while the prediction for real dataset is just like shifting the last input value by one step. I don't know whether it's a prediction error or the network doesn't learn the pattern of the dataset at all. Does anyone get similar results? Are there any solutions to this annoying shift? Thanks a lot.
Here are some of my predictions:
3 frequencies sin prediction result
real traffic dataset prediction result

This is simply the starting point for your network and you'll have to work through it by trying various things.
To name only a few:
Try different window lengths (timesteps fed into network)
Try adding dense layers, or multiple LSTM layers, or fewer LTSM nodes
Try different optimizers, with various learning rates
Look for additional datapoints to feed into the network
How much data do you have? You may need more to get a good prediction
Try different offsets for the Y variable, how many timesteps do you need to be able to predict out for your specific problem?
The list goes on....

Related

Keras neural network predicts the same number for all inputs

I am trying to create a keras neural network to predict distance on roads between two points in city. I am using Google Maps to get travel distance and then train neural network to do that.
import pandas as pd
arr=[]
for i in range(0,100):
arr.append(generateTwoPoints(55.901819,37.344735,55.589537,37.832254))
df=pd.DataFrame(arr,columns=['p1Lat','p1Lon','p2Lat','p2Lon', 'distnaceInMeters', 'timeInSeconds'])
print(df)
Neural network architecture:
from keras.optimizers import SGD
sgd = SGD(lr=0.00000001)
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(100, input_dim=4 , activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mse'])
Then i divide sets to test/train
Xtrain=train[['p1Lat','p1Lon','p2Lat','p2Lon']]/100
Ytrain=train[['distnaceInMeters']]/100000
Xtest=test[['p1Lat','p1Lon','p2Lat','p2Lon']]/100
Ytest=test[['distnaceInMeters']]/100000
Then i fit data into the model, but loss stays the same:
history = model.fit(Xtrain, Ytrain,
batch_size=1,
epochs=1000,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(Xtest, Ytest))
I later print the data:
prediction = model.predict(Xtest)
print(prediction)
print (Ytest)
But result is the same for all the inputs:
[[0.26150784]
[0.26171574]
[0.2617755 ]
[0.2615582 ]
[0.26173398]
[0.26166356]
[0.26185763]
[0.26188275]
[0.2614446 ]
[0.2616575 ]
[0.26175532]
[0.2615183 ]
[0.2618127 ]]
distnaceInMeters
2 0.13595
6 0.27998
7 0.48849
16 0.36553
21 0.37910
22 0.40176
33 0.09173
39 0.24542
53 0.04216
55 0.38212
62 0.39972
64 0.29153
87 0.08788
I can not find the problem. What is it? I am new to machine learning.

You are doing a very elementary mistake: since you are in a regression setting, you should not use a sigmoid activation for your final layer (this is used for binary classification cases); change your last layer to
model.add(Dense(1,activation='linear'))
or even
model.add(Dense(1))
since, according to the docs, if you do not specify the activation argument it defaults to linear.
Various other advice offered already in the other answer and the comments may be useful (lower LR, more layers, other optimizers e.g. Adam), and you certainly need to increase your batch size; but nothing will work with the sigmoid activation function you currently use for your last layer.
Irrelevant to the issue, but in regression settings you don't need to repeat your loss function as a metric; this
model.compile(loss='mse', optimizer='sgd')
will suffice.

It would be very useful if you could post the the progression of the loss and MSE (of both the training and validation/test set) as it goes throughout the training. Even better, it would be best if you can visualize it as per https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/ and post the vizualization here.
In the meantime, based on the facts:
1) You say the loss isn't decreasing (I'm assuming on the training set, during training, based on your compile args).
2) You say that the prediction "accuracy" on your test set is bad.
3) My experience/intuition (not an empirical assessment) tells me that your two layer dense model is a little too small to be able to capture the complexity inherent in your data. AKA your model is suffering from too high a Bias https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229
The fastest and easiest thing you can try, is to try to add both more layers and more nodes to each layer.
However, I should note that there is a lot of causal information that can affect the driving distance and driving time beyond just the the distance between two coordinates, which might be the feature that your Neural network will most readily extract. For example, whether you drive on a highway or sides treets, traffic lights, whehter the roads twist and turn or go straight... to infer all of that just from that the data you will need enormous amounts of data(examples) in my opinion. If you could add input columns with e.g. disatance to nearest higway from both points, you might be able to train with less data
I would also reccomend that you souble check that you are feeding as input what you think you are feeding (and its shape), and also, you should use some standardization from function sklearn which might help the model learn faster and converge faster to a higher "accuracy".
If and when you post either more code or the training history I can help you more (and also how many training samples).
EDIT 1: Try changing batch size to a larger number preferably batch_size=32 if it fits in your memory. you can use a small batch size (such as 1) when working with an "info rich" input like an image, but when using a very "info poor" datum like 4 floats (2 coordinates), the gradient will point each batch (with batch_size=1) to a practically random (pseudo...) direction and not neccessarily get any closer to a local minimum. Only when taking the gradient on the collective loss of a larger batch (like 32, and perhaps more) will you get a gradient that points at least approximately in the direction of the local minimum and converge to a better result. Also, I suggest that you don't mess with the learning rate manually and perhaps change to an optimizer like "adam" or "RMSProp".
Edit 2: #Desertnaut made an excellent point that I totally missed, a correction without which, your code will not work properly. He deserves the credit so I will not include it here. Please refer to his answer. Also, don't forget to raise your batch size, and not "manually mess" with your learning rate, "adam" for example, will do it for you.

How to get the prediction of new data by LSTM in python

This is a univariate time series prediction problem. As the following code shows, I divide the initial data into a train dataset (trainX) and a test dataset(testX), then I create a LSTM network by keras. Next, I train the model by the train dataset. However, when I want to get the prediction, I need to know the test value, so my problem is: why do I have to predict since I have known the true value which is test dataset in this problem. What I want to get is the prediction value of future time? If I have some misunderstandings about LSTM network, please tell me.
Thank you!
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

Since we don't have the future value with us while training the model, we just divide the data into train and test sets. Then we just imagine that test sets are the future values. We train our model using train set (and also usually a validation set). And after our model is trained, we test it using the test set to check our models performance.

why do I have to predict since I have known the true value which is test dataset in this problem. What I want to get is the prediction value of future time?
In ML, we give test data X and it returns us Y. In the case of time-series, it may mislead a beginner a bit as we use the X and output is apparently X as well: The difference here is that we are inputting old values of time-series as X and the output Y is value of same time-series but we are predicting in future (can be applied for present or even past as well) as you have identified it correctly.
(P.S: I would recommend you to begin with simple regression and then come to LSTMs etc. if all you want is to learn the Machine Learning.)

I think the correct term in this context is 'Forecasting'.
A good explanation is: after you train and test your model, with the data that you already had (as the other ones said here before me), you want to predict future data, which is, I think, the trully interresting thing about recurrent networks.
So in order to make this, you need to start predicting the values from one day after your final date in your original dataset, using the model (which is trained with this past data). Once you predict this value, you do the same thing, but considering the last values predict, and so on.
The fact that you are using a prediction to make others predictions, implies that is much more difficult to get good results, so is common to try to predict short ranges of time.
The exact code that you need to perform to do this could vary, but I think that is the prime concept
In the link below, in the last part, in which is perform a forecast, the author show us a code and a explanation on how he did it.
https://towardsdatascience.com/time-series-forecasting-with-recurrent-neural-networks-74674e289816
I guess that's it.

Searching for a CNN Time Series Prediction Tutorial

I'm looking for instructions on how to make a regression time series prediction using a CNN. I want to implement a multi-step prediction for a univariate time series. I have read a few instructions but found nothing suitable for my dataset: one feature and around 400 observations.
Does anyone know an easily understandable and applicable code example for such a time series?
I would be very grateful for any help,
Leon

Using CNNs for sequence data can be a bit tricky to set up. In my experience, CNNs achieve results near RNNs (GRUs and LSTMs) but CNNs are far faster to compute.
First, make sure your data is shaped the way Conv1D expects: (instances, time steps, predictors).
X_cnn = X.reshape(X.shape[0], X.shape[1] // predictors, predictors)
Then, the syntax is:
model_cnn = Sequential()
model_cnn.add(layers.Conv1D(A, B, activation = 'relu',
input_shape = (X_cnn.shape[1], X_cnn.shape[2])))
model_cnn.add(layers.Flatten())
model_cnn.add(layers.Dense(1))
Where A is the number of neurons, and B is the number of time steps to consider. Note the Flatten() layer after the Conv1D layer. This should hopefully get you started.

Keras LSTM, is the time_step equal to 1 like transforming the LSTM into a MLP?

I'm a beginer in this field of Deep Learning. I'm trying to use Keras for a LSTM in a regression problem. I would like to build an ANN which could exploit the memory cell between one prediction and the next one.
In more details... I have a neural network (Keras) with 2 Hidden layer-LSTM and 1 output layer for a regression context.
The batch_size is equal to 7, timestep equal to 1 and I have 5749 samples.
I'm only interested to understand if using timestep == 1 is the same thing as using an MLP instead of LSTM. For time_step, I'm referring to the reshape phase for the input of the Sequential model in Keras. The output is a single regression.
I'm not interested in the previous inputs, but I'm interested only on the output of the network as an information for the next prediction.
Thank you in advance!

You can say so :)
You're right in thinking that you won't have any recurrency anymore.
But internally, there will be still more operations than in regular Dense layers, due to the existence of more kernels.
But be careful:
If you use stateful=True, it will still be a recurrent LSTM!
If you use initial states properly, you can still make it recurrent.
If you're interested in creating custom operations with the memory/state of the cells, you could try creating your custom recurrent cell taking the LSTMCell code as a template.
Then you'd use that cell in a RNN(CustomCell, ...) layer.

Are there some pre-trained LSTM, RNN or ANN models for time-series prediction?

I am trying to solve a time series prediction problem. I tried with ANN and LSTM, played around a lot with the various parameters, but all I could get was 8% better than the persistence prediction.
So I was wondering: since you can save models in keras; are there any pre-trained model (LSTM, RNN, or any other ANN) for time series prediction? If so, how to I get them? Are there in Keras?
I mean it would be super useful if there a website containing pre trained models, so that people wouldn't have to speent too much time training them..
Similarly, another question:
Is it possible to do the following?
1. Suppose I have a dataset now and I use it to train my model. Suppose that in a month, I will have access to another dataset (corresponding to same data or similar data, in the future possibly, but not exclusively). Will it be possible to continue training the model then? It is not the same thing as training it in batches. When you do it in batches you have all the data in one moment.
Is it possible? And how?

I'll answer your last questions first.
Will it be possible to continue training the model then? It is not the same thing as training it in batches. When you do it in batches you have all the data in one moment. Is it possible? And how?
Yes, it is possible. In general, it's called transfer learning. But keep in mind that if two datasets represent very different populations, the network will soon "forget" what it learned on the first run and will optimize to the second one. To do this, you simply start training from a loaded state instead of random initialization and save the model afterwards. It is also recommended to use a smaller learning rate on the second run in order to adapt it gradually to the new data.
are there any pre-trained model (LSTM, RNN, or any other ANN) for time
series prediction? If so, how to I get them? Are there in Keras?
I haven't found exactly a pre-trained model, but a quick search gave me several active GitHub projects that you can just run and get a result for yourself: Time Series Prediction with Machine Learning (LSTM, GRU implementation in tensorflow), LSTM Neural Network for Time Series Prediction (keras and tensorflow), Time series predictions with Keras (keras and theano), Neural-Network-with-Financial-Time-Series-Data (keras and tensorflow). See also this post.

Now you can use BERT or related variants and here you can find all the pre-trained models: https://huggingface.co/transformers/pretrained_models.html
And it is possible to pre-train and fine-tune RNN, and you can refer to this paper: TimeNet: Pre-trained deep recurrent neural network for time series classification.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.