Trouble setting up LSTM model for multivariate time series forecasting - python

I having issues with how many number of LSTMs to have for a multi-variate forecasting.
The dataset I'm working on:
Weather Dataset (kaggle)
df_final = df.loc['2011-01-01':'2014-10-31']
df_final['Temperature (C)'].plot(figsize=(28,6))
In the end I want to forecast temperature.(other paremeters too, but mainly temperature)
The data has hourly readings.
# How many rows per month?
rows_per_months=24*30
test_months = 12 #number of months we want to predict in the future.
test_indices = test_months*720
test_indices
# train and test split:
train = df_final.iloc[:-test_indices]
# Choose the variable/parameter you want to predict
test = df_final.iloc[-test_indices:]
len(train)
#[op]: 24960
scaler = MinMaxScaler()
scaled_train = scaler.fit_transform(train)
scaled_test = scaler.transform(test)
#define generator:
length = rows_per_months#Length of output sequences (in number of timesteps)
batch_size = 30 #Number of timeseries sample in batch
generator = tf.keras.preprocessing.sequence.TimeseriesGenerator(scaled_train,scaled_train,length=length,batch_size=batch_size)
Defining Model
# define model
model = Sequential()
model.add(tf.keras.layers.LSTM(50, input_shape=(length,scaled_train.shape[1])))
#NOTE: Do not specify the activation function for LSTM layers, this is because it will not run on GPU.
model.add(Dense(scaled_train.shape[1]))
model.compile(optimizer='adam', loss='mse')
After training,
Evaluating the model on Test Data:
first_eval_batch = scaled_train[-length:]
first_eval_batch.shape
first_eval_batch = first_eval_batch.reshape((1,length,scaled_train.shape[1]))
n_features = scaled_test.shape[1] #n_features = scaled_train.shape[1] =250 (for predicting all parameters in the next time stamp)
# print(n_features) = 1 #Since we are only predicting temperature.
test_predictions = []
first_eval_batch = scaled_train[-length:]
current_batch = first_eval_batch.reshape((1, length, n_features))
print(current_batch.shape)
#output:(1, 720, 4)
for i in range(len(test)):
#Get prediction 1 time stamp ahead
current_pred = model.predict(current_batch)[0]
#store prediction
test_predictions.append(current_pred)
#update the current batch to now include the prediction and drop the first value.
current_batch = np.append(current_batch[:,1:,:],[[current_pred]],axis=1)
true_predictions = pd.DataFrame(data=true_predictions,columns=test.columns,index=test.index)
true_predictions = scaler.inverse_transform(test_predictions)
The resultant Dataframe:
result_df = pd.concat([test['Temperature (C)'], true_predictions['Temperature (C)']],axis=1)
result_df.plot(figsize=(28,8))
Since the model does not predict the correct value of test data, I cannot proceed further with the forecasting.
How do I fix this?
PS:
I have tried playing with the number of layers and number of lstm units in layers but nothing seems to work. I have also tried to use different activation functions such as relu but it will take more than 20 hrs to finish 1 epoch, since the model wouldn't train on GPU unless the LSTMs have default functions (i.e tanh)

Related

How to add in a Keras time series prediction model the current value of an exogenous variable (besides the past values of other variables)?

I'm following a Keras example which tries to predict a multivariate time series with an LSTM model: the Temperature given past values of temperature and other physical variables.
https://keras.io/examples/timeseries/timeseries_weather_forecasting/
https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_weather_forecasting.py
The model is fed with past values (rolling window) and predicts future values.
I have tried to play with it using additional variables generated by me: introducing the hour and month on the model.
df['HourX']=np.sin(df.DateTime.dt.hour*2*np.pi/24)
df['HourY']=np.cos(df.DateTime.dt.hour*2*np.pi/24)
df['MonthX']=np.sin(df.DateTime.dt.month*2*np.pi/12)
df['MonthY']=np.cos(df.DateTime.dt.month*2*np.pi/12)
Now I have this columns on my dataset:
Temperature, Pressure, Humidty, HourX, HourY, MonthX, MonthY
This is the original code used to create the model and fit it:
start = past + future
end = start + train_split
x_train = train_data[[i for i in range(7)]].values
y_train = features.iloc[start:end][[1]]
sequence_length = int(past / step)
dataset_train = keras.preprocessing.timeseries_dataset_from_array(
x_train, y_train, sequence_length=sequence_length,
batch_size=batch_size,
)
x_end = len(val_data) - past - future
label_start = train_split + past + future
x_val = val_data.iloc[:x_end][[i for i in range(7)]].values
y_val = features.iloc[label_start:][[1]]
dataset_val = keras.preprocessing.timeseries_dataset_from_array(
x_val, y_val, sequence_length=sequence_length, batch_size=batch_size,
)
inputs = keras.layers.Input(shape=(inputs.shape[1], inputs.shape[2]))
lstm_out = keras.layers.LSTM(32)(inputs)
outputs = keras.layers.Dense(1)(lstm_out)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
loss="mse")
model.fit(dataset_train, epochs=epochs, validation_data=dataset_val)
My question is...
This example only uses past values of all the variables.
But I would also like to use the current value of the newly created columns HourX, HourY, MonthX, MonthY. (Because we would not be cheating, we always know them).
How can I tell Keras to keep using the old variables as we did (past values until a given time), but also use all the values of the new columns up to the current prediction time?
This image shows how it's done when you feed the original model:
Only use values until a certain time.
An this is what I want to achieve:
In the second set of variables (the ones I've created) I want to use all values until the predicted value.

LSTM not able to forecast a simple time series

I'm trying to test an LSTM model on the following time series:
As you can see it is stationary and periodic (not that this matters, but it should be pretty easy for a neural net to pick up). This is in fact a coordinate of a simple pendulum vs time.
The steps for preprocessing are the following:
Scale this array using MinMaxScaler.
My model will predict x[t] using x[t-1] up to x[t-5]
scaler = MinMaxScaler()
X = scaler.fit_transform(x.reshape(-1,1))
lookback = 5
features=1
model_input, labels = [],[]
for i in range(X.shape[0]-lookback):
model_input.append(X[i:i+lookback])
labels.append(X[i+lookback])
model_input = np.asarray(model_input)
labels = np.asarray(labels)
model_input.shape, labels.shape
which returns ((495,5,1), (495,1)) this makes sense because my t has 500 steps.
Then I build and train the model:
#train on the first 400 steps, predict on the next 100
train_in, train_out = model_input[:400], labels[:400]
test_out = labels[400:]
model = Sequential()
model.add(LSTM(64, input_shape = (lookback, features))) #input shape is (batch, timesteps, features)
model.add(Dense(1))
model.compile(optimizer = 'adam', loss = 'mse')
#train
model.fit(train_in, train_out, epochs = 30)
Finally, I want to test my model. I don't see the point of using predict here. I want to use the last 5 coordinates in the training set to generate a prediction for the first step in the testing set. Then, I will use this prediction as an input to calculate the next position. And so on...
Here is the code:
#now we make predictions
preds = []
preds_input = train_in[-1:] #to make the first prediction on the test set, we start with the last batch of the training set
for i in range(test_out.shape[0]):
#the next step is the prediction on preds_input
next_step = model.predict(preds_input, verbose=0)
#append next_step to preds
preds.append(next_step)
#append next_step to preds_input and remove the first value so it keeps shape 1,5,1
preds_input = np.append(preds_input,next_step.reshape(1,1,1), axis=1)
preds_input = preds_input[:, 1:, :,]
I then rescaled the predictions and the testing data using inverse_transform and plotted the results.
This is what I got
I'm not able to understand why my model performed so poorly. The pattern is simple and it should've been able to pick it up. Any help would be great!

Out of sample multi-step forecasting with ARIMA using train data and predictions only (Python)

I want to forecast multiple steps ahead using ARIMA. I am deriving my hyper-params with a grid search.
I want to achieve multiple one-step forecasts. However, I do not want a rowling forecast that uses new actual observations, but I want the model to only rely on data in a test set and its own predictions (if predictions are well into the future).
Can anyone tell me what the difference between these three implementations is and if any of them matches my requirements?
no 1
In the first example, the whole data set (df = item) is passed to the model. Does this mean that the model is using actuals as lags instead of predictions at some point?
preds =item[0:len_train]
model = ARIMA(item, (4, 2, 1))
fit = model.fit()
for i in range(0,len_test):
pred = fit.predict(len_train+i,len_train+i)
preds = preds.append(pred)
no 2
train, test = item[0:len_train], item[len_train:]
model = ARIMA(train, order=(4,2,1))
model_fit = model.fit(displ=False)
forecast = model_fit.predict(start=len_train, end=len(item)-1, dynamic=False)
Predictions seem to saturate at some point; it seems that the model is not reusing its own predictions.
no 3
This is an attempt to fit a new model that incorporates the new data after each one-step forecast. However, I do not want this. If I append predictions instead of actual observations to the 'history' forecasts are becoming quickly very extreme.
print('Printing Predicted vs Expected Values...')
for t in range(len(test)):
model = ARIMA(history, order=(4,2,1))
model_fit = model.fit(disp=0)
output = model_fit.forecast()
print('output', output)
yhat = output[0]
predictions.append(float(yhat))
obs = test.values[t]
history.append(obs)
print('predicted=%f, expected=%f' % (np.exp(yhat), np.exp(obs)))
Thanks a lot!

Understanding the TimeSeriesDataSet in pytorch forecasting

Here is a code sample taken from one of pytorch forecasting tutorila:
# create dataset and dataloaders
max_encoder_length = 60
max_prediction_length = 20
training_cutoff = data["time_idx"].max() - max_prediction_length
context_length = max_encoder_length
prediction_length = max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="value",
categorical_encoders={"series": NaNLabelEncoder().fit(data.series)},
group_ids=["series"],
# only unknown variable is "value" - and N-Beats can also not take any additional variables
time_varying_unknown_reals=["value"],
max_encoder_length=context_length,
max_prediction_length=prediction_length,
)
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)
I don't really understand how the validation dataset is done with respect to the time index. I also don't understand why there is no test dataset in the tutorial. is it for a specific reason?
Concerning validation dataset:
training dataset is all data except the last max_prediction_length data points of each Time series (each time series correspond to datapoints with same group_ids).
Those last datapoints are filtered by the training cutoff (cutoff is the same for each time series because they are of same size)
validation data is the last max_prediction_length data points use as targets for each time series (which mean validation data are the last encoder_length + max_prediction_length of each time series).
This is done by using parameter min_prediction_idx=training_cutoff + 1 which make the dataset taking only data with time_index with value superior to training_cutoff + 1 (minimal decoder index is always >= min_prediction_idx)

Keras predict() returns a better accuracy than evaluate()

I set up a model with Keras, then I trained it on a dataset of 3 records and finally I tested the resulting model with evaluate() and predict(), using the same test set for both functions (the test set has 100 records and it doesn't have any record of the training set, as much as it can be relevant, given the size of the two datasets).
The dataset is composed by 5 files, where 4 files represent each one a different temperature sensor, that each minute collects 60 measurements (each row contains 60 measurements), while the last file contains the class labels that I want to predict (in particular, 3 classes: 3, 20 or 100).
This is the model I'm using:
n_sensors, t_periods = 4, 60
model = Sequential()
model.add(Conv1D(100, 6, activation='relu', input_shape=(t_periods, n_sensors)))
model.add(Conv1D(100, 6, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(160, 6, activation='relu'))
model.add(Conv1D(160, 6, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
That I train:
self.model.fit(X_train, y_train, batch_size=3, epochs=5, verbose=1)
Then I use evaluate:
self.model.evaluate(x_test, y_test, verbose=1)
And predict:
predictions = self.model.predict(data)
result = np.where(predictions[0] == np.amax(predictions[0]))
if result[0][0] == 0:
return '3'
elif result[0][0] == 1:
return '20'
else:
return '100'
For each class predicted, I confront it with the actual label, and then I calculate correct guesses / total examples, that should be equivalent to accuracy from the evaluate() function. Here's the code:
correct = 0
for profile in self.profile_file: #profile_file is an opened file
ts1 = self.ts1_file.readline()
ts2 = self.ts2_file.readline()
ts3 = self.ts3_file.readline()
ts4 = self.ts4_file.readline()
data = ts1, ts2, ts3, ts4
test_data = self.dl.transform(data) # see the last block of code I posted
prediction = self.model.predict(test_data)
if prediction == label:
correct += 1
acc = correct / 100 # 100 is the number of total examples
Data feeded to evaluate() is taken from this function:
label = pd.read_csv(os.path.join(self.testDir, 'profile.txt'), sep='\t', header=None)
label = np_utils.to_categorical(label[0].factorize()[0])
data = [os.path.join(self.testDir,'TS2.txt'),os.path.join(self.testDir, 'TS1.txt'),os.path.join(self.testDir,'TS3.txt'),os.path.join(self.testDir, 'TS4.txt')]
df = pd.DataFrame()
for txt in data:
read_df = pd.read_csv(txt, sep='\t', header=None)
df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df, label
While data feeded to predict() is taken from this other one:
df = pd.DataFrame()
for txt in data: # data
read_df = pd.read_csv(StringIO(txt), sep='\t', header=None)
df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df
Accuracies yielded by evaluate() and predict() are always different: in particular, the maximum difference I noted was when evaluate() resulted in a 78% accuracy while predict() in a 95% accuracy. The only difference between the two functions is that I make predict() work on an example at a time, while evaluate() takes the entire dataset all at once, but it should result in no difference. How can it be?
UPDATE 1: It seems that the problem is in how I prepare my data. In the case of predict(), I transform only one line at a time from each file using the last block of code I posted, while in feeding evaluate(), I transform the entire files using the other function reported. Why should it be different? It seems to me that I'm applying the exact same transformation, the only difference is in the number of rows transformed.
This question was already answered here
what happens is when you evaluate the model, since your loss function is categorical_crossentropy, metrics=['accuracy'] calculates categorical_accuracy.
But predict has a default set to binary_accuracy.
So essentially you are calculating categorical accuracy with evaluate and and binary accuracy with predict. this is the reason they are so widely different.
the difference between categorical_accuracy and binary_accuracy is that categorical_accuracy check if all the outputs match with your y_test and binary_accuracy checks if each of you outputs matches with your y_test.
Example(single row):
prediction = [0,0,1,1,0]
y_test = [0,0,0,1,0]
categorical_accuracy = 0%
since 1 output does not match the categorical_accuracy is 0
binary_accuracy = 80%
even though 1 output doesn't match the rest of 80% do match so accuracy is 80%

Categories

Resources