I am new to machine learning and I am performing a Multivariate Time Series Forecast using LSTMs in Keras. I have a monthly timeseries dataset with 4 input variables (temperature, precipitation, Dew and wind_spreed) and 1 output variable (pollution). Using this data i framed a forecasting problem where, given the weather conditions and pollution for prior months, I forecast the pollution at the next month. Below is my code
X = df[['Temperature', 'Precipitation', 'Dew', 'Wind_speed' ,'Pollution (t_1)']].values
y = df['Pollution (t)'].values
y = y.reshape(-1,1)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(X)
#dataset has 359 samples in total
train_X, train_y = X[:278], y[:278]
test_X, test_y = X[278:], y[278:]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
model = Sequential()
model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))
# model.add(LSTM(70))
# model.add(Dropout(0.3))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(train_X, train_y, epochs=700, batch_size=70, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# summarize history for loss
plt.title('model loss')
plt.legend(['train', 'test'], loc='upper right')
To do predictions i use the following code
from sklearn.metrics import mean_squared_error,r2_score
yhat = model.predict(test_X)
mse = mean_squared_error(test_y, yhat)
rmse = np.sqrt(mse)
r2 = r2_score(test_y, yhat)
print("test set performance")
print("R^2: ",r2)
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(range(len(test_y)), test_y, '-b',label='Actual')
ax.plot(range(len(yhat)), yhat, 'r', label='Predicted')
Running this code i fell into the following issues:
For some reason am getting a lagged result for my test set which is not in my training data as shown on the below image. I do not understand why i have these lagged results (does it have something to do with including 'pollution (t_1)' as part of my inputs)?
Graph Results:
By adding "pollution (t_1)" which is a shift by 1 lag of the polution variable as part of my inputs this variable now seems to dominate the prediction as removing the other varibales seems to have no influence on my results (r-squared and rmse) which is strange since all these variables do assit in pollution prediction.
Is there something i am doing wrong in my code which is the reason for these issues? I am new to python so any help to answer the above 2 questions will be greatly appreaciated.
First of all, I think it is not appropriate to input '1' as Timesteps value, because LSTM model is the one treating timeseries or sequence data.
I think the following script of data mining will work well
def lstm_data(df,timestamps):
for t in range(range_):
array_=array.reshape(-1,timestamps, array.shape[1])
return array_
#timestamps depend on your objection, but not '1'
x_data=lstm_data(x, timestamps=4)
y_data=lstm_data(y, timestamps=4)
#Divide each data into train and test
#Input the divided data into your LSTM model
I have been working with binary sequential inputs and outputs using Tensorflow 2.0, and I've been wondering which approach Tensorflow uses to compute metrics such as recall or accuracy during training in those scenarios.
Each sample to my network consists of 60 timesteps, each with 300 features, and thus my expected output is a (60, 1) array of 1s and 0s. Suppose I have 2000 validation samples. When evaluating the validation set for each epoch, does tensorflow concatenates all of the 2000 samples into a single (2000*60=120000, 1) array and then compares to the concatenated groundtruth labels, or does it evalutes each of the (60, 1) individually and then returns a mean of those values? Is there any way to modify this behavior?
Tensorflow/Keras by default computes the metrics batch-wise for train data, while it computes the same metrics on ALL the data passed in validation_data parameters in fit method.
This means that the metric printed during fitting for the train data is the mean of that score calculated on all the batches. In other words, for trainset keras evaluates each bach individually and then returns a mean of those values. For validation data is different, keras gets all the validation samples and then compares them with the "concatenated" groundtruth labels.
To prove this behavior with code I propose a dummy example. I provide a custom callback that computes for sure the accuracy score on ALL the data passed at the end of the epoch (for train and optionally validation). this is useful for us to understand the behavior of tensorflow during training.
import numpy as np
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.callbacks import *
class ACC_custom(tf.keras.callbacks.Callback):
def __init__(self, train, validation=None):
super(ACC_custom, self).__init__()
self.validation = validation
self.train = train
def on_epoch_end(self, epoch, logs={}):
logs['ACC_score_train'] = float('-inf')
X_train, y_train = self.train[0], self.train[1]
y_pred = (self.model.predict(X_train).ravel()>0.5)+0
score = accuracy_score(y_train.ravel(), y_pred)
if (self.validation):
logs['ACC_score_val'] = float('-inf')
X_valid, y_valid = self.validation[0], self.validation[1]
y_val_pred = (self.model.predict(X_valid).ravel()>0.5)+0
val_score = accuracy_score(y_valid.ravel(), y_val_pred)
logs['ACC_score_train'] = np.round(score, 5)
logs['ACC_score_val'] = np.round(val_score, 5)
logs['ACC_score_train'] = np.round(score, 5)
create dummy data
x_train = np.random.uniform(0,1, (1000,60,10))
y_train = np.random.randint(0,2, (1000,60,1))
x_val = np.random.uniform(0,1, (500,60,10))
y_val = np.random.randint(0,2, (500,60,1))
fit model
inp = Input(shape=((60,10)), dtype='float32')
x = Dense(32, activation='relu')(inp)
out = Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
es = EarlyStopping(patience=10, verbose=1, min_delta=0.001,
monitor='ACC_score_val', mode='max', restore_best_weights=True)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train,y_train, epochs=10, verbose=2,
in the graphs below I make a comparison between the accuracies computed by our callback and the accuracy computed by keras
plt.plot(history.history['ACC_score_train'], label='accuracy_callback_train')
plt.plot(history.history['accuracy'], label='accuracy_default_train')
plt.legend(); plt.title('train accuracy')
plt.plot(history.history['ACC_score_val'], label='accuracy_callback_valid')
plt.plot(history.history['val_accuracy'], label='accuracy_default_valid')
plt.legend(); plt.title('validation accuracy')
as we can see the accuracy on the train data (first plot) is different between the default method and our callbacks. this means that the accuracy of train data is calculated batch-wise.
the validation accuracy (second plot) calculated by our callback and the default method is the same! this means that the score on validation data is computed one-shoot
I am trying to predict neutron widths from resonance energies, using a Neural Network (I'm quite new to Keras/NNs in general so apologies in advance).
There is said to be a link between resonance energies and neutron widths, and the similiarities between energy increasing monotonically this can be modelled similiar to a time series problem.
In essences I have 2 columns of data with the first column being resonance energy and the other column containing the respective neutron width on each row. I have decided to use an LSTM layer to help in the networks predict by utlising previous computations.
From various tutorials and other answers, it seems common to use a "look_back" argument to allow the network to use previous timesteps to help predict the current timestep when creating the dataset e.g
trainX, trainY = create_dataset(train, look_back)
I would like to ask regarding forming the NN:
1) Given my particular application do I need to explicitly map each resonance energy to its corresponding neutron width on the same row?
2) Look_back indicates how many previous values the NN can use to help predict the current value, but how is it incorporated with the LSTM layer? I.e I dont quite understand how both can be used?
3) At which point do I inverse the MinMaxScaler?
That is the main two queries, for 1) I have assumed its okay not to, for 2) I believe it is possible but I dont really understand how. I can't quite work out what I have done wrong in the code, ideally I would like to plot the relative deviation of predicted to reference values in the train and test data once the code works. Any advice would be much appreciated:
import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset) - look_back - 1):
a = dataset[i:(i + look_back), 0]
dataY.append(dataset[i + look_back, 1])
return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
# load the dataset
dataframe = pandas.read_csv('CSVDataFe56Energyneutron.csv', engine='python')
dataset = dataframe.values
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0],look_back, 1))
# # create and fit the LSTM network
model = Sequential()
model.add(LSTM(6, input_shape=(look_back,1)))
for x in range(0, number_of_hidden_layers):
model.add(Dense(50, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='adam')
history= model.fit(trainX, trainY, nb_epoch=200, batch_size=32)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore, math.sqrt(trainScore)))
testScore = model.evaluate(testX, testY, verbose=0)
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore, math.sqrt(testScore)))
1) Given my particular application do I need to explicitly map each
resonance energy to its corresponding neutron width on the same row?
Yes you have to do that. Basically your data has to be in a shape of.
X=[timestep, timestep,...] y=[label, label,...]
2) Look_back indicates how many previous values the NN can use to help
predict the current value, but how is it incorporated with the LSTM
layer? I.e I dont quite understand how both can be used?
A LSTM is a sequence aware layer. You can think about it as a hidden markov model. It takes the first timestep, calculates something and in the next timestep the previous calculation is considered. Look_back, with is usually called sequence_length is just the maximum number of timesteps.
3) At which point do I inverse the MinMaxScaler?
Why should you do that? Furthermore, you don´t need to scale your input.
It seems like you have a general misconception in your model. If you have input_shape=(look_back,1) you don´t need LSTMs at all. If your sequence is just sequence of single values, it might be better to avoid LSTMs. Furthermore, fitting your model should include validation after each epoch to track the loss and validation performance.
model.fit(x_train, y_train,
validation_data=[x_test, y_test],
I'm currently undertaking my first 'real' DL project of (surprise) predicting stock movements. I know that I'm 1000:1 to make anything useful but I'm enjoying it and want to see it through, I've learnt more in my few weeks of attempting this than I have in the prior 6 months of completing MOOC's.
I'm building an LSTM using Keras to currently predict the next 1 step forward and have attempted the task as both classification (up/down/steady) and now as a regression problem. Both result in a similar roadblock in that my validation loss never improves from epoch #1.
I can get the model to overfit such that training loss approaches zero with MSE (or 100% accuracy if classification), but at no stage does the validation loss decrease. This screams overfitting to my untrained eye so I added varying amounts of dropout but all that does is stifle the learning of the model/training accuracy and shows no improvements on the validation accuracy.
I have attempted to change a significant number of hyperparameters - learning rate, optimiser, batchsize, lookback window, #layers, #units, dropout, #samples, etc, also tried with subset of data and subset of features but I just can't get it to work so I'm very thankful for any help.
Code Below (it's not pretty I know):
# Import saved full dataframe ~ 200 features
import feather
df = feather.read_dataframe('df_feathered')
df.set_index('time', inplace=True)
# Difference the dataset to make stationary
df = df.diff(periods=1, axis=0)
df_train = df.loc['2017-3-1':'2017-6-30']
df_val = df.loc['2017-7-1':'2017-8-31']
df_test = df.loc['2017-9-1':'2017-9-30']
# Make x_train, x_val sets by dropping target variable
x_train = df_train.drop('close+1', axis=1)
x_val = df_val.drop('close+1', axis=1)
# Scale the training data first then fit the transform to the test set
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_val)
# scaler = MinMaxScaler(feature_range=(0,1))
# x_train = scaler.fit_transform(df_train1)
# x_test = scaler.transform(df_val1)
# Create y_train, y_test, simply target variable for regression
y_train = df_train['close+1']
y_test = df_val['close+1']
# Define Lookback window for LSTM input
sliding_window = 15
# Convert x_train, x_test, y_train, y_test into 3d array (samples,
timesteps, features) for LSTM input
dataXtrain = []
for i in range(len(x_train)-sliding_window-1):
a = x_train[i:(i+sliding_window), 0:(x_train.shape[1])]
dataXtest = []
for i in range(len(x_test)-sliding_window-1):
a = x_test[i:(i+sliding_window), 0:(x_test.shape[1])]
dataYtrain = []
for i in range(len(y_train)-sliding_window-1):
dataYtrain.append(y_train[i + sliding_window])
dataYtest = []
for i in range(len(y_test)-sliding_window-1):
dataYtest.append(y_test[i + sliding_window])
# Make data the divisible by a variety of batch_sizes for training
# Started at 1000 to not include replaced NaN values
dataXtrain = np.array(dataXtrain[1000:172008])
dataYtrain = np.array(dataYtrain[1000:172008])
dataXtest = np.array(dataXtest[1000:83944])
dataYtest = np.array(dataYtest[1000:83944])
# Checking input shapes
print('dataXtrain size is: {}'.format((dataXtrain).shape))
print('dataXtest size is: {}'.format((dataXtest).shape))
print('dataYtrain size is: {}'.format((dataYtrain).shape))
print('dataYtest size is: {}'.format((dataYtest).shape))
batch_size = 256
timesteps = dataXtrain.shape[1]
features = dataXtrain.shape[2]
# Model set-up, stacked 4 layer stateful LSTM
model = Sequential()
model.add(LSTM(512, return_sequences=True, stateful=True,
batch_input_shape=(batch_size, timesteps, features)))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(Dense(1, activation='linear'))
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=5, min_lr=0.000001, verbose=1)
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
history = model.fit(dataXtrain, dataYtrain,validation_data=(dataXtest, dataYtest),
epochs=100,batch_size=batch_size, shuffle=False, verbose=1, callbacks=[reduce_lr])
score = model.evaluate(dataXtest, dataYtest,batch_size=batch_size, verbose=1)
predictions = model.predict(dataXtest, batch_size=batch_size)
import matplotlib.pyplot as plt
%matplotlib inline
plt.legend(["train", "val"], loc="best")
plt.title("model loss")
plt.legend(["train", "val"], loc="best")
plt.legend(["Truth", "Prediction"], loc="best")
Maybe you should remember you are predicting sock returns, which it's very likely to predict nothing. So val_loss increasing is not overfitting at all. Instead of adding more dropouts, maybe you should think about adding more layers to increase it's power.
Try to reduce learning rate much (and remove dropouts for now).
Why do you use
in fit() function?
I'm trying to detect fraud using autoencoder and Keras. I've written the following code as a Notebook:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.preprocessing import StandardScaler
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt
data = pd.read_csv('../input/creditcard.csv')
data['normAmount'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1, 1))
data = data.drop(['Time','Amount'],axis=1)
data = data[data.Class != 1]
X = data.loc[:, data.columns != 'Class']
encodingDim = 7
inputShape = X.shape[1]
inputData = Input(shape=(inputShape,))
X = X.as_matrix()
encoded = Dense(encodingDim, activation='relu')(inputData)
decoded = Dense(inputShape, activation='sigmoid')(encoded)
autoencoder = Model(inputData, decoded)
encoder = Model(inputData, encoded)
encodedInput = Input(shape=(encodingDim,))
decoderLayer = autoencoder.layers[-1]
decoder = Model(encodedInput, decoderLayer(encodedInput))
autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = autoencoder.fit(X, X,
# summarize history for accuracy
plt.title('model accuracy')
plt.legend(['train', 'test'], loc='upper left')
# summarize history for loss
plt.title('model loss')
plt.legend(['train', 'test'], loc='upper left')
I'm probably missing something, my accuracy is stuck on 0 and my test loss is lower than my train loss.
Any Insight would be appericiated
Accuracy on an autoencoder has little meaning, especially on a fraud detection algorithm. What I mean by this is that accuracy is not well defined on regression tasks. For example is it accurate to say that 0.1 is the same as 0.11. For the keras algorithm it is not. If you want to see how well your algorithm replicates the data I would suggest looking at the MSE or at the data itself. Many autoencoder use MSE as their loss function.
The metric you should be monitoring is the training loss on good examples vs the validation loss on fraudulent examples. There you can easily see if you can fit your real examples more closely than the fraudulent ones and how well your algorithm performs in practice.
Another design choice I would not make is relu in an autoencoder. ReLU works well with deeper model because of its simplicity and effectiveness in combating vanishing/exploding gradients. However, both of this concerns are non-factors in autoencoder and the loss of data hurts in an autoencoder. I would suggest using tanh as your intermediate activation function.
I'm trying to predict the water usage of a population.
I have 1 main input:
Water volume
and 2 secondary inputs:
In theory they have a relation with the water supply.
It must be said that each rainfall and temperature data correspond with the water volume. So this is a time series problem.
The problem is that I don't know how to use 3 inputs from just one .csv file, with 3 columns, each one for each input, as the code below is made. When I have just one input (e.g.water volume) the network works more or less good with this code, but not when I have more than one. (So if you run this code with the csv file below, it will show a dimension error).
Reading some answers from:
Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras
Time Series Forecast Case Study with Python: Annual Water Usage in Baltimore
it seems to be that many people have the same problem.
The code:
EDIT: Code has been updated
import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset) - look_back - 1):
a = dataset[i:(i + look_back), 0]
dataY.append(dataset[i + look_back, 2])
return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
# load the dataset
dataframe = pandas.read_csv('datos.csv', engine='python')
dataset = dataframe.values
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 3))
testX = numpy.reshape(testX, (testX.shape[0],look_back, 3))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.compile(loss='mean_squared_error', optimizer='adam')
history= model.fit(trainX, trainY,validation_split=0.33, nb_epoch=200, batch_size=32)
# Plot training
plt.title('model loss')
plt.legend(['entrenamiento', 'validación'], loc='upper right')
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# Get something which has as many features as dataset
trainPredict_extended = numpy.zeros((len(trainPredict),3))
# Put the predictions there
trainPredict_extended[:,2] = trainPredict[:,0]
# Inverse transform it and select the 3rd column.
trainPredict = scaler.inverse_transform(trainPredict_extended) [:,2]
# Get something which has as many features as dataset
testPredict_extended = numpy.zeros((len(testPredict),3))
# Put the predictions there
testPredict_extended[:,2] = testPredict[:,0]
# Inverse transform it and select the 3rd column.
testPredict = scaler.inverse_transform(testPredict_extended)[:,2]
trainY_extended = numpy.zeros((len(trainY),3))
testY_extended = numpy.zeros((len(testY),3))
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict))
print('Test Score: %.2f RMSE' % (testScore))
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, 2] = trainPredict
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, 2] = testPredict
plt.title('Consumo de agua')
plt.ylabel('cosumo (m3)')
plt.legend([serie,prediccion_entrenamiento,prediccion_test],['serie','entrenamiento','test'], loc='upper right')
This is the csv file I have created, if it helps.
After changing the code, I fixed all the errors, but I'm not really sure about the results. This is a zoom in the prediction plot:
which shows that there is a "displacement" in the values predicted and in the real ones. When there is a max in the real time series, there is a min in the forecast for the same time, but it seems like it corresponds to the previous time step.
a = dataset[i:(i + look_back), 0]
a = dataset[i:(i + look_back), :]
If you want the 3 features in your training data.
Then use
model.add(LSTM(4, input_shape=(look_back,3)))
To specify that you have look_back time steps in your sequence, each with 3 features.
It should run
Indeed, sklearn.preprocessing.MinMaxScaler()'s function : inverse_transform() takes an input which has the same shape as the object you fitted. So you need to do something like this :
# Get something which has as many features as dataset
trainPredict_extended = np.zeros((len(trainPredict),3))
# Put the predictions there
trainPredict_extended[:,2] = trainPredict
# Inverse transform it and select the 3rd column.
trainPredict = scaler.inverse_transform(trainPredict_extended)[:,2]
I guess you will have other issues like this below in your code but nothing that you can't fix :) the ML part is fixed and you know where the error comes from. Just check the shapes of your objects and try to make them match.
You can change what you are optimizing, for maybe better results. For example, try predicting binary 0,1 if there will be a 'spike up' for the next day. Then feed the probability of a 'spike up' as a feature to predict the usage itself.