I'm trying to test an LSTM model on the following time series:
As you can see it is stationary and periodic (not that this matters, but it should be pretty easy for a neural net to pick up). This is in fact a coordinate of a simple pendulum vs time.
The steps for preprocessing are the following:
Scale this array using MinMaxScaler.
My model will predict x[t] using x[t-1] up to x[t-5]
scaler = MinMaxScaler()
X = scaler.fit_transform(x.reshape(-1,1))
lookback = 5
features=1
model_input, labels = [],[]
for i in range(X.shape[0]-lookback):
model_input.append(X[i:i+lookback])
labels.append(X[i+lookback])
model_input = np.asarray(model_input)
labels = np.asarray(labels)
model_input.shape, labels.shape
which returns ((495,5,1), (495,1)) this makes sense because my t has 500 steps.
Then I build and train the model:
#train on the first 400 steps, predict on the next 100
train_in, train_out = model_input[:400], labels[:400]
test_out = labels[400:]
model = Sequential()
model.add(LSTM(64, input_shape = (lookback, features))) #input shape is (batch, timesteps, features)
model.add(Dense(1))
model.compile(optimizer = 'adam', loss = 'mse')
#train
model.fit(train_in, train_out, epochs = 30)
Finally, I want to test my model. I don't see the point of using predict here. I want to use the last 5 coordinates in the training set to generate a prediction for the first step in the testing set. Then, I will use this prediction as an input to calculate the next position. And so on...
Here is the code:
#now we make predictions
preds = []
preds_input = train_in[-1:] #to make the first prediction on the test set, we start with the last batch of the training set
for i in range(test_out.shape[0]):
#the next step is the prediction on preds_input
next_step = model.predict(preds_input, verbose=0)
#append next_step to preds
preds.append(next_step)
#append next_step to preds_input and remove the first value so it keeps shape 1,5,1
preds_input = np.append(preds_input,next_step.reshape(1,1,1), axis=1)
preds_input = preds_input[:, 1:, :,]
I then rescaled the predictions and the testing data using inverse_transform and plotted the results.
This is what I got
I'm not able to understand why my model performed so poorly. The pattern is simple and it should've been able to pick it up. Any help would be great!
Related
While I'm able to understand how to use model.fit(x_train, y_train), I can't figure out how to make predictions on new data using tensorflow's gradient tape. My github repository with runnable code (up to an error) can be found here. What is currently working is that I get the trained model "network_output", however it appears that with gradient tape, argmax is being used on the model itself, where I'm used to model.fit() taking the test data as an input:
network_output = trained_network(input_images,input_number)
preds = np.argmax(network_output, axis=1)
Where "input_images" is an ndarray: (20,3,3,1) and "input_number" is an ndarray: (20,5).
Now I'm taking network_output as the trained model and would like to use it to predict similarly typed data of test_images, and test_number respectively.
The error 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'predict' here:
predicted_number = network_output.predict(test_images)
Which is because I don't know how to use the tape to make predictions. However once the prediction works I would guess I can compare the resulting "predicted_number" against the "test_number" as would usually be done using the model.fit method.
acc = 0
for i in range(len(test_images)):
if (predicted_number[i] == test_number[i]):
acc += 1
print("Accuracy: ", acc / len(input_images) * 100, "%")
In order to obtain prediction I usually iterate through batches manually like this:
predictions = []
for batch in range(num_batch):
logits = trained_network(x_test[batch * batch_size: (batch + 1) * batch_size], training=False)
# first obtain probabilities
# (if the last layer of the network has no activation, otherwise skip the softmax here)
prob = tf.nn.softmax(logits)
# putting back together predictions for all batches
predictions.extend(tf.argmax(input=prob, axis=1))
If you don't have a lot of data you can skip the loop, this is faster than using predict because you directly invoke the __call__ method of the model:
logits = trained_network(x_test, training=False)
prob = tf.nn.softmax(logits)
predictions = tf.argmax(input=prob, axis=1)
Finally you could also use predict. In this case the batches are handled automatically. It is easier to use when you have lots of data since you don't have to create a loop to interate through batches. The result is a numpy array of predictions. In can be used like this:
predictions = trained_network.predict(x_test) # you can set a batch_size if you want
What you're doing wrong is this part:
network_output = trained_network(input_images,input_number)
predicted_number = network_output.predict(test_images)
You have to call predict directly on your model trained_network.
I want to forecast multiple steps ahead using ARIMA. I am deriving my hyper-params with a grid search.
I want to achieve multiple one-step forecasts. However, I do not want a rowling forecast that uses new actual observations, but I want the model to only rely on data in a test set and its own predictions (if predictions are well into the future).
Can anyone tell me what the difference between these three implementations is and if any of them matches my requirements?
no 1
In the first example, the whole data set (df = item) is passed to the model. Does this mean that the model is using actuals as lags instead of predictions at some point?
preds =item[0:len_train]
model = ARIMA(item, (4, 2, 1))
fit = model.fit()
for i in range(0,len_test):
pred = fit.predict(len_train+i,len_train+i)
preds = preds.append(pred)
no 2
train, test = item[0:len_train], item[len_train:]
model = ARIMA(train, order=(4,2,1))
model_fit = model.fit(displ=False)
forecast = model_fit.predict(start=len_train, end=len(item)-1, dynamic=False)
Predictions seem to saturate at some point; it seems that the model is not reusing its own predictions.
no 3
This is an attempt to fit a new model that incorporates the new data after each one-step forecast. However, I do not want this. If I append predictions instead of actual observations to the 'history' forecasts are becoming quickly very extreme.
print('Printing Predicted vs Expected Values...')
for t in range(len(test)):
model = ARIMA(history, order=(4,2,1))
model_fit = model.fit(disp=0)
output = model_fit.forecast()
print('output', output)
yhat = output[0]
predictions.append(float(yhat))
obs = test.values[t]
history.append(obs)
print('predicted=%f, expected=%f' % (np.exp(yhat), np.exp(obs)))
Thanks a lot!
I having issues with how many number of LSTMs to have for a multi-variate forecasting.
The dataset I'm working on:
Weather Dataset (kaggle)
df_final = df.loc['2011-01-01':'2014-10-31']
df_final['Temperature (C)'].plot(figsize=(28,6))
In the end I want to forecast temperature.(other paremeters too, but mainly temperature)
The data has hourly readings.
# How many rows per month?
rows_per_months=24*30
test_months = 12 #number of months we want to predict in the future.
test_indices = test_months*720
test_indices
# train and test split:
train = df_final.iloc[:-test_indices]
# Choose the variable/parameter you want to predict
test = df_final.iloc[-test_indices:]
len(train)
#[op]: 24960
scaler = MinMaxScaler()
scaled_train = scaler.fit_transform(train)
scaled_test = scaler.transform(test)
#define generator:
length = rows_per_months#Length of output sequences (in number of timesteps)
batch_size = 30 #Number of timeseries sample in batch
generator = tf.keras.preprocessing.sequence.TimeseriesGenerator(scaled_train,scaled_train,length=length,batch_size=batch_size)
Defining Model
# define model
model = Sequential()
model.add(tf.keras.layers.LSTM(50, input_shape=(length,scaled_train.shape[1])))
#NOTE: Do not specify the activation function for LSTM layers, this is because it will not run on GPU.
model.add(Dense(scaled_train.shape[1]))
model.compile(optimizer='adam', loss='mse')
After training,
Evaluating the model on Test Data:
first_eval_batch = scaled_train[-length:]
first_eval_batch.shape
first_eval_batch = first_eval_batch.reshape((1,length,scaled_train.shape[1]))
n_features = scaled_test.shape[1] #n_features = scaled_train.shape[1] =250 (for predicting all parameters in the next time stamp)
# print(n_features) = 1 #Since we are only predicting temperature.
test_predictions = []
first_eval_batch = scaled_train[-length:]
current_batch = first_eval_batch.reshape((1, length, n_features))
print(current_batch.shape)
#output:(1, 720, 4)
for i in range(len(test)):
#Get prediction 1 time stamp ahead
current_pred = model.predict(current_batch)[0]
#store prediction
test_predictions.append(current_pred)
#update the current batch to now include the prediction and drop the first value.
current_batch = np.append(current_batch[:,1:,:],[[current_pred]],axis=1)
true_predictions = pd.DataFrame(data=true_predictions,columns=test.columns,index=test.index)
true_predictions = scaler.inverse_transform(test_predictions)
The resultant Dataframe:
result_df = pd.concat([test['Temperature (C)'], true_predictions['Temperature (C)']],axis=1)
result_df.plot(figsize=(28,8))
Since the model does not predict the correct value of test data, I cannot proceed further with the forecasting.
How do I fix this?
PS:
I have tried playing with the number of layers and number of lstm units in layers but nothing seems to work. I have also tried to use different activation functions such as relu but it will take more than 20 hrs to finish 1 epoch, since the model wouldn't train on GPU unless the LSTMs have default functions (i.e tanh)
I'm learning Neural Networks and currently implemented object classification on CFAR-10 dataset using Keras library. Here is my definition of a neural network defined by Keras:
# Define the model and train it
model = Sequential()
model.add(Dense(units = 60, input_dim = 1024, activation = 'relu'))
model.add(Dense(units = 50, activation = 'relu'))
model.add(Dense(units = 60, activation = 'relu'))
model.add(Dense(units = 70, activation = 'relu'))
model.add(Dense(units = 30, activation = 'relu'))
model.add(Dense(units = 10, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=10000)
So I've 1 input layer having the input of dimensions 1024 or (1024, ) (each image of 32 * 32 *3 is first converted to grayscale resulting in dimensions of 32 * 32), 5 hidden layers and 1 output layer as defined in the above code.
When I train my model over 50 epochs, I got the accuracy of 0.9 or 90%. Also when I evaluate it using test dataset, I got the accuracy of approx. 90%. Here is the line of code which evaluates the model:
print (model.evaluate(X_test, y_test))
This prints following loss and accuracy:
[1.611809492111206, 0.8999999761581421]
But When I calculate the accuracy manually by making predictions on each test data images, I got accuracy around 11% (This is almost the same as probability randomly making predictions). Here is my code to calculate it manually:
wrong = 0
for x, y in zip(X_test, y_test):
if not (np.argmax(model.predict(x.reshape(1, -1))) == np.argmax(y)):
wrong += 1
print (wrong)
This prints out 9002 out of 10000 wrong predictions. So what am I missing here? Why both accuracies are exactly reverse (100 - 89 = 11%) of each other? Any intuitive explanation will help! Thanks.
EDIT:
Here is my code which processes the dataset:
# Process the training and testing data and make in Neural Network comfortable
# convert given colored image to grayscale
def rgb2gray(rgb):
return np.dot(rgb, [0.2989, 0.5870, 0.1140])
X_train, y_train, X_test, y_test = [], [], [], []
def process_batch(batch_path, is_test = False):
batch = unpickle(batch_path)
imgs = batch[b'data']
labels = batch[b'labels']
for img in imgs:
img = img.reshape(3,32,32).transpose([1, 2, 0])
img = rgb2gray(img)
img = img.reshape(1, -1)
if not is_test:
X_train.append(img)
else:
X_test.append(img)
for label in labels:
if not is_test:
y_train.append(label)
else:
y_test.append(label)
process_batch('cifar-10-batches-py/data_batch_1')
process_batch('cifar-10-batches-py/data_batch_2')
process_batch('cifar-10-batches-py/data_batch_3')
process_batch('cifar-10-batches-py/data_batch_4')
process_batch('cifar-10-batches-py/data_batch_5')
process_batch('cifar-10-batches-py/test_batch', True)
number_of_classes = 10
number_of_batches = 5
number_of_test_batch = 1
X_train = np.array(X_train).reshape(meta_data[b'num_cases_per_batch'] * number_of_batches, -1)
print ('Shape of training data: {0}'.format(X_train.shape))
# create labels to one hot format
y_train = np.array(y_train)
y_train = np.eye(number_of_classes)[y_train]
print ('Shape of training labels: {0}'.format(y_train.shape))
# Process testing data
X_test = np.array(X_test).reshape(meta_data[b'num_cases_per_batch'] * number_of_test_batch, -1)
print ('Shape of testing data: {0}'.format(X_test.shape))
# create labels to one hot format
y_test = np.array(y_test)
y_test = np.eye(number_of_classes)[y_test]
print ('Shape of testing labels: {0}'.format(y_test.shape))
The reason why this is happening is due to the loss function that you are using. You are using binary cross entropy where you should be using categorical cross entropy as the loss. Binary is only for a two-label problem but you have 10 labels here due to CIFAR-10.
When you show the accuracy metric, it is in fact misleading you because it is showing binary classification performance. The solution is to retrain your model by choosing categorical_crossentropy.
This post has more details: Keras binary_crossentropy vs categorical_crossentropy performance?
Related - this post is answering a different question, but the answer is essentially what your problem is: Keras: model.evaluate vs model.predict accuracy difference in multi-class NLP task
Edit
You mentioned that the accuracy of your model is hovering at around 10% and not improving in your comments. Upon examining your Colab notebook and when you change to categorical cross-entropy, it appears that you are not normalizing your data. Because the pixel values are originally unsigned 8-bit integer, when you create your training set it promotes the values to floating-point, but because of the dynamic range of the data, your neural network has a hard time learning the right weights. When you try to update the weights, the gradients are so small that there are essentially no updates and hence your network is performing just like random chance. The solution is to simply divide your training and test dataset by 255 before you proceed:
X_train /= 255.0
X_test /= 255.0
This will transform your data so that the dynamic range scales from [0,255] to [0,1]. Your model will have an easier time training due to the smaller dynamic range, which should help gradients propagate and not vanish because of the larger scale before normalizing. Because your original model specification has a significant number of dense layers, due to the dynamic range of your data the gradient updates will most likely vanish which is why the performance is poor initially.
When I run your notebook, I get 37% accuracy. This is not unexpected with CIFAR-10 and only a fully-connected / dense network. Also when you run your notebook now, the accuracy and the fraction of wrong examples match.
If you want to increase accuracy, I have a couple of suggestions:
Actually include colour information. Each object in CIFAR-10 has a distinct colour profile that should help in discrimination
Add Convolutional layers. I'm not sure where you are in your learning, but convolutional layers help in learning and extracting the right features in the image so that the most optimal features are presented to the dense layers so that classification on these features increases accuracy. Right now you're classifying raw pixels, which is not advisable given how noisy they can be, or due to how unconstrained things can get (rotation, translation, skew, scale, etc.).
I have data set that has two inputs x1,x2 and output that has 1 binary value (0,1) and 45 real numbers (output vector has 46 attibutes in summary). I would like to use different loss functions for this 1 binary value and 45 real numbers, namely binary crossentropy and mean squared error.
My knowledge of Keras is very limited, so I am not even sure if this is the architecture I want. Is this the right way of doing this?
first, preprocessing:
# load dataset
dataframe = pandas.read_csv("inputs.csv", delim_whitespace=True,header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:2]
Y = dataset[:,3:]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,
random_state=123)
y_train_L, y_train_R = y_train[:,0], y_train[:,1:]
y_train_L = y_train_L.reshape(-1,1)
scalarX, scalarY_L, scalarY_R = MinMaxScaler(), MinMaxScaler(), MinMaxScaler()
scalarX.fit(x_train)
scalarY_L.fit(y_train_L)
scalarY_R.fit(y_train_R)
x_train = scalarX.transform(x_train)
y_train_L = scalarY_L.transform(y_train_L)
y_train_R = scalarY_R.transform(y_train_R)
where y_train_L is left part are just binary values and y_train_R are real numbers. I had to split them because when defining architecture:
# define and fit the final model
inputs = Input(shape=(x_train.shape[1],))
first =Dense(46, activation='relu')(inputs)
#last
layer45 = Dense(45, activation='linear')(first)
layer1 = Dense(1, activation='tanh')(first)
out = [layer1,layer45]
#end last
model = Model(inputs=inputs,outputs=out)
model.compile(loss=['binary_crossentropy','mean_squared_error'], optimizer='adam')
model.fit(x_train, [y_train_L,y_train_R], epochs=1000, verbose=1)
Xnew = scalarX.transform(x_test)
y_test_L, y_test_R = y_test[:,0], y_test[:,1:]
y_test_L = y_test_L.reshape(-1,1)
y_test_L=scalarY_L.transform(y_test_L)
y_test_R=scalarY_R.transform(y_test_R)
# make a prediction
ynew = model.predict(Xnew)
loss=['binary_crossentropy','mean_squared_error'] expects two different arrays in model.fit(x_train, [y_train_L,y_train_R])
then i have to do all the 'funny' tricks to get predicted values and compare them next to each other because ynew = model.predict(Xnew) return list of two lists, one for binary values and one for real numbers.
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
print("SCALED VALUES")
for i in range(len(Xnew)):
print("X=%s\n P=%s,%s\n A=%s,%s" % (Xnew[i], ynew[0][i], ynew[1][i], y_test_L[i], y_test_R[i]))
inversed_X_test = scalarX.inverse_transform(Xnew)
inversed_Y_test_L = scalarY_L.inverse_transform(y_test_L)
inversed_Y_test_R = scalarY_R.inverse_transform(y_test_R)
inversed_y_predicted_L = scalarY_L.inverse_transform(ynew[0])
inversed_y_predicted_R = scalarY_R.inverse_transform(ynew[1])
print("REAL VALUES")
for i in range(len(inversed_X_test)):
print("X=%s\n P=%s,%s\n A=%s,%s" % (inversed_X_test[i], inversed_y_predicted_L[i],inversed_y_predicted_R[i], inversed_Y_test_L[i],inversed_Y_test_R[i]))
questions:
Can I achieve this in cleaner way?
How can I measure loss? I would like to create chart of loss values during trening.
1) The way you define your model seems correct and there is no 'cleaner' way of doing it (I would argue that Keras' functional API is as clean as it gets)
2) To visualize training loss, store the history of training in a variable:
history = model.fit(...)
This history object will contain the train and validation losses for each epoch, you can use itto make plots.
3) In your classification output (layer1), you want to use a sigmoid activation instead of tanh. The sigmoid function returns values between 0 and 1, tanh returns values between -1 and 1. Your binary_crossentropy loss function expects the former.