keras LSTM poor prediction

keras LSTM poor prediction - python

i am trying to predict the value based on some sequence (i have 5 values like 1,2,3,4,5 and want preditc the next one - 6). I am using LSTM keras for that.
creating training data:
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM,Dense
a = [float(i) for i in range(1,100)]
a = np.array(a)
data_train = a[:int(len(a)*0.9)]
data_test = a[int(len(a)*0.9):]
x = 5
y = 1
z = 0
train_x = []
train_y = []
for i in data_train:
t = data_train[z:x]
r = data_train[x:x+y]
if len(r) == 0:
break
else:
train_x.append(t)
train_y.append(r)
z = z + 1
x = x+1
train_x = np.array(train_x)
train_y = np.array(train_y)
x = 5
y = 1
z = 0
test_x = []
test_y = []
for i in data_test:
t = data_test[z:x]
r = data_test[x:x+y]
if len(r) == 0:
break
else:
test_x.append(t)
test_y.append(r)
z = z + 1
x = x+1
test_x = np.array(test_x)
test_y = np.array(test_y)
print(train_x.shape,train_y.shape)
print(test_x.shape,test_y.shape)
transform it into LSTM freandly shape:
train_x_1 = train_x.reshape(train_x.shape[0],len(train_x[0]),1)
train_y_1 = train_y.reshape(train_y.shape[0],1)
test_x_1 = test_x.reshape(test_x.shape[0],len(test_x[0]),1)
test_y_1 = test_y.reshape(test_y.shape[0],1)
print(train_x_1.shape, train_y_1.shape)
print(test_x_1.shape, test_y_1.shape)
build and train model:
model = Sequential()
model.add(LSTM(32,return_sequences = False,input_shape=(trein_x_1.shape[1],1)))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x_1,
train_y_1,
epochs=20,
shuffle=False,
batch_size=1,
verbose=2,
validation_data=(test_x_1,test_y_1))
but I get a realy bad result, can somebody explaine me what I am doing wrong.
pred = model.predict(test_x_1)
for i,a in enumerate(pred):
print(pred[i],test_y_1[i])
[89.71895] [95.]
[89.87877] [96.]
[90.03465] [97.]
[90.18714] [98.]
[90.337006] [99.]
Thenks.

You expect the network to extrapolate from the data you used for training. Neural networks are not good at this. You could try to normalize your data so that you are not extrapolating anymore by, for example, using relative values instead of absolute values. That would make this example of course very trivial.

Please use first derivation for x (always delta x == 1 ), then it works. But this makes of course no real sense for this simple formular. Try something more complicated like my example:
LSTM prediction of damped sin curve
Source:
https://www.kaggle.com/maciejbednarz/lstm-example

Related

Python LSTM Bitcoin prediction flatlines

I'm currently trying to build a "simple" LSTM model that takes historical Bitcoin data, learns from that and then tries to predict the future X steps in advance.
I've build it on the idea that A + B + C = D so B + C + D should be E. (I think that's a very simple idea behind an LSTM model. I might be wrong however i'm pretty new to it.)
I managed to build the basics in python (I'm fairly new to python) but something seems off by the prediction. For some reason many of the predictions i test / make end up flatlining. I have a theory on why but we have no idea if it's correct and even less idea on how to solve it.
My theory is that within a sequence the model learns to put more importance / weight on the last digit in the sequence because with Bitcoin prices the future price (in 1 minute) is probably pretty close to the price now. That's try the predicted values keeps getting closer to the real value eventually being equal and thus flatlining in a graph. (I don't know if that makes sense but thats what i tought anyway.)
I've also added a screenshot of my graph from a few days ago. Almost all predictions however end similar to this graph. This is just a more extreme example as demonstration.
Here is my code, can someone please explain why it flatlines and what i did wrong?
import numpy as np
from matplotlib import pyplot
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from math import sqrt
from sklearn.metrics import mean_squared_error
# Create output sets X + Y from given input-set
# with inputset : a 1-dimensional list of floats
# with N : the number of lookback values to use for X
# with Gap : the number of point skipped between X and Y
# Y: is equal to input, (although the first N are missing)
# X: for each y of Y a corresponding set of size N is created
# composed of the N values preceeding y.
def create_lookback(inputset, n=1, gap=0):
print("create_lookback with n=%d gap=%d" % (n,gap))
print(" - length of inputset = %d" % len(inputset))
dataX, dataY = [], []
for i in range(len(inputset) - (n+gap)):
a = inputset[i:(i + n), 0]
dataX.append(a)
dataY.append(inputset[i + n+gap, 0])
print(" - length of dataY = %d" % len(dataY))
data_x = np.array(dataX)
xret = data_x.reshape(data_x.shape[0], 1, data_x.shape[1])
return xret, np.array(dataY)
# Train model based on given training-set + Test-set
def create_model(trainX,trainY,testX,testY):
model = Sequential()
model.add(LSTM(units = 100, input_shape=(trainX.shape[1], trainX.shape[2], )))
model.add(Dropout(0.2))
#model.add(LSTM(30, return_sequences=True))
#model.add(Dropout(0.1))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
history = model.fit(trainX, trainY, epochs=100, batch_size=5, validation_data=(testX, testY), verbose=1, shuffle=False)
return model
# Evaluate given X / Y set.
# - Calculate RMSE
# - Generate visual line-plot to screen
def show_result(scaler,yhat,setY,txt):
print("Show %s result" % txt)
yhat_inverse = scaler.inverse_transform(yhat.reshape(-1, 1))
testY_inverse = scaler.inverse_transform(setY.reshape(-1, 1))
if len(testY_inverse) == len(yhat_inverse):
rmse = sqrt(mean_squared_error(testY_inverse, yhat_inverse))
print(' RMSE %s : %.3f' % (txt,rmse))
pyplot.plot(yhat_inverse, label='predict '+txt)
pyplot.plot(testY_inverse, label='actual '+txt, alpha=0.5)
pyplot.legend()
pyplot.show()
# Extrapoleer is dutch for Extrapolate
def extrapoleer(i,model,tup,toekomst):
if(i == 0):
return
setX = np.array([[tup]])
y = model.predict(setX)
y_float = y[0][0]
tup_new = np.append(tup[1:], y_float)
toekomst.append(y_float)
extrapoleer(i-1, model, tup_new,toekomst)
# --- end of defined functions
# -- start of main flow
data_grid_1 = yf.download('BTC-USD', start="2021-04-14",end="2021-04-15", interval="1m");
data_grid_2 = yf.download('BTC-USD', period="12h", interval="1m");
dataset_1 = data_grid_1.iloc[:, 1:2].values
dataset_2 = data_grid_2.iloc[:, 1:2].values
scaler = MinMaxScaler(feature_range = (0, 1))
scaled = scaler.fit_transform(dataset_1)
# 70% of dataset_1 is used to train ; 30% to test
train_size = int(len(scaled) * 0.7)
test_size = len(scaled) - train_size
train, test = scaled[0:train_size,:], scaled[train_size:len(scaled),:]
print("train: %d test: %d" % (len(train), len(test)))
scaled_2 = scaler.fit_transform(dataset_2)
look_back_n = 3
look_back_gap = 0
trainX, trainY = create_lookback(train, look_back_n, look_back_gap)
testX, testY = create_lookback(test, look_back_n, look_back_gap)
testX_2, testY_2 = create_lookback(scaled_2, look_back_n, look_back_gap)
model = create_model(trainX,trainY,testX,testY)
yhat_1 = model.predict(testX)
yhat_2 = model.predict(testX_2)
show_result(scaler,yhat_1,testY,"test")
show_result(scaler,yhat_2,testY_2,"test2")
last_n = testY_2[-look_back_n:]
#toekomst = Future in dutch
toekomst = []
#aantal = Amount in Dutch, this indicates the amount if steps you want to future predict
aantal = 30
extrapoleer(aantal, model, last_n, toekomst)
print("Resultaat van %d voorspelde punten in de toekomst: " % aantal)
print(toekomst)
yhat_2_plus = np.append(yhat_2,toekomst)
show_result(scaler,yhat_2_plus,testY_2,"test2-plus")

How do you gather the elements of y_pred that do not correspond to the true label in a Keras/tf2.0 custom loss function?

Below is a simple example in numpy of what I would like to do:
import numpy as np
y_true = np.array([0,0,1])
y_pred = np.array([0.1,0.2,0.7])
yc = (1-y_true).astype('bool')
desired = y_pred[yc]
>>> desired
>>> array([0.1, 0.2])
So the prediction corresponding to the ground truth is 0.7, I want to operate on an array containing all the elements of y_pred, except for the ground truth element.
I am unsure of how to make this work within Keras. Here is a working example of the problem in the loss function. Right now 'desired' isn't accomplishing anything, but that is what I need to work with:
# using tensorflow 2.0.0 and keras 2.3.1
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras.layers import Input,Dense,Flatten
from tensorflow.keras.models import Model
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
# Convert class vectors to binary class matrices.
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
input_shape = x_train.shape[1:]
x_in = Input((input_shape))
x = Flatten()(x_in)
x = Dense(256,'relu')(x)
x = Dense(256,'relu')(x)
x = Dense(256,'relu')(x)
out = Dense(10,'softmax')(x)
def loss(y_true,y_pred):
yc = tf.math.logical_not(kb.cast(y_true, 'bool'))
desired = tf.boolean_mask(y_pred,yc,axis = 1) #Remove and it runs
CE = tf.keras.losses.categorical_crossentropy(
y_true,
y_pred)
L = CE
return L
model = Model(x_in,out)
model.compile('adam',loss = loss,metrics = ['accuracy'])
model.fit(x_train,y_train)
I end up getting an error
ValueError: Shapes (10,) and (None, None) are incompatible
Where 10 is the number of categories. The end purpose is to implement this: ComplementEntropy
in Keras, where my issue seems to be lines 26-28.

You can remove axis=1 from the Boolean_mask and it will run. And frankly, I don't see why you need axis=1 here.
def loss(y_true,y_pred):
yc = tf.math.logical_not(K.cast(y_true, 'bool'))
print(yc.shape)
desired = tf.boolean_mask(y_pred, yc) #Remove axis=1 and it runs
CE = tf.keras.losses.categorical_crossentropy(
y_true,
y_pred)
L = CE
return L
This is probably what happens. You have y_pred which is a 2D tensor (N=2). Then you have a 2D mask (K=2). But there's this condition K + axis <= N. And if you pass axis=1 this fails.

Using thushv89's answer, here is the full code for how I implemented COT on LeNet from the referenced paper. The one trick is I am not actually flipping back and forth between the two objectives, instead there is just a random weight that flips s.
# using tensorflow 2.0.0 and keras 2.3.1
import tensorflow.keras.backend as kb
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Input, Dense,Flatten,AveragePooling2D,GlobalAveragePooling2D
from tensorflow.keras.models import Model
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
#exapnd dims to fit chn format
x_train = np.expand_dims(x_train,axis=3)
x_test = np.expand_dims(x_test,axis=3)
# Convert class vectors to binary class matrices.
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
input_shape = x_train.shape[1:]
x_in = Input((input_shape))
act = 'tanh'
x = Conv2D(32, (5, 5), activation=act, padding='same',strides = 1)(x_in)
x = AveragePooling2D((2, 2),strides = (2,2))(x)
x = Conv2D(16, (5, 5), activation=act)(x)
x = AveragePooling2D((2, 2),strides = (2,2))(x)
conv_out = Flatten()(x)
z = Dense(120,activation = act)(conv_out)#120
z = Dense(84,activation = act)(z)#84
last = Dense(10,activation = 'softmax')(z)
model = Model(x_in,last)
def loss(y_true,y_pred, axis=-1):
s = kb.round(tf.random.uniform( (1,), minval=0, maxval=1, dtype=tf.dtypes.float32))
s_ = 1 - s
y_pred = y_pred + 1e-8
yg = kb.max(y_pred,axis=1)
yc = tf.math.logical_not(kb.cast(y_true, 'bool'))
yp_c = tf.boolean_mask(y_pred, yc)
ygc_ = 1/(1-yg+1e-8)
ygc_ = kb.expand_dims(ygc_,axis=1)
Px = yp_c*ygc_ +1e-8
COT = kb.mean(Px*kb.log(Px),axis=1)
CE = -kb.mean(y_true*kb.log(y_pred),axis=1)
L = s*CE +s_*(1/(10-1))*COT
return L
model.compile(loss=loss,
optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train,epochs=20,batch_size = 128,validation_data= (x_test,y_test))
pred = model.predict(x_test)
pred_label = np.argmax(pred,axis=1)
label = np.argmax(y_test,axis=1)
cor = (pred_label == label).sum()
acc = print('acc:',cor/label.shape[0])

TensorFlow simple XOR example not converging

I have the following code to learn a simple XOR network:
import tensorflow as tf
import numpy as np
def generate_xor(length=1000):
x = np.random.randint(0,2, size=(length,2))
y = []
for pair in x:
y.append(int(np.logical_xor(pair[0],pair[1])))
return x, np.array(y)
n_inputs = 2
n_hidden = n_inputs*4
n_outputs = 1
x = tf.placeholder(tf.float32, shape=[1,n_inputs])
y = tf.placeholder(tf.float32, [1, n_outputs])
W = tf.Variable(tf.random_uniform([n_inputs, n_hidden],-1,1))
b = tf.Variable(tf.zeros([n_hidden]))
W2 = tf.Variable(tf.random_uniform([n_hidden,n_outputs],-1,1))
b2 = tf.Variable(tf.zeros([n_outputs]))
def xor_model(data):
x = data
hidden_layer = tf.nn.relu(tf.matmul(x,W)+b)
output = tf.nn.relu(tf.matmul(hidden_layer, W2)+b2)
return output
xor_nn = xor_model(x)
cost = tf.reduce_mean(tf.abs(xor_nn - y))
train_step = tf.train.AdagradOptimizer(0.05).minimize(cost)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
x_data,y_data = generate_xor(length=100000)
errors = []
count = 0
out_freq = 1000
for xor_in, xor_out in zip(x_data,y_data):
_, err = sess.run([train_step, cost], feed_dict={x:xor_in.reshape(1,2), y:xor_out.reshape(1,n_outputs)})
errors.append(err)
count += 1
if count == out_freq:
tol = np.mean(errors[-out_freq:])
print tol
count = 0
if tol < 0.005:
break
n_tests = 100
correct = 0
count = 0
x_test, y_test = generate_xor(length=n_tests)
for xor_in, xor_out in zip(x_test, y_test):
output = sess.run([xor_nn], feed_dict={x:xor_in.reshape(1,2)})[0]
guess = int(output[0][0])
truth = int(xor_out)
if guess == truth:
correct += 1
count += 1
print "Model %d : Truth %d - Pass Rate %.2f" % (int(guess), int(xor_out), float(correct*100.0)/float(count))
However, I can't get the code to reliably converge. I have tried varying the size of the hidden layer, using different optimizers / step sizes and different initializations of the weights and biases.
I'm clearly making an elemental error. If anyone could help I'd be grateful.
EDIT:
Thanks to Prem and Alexander Svetkin I managed to spot my errors. Firstly I wasn't rounding the outputs when I cast them to ints, a schoolboy mistake. Secondly I had a relu on the output layer which wasn't needed - a copy and paste mistake. Thirdly relu is indeed a bad choice of activation function for this task, using a sigmoid function works much better.
So this:
hidden_layer = tf.nn.relu(tf.matmul(x,W)+b)
output = tf.nn.relu(tf.matmul(hidden_layer, W2)+b2)
becomes this:
hidden_layer = tf.nn.sigmoid(tf.matmul(x,W)+b)
output = tf.matmul(hidden_layer, W2)+b2
and this:
guess = int(output[0][0])
becomes this:
guess = int(output[0][0]+0.5)

Shouldn't you only return the activation function of output layer instead of relu?
output = tf.matmul(hidden_layer, W2) + b2

ReLU just isn't right activation function for binary classification task, use something different, like sigmoid function.
Pay attention to your float output values. 0.99 should mean 1 or 0? Use rounding.

Tensorflow practice y = xw how do you initialize vector x?

I am studying with first machine-learning practice.
This is the prediction system of monthly temperature.
train_t has the temperatures and train_x has the weight for each data.
However I have a question where initializing train_x
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint
x = tf.placeholder(tf.float32,[None,5])
w = tf.Variable(tf.zeros([5,1]))
y = tf.matmul(x,w)
t = tf.placeholder(tf.float32,[None,1])
loss = tf.reduce_sum(tf.square(y-t))
train_step = tf.train.AdamOptimizer().minimize(loss)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
train_t = np.array([5.2,5.7,8.6,14.9,18.2,20.4,25.5,26.4,22.8,17.5,11.1,6.6]) #montly temperature
train_t = train_t.reshape([12,1])
train_x = np.zeros([12,5])
for row, month in enumerate(range(1,13)):
for col, n in enumerate(range(0,5)):
train_x[row][col] = month**n ## why initialize like this??
i = 0
for _ in range(10000):
i += 1
sess.run(train_step,feed_dict={x:train_x,t:train_t})
if i % 1000 == 0:
loss_val = sess.run(loss,feed_dict={x:train_x,t:train_t})
print('step : %d,Loss: %f' % (i,loss_val))
w_val = sess.run(w)
pprint(w_val)
def predict(x):
result = 0.0
for n in range(0,5):
result += w_val[n][0] * x**n
return result
fig = plt.figure()
subplot = fig.add_subplot(1,1,1)
subplot.set_xlim(1,12)
subplot.scatter(range(1,13),train_t)
linex = np.linspace(1,12,100)
liney = predict(linex)
subplot.plot(linex, liney)
However I don't understand here
for row, month in enumerate(range(1,13)): #
for col, n in enumerate(range(0,5)): #
train_x[row][col] = month**n ## why initialize like this??
What does this mean??
There is no comment about this in my book??
Why train_x is initialized here??

In fact, this bloc of code:
train_t = np.array([5.2,5.7,8.6,14.9,18.2,20.4,25.5,26.4,22.8,17.5,11.1,6.6]) #montly temperature
train_t = train_t.reshape([12,1])
train_x = np.zeros([12,5])
for row, month in enumerate(range(1,13)):
for col, n in enumerate(range(0,5)):
train_x[row][col] = month**n
Is the generation of your data. It initialize train_t and train_x which are the data that will be injected into placeholders x and t
train_t is a tensor of temperatures
train_x is a tensor of sort of weight of each temperatures.
They constitute the dataset.

Both train_x and train_t are arrays with your training data. In the array train_t you have the target of your model, while train_x contains the features in input to your model.
The weights of your model (the ones that are trained) are w (which is the only tf.Variable in your code), which is initialized randomly.
The model you are training is a degree 4 (which is the max of range(0, 5)) polynomial on the linear variable month which ranges in range(1, 13). That snipped of code generates the features for a degree 4 polynomial starting from the linear variable month.

Getting very high values in linear regression

I am trying to make a simple MLP to predict values of a pixel of an image - original blog .
Here's my earlier attempt using Keras in python - link
I've tried to do the same in tensorflow, but I am getting very large output values (~10^12) when they should be less than 1.
Here's my code:
import numpy as np
import cv2
from random import shuffle
import tensorflow as tf
'''
Image preprocessing
'''
image_file = cv2.imread("Mona Lisa.jpg")
h = image_file.shape[0]
w = image_file.shape[1]
preX = []
preY = []
for i in xrange(h):
for j in xrange(w):
preX.append([i,j])
preY.append(image_file[i,j,:].astype('float32')/255.0)
print preX[:5], preY[:5]
zipped = [i for i in zip(preX,preY)]
shuffle(zipped)
X_train = np.array([i for (i,j) in zipped]).astype('float32')
Y_train = np.array([j for (i,j) in zipped]).astype('float32')
print X_train[:10], Y_train[:10]
'''
Tensorflow code
'''
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
x = tf.placeholder(tf.float32, shape=[None,2])
y = tf.placeholder(tf.float32, shape=[None,3])
'''
Layers
'''
w1 = weight_variable([2,300])
b1 = bias_variable([300])
L1 = tf.nn.relu(tf.matmul(X_train,w1)+b1)
w2 = weight_variable([300,3])
b2 = bias_variable([3])
y_model = tf.matmul(L1,w2)+b2
'''
Training
'''
# criterion
MSE = tf.reduce_mean(tf.square(tf.sub(y,y_model)))
# trainer
train_op = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(MSE)
nb_epochs = 10
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
cost = 0
for i in range(nb_epochs):
sess.run(train_op, feed_dict ={x: X_train, y: Y_train})
cost += sess.run(MSE, feed_dict ={x: X_train, y: Y_train})
cost /= nb_epochs
print cost
'''
Prediction
'''
pred = sess.run(y_model,feed_dict = {x:X_train})*255.0
print pred[:10]
output_image = []
index = 0
h = image_file.shape[0]
w = image_file.shape[1]
for i in xrange(h):
row = []
for j in xrange(w):
row.append(pred[index])
index += 1
row = np.array(row)
output_image.append(row)
output_image = np.array(output_image)
output_image = output_image.astype('uint8')
cv2.imwrite('out_mona_300x3_tf.png',output_image)

First of all, I think that instead of running the train_op and then the MSE
you can run both ops in a list and reduce your computational cost significantly.
for i in range(nb_epochs):
cost += sess.run([MSE, train_op], feed_dict ={x: X_train, y: Y_train})
Secondly, I suggest always writing out your cost function so you can see what is going on during the training phase. Either manually print it out or use tensorboard to log your cost and plot it (you can find examples on the official tf page).
You can also monitor your weights to see that they aren't blowing up.
A few things you can try:
Reduce learning rate, add regularization to weights.
Check that your training set (pixels) really consist of the values that
you expect them to.

You give the input layer weights and the output layer weights the same names w and b, so it seems something goes wrong in the gradient-descent procedure. Actually I'm surprised tensorflow doesn't issue an error or at leas a warning (or am I missing something?)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

keras LSTM poor prediction - python

Please use first derivation for x (always delta x == 1 ), then it works. But this makes of course no real sense for this simple formular. Try something more complicated like my example: LSTM prediction of damped sin curve Source: https://www.kaggle.com/maciejbednarz/lstm-example

Related

Python LSTM Bitcoin prediction flatlines

How do you gather the elements of y_pred that do not correspond to the true label in a Keras/tf2.0 custom loss function?

TensorFlow simple XOR example not converging

Tensorflow practice y = xw how do you initialize vector x?

Getting very high values in linear regression

Categories

Resources