I'm working on some Artificial Intelligence project and I want to predict the bitcoin trend but while using the model.predict function from Keras with my test_set, the prediction is always equal to 1 and the line in my diagram is therefor always straight.
import csv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from cryptory import Cryptory
from keras.models import Sequential, Model, InputLayer
from keras.layers import LSTM, Dropout, Dense
from sklearn.preprocessing import MinMaxScaler
def format_to_3d(df_to_reshape):
reshaped_df = np.array(df_to_reshape)
return np.reshape(reshaped_df, (reshaped_df.shape[0], 1, reshaped_df.shape[1]))
crypto_data = Cryptory(from_date = "2014-01-01")
bitcoin_data = crypto_data.extract_coinmarketcap("bitcoin")
sc = MinMaxScaler()
for col in bitcoin_data.columns:
if col != "open":
del bitcoin_data[col]
training_set = bitcoin_data;
training_set = sc.fit_transform(training_set)
# Split the data into train, validate and test
train_data = training_set[365:]
# Split the data into x and y
x_train, y_train = train_data[:len(train_data)-1], train_data[1:]
model = Sequential()
model.add(LSTM(units=4, input_shape=(None, 1))) # 128 -- neurons**?
# model.add(Dropout(0.2))
model.add(Dense(units=1, activation="softmax")) # activation function could be different
model.compile(optimizer="adam", loss="mean_squared_error") # mse could be used for loss, look into optimiser
model.fit(format_to_3d(x_train), y_train, batch_size=32, epochs=15)
test_set = bitcoin_data
test_set = sc.transform(test_set)
test_data = test_set[:364]
input = test_data
input = sc.inverse_transform(input)
input = np.reshape(input, (364, 1, 1))
predicted_result = model.predict(input)
print(predicted_result)
real_value = sc.inverse_transform(input)
plt.plot(real_value, color='pink', label='Real Price')
plt.plot(predicted_result, color='blue', label='Predicted Price')
plt.title('Bitcoin Prediction')
plt.xlabel('Time')
plt.ylabel('Prices')
plt.legend()
plt.show()
The training set performance looks like this:
1566/1566 [==============================] - 3s 2ms/step - loss: 0.8572
Epoch 2/15
1566/1566 [==============================] - 1s 406us/step - loss: 0.8572
Epoch 3/15
1566/1566 [==============================] - 1s 388us/step - loss: 0.8572
Epoch 4/15
1566/1566 [==============================] - 1s 388us/step - loss: 0.8572
Epoch 5/15
1566/1566 [==============================] - 1s 389us/step - loss: 0.8572
Epoch 6/15
1566/1566 [==============================] - 1s 392us/step - loss: 0.8572
Epoch 7/15
1566/1566 [==============================] - 1s 408us/step - loss: 0.8572
Epoch 8/15
1566/1566 [==============================] - 1s 459us/step - loss: 0.8572
Epoch 9/15
1566/1566 [==============================] - 1s 400us/step - loss: 0.8572
Epoch 10/15
1566/1566 [==============================] - 1s 410us/step - loss: 0.8572
Epoch 11/15
1566/1566 [==============================] - 1s 395us/step - loss: 0.8572
Epoch 12/15
1566/1566 [==============================] - 1s 386us/step - loss: 0.8572
Epoch 13/15
1566/1566 [==============================] - 1s 385us/step - loss: 0.8572
Epoch 14/15
1566/1566 [==============================] - 1s 393us/step - loss: 0.8572
Epoch 15/15
1566/1566 [==============================] - 1s 397us/step - loss: 0.8572
I'm supposed to print a plot with the Real Price and the Predicted Price, the Real Price is displayed properly but the Predicted price is only a straight line because of that model.predict that only contains the value 1.
Thanks in advance!
You're trying to predict a price value, that is, you're aiming at solving a regression problem and not a classification problem.
However, in your last layer of the network (model.add(Dense(units=1, activation="softmax"))), you have a single neuron (which would be adequate for a regression problem), but you've chosen to use a softmax activation function. The softmax function is used in multi-class classification problems, to normalize the outputs into a probability distribution. If you have a single output neuron and you apply softmax, the final result will always 1.0, as it is the only parameter of the probability distribution.
In summary, for regression problems you do not use an activation function, as the network is intended to already output the predicted value.
Related
I have tried over and over with different approaches to building this model however I continue to run into this issue where my training accuracy steadily increases just fine but my validation and evaluation accuracy remains very low (55% - 65%).
Epoch 95/100
119/119 [==============================] - 0s 2ms/step - loss: 0.6326 - accuracy: 0.8057 - val_loss: 2.0461 - val_accuracy: 0.5985
Epoch 96/100
119/119 [==============================] - 0s 2ms/step - loss: 0.6485 - accuracy: 0.7990 - val_loss: 1.9512 - val_accuracy: 0.5909
Epoch 97/100
119/119 [==============================] - 0s 2ms/step - loss: 0.6263 - accuracy: 0.8032 - val_loss: 2.0344 - val_accuracy: 0.5682
Epoch 98/100
119/119 [==============================] - 0s 2ms/step - loss: 0.6249 - accuracy: 0.7990 - val_loss: 2.0183 - val_accuracy: 0.5682
Epoch 99/100
119/119 [==============================] - 0s 2ms/step - loss: 0.6189 - accuracy: 0.8007 - val_loss: 2.0818 - val_accuracy: 0.5758
Epoch 100/100
119/119 [==============================] - 0s 2ms/step - loss: 0.6261 - accuracy: 0.8024 - val_loss: 2.0591 - val_accuracy: 0.5833
18/18 [==============================] - 0s 1ms/step - loss: 2.2385 - accuracy: 0.5628
EVAL:
[2.238506317138672, 0.5628318786621094]
The entire script is as follows:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from keras import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
import numpy as np
from keras.utils import to_categorical
def count_classes(y: list):
counts = {}
for i in y:
if i in counts.keys():
counts[i] += 1
else:
counts[i] = 1
return counts
dataset = pd.read_csv('Dementia-data.csv')
X= dataset.iloc[:,1:]
y= dataset.iloc[:,0] # roughly 2562 input variables
X.head(2)
#standardizing the input feature
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle = True, random_state = 0, test_size=0.2)
print("TRAIN:")
print(count_classes(list(y_train)))
print("TEST:")
print(count_classes(list(y_test)))
sc = StandardScaler()
scaler = sc.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
optimizer = Adam(lr=0.00005)
classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(32, activation='relu', kernel_initializer='random_normal', kernel_regularizer=regularizers.l2(0.005), input_dim=2562))
#Second Hidden Layer
classifier.add(Dropout(0.2))
classifier.add(Dense(64, activation='relu', kernel_initializer='random_normal', kernel_regularizer=regularizers.l2(0.005)))
#Output Layer
classifier.add(Dropout(0.2))
classifier.add(Dense(64, activation='relu', kernel_initializer='random_normal'))
classifier.add(Dense(7, activation='softmax', kernel_initializer='random_normal'))
#Compiling the neural network
classifier.compile(optimizer =optimizer,loss='sparse_categorical_crossentropy', metrics =['accuracy'])
#Fitting the data to the training dataset
classifier.fit(X_train,y_train, batch_size=20, epochs=200, validation_split=0.1, shuffle=True)
eval_model=classifier.evaluate(X_test, y_test)
print("EVAL:")
print(eval_model)
I have tried all sorts of things to try to combat overfittings such as regulators, dropouts, and data splitting because that seems to be the lead cause for problems like this one. I do not know if I am missing something.
The class distribution for the test and train data is as follows, it doesn't look like there is any inconsistencies between the 2 datasets in terms of ratios of classses:
TRAIN:
{2: 822, 4: 136, 6: 229, 5: 184, 3: 76, 1: 57}
TEST:
{2: 199, 6: 59, 5: 45, 1: 26, 3: 15, 4: 33}
The plotted accuracy for training and testing:
The plotted loss for training and testing, the loss clearly increases for the testing data:
I have been struggling with this for weeks now and I'd truly appreciate any help to get this working. I have also tried using learning rates between 0.05 and 0.00005 with little to no improvement.
Im trying to approximate the function y=x^2, but the results are completely wrong. If i predict 2 the y number is -164455.89.
import tensorflow as tf
import numpy as np
from tensorflow import keras
xs = []
ys = []
for i in range(0,1000):
xs.append(i)
ys.append(i*i)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=1,input_shape=[1]))
model.add(tf.keras.layers.Dense(128,))
model.add(tf.keras.layers.Dense(128,))
model.add(tf.keras.layers.Dense(1,))
model.compile(optimizer='adam',loss = 'mean_squared_error')
model.fit(xs,ys,epochs=1000)
model.predict([2])
You need to consider:
Transform your xs and ys to range (0, 1). You can use sklearn.preprocessing.MinMaxScaler.
You can use numpy.linspace for creating more numbers for the range that you want.
Use a larger network and more neurons in the first Dense layer for getting better results.
Make sure to use activation=relu in the all Dense layer except the last layer.
Example Code: (for model.predict([[2]]) -> we get [[4.008407]])
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
xs = np.linspace(-200,200,5000).reshape(-1,1)
ys = xs**2
xs = MinMaxScaler().fit_transform(xs)
ys = MinMaxScaler().fit_transform(ys)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=128,input_shape=(1,), activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1,))
model.compile(optimizer='adam',loss = 'mae')
model.fit(xs,ys,epochs=300, batch_size=64)
y_pred = model.predict(([[2]]))
print((y_pred))
y_pred = model.predict(xs, batch_size=16)
plt.plot(xs.reshape(-1), y_pred, 'r')
plt.plot(xs.reshape(-1), ys, 'b:')
plt.show()
Output:
Epoch 1/300
79/79 [==============================] - 1s 3ms/step - loss: 0.2474
Epoch 2/300
79/79 [==============================] - 0s 2ms/step - loss: 0.1370
Epoch 3/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0238
Epoch 4/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0077
...
Epoch 295/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0032
Epoch 296/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0028
Epoch 297/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0035
Epoch 298/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0057
Epoch 299/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0024
Epoch 300/300
79/79 [==============================] - 0s 2ms/step - loss: 0.0028
[[4.008407]]
I am trying to perform linear regression in tensorflow. I don't understand why from the beginning my loss is so big. What is more, the decrease in loss is very small during training.
Could you please tell me if there are any mistakes in my code that may cause this problem?
If it is important, I used Kaggle dataset - "House price prediction" - which can be found at kaggle at /shree1992/housedata.
I paste my code below:
import sys
import tensorflow as tf
import numpy as np
import pandas as pd
np.set_printoptions(threshold=sys.maxsize)
import numpy as np
import matplotlib.pyplot as plt
# Loading and inspecting the data:
data = pd.read_csv('dataset/data.csv')
print(data.head(1))
data = data.drop('street', 1)
data = data.drop('statezip', 1)
print(data.head(1))
# Encoding categorical data - street, city, statezip, country:
data = pd.get_dummies(data, columns=["city", "country"])
data = data.values
X = np.array(data[:, 2:], dtype=np.float)
Y = np.array(data[:, 1], dtype=np.float)
print(X.shape)
# it had 4600 observations, 4659 features before dropping two columns - (4600, 57)
#after dropping 'street' and 'statezip' - (4600, 57)
# Splitting the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3)
N, D = X_train.shape
# Scaling the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Creating a model
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(D,)),
tf.keras.layers.Dense(1)
])
model.compile(optimizer=tf.keras.optimizers.SGD(0.001, 0.9), loss='mse')
# model.compile(optimizer='adam', loss='mse')
# learning rate scheduler
def schedule(epoch, lr):
if epoch >= 50:
return 0.0001
return 0.001
scheduler = tf.keras.callbacks.LearningRateScheduler(schedule)
# Train the model
r = model.fit(X_train, y_train, epochs=200, callbacks=[scheduler])
Below you can find a fragment of results of running this code:
Epoch 38/200
101/101 [==============================] - 0s 464us/step - loss: 270232453120.0000
Epoch 39/200
101/101 [==============================] - 0s 464us/step - loss: 270549336064.0000
Epoch 40/200
101/101 [==============================] - 0s 464us/step - loss: 270402207744.0000
Epoch 41/200
101/101 [==============================] - 0s 619us/step - loss: 270913732608.0000
Epoch 42/200
101/101 [==============================] - 0s 464us/step - loss: 271168536576.0000
Epoch 43/200
101/101 [==============================] - 0s 464us/step - loss: 270916206592.0000
Epoch 44/200
101/101 [==============================] - 0s 464us/step - loss: 272359129088.0000
Epoch 45/200
101/101 [==============================] - 0s 619us/step - loss: 271552167936.0000
Epoch 46/200
101/101 [==============================] - 0s 464us/step - loss: 272676618240.0000
Epoch 47/200
101/101 [==============================] - 0s 619us/step - loss: 272397254656.0000
Epoch 48/200
101/101 [==============================] - 0s 464us/step - loss: 270996291584.0000
Epoch 49/200
101/101 [==============================] - 0s 619us/step - loss: 271571435520.0000
Epoch 50/200
101/101 [==============================] - 0s 464us/step - loss: 272347545600.0000
Epoch 51/200
101/101 [==============================] - 0s 619us/step - loss: 269679640576.0000
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM, Dropout
from keras.layers import Dense
def split_univariate_sequence(sequence, n_steps_in, n_steps_out):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out
# check if we are beyond the sequence
if out_end_ix > len(sequence):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
n_steps_in, n_steps_out = 30, 30
X1, y1 = split_univariate_sequence(sumpred, n_steps_in, n_steps_out)
transformer = MinMaxScaler()
X1_transformed = transformer.fit_transform(X1)
n_features = 1
X1_transformed = X1_transformed.reshape((X1_transformed.shape[0], X1_transformed.shape[1], n_features))
model = Sequential()
model.add(LSTM(150, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
model.add(Dropout(0.3))
model.add(LSTM(50, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')
model.fit(X1_transformed, y1, epochs=1000, verbose=1)
# demonstrate prediction
x_input = sumpred[-30:].reshape(1, -1)
x_input = transformer.transform(x_input)
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose=1)
yhat_inverse = transformer.inverse_transform(yhat)
sumpred is a array of float-32 (144,) with values between 390.624 to 347471. I'm trying to predict the next 30 numbers based on the last 30 sumpred values.
When I train the model, I have results like this:
Epoch 990/1000
85/85 [==============================] - 0s 2ms/step - loss: 1031220211.9529
Epoch 991/1000
85/85 [==============================] - 0s 2ms/step - loss: 1087168440.4706
Epoch 992/1000
85/85 [==============================] - 0s 2ms/step - loss: 1011368153.6000
Epoch 993/1000
85/85 [==============================] - 0s 2ms/step - loss: 1104842800.1882
Epoch 994/1000
85/85 [==============================] - 0s 2ms/step - loss: 1086514331.1059
Epoch 995/1000
85/85 [==============================] - 0s 2ms/step - loss: 1050088100.8941
Epoch 996/1000
85/85 [==============================] - 0s 2ms/step - loss: 1003426751.2471
Epoch 997/1000
85/85 [==============================] - 0s 2ms/step - loss: 1139417025.5059
Epoch 998/1000
85/85 [==============================] - 0s 2ms/step - loss: 1129283814.4000
Epoch 999/1000
85/85 [==============================] - 0s 2ms/step - loss: 1107968009.0353
Epoch 1000/1000
85/85 [==============================] - 0s 2ms/step - loss: 1651960831.6235
The values in yhat_inverse are far beyond expected. It was not better with other losses, like mean squared logarithmic error. Even with the data transformation (MinMaxScaler) and Dropout layers, I'm still having this issue.
Someone has any clue to improve my model performance?
Your model is not able to learn, so, first increase the size of the network. Given how much the loss is coming out, the input size is quite large and you are not providing enough power to the neural network to learn the data.
Remove the dropouts first and just increase the layers and keep them all at 150 or more.
Dropout is usually used towards the end when you see overfitting, but, your model has not even started learning.
I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.
I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.
Here is my code:
Generate some two random data with two classes:
N = 100
x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
y = np.concatenate([np.zeros(N//2), np.ones(N//2)])
And plotting the two class data distribution (one feature x):
data = pd.DataFrame({'x': x.ravel(), 'y': y})
sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
pyplot.tight_layout(0)
pyplot.show()
Build and fit the keras model:
model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)
Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.
The training results are:
100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
Epoch 2/10
100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
Epoch 3/10
100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
Epoch 4/10
100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
Epoch 5/10
100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
Epoch 6/10
100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
Epoch 7/10
100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
Epoch 8/10
100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
Epoch 9/10
100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
Epoch 10/10
100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800
The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.
I have been going through the code in keras, particularly this part:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364
and so far, all I can say that the difference is due to some different computation through the computation graph.
Does anyone has any idea why there would be such difference?
So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.
Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.
This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...
A quick check using tensorflow confirmed that values are fetched before variables are updated:
import tensorflow as tf
import numpy as np
np.random.seed(1)
x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")
W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
z = tf.matmul(x, W) + b
error = tf.square(z - y)
obj = tf.reduce_mean(error, name="obj")
opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
grads = opt.compute_gradients(obj)
train_step = opt.apply_gradients(grads)
N = 100
x_np = np.random.randn(N).reshape(-1, 1)
y_np = 2*x_np + 3 + np.random.randn(N)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(2):
res = sess.run([obj, W, b, train_step], feed_dict={x: x_np, y: y_np})
print('MSE: {}, W: {}, b: {}'.format(res[0], res[1][0, 0], res[2][0]))
Output:
MSE: 14.721437454223633, W: 0.0, b: 0.0
MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985
Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...