Keras always predicting the same output - python

Keras will always predict the same class for every input i give hm. There are currently four classes.
News, Weather, Sport and Economy.
The training set consists of a lot of different texts, where the class is the same as its topic. There are a lot more texts classified as News and Sport, than there are texts for Weather and Economy.
News: 12112 texts
Weather: 1685 texts
Sport: 13669 texts
economy: 1282 texts
I would have expected the model to be biased towards Sport and News, but instead it is completely biased towards Weather with every input beeing classified as Weather with at least 80% confidence.
Just to add to my confusion: While training the annotator will reach accuracy scores from 95% to 100%(sic!). I guess I am doing something really stupid here but I don't know what it is.
This one is how i call my classifier. It runs on python 3 on a Windows pc.
with open('model.json') as json_data:
model_JSON = json.load(json_data)
model_JSON = json.dumps(model_JSON)
model = model_from_json(model_JSON)
model.load_weights('weights.h5')
text = str(text.decode())
encoded = one_hot(text, max_words, split=" ")
tokenizer = Tokenizer(num_words=max_words)
matrix = tokenizer.sequences_to_matrix([encoded], mode='binary')
result = model.predict(matrix)
legende = ["News", "Wetter", "Sport", "Wirtschaft"]
print(str(legende))
print(str(result))
cat = numpy.argmax(result)
return str(legende[cat]).encode()
This one is how I train my classifier. I omitted the part where I fetch the data from a database. This is done on a Linux VM.
I already tried changing the loss and activation around, but nothing happened.
Also I am curently trying to use more epochs but up to now that hasn't helped either.
max_words = 10000
batch_size=32
epochs=15
rows = cursor.fetchall()
X = []
Y = []
# Einlesen der Rows
for row in rows:
X.append(row[5])
Y.append(row[1])
num_classes = len(set(Y))
Y = one_hot("$".join(Y), num_classes, split="$")
for i in range(len(X)):
X[i] = one_hot(str(X[i]), max_words, split=" ")
split = round(len(X) * 0.2)
x_test = np.asarray(X[0:int(split)])
y_test = np.asarray(Y[0:int(split)])
x_train = np.asarray(X[int(split):len(X)])
y_train = np.asarray(Y[int(split):len(X)])
print('x_test shape', x_test.shape)
print('y_test shape', y_test.shape)
print(num_classes, 'classes')
#vektorisieren
tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')
#klassenvektor zu binärer klassenmatrix
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
#model erstellen
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_split=0.1
)
score = model.evaluate(x_test, y_test,
batch_size=batch_size,
verbose=1
)
print('Test score', score[0])
print('Test accuracy', score[1])
#write model to json
print("writing model to json")
model_json = model.to_json()
with open("model.json", 'w') as json_file:
json_file.write(model_json)
#save weights as hdf5
print("saving weights to hdf5")
model.save_weights("weights.h5")

Thanks to the tip that #Daniel Möller gave me I found out what the problem was. His tip was to look at how many instances of each Class are contained in your training set.
In my case I found out, that hashing your classes with One_Hot is not smart, as it will sometimes encode multiple classes with the same number. For me One_Hot encoded nearly everything as a 1. This way Keras learned to only predict 1.

Related

Results of loaded keras model are different

I am trying to build a LSTM model in order to detect sentiment of texts. (0 -> normal, 1 -> hateful)After I trained my model, I send some texts to my model for prediction. The predicted results are as I expected. However, after I load my model as "h5" file, I cannot get same accuracies even if I send same texts. Here is my training codes:
texts = tweets['text']
labels = tweets['label']
labels = LabelEncoder().fit_transform(labels)
labels = labels.reshape(-1, 1)
X_train, X_test, Y_train, Y_test = train_test_split(texts, labels, test_size=0.20)
tokenizer.fit_on_texts(X_train)
sequences = tokenizer.texts_to_sequences(X_train)
sequences_matrix = sequence.pad_sequences(sequences, maxlen=max_len)
inputs = Input(name='inputs', shape=[max_len])
layer = Embedding(max_words, 50, input_length=max_len)(inputs)
layer = LSTM(64)(layer)
layer = Dense(256, name='FC1')(layer)
layer = Activation('relu')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(1, name='out_layer')(layer)
layer = Activation('sigmoid')(layer)
model = Model(inputs=inputs, outputs=layer)
earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0.0001,
restore_best_weights=False)
model.summary()
model.compile(loss='binary_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])
model.fit(sequences_matrix, Y_train, batch_size=128, shuffle=True, epochs=10,
validation_split=0.2, callbacks=[earlyStopping])
model.save("ModelsDL/LSTM.h5")
test_sequences = tokenizer.texts_to_sequences(X_test)
test_sequences_matrix = sequence.pad_sequences(test_sequences, maxlen=max_len)
accr = model.evaluate(test_sequences_matrix, Y_test)
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0], accr[1]))
texts = ["hope", "feel relax", "feel energy", "peaceful day"]
tokenizer.fit_on_texts(texts)
test_samples_token = tokenizer.texts_to_sequences(texts)
test_samples_tokens_pad = pad_sequences(test_samples_token, maxlen=max_len)
print(model.predict(x=test_samples_tokens_pad))
del model
The output of print(model.predict(x=test_samples_tokens_pad)) is:
[[0.0387207 ]
[0.02622151]
[0.3856796 ]
[0.03749594]]
Text with "normal" sentiment results closer to 0.Also text with "hateful" sentiment results closer to 1.
As you see in the output, my results are consistent because they have "normal" sentiment.
However, after I load my model, I always encounter different results. Here is my codes:
texts = ["hope", "feel relax", "feel energy", "peaceful day"] # same texts
model = load_model("ModelsDL/LSTM.h5")
tokenizer.fit_on_texts(texts)
test_samples_token = tokenizer.texts_to_sequences(texts)
test_samples_tokens_pad = pad_sequences(test_samples_token, maxlen=max_len)
print(model.predict(x=test_samples_tokens_pad))
Output of print(model.predict(x=test_samples_tokens_pad)):
[[0.9838583 ]
[0.99957573]
[0.9999665 ]
[0.9877912 ]]
As you notice, The same LSTM model treated the texts as if they had a hateful context.
What should I do for this problem ?
EDIT: I solved the problem. I saved the tokenizer which is used while model training. Then, I loaded that tokenizer before tokenizer.fit_on_texts(texts) for predicted texts.
On your test train split code you need to give a random state to get similar results.For example;
X_train, X_test, Y_train, Y_test = train_test_split(texts, labels, test_size=0.20,random_state=15).
Try every state like 1,2,3,4....Once you get the result you like then you can save it and use after with same random state.Hope it would solve your problem.

Train many neural networks and pick best one

I'm working on a classification task, trying to reconstruct a network from paper. In that paper, they are talking about doing a train test split 300 times and training the network each time after they are taking the mean of all predictions from each network for specific input data.
So here's the question: What is the best option for doing that, I've already reconstructed their network and thinking about using a for loop and saving outputs of each network in a data frame but can't get it the right way.
Here's the code :
# Set X and Y for training
X = dum_bll_fsrq.drop(['type2', 'name', 'Type_is_bll', 'Type_is_fsrq'], axis = 1)
Y = dum_bll_fsrq.iloc[:,-2:]
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, stratify = Y)
# Create model
model_two_neuron = tf.keras.Sequential([
tf.keras.layers.Dense(40, input_shape=(15,)), # input shape required
tf.keras.layers.Dense(2, activation=tf.nn.sigmoid)
])
model_two_neuron.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.Precision()])
# Train
model_two_neuron.fit(X_train, y_train, epochs=20)
You can use callbacks to save the best weights for each of your models, then evaluate the best results saved by callbacks after training.
Here is a basic example, provided in the Documentation:
model.compile(loss=..., optimizer=...,
metrics=['accuracy'])
EPOCHS = 10
checkpoint_filepath = '/tmp/checkpoint'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_accuracy',
mode='max',
save_best_only=True)
# Model weights are saved at the end of every epoch, if it's the best seen
# so far.
model.fit(epochs=EPOCHS, callbacks=[model_checkpoint_callback])
# The model weights (that are considered the best) are loaded into the model.
model.load_weights(checkpoint_filepath)

My text classifier model doens't improve with multiple classes

I'm trying to train a model for a text classification and the model take a list of maximum 300 integer embedded from articles. The model trains without problem and all but the accuracy won't go up.
The target consists of 41 categories encoded into int from 0 to 41 and were then normalized.
The table would look like this
Also, I don't know how my model should look like since I refered on two different example as per below
A binary classifier with one input column and one output column Example 1
Multiple class classifier with multiple columns as input Example 2
I have tried modifying my model based on both model but the model accuracy won't change and even getting lower per epoch
Should I add more layers to my model or I have done something stupid that I haven't realized?
Note: If the 'df.pickle' download link broken, use this link
from sklearn.model_selection import train_test_split
from urllib.request import urlopen
from os.path import exists
from os import mkdir
import tensorflow as tf
import pandas as pd
import pickle
# Define dataframe path
df_path = 'df.pickle'
# Check if local dataframe exists
if not exists(df_path):
# Download binary from dropbox
content = urlopen('https://ucd92a22d5e0d4d29b8edb608305.dl.dropboxusercontent.com/cd/0/get/Askx_25n3JI-jmnZsWXmMmRgd4O2EH1w9l0U6zCMq7xdSXs_IN_i2zuUviseqa9N7-WrReFbGhQi8CeseV5cNsFTO8dzRmSdxjr-MWEDQNpPaZ8Ik29E_58YAjY57qTc4CA/file#').read()
# Write to file
with open(df_path, 'wb') as file: file.write(content)
# Load the dataframe from bytes
df = pickle.loads(content)
# If the file exists (aka. downloaded)
else:
# Load the dataframe from file
df = pickle.load(open(df_path, 'rb'))
# Normalize the category
df['Category_Code'] = df['Category_Code'].apply(lambda x: x / 41)
train_df, test_df = [pd.DataFrame() for _ in range(2)]
x_train, x_test, y_train, y_test = train_test_split(df['Content_Parsed'], df['Category_Code'], test_size=0.15, random_state=8)
train_df['Content_Parsed'], train_df['Category_Code'] = x_train, y_train
test_df['Content_Parsed'], test_df['Category_Code'] = x_test, y_test
# Variable containing the number of words we want to keep in our vocabulary
NUM_WORDS = 10000
# Input/Token length
SEQ_LEN = 300
# Create tokenizer for our data
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=NUM_WORDS, oov_token='<UNK>')
tokenizer.fit_on_texts(train_df['Content_Parsed'])
# Convert text data to numerical indexes
train_seqs=tokenizer.texts_to_sequences(train_df['Content_Parsed'])
test_seqs=tokenizer.texts_to_sequences(test_df['Content_Parsed'])
# Pad data up to SEQ_LEN (note that we truncate if there are more than SEQ_LEN tokens)
train_seqs=tf.keras.preprocessing.sequence.pad_sequences(train_seqs, maxlen=SEQ_LEN, padding="post")
test_seqs=tf.keras.preprocessing.sequence.pad_sequences(test_seqs, maxlen=SEQ_LEN, padding="post")
# Create Models folder if not exists
if not exists('Models'): mkdir('Models')
# Define local model path
model_path = 'Models/model.pickle'
# Check if model exists/pre-trained
if not exists(model_path):
# Define word embedding size
EMBEDDING_SIZE = 16
# Create new model
'''
model = tf.keras.Sequential([
tf.keras.layers.Embedding(NUM_WORDS, EMBEDDING_SIZE),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(EMBEDDING_SIZE)),
# tf.keras.layers.Dense(EMBEDDING_SIZE, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
'''
model = tf.keras.Sequential([
tf.keras.layers.Embedding(NUM_WORDS, EMBEDDING_SIZE),
# tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(EMBEDDING_SIZE)),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(EMBEDDING_SIZE, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
# Stop training when a monitored quantity has stopped improving.
es = tf.keras.callbacks.EarlyStopping(monitor='val_acc', mode='max', patience=1)
# Define batch size (Can be tuned to improve model accuracy)
BATCH_SIZE = 16
# Define number or cycle to train
EPOCHS = 20
# Using GPU (If error means you don't have GPU. Use CPU instead)
with tf.device('/GPU:0'):
# Train/Fit the model
history = model.fit(
train_seqs,
train_df['Category_Code'].values,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_split=0.2,
validation_steps=30,
callbacks=[es]
)
# Evaluate the model
model.evaluate(test_seqs, test_df['Category_Code'].values)
# Save the model into a file
with open(model_path, 'wb') as file: file.write(pickle.dumps(model))
else:
# Load the model
model = pickle.load(open(model_path, 'rb'))
# Check the model
model.summary()
After 2 days of tweaking and understanding more examples I found this website which explains quite well about the multi-class classification.
The details of changes I made are as follows:
Since I'm going to build a model for multiple classes, during the model compilation the model should use categorical_crossentropy as it's loss function instead of binary_crossentropy.
The model should produce number of output with similar length as your total class you're going to classify which in my case 41. (One hot encoding)
The last layer's activation function should be "softmax" since we're choosing a label with the highest confidence level (closest to 1.0).
You will need to tweak the layers accordingly based on the number of classes you're going to classify. See here on how to improve your model.
My final code would look something just like this
from sklearn.model_selection import train_test_split
from urllib.request import urlopen
from functools import reduce
from os.path import exists
from os import listdir
from sys import exit
import tensorflow as tf
import pandas as pd
import pickle
import re
# Specify dataframe path
df_path = 'df.pickle'
# Check if the file exists
if not exists(df_path):
# Specify url of the dataframe binary
url = 'https://www.dropbox.com/s/76hibe24hmpz3bk/df.pickle?dl=1'
# Read the byte content from url
content = urlopen(url).read()
# Write to a file to save up time
with open(df_path, 'wb') as file: file.write(pickle.dumps(content))
# Unpickle the dataframe
df = pickle.loads(content)
else:
# Load the pickle dataframe
df = pickle.load(open(df_path, 'rb'))
# Useful variables
MAX_NUM_WORDS = 50000 # Vocabulary size for our tokenizer
MAX_SEQ_LENGTH = 600 # Maximum length of tokens (for padding later)
EMBEDDING_SIZE = 256 # Embedding size (Tweak to improve accuracy)
OUTPUT_LENGTH = len(df['Category'].unique()) # Number of class to be classified
# Create our tokenizer
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=MAX_NUM_WORDS, lower=True)
# Fit our tokenizer with words/tokens
tokenizer.fit_on_texts(df['Content_Parsed'].values)
# Get our token vocabulary
word_index = tokenizer.word_index
print('Found {} unique tokens'.format(len(word_index)))
# Parse our text into sequence of numbers using our tokenizer
X = tokenizer.texts_to_sequences(df['Content_Parsed'].values)
# Pad the sequence up to the MAX_SEQ_LENGTH
X = tf.keras.preprocessing.sequence.pad_sequences(X, maxlen=MAX_SEQ_LENGTH)
print('Shape of feature tensor: {}'.format(X.shape))
# Convert our labels into dummy variable (More info on the link provided above)
Y = pd.get_dummies(df['Category']).values
print('Shape of label tensor: {}'.format(Y.shape))
# Split our features and labels into test and train dataset
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=42)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
# Creating our model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(MAX_NUM_WORDS, EMBEDDING_SIZE, input_length=MAX_SEQ_LENGTH))
model.add(tf.keras.layers.SpatialDropout1D(0.2))
# The number 64 could be changed based on your model performance
model.add(tf.keras.layers.LSTM(64, dropout=0.2, recurrent_dropout=0.2))
# Our output layer with length similar to the OUTPUT_LENGTH
model.add(tf.keras.layers.Dense(OUTPUT_LENGTH, activation='softmax'))
# Compile our model with "categorical_crossentropy" loss function
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Model variables
EPOCHS = 100 # Number of cycle to run (The early stopping may stop the training process accordingly)
BATCH_SIZE = 64 # Batch size (Tweaking this may improve model performance a bit)
checkpoint_path = 'model_checkpoints' # Checkpoint path of our model
# Use GPU if available
with tf.device('/GPU:0'):
# Fit/Train our model
history = model.fit(
x_train, y_train,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
validation_split=0.1,
callbacks=[
tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.0001),
tf.keras.callbacks.ModelCheckpoint(
checkpoint_path,
monitor='val_acc',
save_best_only=True,
save_weights_only=False
)
],
verbose=1
)
Now, my model accuracies perform well and are increasing each epoch but since the validation accuracies (val_acc around 76~77 percent) are not performing well, I may need to tweak the model/layers a bit.
The output snapshot is provided below

keras predicting classes method

So, I have this little project going on about predicting the nba 2019 champion but it seems that my code is not clear enough to make keras understand what I want. I have passed a list of past champions on my dataset and made it the output class to get the current champion.
I'm using a dataset for teams stats from 2014 to 2018 regular seasons and I'm assuming that I should have the 2019 stats to do it. I have made my dataset very well encoded for my NN to understand by providing one hot encoding in every feature I think it's useful.
x = pd.concat([df.drop(['Unnamed: 0','Team','Game','Date','Opponent','LastSeasonChamp'], axis = 1), df_ohc], axis = 1)
y = df['LastSeasonChamp']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.35)
x_train = tf.keras.utils.normalize(x_train.values, axis = 1)
x_test = tf.keras.utils.normalize(x_test.values, axis = 1)
n_classes = 30
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(x_train.shape[1], input_shape = (x_train.shape[0],x_train.shape[1]), activation = tf.nn.relu))
model.add(tf.keras.layers.Dense(np.mean([x_train.shape[1], n_classes], dtype = int), activation = tf.nn.relu))
model.add(tf.keras.layers.Dense(n_classes, activation = tf.nn.softmax))
model.compile(optimizer = 'adagrad' , loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train.values, epochs = 3)
model.evaluate(x_test, y_test)
model.save('nba_champ_2019')
new_model = tf.keras.models.load_model('nba_champ_2019')
pred = new_model.predict(x_test)
y_pred = to_categorical(pred)
So, I could expect my y_pred to be a column with 0 and 1 and but all I get is a column full of 1.
The to_categorical function is used to convert a list of class IDs to an one-hot matrix. You don't need it here. You should get the output you expect by removing in this case.

Keras LSTM - Validation Loss Increasing From Epoch #1

I'm currently undertaking my first 'real' DL project of (surprise) predicting stock movements. I know that I'm 1000:1 to make anything useful but I'm enjoying it and want to see it through, I've learnt more in my few weeks of attempting this than I have in the prior 6 months of completing MOOC's.
I'm building an LSTM using Keras to currently predict the next 1 step forward and have attempted the task as both classification (up/down/steady) and now as a regression problem. Both result in a similar roadblock in that my validation loss never improves from epoch #1.
I can get the model to overfit such that training loss approaches zero with MSE (or 100% accuracy if classification), but at no stage does the validation loss decrease. This screams overfitting to my untrained eye so I added varying amounts of dropout but all that does is stifle the learning of the model/training accuracy and shows no improvements on the validation accuracy.
I have attempted to change a significant number of hyperparameters - learning rate, optimiser, batchsize, lookback window, #layers, #units, dropout, #samples, etc, also tried with subset of data and subset of features but I just can't get it to work so I'm very thankful for any help.
Code Below (it's not pretty I know):
# Import saved full dataframe ~ 200 features
import feather
df = feather.read_dataframe('df_feathered')
df.set_index('time', inplace=True)
# Difference the dataset to make stationary
df = df.diff(periods=1, axis=0)
# MAKE LARGE SAMPLE FOR TESTING
df_train = df.loc['2017-3-1':'2017-6-30']
df_val = df.loc['2017-7-1':'2017-8-31']
df_test = df.loc['2017-9-1':'2017-9-30']
# Make x_train, x_val sets by dropping target variable
x_train = df_train.drop('close+1', axis=1)
x_val = df_val.drop('close+1', axis=1)
# Scale the training data first then fit the transform to the test set
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_val)
# scaler = MinMaxScaler(feature_range=(0,1))
# x_train = scaler.fit_transform(df_train1)
# x_test = scaler.transform(df_val1)
# Create y_train, y_test, simply target variable for regression
y_train = df_train['close+1']
y_test = df_val['close+1']
# Define Lookback window for LSTM input
sliding_window = 15
# Convert x_train, x_test, y_train, y_test into 3d array (samples,
timesteps, features) for LSTM input
dataXtrain = []
for i in range(len(x_train)-sliding_window-1):
a = x_train[i:(i+sliding_window), 0:(x_train.shape[1])]
dataXtrain.append(a)
dataXtest = []
for i in range(len(x_test)-sliding_window-1):
a = x_test[i:(i+sliding_window), 0:(x_test.shape[1])]
dataXtest.append(a)
dataYtrain = []
for i in range(len(y_train)-sliding_window-1):
dataYtrain.append(y_train[i + sliding_window])
dataYtest = []
for i in range(len(y_test)-sliding_window-1):
dataYtest.append(y_test[i + sliding_window])
# Make data the divisible by a variety of batch_sizes for training
# Started at 1000 to not include replaced NaN values
dataXtrain = np.array(dataXtrain[1000:172008])
dataYtrain = np.array(dataYtrain[1000:172008])
dataXtest = np.array(dataXtest[1000:83944])
dataYtest = np.array(dataYtest[1000:83944])
# Checking input shapes
print('dataXtrain size is: {}'.format((dataXtrain).shape))
print('dataXtest size is: {}'.format((dataXtest).shape))
print('dataYtrain size is: {}'.format((dataYtrain).shape))
print('dataYtest size is: {}'.format((dataYtest).shape))
### ACTUAL LSTM MODEL
batch_size = 256
timesteps = dataXtrain.shape[1]
features = dataXtrain.shape[2]
# Model set-up, stacked 4 layer stateful LSTM
model = Sequential()
model.add(LSTM(512, return_sequences=True, stateful=True,
batch_input_shape=(batch_size, timesteps, features)))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(128,stateful=True))
model.add(Dense(1, activation='linear'))
model.summary()
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=5, min_lr=0.000001, verbose=1)
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
model.compile(loss='mse',
optimizer='nadam',
metrics=[coeff_determination,'mse','mae','mape'])
history = model.fit(dataXtrain, dataYtrain,validation_data=(dataXtest, dataYtest),
epochs=100,batch_size=batch_size, shuffle=False, verbose=1, callbacks=[reduce_lr])
score = model.evaluate(dataXtest, dataYtest,batch_size=batch_size, verbose=1)
print(score)
predictions = model.predict(dataXtest, batch_size=batch_size)
print(predictions)
import matplotlib.pyplot as plt
%matplotlib inline
#plt.plot(history.history['mean_squared_error'])
#plt.plot(history.history['val_mean_squared_error'])
plt.plot(history.history['coeff_determination'])
plt.plot(history.history['val_coeff_determination'])
#plt.plot(history.history['mean_absolute_error'])
#plt.plot(history.history['mean_absolute_percentage_error'])
#plt.plot(history.history['val_mean_absolute_percentage_error'])
#plt.title("MSE")
plt.ylabel("R2")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()
plt.plot(history.history["loss"][5:])
plt.plot(history.history["val_loss"][5:])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()
plt.figure(figsize=(20,8))
plt.plot(dataYtest)
plt.plot(predictions)
plt.title("Prediction")
plt.ylabel("Price")
plt.xlabel("Time")
plt.legend(["Truth", "Prediction"], loc="best")
plt.show()
Maybe you should remember you are predicting sock returns, which it's very likely to predict nothing. So val_loss increasing is not overfitting at all. Instead of adding more dropouts, maybe you should think about adding more layers to increase it's power.
Try to reduce learning rate much (and remove dropouts for now).
Why do you use
shuffle=False
in fit() function?

Categories

Resources