I've been trying to predict the google stock prices between certain dates but when I use the trained network to predict future values I get an output similar to target but in a different scale (screenshots are below).
I coded a lstm neural network using pytorch. The google stock prices were obtained from yfinance library (python).
The neural network is:
input size = 1
hidden size = 200
number of layers = 1
The output of lstm is passed to a fully connected layer:
input size = 200
output size = 1
Before trainning the network I use MinMaxScaler.fit_transform() to scale the trainning and testing data. Then I use the network to predict future values, and the output obtained is conversed to original scale using MinMaxScaler.inverse_transform()
Nonetheless the predicted output has a different scale that the target output but they are similar.
Screenshot: plot of y_target (blue) and y_pred (orange)
If I zoom to y_pred I can see the next plot
Screenshot: plot of y_pred
What is happening? Why the predicted values are similar to target values but in a reduced scale? What am I doing wrong?
Code input data
# get close prices from dataset
df_close =pd.DataFrame( df['Close'])
df_close_values = df_close.values
# normalize data using MinMaxScaler
mmscaler = MinMaxScaler(feature_range=(0,1))
df_close_scaled = mmscaler.fit_transform(df_close_values)
# Sequence Lenght
sequence_length = 25
# divide data into train, validation and test data
len_data = df_close_values.shape[0]
len_train_data = int(len_data * 0.8)
len_val_data = int((len_data - len_train_data)/2)
len_test_data = len_data - len_train_data - len_val_data
train_data = df_close_scaled[0:len_train_data]
val_data = df_close_scaled[len_train_data-sequence_length:len_train_data+len_val_data]
test_data = df_close_scaled[len_train_data+len_val_data-sequence_length:]
# Function to divide data into x and y
def partition_dataset(sequence_length, train_df):
x, y = [], []
data_len = train_df.shape[0]
for i in range(sequence_length, data_len):
# Convert the x and y to numpy arrays
x = np.array(x, dtype=np.float32)
y = np.array(y, dtype=np.float32)
return x, y
x_train, y_train = partition_dataset(sequence_length, train_data)
x_val, y_val = partition_dataset(sequence_length, val_data)
x_test, y_test = partition_dataset(sequence_length, test_data)
shapes result:
x_train.shape , y_train.shape = (2554, 25, 1) (2554,)
x_val.shape , y_val.shape = (322, 25, 1) (322,)
x_test.shape , y_test.shape = (323, 25, 1) (323,)
LSTM code:
class LSTMPredictor(nn.Module):
def __init__(self, input_size=50, hidden_size=1, num_layers=1, output_size=1, bidirectional=1, dropout=1.0, device='cuda'):
# Atributes
self.device = 'cuda'
self.num_layers = num_layers
self.hidden_size = hidden_size
self.D = True if bidirectional==2 else False
self.output_size = output_size
self.dropout = dropout
# define LSTM layer
self.lstm = nn.LSTM(input_size = input_size,
hidden_size = self.hidden_size,
num_layers = self.num_layers,
bidirectional = self.D,
batch_first = True,
dropout = dropout)
# define fully connected (MLP)
self.fully_connected = nn.Linear(self.hidden_size,
self.dropout = nn.Dropout(p=0.2)
def forward(self, x, hidden=None):
# Propagate input through LSTM
output, (h, _) = self.lstm(x)
out = self.fully_connected(output[:,-1])
return out
I used minmaxscaler separately for each set (train, validation and test) and effectively the scale changed but the result is similar:
Plot y_traget (blue) and y_pred (orange)
Plot y_pred only
The problem with your code is this;
df_close_scaled = mmscaler.fit_transform(df_close_values)
is first applied on the entire dataset. Then you separate the data to train and test data. This is wrong as it transfer information about test data to train data.
First separate the data and apply MinMaxScalar on the train data. Save this object in a variable and then when you test, use the scalar to convert the values to the model and then the same to covert it back.
I do not see how this can affect such a drastic change in the data. But this is definitely a problem as it is.
I'll inspect the code more and update if I notice anything else.
I want to run a seq2seq model using lstm for a customer journey analysis.I am able to run the model but unable to load the saved model on a different notebook.
Code for attention model is here:
# RNN "Cell" classes in Keras perform the actual data transformations at each timestep. Therefore, in order to add attention to LSTM, we need to make a custom subclass of LSTMCell.
class AttentionLSTMCell(LSTMCell):
def __init__(self, **kwargs):
self.attentionMode = False
super(AttentionLSTMCell, self).__init__(**kwargs)
# Build is called to initialize the variables that our cell will use. We will let other Keras
# classes (e.g. "Dense") actually initialize these variables.
def build(self, input_shape):
# Converts the input sequence into a sequence which can be matched up to the internal
# hidden state.
self.dense_constant = TimeDistributed(Dense(self.units, name="AttLstmInternal_DenseConstant"))
# Transforms the internal hidden state into something that can be used by the attention
# mechanism.
self.dense_state = Dense(self.units, name="AttLstmInternal_DenseState")
# Transforms the combined hidden state and converted input sequence into a vector of
# probabilities for attention.
self.dense_transform = Dense(1, name="AttLstmInternal_DenseTransform")
# We will augment the input into LSTMCell by concatenating the context vector. Modify
# input_shape to reflect this.
batch, input_dim = input_shape[0]
batch, timesteps, context_size = input_shape[-1]
lstm_input = (batch, input_dim + context_size)
# The LSTMCell superclass expects no constant input, so strip that out.
return super(AttentionLSTMCell, self).build(lstm_input)
# This must be called before call(). The "input sequence" is the output from the
# encoder. This function will do some pre-processing on that sequence which will
# then be used in subsequent calls.
def setInputSequence(self, input_seq):
self.input_seq = input_seq
self.input_seq_shaped = self.dense_constant(input_seq)
self.timesteps = tf.shape(self.input_seq)[-2]
# This is a utility method to adjust the output of this cell. When attention mode is
# turned on, the cell outputs attention probability vectors across the input sequence.
def setAttentionMode(self, mode_on=False):
self.attentionMode = mode_on
# This method sets up the computational graph for the cell. It implements the actual logic
# that the model follows.
def call(self, inputs, states, constants):
# Separate the state list into the two discrete state vectors.
# ytm is the "memory state", stm is the "carry state".
ytm, stm = states
# We will use the "carry state" to guide the attention mechanism. Repeat it across all
# input timesteps to perform some calculations on it.
stm_repeated = K.repeat(self.dense_state(stm), self.timesteps)
# Now apply our "dense_transform" operation on the sum of our transformed "carry state"
# and all encoder states. This will squash the resultant sum down to a vector of size
# [batch,timesteps,1]
# Note: Most sources I encounter use tanh for the activation here. I have found with this dataset
# and this model, relu seems to perform better. It makes the attention mechanism far more crisp
# and produces better translation performance, especially with respect to proper sentence termination.
combined_stm_input = self.dense_transform(
keras.activations.relu(stm_repeated + self.input_seq_shaped))
# Performing a softmax generates a log probability for each encoder output to receive attention.
score_vector = keras.activations.softmax(combined_stm_input, 1)
# In this implementation, we grant "partial attention" to each encoder output based on
# it's log probability accumulated above. Other options would be to only give attention
# to the highest probability encoder output or some similar set.
context_vector = K.sum(score_vector * self.input_seq, 1)
# Finally, mutate the input vector. It will now contain the traditional inputs (like the seq2seq
# we trained above) in addition to the attention context vector we calculated earlier in this method.
inputs = K.concatenate([inputs, context_vector])
# Call into the super-class to invoke the LSTM math.
res = super(AttentionLSTMCell, self).call(inputs=inputs, states=states)
# This if statement switches the return value of this method if "attentionMode" is turned on.
return (K.reshape(score_vector, (-1, self.timesteps)), res[1])
return res
# Custom implementation of the Keras LSTM that adds an attention mechanism.
# This is implemented by taking an additional input (using the "constants" of the RNN class into the LSTM: The encoder output vectors across the entire input sequence.
class LSTMWithAttention(RNN):
def __init__(self, units, **kwargs):
cell = AttentionLSTMCell(units=units)
self.units = units
super(LSTMWithAttention, self).__init__(cell, **kwargs)
def build(self, input_shape):
self.input_dim = input_shape[0][-1]
self.timesteps = input_shape[0][-2]
return super(LSTMWithAttention, self).build(input_shape)
# This call is invoked with the entire time sequence. The RNN sub-class is responsible
# for breaking this up into calls into the cell for each step.
# The "constants" variable is the key to our implementation. It was specifically added
# to Keras to accomodate the "attention" mechanism we are implementing.
def call(self, x, constants, **kwargs):
if isinstance(x, list):
self.x_initial = x[0]
self.x_initial = x
# The only difference in the LSTM computational graph really comes from the custom
# LSTM Cell that we utilize.
self.cell._dropout_mask = None
self.cell._recurrent_dropout_mask = None
return super(LSTMWithAttention, self).call(inputs=x, constants=constants, **kwargs)
Code defining encoder and decoder model:
# Encoder Layers
encoder_inputs = Input(shape=(None,len_input), name="attenc_inputs")
encoder = LSTM(units=units, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder((encoder_inputs))
encoder_states = [state_h, state_c]
#define inference decoder
encoder_model = Model(encoder_inputs, encoder_states)'atten_enc_model.h5')
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
Attention_dec_lstm = LSTMWithAttention(units=units, return_sequences=True, return_state=True)
# Note that the only real difference here is that we are feeding attenc_outputs to the decoder now.
attdec_lstm_out, _, _ = Attention_dec_lstm(inputs=decoder_inputs,
decoder_dense1 = Dense(units, activation="relu")
decoder_dense2 = Dense(n_output, activation='softmax')
decoder_outputs = decoder_dense2(Dropout(rate=.10)(decoder_dense1(Dropout(rate=.10)(attdec_lstm_out))))
atten_model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
atten_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#Defining inference decoder
state_input_h = Input(shape=(units,), name="state_input_h")
state_input_c = Input(shape=(units,), name="state_input_c")
decoder_states_inputs = [state_input_h, state_input_c]
attenc_seq_out = Input(shape=encoder_outputs.get_shape()[1:], name="attenc_seq_out")
inf_attdec_inputs = Input(shape=(None,n_output), name="inf_attdec_inputs")
attdec_res, attdec_h, attdec_c = Attention_dec_lstm(inputs=inf_attdec_inputs,
decoder_states = [attdec_h, attdec_c]
decoder_model = Model(inputs=[inf_attdec_inputs, state_input_h, state_input_c, attenc_seq_out],
outputs=[attdec_res, attdec_h, attdec_c])
Code for model fit and save:
history =[encoder_input_data, decoder_input_data], decoder_target_data,
Code to load the encoder decoder model with custom Attention layer:
with open('atten_model_lstm.json') as mdl:
json_string =
model = model_from_json(json_string, custom_objects={'AttentionLSTMCell': AttentionLSTMCell, 'LSTMWithAttention': LSTMWithAttention})
This code to load is giving error :
TypeError: int() argument must be a string, a bytes-like object or a number, not 'AttentionLSTMCell'
Here's a solution inspired by the link in my comment:
# serialize model to JSON
atten_model_json = atten_model.to_json()
with open("atten_model.json", "w") as json_file:
# serialize weights to HDF5
print("Saved model to disk")
# Different part of your code or different file
# load json and create model
json_file = open('atten_model.json', 'r')
loaded_model_json =
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
print("Loaded model from disk")
I tried to train an auto encoder to learn feature representations from some segments of audio MFCC features(my dataset is TIMIT). So I used a fixed size window to cut the audio MFCC data(13 dimensional MFCC), My data is (None, window_size*13),here my window size is 20(frames). I trained a auto encoder with feed_forward type, use loss function as least squared error(the difference norm between prediction and label), the problem is my loss stops decrease at a high level. My network part code is here:
self.centers_placeholder = tf.placeholder(tf.float32,[None,self.train_centers[0].shape[1]],name = 'center_placeholder')
layer = tf.layers.dense(self.centers_placeholder,2000,activation=tf.nn.relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,1500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,100,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
self.embedding= tf.layers.dense(layer,50,activation = tf.nn.leaky_relu)
layer = tf.layers.dense(layer,100,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,1500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,2000,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
self.decoded = tf.layers.dense(layer,self.train_centers[0].shape[1],activation=None)
#cost = tf.reduce_mean(tf.sqrt(tf.square(neighbors_placeholder-decoder4)))
self.cost = tf.reduce_mean(tf.norm(self.centers_placeholder-self.decoded,ord=2,axis=1))
target_norm = tf.norm(self.centers_placeholder,ord=2,axis=1)
self.metric = tf.reduce_mean(difference_norm/target_norm)
print('model graph has built',flush=True)
self.optimizer = tf.train.AdamOptimizer(lr).minimize(self.cost,global_step=global_step)
self.saver = tf.train.Saver()
I tried different architecture, deep or shallow, but it always convergence at a high level. Here is one example of my lossesenter image description here
Can anyone help? Is there anyone has some experiments with MFCC in auto encoder?
I'm struggling with solving this issue and I believe it is due to my data. I'm thinking about this as a few to many regression problem, but there could be a better approach in tensorflow.
Training Data
I have some data generated from a video sequence. For each frame of video I have a distribution of x,y positions for each cluster. There are 157,110 frames and 200,000 clusters. The frames and clusters are the inputs, which are integers and I think could be considered labels (I'll be using another network to learn the sequences of clusters later on). As each histogram is related to both a frame and clusterID, the input is not "one hot". The histograms (outputs) have 19+8 (x+y) bins where each count is rarely above 10, and could be normalized.
A subset of the training data is available here: The first two columns are the frame and clusterID (inputs) and the remaining 19+8 columns are the histograms (outputs).
What is the best network to learn to generate the appropriate histogram for a given frame/clusterID pair?
The following code is my current attempt using an MLP. It does not converge; in fact cost does not decrease at all. Is there something wrong in my implementation, or my choice of MLP, or a lack of scaling in my input data?
# This program uses tensorflow to learn cluster probabilities and associate them with frame and cluster IDs
# Arguments
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("clusterProbabilityfile", help="CSV file containing cluster probabilities")
parser.add_argument("trainingIterations", type=int, help="CSV file containing cluster probabilities")
args = parser.parse_args()
# Imports for ML
import tensorflow as tf
import numpy as np
from tensorflow.python.framework import dtypes
# Imports for loading CSV file
from tensorflow.python.platform import gfile
import csv
# Global vars
numInputUnits = 2;
numOutputUnits = 19+8
numHiddenUnits = (numOutputUnits-numInputUnits)/2
workingDirectory = args.clusterProbabilityfile.split('/')[0]+"/"
columnSplit = 2 # Column number that splits
# Shuffle training set
def shuffleTrainingSet(trainingSet):
trainingIndecies = np.arange(len( # assumes len(data) == len(target)
np.random.shuffle(trainingIndecies) # shuffle indecies
data =[trainingIndecies]
target =[trainingIndecies]
training_set = tf.contrib.learn.datasets.base.Dataset(data=data, target=target)
return training_set
# Load training data from CSV file, convert to numpy arrays and construct Dataset
# Modified from tf.contrib.learn.datasets.base.load_csv_without_header
# Should these be randomized???
with gfile.Open(args.clusterProbabilityfile) as csv_file:
data_file = csv.reader(csv_file)
data, target = [], []
for row in data_file:
target.append(row[columnSplit+1:]) # All elements past the split column.
data.append(row[:columnSplit]) # All elements before and including the split column.
target = np.array(target, dtype=int)
data = np.array(data, dtype=int)
training_set = tf.contrib.learn.datasets.base.Dataset(data=data, target=target)
training_set = shuffleTrainingSet(training_set)
# Construct computation graph
# MLP approach (from
# Single hidden layer!
inputVec = tf.placeholder(tf.float32, [None, numInputUnits])
outputVec = tf.placeholder(tf.float32, [None, numOutputUnits])
# Weights
hiddenWeights = tf.Variable(tf.random_normal([numInputUnits, numHiddenUnits])) # inputUnits -> hiddenUnits
outputWeights = tf.Variable(tf.random_normal([numHiddenUnits, numOutputUnits])) # hiddenUnits -> outputUnits
# Biases
hiddenBiases = tf.Variable(tf.random_normal([numHiddenUnits]))
outputBiases = tf.Variable(tf.random_normal([numOutputUnits]))
# Contruct MLP from layers
hiddenLayer = tf.add(tf.matmul(inputVec, hiddenWeights), hiddenBiases) # input * weight + bias = hidden
hiddenLayer = tf.nn.relu(hiddenLayer) # RELU Activation function for hidden layer.
outputLayer = tf.add(tf.matmul(hiddenLayer, outputWeights), outputBiases) # hidden * weight + bias = output
# loss and optimizer
#cross_entropy = -(outputVec * tf.log(outputLayer) + (1 - outputVec) * tf.log(1 - outputLayer))
#cost = tf.reduce_mean(cross_entropy)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(outputLayer, outputVec))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
# Compute graph
sess = tf.Session()
for epoch in range(args.trainingIterations):
training_set = shuffleTrainingSet(training_set) # Reshuffle for each epoch.
epochCost =, feed_dict={inputVec:, outputVec:})
print("{:d}\t{:f}".format(epoch, epochCost))
# Evaluate model
correct_prediction = tf.equal(tf.argmax(outputLayer,1), tf.argmax(outputVec,1)) # compare output layer with target output vector.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Cost:",,feed_dict={inputVec:, outputVec:}))
print("Accuracy:",,feed_dict={inputVec:, outputVec:}))
I am trying to detect micro-events in a long time series. For this purpose, I will train a LSTM network.
Data. Input for each time sample is 11 different features somewhat normalized to fit 0-1. Output will be either one of two classes.
Batching. Due to huge class imbalance I have extracted the data in batches of each 60 time samples, of which at least 5 will always be class 1, and the rest class to. In this way the class imbalance is reduced from 150:1 to around 12:1 I have then randomized the order of all my batches.
Model. I am attempting to train an LSTM, with initial configuration of 3 different cells with 5 delay steps. I expect the micro events to arrive in sequences of at least 3 time steps.
Problem: When I try to train the network it will quickly converge towards saying that EVERYTHING belongs to the majority class. When I implement a weighted loss function, at some certain threshold it will change to saying that EVERYTHING belongs to the minority class. I suspect (without being expert) that there is no learning in my LSTM cells, or that my configuration is off?
Below is the code for my implementation. I am hoping that someone can tell me
Is my implementation correct?
What other reasons could there be for such behaviour?
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn
import ar_config
config = ar_config.get_config()
class ARModel(object):
def __init__(self, is_training=False, config=None):
# Config
if config is None:
config = ar_config.get_config()
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_classes], name='ModelOutput')
# Hidden layer
with tf.variable_scope('lstm') as scope:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_delays)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, [self._features], dtype=tf.float32)
# Output layer
output = outputs[-1]
softmax_w = tf.get_variable('softmax_w', [config.num_hidden, config.num_classes], tf.float32)
softmax_b = tf.get_variable('softmax_b', [config.num_classes], tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
# Evaluate
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
self._cost = cost = tf.reduce_mean(loss)
self._predict = tf.argmax(tf.nn.softmax(logits), 1)
self._correct = tf.equal(tf.argmax(logits, 1), tf.argmax(self._targets, 1))
self._accuracy = tf.reduce_mean(tf.cast(self._correct, tf.float32))
self._final_state = state
if not is_training:
# Optimize
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(cost)
def features(self):
return self._features
def targets(self):
return self._targets
def cost(self):
return self._cost
def accuracy(self):
return self._accuracy
def train_op(self):
return self._train_op
def predict(self):
return self._predict
def initial_state(self):
return self._initial_state
def final_state(self):
return self._final_state
import os
from datetime import datetime
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
import ar_network
import ar_config
import ar_reader
config = ar_config.get_config()
def main(argv=None):
if gfile.Exists(config.train_dir):
def train():
train_data = ar_reader.ArousalData(config.train_data, num_steps=config.max_steps)
test_data = ar_reader.ArousalData(config.test_data, num_steps=config.max_steps)
with tf.Graph().as_default(), tf.Session() as session, tf.device('/cpu:0'):
initializer = tf.random_uniform_initializer(minval=-0.1, maxval=0.1)
with tf.variable_scope('model', reuse=False, initializer=initializer):
m = ar_network.ARModel(is_training=True)
s = tf.train.Saver(tf.all_variables())
for batch_input, batch_target in train_data:
step = train_data.iter_steps
dict = {
m.features: batch_input,
m.targets: batch_target
}, feed_dict=dict)
state, cost, accuracy =[m.final_state, m.cost, m.accuracy], feed_dict=dict)
if not step % 10:
test_input, test_target =
test_accuracy =, feed_dict={
m.features: test_input,
m.targets: test_target
now =
print ('%s | Iter %4d | Loss= %.5f | Train= %.5f | Test= %.3f' % (now, step, cost, accuracy, test_accuracy))
if not step % 1000:
destination = os.path.join(config.train_dir, 'ar_model.ckpt'), destination)
if __name__ == '__main__':
class Config(object):
# Directories
train_dir = '...'
ckpt_dir = '...'
train_data = '...'
test_data = '...'
# Data
num_features = 13
num_classes = 2
batch_size = 60
# Model
num_hidden = 3
num_delays = 5
# Training
max_steps = 100000
def get_config():
return Config()
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features, config.num_delays], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_output], name='ModelOutput')
# Weights
weights = {
'hidden': tf.get_variable('w_hidden', [config.num_features, config.num_hidden], tf.float32),
'out': tf.get_variable('w_out', [config.num_hidden, config.num_classes], tf.float32)
biases = {
'hidden': tf.get_variable('b_hidden', [config.num_hidden], tf.float32),
'out': tf.get_variable('b_out', [config.num_classes], tf.float32)
#Layer in
with tf.variable_scope('input_hidden') as scope:
inputs = self._features
inputs = tf.transpose(inputs, perm=[2, 0, 1]) # (BatchSize,NumFeatures,TimeSteps) -> (TimeSteps,BatchSize,NumFeatures)
inputs = tf.reshape(inputs, shape=[-1, config.num_features]) # (TimeSteps,BatchSize,NumFeatures -> (TimeSteps*BatchSize,NumFeatures)
inputs = tf.add(tf.matmul(inputs, weights['hidden']), biases['hidden'])
#Layer hidden
with tf.variable_scope('hidden_hidden') as scope:
inputs = tf.split(0, config.num_delays, inputs) # -> n_steps * (batchsize, features)
cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, inputs, dtype=tf.float32)
#Layer out
with tf.variable_scope('hidden_output') as scope:
output = outputs[-1]
logits = tf.add(tf.matmul(output, weights['out']), biases['out'])
Odd elements
Weighted loss
I am not sure your "weighted loss" does what you want it to do:
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
this is applied before calculating the loss function (further I think you wanted an element-wise multiplication as well? also your ratio is above 1 which makes the second part negative?) so it forces your predictions to behave in a certain way before applying the softmax.
If you want weighted loss you should apply this after
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
with some element-wise multiplication of your weights.
loss = loss * weights
Where your weights have a shape like [2,]
However, I would not recommend you to use weighted losses. Perhaps try increasing the ratio even further than 1:6.
As far as I can read, you are using 5 stacked LSTMs with 3 hidden units per layer?
Try removing the multi rnn and just use a single LSTM/GRU (maybe even just a vanilla RNN) and jack the hidden units up to ~100-1000.
Often when you are facing problems with an odd behaving network, it can be a good idea to:
Print everything
Literally print the shapes and values of every tensor in your model, use sess to fetch it and then print it. Your input data, the first hidden representation, your predictions, your losses etc.
You can also use tensorflows tf.Print() x_tensor = tf.Print(x_tensor, [tf.shape(x_tensor)])
Use tensorboard
Using tensorboard summaries on your gradients, accuracy metrics and histograms will reveal patterns in your data that might explain certain behavior, such as what lead to exploding weights. Like maybe your forget bias goes to infinity or your not tracking gradient through a certain layer etc.
Other questions
How large is your dataset?
How long are your sequences?
Are the 13 features categorical or continuous? You should not normalize categorical variables or represent them as integers, instead you should use one-hot encoding.
Gunnar has already made lots of good suggestions. A few more small things worth paying attention to in general for this sort of architecture:
Try tweaking the Adam learning rate. You should determine the proper learning rate by cross-validation; as a rough start, you could just check whether a smaller learning rate saves your model from crashing on the training data.
You should definitely use more hidden units. It's cheap to try larger networks when you first start out on a dataset. Go as large as necessary to avoid the underfitting you've observed. Later you can regularize / pare down the network after you get it to learn something useful.
Concretely, how long are the sequences you are passing into the network? You say you have a 30k-long time sequence.. I assume you are passing in subsections / samples of this sequence?