I am trying to figure out what is wrong with my initialization of the neural network model. I have already set a pdb trace to see that the defining neural network part is the source of error. Also, I get yellow marks on the defining neural network code because the module is expected to be returned but if I return the module, it causes a recursion error. It is a linear model that has to have an input dimension of the batch size * 81 and an output dimension of the batch size * 1. I am relatively new at pytorch and defining deep neural networks so this may not be a good question. My syntax may also be very bad. Any help is appreciated. The code below is the defining of the neural network and training of the pytorch model.
def get_nnet_model(module_list=nn.ModuleList(), input_dim: int = 8100, layer_dim: int = 100) -> nn.Module:
""" Get the neural network model
#return: neural network model
"""
device = torch.device('cpu')
module_list.append(nn.Linear(input_dim, layer_dim))
module_list[-1].weight.data.normal_(0, 0.1)
module_list[-1].bias.data.zero_()
def train_nnet(nnet: nn.Module, states_nnet: np.ndarray, outputs: np.ndarray, batch_size: int = 100, num_itrs: int = 10000, train_itr: int = 10000, device: torch.device, lr=0.01, lr_d=1):
nnet.train()
criterion = nn.MSELoss()
optimizer = optim.Adam(nnet.parameters(), lr=lr)
while train_itr < num_itrs:
optimizer.zero_grad()
lr_itr = lr + (lr_d ** train_itr)
for param_group in optimizer.param_groups:
param_group['lr'] = lr_itr
data = pickle.load(open("data/data.pkl", "rb"))
nnet_inputs_np, nnet_targets_np = data
nnet_inputs_np = nnet_inputs_np.astype(np.float32)
nnet_inputs = torch.tensor(nnet_inputs_np, device=device)
nnet_targets = torch.tensor(nnet_targets_np, device=device)
nnet_inputs = nnet_inputs.float()
nnet_outputs = nnet(nnet_inputs)
loss = criterion(nnet_outputs, nnet_targets)
loss.backward()
optimizer.step()
Based on your comment, somewhere else in your code you have something like:
nnet = get_nnet_model(...)
However, get_nnet_model(...) isn't returning anything. Change the def get_nnet_model to:
def get_nnet_model(module_list=nn.ModuleList(), input_dim: int = 8100, layer_dim: int = 100) -> nn.Module:
""" Get the neural network model
#return: neural network model
"""
device = torch.device('cpu')
module_list.append(nn.Linear(input_dim, layer_dim))
module_list[-1].weight.data.normal_(0, 0.1)
module_list[-1].bias.data.zero_()
return module_list # add this one
Related
I'm new to pytorch and I'm having a problem with some code to train a a neural network to solve a control problem. I use the following code to solve a toy version of my problem:
# SOME IMPORTS
import torch
import torch.autograd as autograd
from torch import Tensor
import torch.nn as nn
import torch.optim as optim
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# PARAMETERS OF THE PROBLEM
layers = [4, 32, 32, 4] # Layers of the NN
steps = 10000 # Simulation steps
train_step = 1 # I train the NN for 1 epoch every train_step steps
lr = 1e-3 # Learning rate
After this I define a very simple network:
# DEFINITION OF THE NETWORK (A SIMPLE FEED FORWARD)
class FCN(nn.Module):
def __init__(self,layers):
super(FCN, self).__init__() #call __init__ from parent class
self.linears = []
for i in range(len(layers)-2):
self.linears.append(
nn.Linear(layers[i], layers[i+1])
)
self.linears.append(
nn.ReLU()
)
self.linears.append(
nn.Linear(layers[-2], layers[-1])
)
self.linear_stack = nn.Sequential(*self.linears)
'forward pass'
def forward(self,x):
out = self.linear_stack(x)
return out
I then use the defined class to create my model:
model = FCN(layers)
model.to(device)
params = list(model.parameters())
optimizer = torch.optim.Adam(model.parameters(),lr=lr,amsgrad=False)
Then I define the loss function and the simulation function, i.e. the function that updates the state of my problem.
def simulate(state_old, model):
state_new = model(state_old)
return state_new
def lossNN(state_old,state_new, model):
error = torch.sum( (state_old-state_new)**2 )
return error
And finally I train my model:
torch.autograd.set_detect_anomaly(True)
state_old = torch.Tensor([0.01, 0.01, 0.5, 0.1]).to(device)
for i in range(steps):
state_new = simulate(state_old, model)
if i%train_step == 0:
optimizer.zero_grad()
loss = lossNN(state_old, state_new, model)
loss.backward(retain_graph=True)
optimizer.step()
state_old = state_new
if (i%1000)==0:
print(loss)
print(state_new)
I then get the following error. Here you can find the backtrace :
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 4]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
You need to use detach to remove the gradient created in the previous state.
state_old = state_new
state_old = state_new.detach()
Then your training code changes to:
torch.autograd.set_detect_anomaly(True)
state_old = torch.Tensor([0.01, 0.01, 0.5, 0.1]).to(device)
for i in range(steps):
state_new = simulate(state_old, model)
if i%train_step == 0:
optimizer.zero_grad()
loss = lossNN(state_old, state_new, model)
loss.backward(retain_graph=True)
optimizer.step()
state_old = state_new.detach()
if (i%1000)==0:
print(loss)
print(state_new)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import transforms, datasets
#neural network class
class Net(nn.Module):
#intialize class
def __init__(self):
super().__init__()
#feedforward neural network passes data from input layer to output layer
#fully connected layer with input shape of 28*28 pixels (flatten image to one row) and output feature is size 64. Linear means flat layer
self.fc1 = nn.Linear(28*28,64)
#feed in data from fc1 to fc2
self.fc2 = nn.Linear(64,64)
self.fc3 = nn.Linear(64,64)
#output layer has input 64 and output of size 10 to represent 10 classes in MNIST
self.fc4 = nn.Linear(64,10)
#forward pass through the data
def forward(self, x):
#relu is activation function and performs operation on input data
#input and output dimenson of relu are the same
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
#softmax function gets probability distribution for each class adding up to 1 for output layer
x = F.log_softmax(self.fc4(x), dim=1)
return x
#declare a model
net = Net()
#print(net)
#passing in random data
x = torch.rand(28,28)
#resize to represent input shape (batch size, input x, input y)
x = x.view(-1,28,28)
#print(x)
#optmizer adjusts neural network based on error calculation
import torch.optim as optim
#net.parameters() means all the adjustable parts of the neural network, learning rate is amount of change (we don't want model to swerve based on one train case)
optimizer = optim.Adam(net.parameters(),lr=0.001)
#get datasets using torchvision.datasets transforms are application applied to data (transforms conversion to tensors)
train = torchvision.datasets.MNIST("", train=True, download=True,transform=transforms.Compose([transforms.ToTensor()]))
test = torchvision.datasets.MNIST("", train=False, download=True,transform=transforms.Compose([transforms.ToTensor()]))
#store in data loader, batch size is how many samples is passed through the model at once (in GPU memory), best batch size is between 8-64
#shuffling avoids feeding too much of one kind of image and leads to more generalization
trainset = torch.utils.data.DataLoader(train,batch_size=10,shuffle=True)
testset = torch.utils.data.DataLoader(test,batch_size=10,shuffle=True)
#full pass through data is epoch
EPOCHS = 3
for epoch in range(EPOCHS):
#data is a batch of data in the training set
for data in trainset:
#split into features and labels
features, labels = data
#print(features, labels)
#reset the gradient for next passes to avoid convoluting the results of multiple backpropogations
net.zero_grad()
#pass data into network (make sure input shape matches)
output = net(features.view(-1,28*28))
#compute error (output,expected)
loss = F.nll_loss(output,labels)
print("loss is the ", loss)
#backpropogate loss through trainiable parameters of model
loss.backward()
#adjust neural network
optimizer.step()
I am using Pytorch on Google colab. The error message says that the gradient isn't there.
I am unsure where the error stems from? I used this tutorial: https://www.youtube.com/watch?v=9j-_dOze4IM&list=PLQVvvaa0QuDdeMyHEYc0gxFpYwHY2Qfdh&index=4
Error message from Google colab
Error message
I've got a tensorflow multiple layer rnn cell like this:
def MakeLSTMCell(self):
cells = []
for n in self.numUnits:
cell = tf.nn.rnn_cell.LSTMCell(n)
dropout = tf.nn.rnn_cell.DropoutWrapper(cell,
input_keep_prob=self.keep_prob,
output_keep_prob=self.keep_prob)
cells.append(dropout)
stackedRNNCell = tf.nn.rnn_cell.MultiRNNCell(cells)
return stackedRNNCell
def BuildGraph(self):
"""
Build the Graph of the recurrent reinforcement neural network.
"""
with self.graph.as_default():
with tf.variable_scope(self.scope):
self.inputSeq = tf.placeholder(tf.float32, [None, None, self.observationDim], name='input_seq')
self.batch_size = tf.shape(self.inputSeq)[0]
self.seqLength = tf.shape(self.inputSeq)[1]
self.cell = self.MakeLSTMCell()
with tf.name_scope("LSTM_layers"):
self.zeroState = self.cell.zero_state(self.batch_size, tf.float32)
self.cellState = self.zeroState
self.outputs, self.outputState = tf.nn.dynamic_rnn(self.cell,
self.inputSeq,
initial_state=self.cellState,
swap_memory=True)
However, this self.cellState is not configurable. I would like to know how could I save the lstm hidden state (keeps the same form so that I could feed it back to the rnn at any time) and reuse it at any time as initial_state?
I've tried the accepted answer in this question:
Tensorflow, best way to save state in RNNs?
However, dynamic batch size is not allowed when creating tf Variable.
Any help will be appreciated
Given a trained LSTM model I want to perform inference for single timesteps, i.e. seq_length = 1 in the example below. After each timestep the internal LSTM (memory and hidden) states need to be remembered for the next 'batch'. For the very beginning of the inference the internal LSTM states init_c, init_h are computed given the input. These are then stored in a LSTMStateTuple object which is passed to the LSTM. During training this state is updated every timestep. However for inference I want the state to be saved in between batches, i.e. the initial states only need to be computed at the very beginning and after that the LSTM states should be saved after each 'batch' (n=1).
I found this related StackOverflow question: Tensorflow, best way to save state in RNNs?. However this only works if state_is_tuple=False, but this behavior is soon to be deprecated by TensorFlow (see rnn_cell.py). Keras seems to have a nice wrapper to make stateful LSTMs possible but I don't know the best way to achieve this in TensorFlow. This issue on the TensorFlow GitHub is also related to my question: https://github.com/tensorflow/tensorflow/issues/2838
Anyone good suggestions for building a stateful LSTM model?
inputs = tf.placeholder(tf.float32, shape=[None, seq_length, 84, 84], name="inputs")
targets = tf.placeholder(tf.float32, shape=[None, seq_length], name="targets")
num_lstm_layers = 2
with tf.variable_scope("LSTM") as scope:
lstm_cell = tf.nn.rnn_cell.LSTMCell(512, initializer=initializer, state_is_tuple=True)
self.lstm = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_lstm_layers, state_is_tuple=True)
init_c = # compute initial LSTM memory state using contents in placeholder 'inputs'
init_h = # compute initial LSTM hidden state using contents in placeholder 'inputs'
self.state = [tf.nn.rnn_cell.LSTMStateTuple(init_c, init_h)] * num_lstm_layers
outputs = []
for step in range(seq_length):
if step != 0:
scope.reuse_variables()
# CNN features, as input for LSTM
x_t = # ...
# LSTM step through time
output, self.state = self.lstm(x_t, self.state)
outputs.append(output)
I found out it was easiest to save the whole state for all layers in a placeholder.
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
Then unpack it and create a tuple of LSTMStateTuples before using the native tensorflow RNN Api.
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
RNN passes in the API:
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input_batch, initial_state=rnn_tuple_state)
The state - variable will then be feeded to the next batch as a placeholder.
Tensorflow, best way to save state in RNNs? was actually my original question. The code bellow is how I use the state tuples.
with tf.variable_scope('decoder') as scope:
rnn_cell = tf.nn.rnn_cell.MultiRNNCell \
([
tf.nn.rnn_cell.LSTMCell(512, num_proj = 256, state_is_tuple = True),
tf.nn.rnn_cell.LSTMCell(512, num_proj = WORD_VEC_SIZE, state_is_tuple = True)
], state_is_tuple = True)
state = [[tf.zeros((BATCH_SIZE, sz)) for sz in sz_outer] for sz_outer in rnn_cell.state_size]
for t in range(TIME_STEPS):
if t:
last = y_[t - 1] if TRAINING else y[t - 1]
else:
last = tf.zeros((BATCH_SIZE, WORD_VEC_SIZE))
y[t] = tf.concat(1, (y[t], last))
y[t], state = rnn_cell(y[t], state)
scope.reuse_variables()
Rather than using tf.nn.rnn_cell.LSTMStateTuple I just create a lists of lists which works fine. In this example I am not saving the state. However you could easily have made state out of variables and just used assign to save the values.
I am trying to detect micro-events in a long time series. For this purpose, I will train a LSTM network.
Data. Input for each time sample is 11 different features somewhat normalized to fit 0-1. Output will be either one of two classes.
Batching. Due to huge class imbalance I have extracted the data in batches of each 60 time samples, of which at least 5 will always be class 1, and the rest class to. In this way the class imbalance is reduced from 150:1 to around 12:1 I have then randomized the order of all my batches.
Model. I am attempting to train an LSTM, with initial configuration of 3 different cells with 5 delay steps. I expect the micro events to arrive in sequences of at least 3 time steps.
Problem: When I try to train the network it will quickly converge towards saying that EVERYTHING belongs to the majority class. When I implement a weighted loss function, at some certain threshold it will change to saying that EVERYTHING belongs to the minority class. I suspect (without being expert) that there is no learning in my LSTM cells, or that my configuration is off?
Below is the code for my implementation. I am hoping that someone can tell me
Is my implementation correct?
What other reasons could there be for such behaviour?
ar_model.py
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn
import ar_config
config = ar_config.get_config()
class ARModel(object):
def __init__(self, is_training=False, config=None):
# Config
if config is None:
config = ar_config.get_config()
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_classes], name='ModelOutput')
# Hidden layer
with tf.variable_scope('lstm') as scope:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_delays)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, [self._features], dtype=tf.float32)
# Output layer
output = outputs[-1]
softmax_w = tf.get_variable('softmax_w', [config.num_hidden, config.num_classes], tf.float32)
softmax_b = tf.get_variable('softmax_b', [config.num_classes], tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
# Evaluate
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
self._cost = cost = tf.reduce_mean(loss)
self._predict = tf.argmax(tf.nn.softmax(logits), 1)
self._correct = tf.equal(tf.argmax(logits, 1), tf.argmax(self._targets, 1))
self._accuracy = tf.reduce_mean(tf.cast(self._correct, tf.float32))
self._final_state = state
if not is_training:
return
# Optimize
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(cost)
#property
def features(self):
return self._features
#property
def targets(self):
return self._targets
#property
def cost(self):
return self._cost
#property
def accuracy(self):
return self._accuracy
#property
def train_op(self):
return self._train_op
#property
def predict(self):
return self._predict
#property
def initial_state(self):
return self._initial_state
#property
def final_state(self):
return self._final_state
ar_train.py
import os
from datetime import datetime
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
import ar_network
import ar_config
import ar_reader
config = ar_config.get_config()
def main(argv=None):
if gfile.Exists(config.train_dir):
gfile.DeleteRecursively(config.train_dir)
gfile.MakeDirs(config.train_dir)
train()
def train():
train_data = ar_reader.ArousalData(config.train_data, num_steps=config.max_steps)
test_data = ar_reader.ArousalData(config.test_data, num_steps=config.max_steps)
with tf.Graph().as_default(), tf.Session() as session, tf.device('/cpu:0'):
initializer = tf.random_uniform_initializer(minval=-0.1, maxval=0.1)
with tf.variable_scope('model', reuse=False, initializer=initializer):
m = ar_network.ARModel(is_training=True)
s = tf.train.Saver(tf.all_variables())
tf.initialize_all_variables().run()
for batch_input, batch_target in train_data:
step = train_data.iter_steps
dict = {
m.features: batch_input,
m.targets: batch_target
}
session.run(m.train_op, feed_dict=dict)
state, cost, accuracy = session.run([m.final_state, m.cost, m.accuracy], feed_dict=dict)
if not step % 10:
test_input, test_target = test_data.next()
test_accuracy = session.run(m.accuracy, feed_dict={
m.features: test_input,
m.targets: test_target
})
now = datetime.now().time()
print ('%s | Iter %4d | Loss= %.5f | Train= %.5f | Test= %.3f' % (now, step, cost, accuracy, test_accuracy))
if not step % 1000:
destination = os.path.join(config.train_dir, 'ar_model.ckpt')
s.save(session, destination)
if __name__ == '__main__':
tf.app.run()
ar_config.py
class Config(object):
# Directories
train_dir = '...'
ckpt_dir = '...'
train_data = '...'
test_data = '...'
# Data
num_features = 13
num_classes = 2
batch_size = 60
# Model
num_hidden = 3
num_delays = 5
# Training
max_steps = 100000
def get_config():
return Config()
UPDATED ARCHITECTURE:
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features, config.num_delays], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_output], name='ModelOutput')
# Weights
weights = {
'hidden': tf.get_variable('w_hidden', [config.num_features, config.num_hidden], tf.float32),
'out': tf.get_variable('w_out', [config.num_hidden, config.num_classes], tf.float32)
}
biases = {
'hidden': tf.get_variable('b_hidden', [config.num_hidden], tf.float32),
'out': tf.get_variable('b_out', [config.num_classes], tf.float32)
}
#Layer in
with tf.variable_scope('input_hidden') as scope:
inputs = self._features
inputs = tf.transpose(inputs, perm=[2, 0, 1]) # (BatchSize,NumFeatures,TimeSteps) -> (TimeSteps,BatchSize,NumFeatures)
inputs = tf.reshape(inputs, shape=[-1, config.num_features]) # (TimeSteps,BatchSize,NumFeatures -> (TimeSteps*BatchSize,NumFeatures)
inputs = tf.add(tf.matmul(inputs, weights['hidden']), biases['hidden'])
#Layer hidden
with tf.variable_scope('hidden_hidden') as scope:
inputs = tf.split(0, config.num_delays, inputs) # -> n_steps * (batchsize, features)
cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, inputs, dtype=tf.float32)
#Layer out
with tf.variable_scope('hidden_output') as scope:
output = outputs[-1]
logits = tf.add(tf.matmul(output, weights['out']), biases['out'])
Odd elements
Weighted loss
I am not sure your "weighted loss" does what you want it to do:
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
this is applied before calculating the loss function (further I think you wanted an element-wise multiplication as well? also your ratio is above 1 which makes the second part negative?) so it forces your predictions to behave in a certain way before applying the softmax.
If you want weighted loss you should apply this after
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
with some element-wise multiplication of your weights.
loss = loss * weights
Where your weights have a shape like [2,]
However, I would not recommend you to use weighted losses. Perhaps try increasing the ratio even further than 1:6.
Architecture
As far as I can read, you are using 5 stacked LSTMs with 3 hidden units per layer?
Try removing the multi rnn and just use a single LSTM/GRU (maybe even just a vanilla RNN) and jack the hidden units up to ~100-1000.
Debugging
Often when you are facing problems with an odd behaving network, it can be a good idea to:
Print everything
Literally print the shapes and values of every tensor in your model, use sess to fetch it and then print it. Your input data, the first hidden representation, your predictions, your losses etc.
You can also use tensorflows tf.Print() x_tensor = tf.Print(x_tensor, [tf.shape(x_tensor)])
Use tensorboard
Using tensorboard summaries on your gradients, accuracy metrics and histograms will reveal patterns in your data that might explain certain behavior, such as what lead to exploding weights. Like maybe your forget bias goes to infinity or your not tracking gradient through a certain layer etc.
Other questions
How large is your dataset?
How long are your sequences?
Are the 13 features categorical or continuous? You should not normalize categorical variables or represent them as integers, instead you should use one-hot encoding.
Gunnar has already made lots of good suggestions. A few more small things worth paying attention to in general for this sort of architecture:
Try tweaking the Adam learning rate. You should determine the proper learning rate by cross-validation; as a rough start, you could just check whether a smaller learning rate saves your model from crashing on the training data.
You should definitely use more hidden units. It's cheap to try larger networks when you first start out on a dataset. Go as large as necessary to avoid the underfitting you've observed. Later you can regularize / pare down the network after you get it to learn something useful.
Concretely, how long are the sequences you are passing into the network? You say you have a 30k-long time sequence.. I assume you are passing in subsections / samples of this sequence?