How to get grads in pytorch after matrix multiplication? - python

I want to get the product of matrix multiplication in the latent space and optimize the weight matrix by the optimizer. I use different kinds of ways to do that. While, The value of 'pi_' in the below codes never changes. What should I do?
I've tried different functions to get the product, like torch.mm(), torch.matual() and #. The weight matrix 'pi_' never changed.
import torch
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset
#from torchvision import transforms
from torchvision.datasets import MNIST
def get_mnist(data_dir='./data/mnist/',batch_size=128):
train=MNIST(root=data_dir,train=True,download=True)
test=MNIST(root=data_dir,train=False,download=True)
X=torch.cat([train.data.float().view(-1,784)/255.,test.data.float().view(-1,784)/255.],0)
Y=torch.cat([train.targets,test.targets],0)
dataset=dict()
dataset['X']=X
dataset['Y']=Y
dataloader=DataLoader(TensorDataset(X,Y),batch_size=batch_size,shuffle=True)
return dataloader
class tests(torch.nn.Module):
def __init__(self):
super(tests, self).__init__()
self.pi_= torch.nn.Parameter(torch.FloatTensor(10, 1).fill_(1),requires_grad=True)
self.linear0 = torch.nn.Linear(784,10)
self.linear1 = torch.nn.Linear(1,784)
def forward(self, data):
data = torch.nn.functional.relu(self.linear0(data))
# data = data.mm(self.pi_)
# data = torch.mm(data, self.pi_)
# data = data # self.pi_
data = torch.matmul(data, self.pi_)
data = torch.nn.functional.relu(self.linear1(data))
return data
if __name__ == '__main__':
DL=get_mnist()
t = tests().cuda()
optimizer = torch.optim.Adam(t.parameters(), lr = 2e-3)
for i in range(100):
for inputs, classes in DL:
inputs = inputs.cuda()
res = t(inputs)
loss = torch.nn.functional.mse_loss(res, inputs)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("Epoch:", i,"pi:",t.pi_)

TL;DR
You have too many parameters in your neural network, some of them becomes useless and therefore they are no longer being updated. Change your network architecture to reduce useless parameters.
Full explanation:
The weight matrix pi_ does change. You initialize pi_ as all 1, after running the first epochs, the weight matrix pi_ becomes
output >>>
tensor([[0.9879],
[0.9874],
[0.9878],
[0.9880],
[0.9876],
[0.9878],
[0.9878],
[0.9873],
[0.9877],
[0.9871]], device='cuda:0', requires_grad=True)
So, it has changed once. The true reason behind it involves some mathematics. But to put it in non-mathematical ways it means this layer doesn't contribute much to the loss, therefore, the network decided not to update this layer. i.e. The existence of pi_ in this network is redundant.
If you want to observe the change in pi_, you should modify the neural network such that pi_ is not redundant anymore.
One possible modification is to change your reconstruction problem to a classification problem
import torch
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset
#from torchvision import transforms
from torchvision.datasets import MNIST
def get_mnist(data_dir='./data/mnist/',batch_size=128):
train=MNIST(root=data_dir,train=True,download=True)
test=MNIST(root=data_dir,train=False,download=True)
X=torch.cat([train.data.float().view(-1,784)/255.,test.data.float().view(-1,784)/255.],0)
Y=torch.cat([train.targets,test.targets],0)
dataset=dict()
dataset['X']=X
dataset['Y']=Y
dataloader=DataLoader(TensorDataset(X,Y),batch_size=batch_size,shuffle=True)
return dataloader
class tests(torch.nn.Module):
def __init__(self):
super(tests, self).__init__()
# self.pi_= torch.nn.Parameter(torch.randn((10, 1),requires_grad=True))
self.pi_= torch.nn.Parameter(torch.FloatTensor(10, 1).fill_(1),requires_grad=True)
self.linear0 = torch.nn.Linear(784,10)
# self.linear1 = torch.nn.Linear(1,784)
def forward(self, data):
data = torch.nn.functional.relu(self.linear0(data))
# data = data.mm(self.pi_)
# data = torch.mm(data, self.pi_)
# data = data # self.pi_
data = torch.matmul(data, self.pi_)
# data = torch.nn.functional.relu(self.linear1(data))
return data
if __name__ == '__main__':
DL=get_mnist()
t = tests().cuda()
optimizer = torch.optim.Adam(t.parameters(), lr = 2e-3)
for i in range(100):
for inputs, classes in DL:
inputs = inputs.cuda()
classes = classes.cuda().float()
output = t(inputs)
loss = torch.nn.functional.mse_loss(output.view(-1), classes)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# print("Epoch:", i, "pi_grad", t.pi_.grad)
print("Epoch:", i,"pi:",t.pi_)
Now, pi_ changes every single epoch.
output >>>
Epoch: 0 pi: Parameter containing:
tensor([[1.3429],
[1.0644],
[0.9817],
[0.9767],
[0.9715],
[1.1110],
[1.1139],
[0.9759],
[1.2424],
[1.2632]], device='cuda:0', requires_grad=True)
Epoch: 1 pi: Parameter containing:
tensor([[1.4413],
[1.1977],
[0.9588],
[1.0325],
[0.9241],
[1.1988],
[1.1690],
[0.9248],
[1.2892],
[1.3427]], device='cuda:0', requires_grad=True)
Epoch: 2 pi: Parameter containing:
tensor([[1.4653],
[1.2351],
[0.9539],
[1.1588],
[0.8670],
[1.2739],
[1.2058],
[0.8648],
[1.2848],
[1.3891]], device='cuda:0', requires_grad=True)
Epoch: 3 pi: Parameter containing:
tensor([[1.4375],
[1.2256],
[0.9580],
[1.2293],
[0.8174],
[1.3471],
[1.2035],
[0.8102],
[1.2505],
[1.4201]], device='cuda:0', requires_grad=True)

Related

Simple Neural Network in Pytorch with 3 inputs (Numerical Values)

Having a hard time setting up a neural network most of the examples are images. My problem has 3 inputs each of size N X M where N are the samples and M are the features. I have a separate file (CSV) with 1 x N binary target (0,1).
The network i'm trying to configure should have two hidden layers with 100 and 50 neurons, respectively. Sigmoid activation function and cross-entropy to check performance. The result should just be a single probability output.
Please help?
EDIT:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.autograd as autograd
import torch.nn.functional as F
#from torch.autograd import Variable
import pandas as pd
# Import Data
Input1 = pd.read_csv(r'...')
Input2 = pd.read_csv(r'...')
Input3 = pd.read_csv(r'...')
Target = pd.read_csv(r'...' )
# Convert to Tensor
Input1_tensor = torch.tensor(Input1.to_numpy()).float()
Input2_tensor = torch.tensor(Input2.to_numpy()).float()
Input3_tensor = torch.tensor(Input3.to_numpy()).float()
Target_tensor = torch.tensor(Target.to_numpy()).float()
# Transpose to have signal as columns instead of rows
input1 = Input1_tensor
input2 = Input2_tensor
input3 = Input3_tensor
y = Target_tensor
# Define the model
class Net(nn.Module):
def __init__(self, num_inputs, hidden1_size, hidden2_size, num_classes):
# Initialize super class
super(Net, self).__init__()
#self.criterion = nn.CrossEntropyLoss()
# Add hidden layer
self.layer1 = nn.Linear(num_inputs,hidden1_size)
# Activation
self.sigmoid = torch.nn.Sigmoid()
# Add output layer
self.layer2 = nn.Linear(hidden1_size,hidden2_size)
# Activation
self.sigmoid2 = torch.nn.Sigmoid()
self.layer3 = nn.Linear(hidden2_size, num_classes)
def forward(self, x1, x2, x3):
# implement the forward pass
in1 = self.layer1(x1)
in2 = self.layer1(x2)
in3 = self.layer1(x3)
xyz = torch.cat((in1,in2,in3),1)
return xyz
# Define loss function
loss_function = nn.CrossEntropyLoss()
# Define optimizer
optimizer = optim.SGD(model.parameters(), lr=1e-4)
for t in range(num_epochs):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(input1, input2, input3)
# Compute and print loss
loss = loss_function(y_pred, y)
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
# Calculate gradient using backward pass
loss.backward()
# Update model parameters (weights)
optimizer.step()
Here I am getting an error of "
RuntimeError: 0D or 1D target tensor expected, multi-target not supported"
for line "loss = loss_function(y_pred, y)"
Where y_pred is [20000,375] and y is [20000,1]
you can refer to pytorch, a python library for deep learning and neural networks.
and you can use code that defines network below:
from torch import nn
import torch.nn.functional as F
def network(nn.Module):
def __init__(self, M):
# M is the dimension of input feature
super(network, self).__init__()
self.layer1 = nn.Linear(M, 100)
self.layer2 = nn.Linear(100, 50)
self.out = nn.Linear(50,1)
def forward(self,x):
return F.sigmoid(self.out(self.layer2(self.layer1(x))))
----------
You can then refer to the pytorch documentation and finish the rest training code.
Edit:
As for RuntimeError, you can squeeze the target tensor by y.squeeze(). This will remove redundant dimension in your tensor, e.g. [20000,1] -> [20000]

How do I retrieve the weights estimated from a neural net using skorch?

I've trained a simple neural net using skorch to make it sklearn compatible and I would like to know how to retrieve the actual estimated weights.
Here's a replicable example of what I need.
The neural net presented here uses 10 features, has one hidden layer of 2 nodes, uses ReLu activation functions and linearly combines the output of the 2 nodes.
import torch
import numpy as np
from torch.autograd import Variable
# Create example data
np.random.seed(2022)
train_size = 1000
n_features= 10
X_train = np.random.rand(n_features, train_size).astype("float32")
l2_params_1 = np.random.rand(1,n_features).astype("float32")
l2_params_2 = np.random.rand(1,n_features).astype("float32")
l1_X = np.matmul(l2_params_1, X_train)
l2_X = np.matmul(l2_params_2, X_train)
y_train = l1_X + l2_X
# Defining my NN
class NNModule(torch.nn.Module):
def __init__(self, in_features):
super(NNModule, self).__init__()
self.l1 = torch.nn.Linear(in_features, 2)
self.a1 = torch.nn.ReLU()
self.l2 = torch.nn.Linear(2, 1)
def forward(self, x):
x = self.l1(x)
x = self.a1(x)
return self.l2(x)
# Initialize the NN
torch.manual_seed(200)
model = NNModule(in_features = 10)
model.l1.weight.data.uniform_(0.0, 1.0)
model.l1.bias.data.uniform_(0.0, 1.0)
# Define criterion and optimizer
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# Train the NN
torch.manual_seed(200)
for epoch in range(100):
inputs = Variable(torch.from_numpy(np.transpose(X_train)))
labels = Variable(torch.from_numpy(np.transpose(y_train)))
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
The parameters at which I'm arriving are the following:
list(model.parameters())
[Output]:
[Parameter containing:
tensor([[0.8997, 0.8345, 0.8284, 0.6950, 0.5949, 0.1217, 0.9067, 0.1824, 0.8272,
0.2372],
[0.7525, 0.6577, 0.4358, 0.6109, 0.8817, 0.5429, 0.5263, 0.7531, 0.1552,
0.7066]], requires_grad=True),
Parameter containing:
tensor([0.6617, 0.1079], requires_grad=True),
Parameter containing:
tensor([[0.9225, 0.8339]], requires_grad=True),
Parameter containing:
tensor([0.0786], requires_grad=True)]
Now, to wrap my NNModule with skorch, I'm using this:
from skorch import NeuralNetRegressor
torch.manual_seed(200)
net = NeuralNetRegressor(
module=NNModule(in_features=10),
criterion=torch.nn.MSELoss,
optimizer=torch.optim.SGD,
optimizer__lr=0.01,
max_epochs=100,
verbose=0
)
net.fit(np.transpose(X_train), np.transpose(y_train))
And I'd like to retrieve the weights obtained in the training. I've used dir(net) to see if the weights are stored in any attributes to no avail.
To retrieve the weights one needs to output them like this:
list(net.module.parameters())

Access lower dimensional encoded data of autoencoder

Here is an autoencoder I’m working on from tutorial:https://debuggercafe.com/implementing-deep-autoencoder-in-pytorch/
I’m just learning about autoencoders and I’ve modified the source encode a custom small dataset which consists of:
[0,1,0,1,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0]
It seems to work ok, but I’m unsure how to access the lower dimensional embedding values of dimension 2 (set by parameter out_features).
I've added a methods to the Autoencoder class to return the embedding , is this the recommended method of accessing the embedding's ?
Code:
import torch
import torchvision
from torch import nn
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import save_image
import warnings
import os
# import packages
import os
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms
import torch.optim as optim
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.utils import save_image
import numpy as np
# utility functions
def get_device():
if torch.cuda.is_available():
device = 'cuda:0'
else:
device = 'cpu'
return device
device = get_device()
features = torch.tensor(np.array([ [0,1,0,1,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0] ])).float()
tic_tac_toe_data_loader = torch.utils.data.DataLoader(features, batch_size=1, shuffle=True)
class Encoder(nn.Module):
def __init__(self):
super(Encoder, self).__init__()
self.fc1 = nn.Linear(in_features=9, out_features=2)
def forward(self, x):
return F.sigmoid(self.fc1(x))
class Decoder(nn.Module):
def __init__(self):
super(Decoder, self).__init__()
self.fc1 = nn.Linear(in_features=2, out_features=9)
def forward(self, x):
return F.sigmoid(self.fc1(x))
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.fc1 = Encoder()
self.fc2 = Decoder()
def forward(self, x):
return self.fc2(self.fc1(x))
net = Autoencoder()
net.to(device)
NUM_EPOCHS = 50
LEARNING_RATE = 1e-3
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=LEARNING_RATE)
# image transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
outputs = None
def train(net, trainloader, NUM_EPOCHS):
train_loss = []
for epoch in range(NUM_EPOCHS):
running_loss = 0.0
for data in trainloader:
img = data
img = img.to(device)
img = img.view(img.size(0), -1)
# print('img.shape' , img.shape)
optimizer.zero_grad()
outputs = net(img)
loss = criterion(outputs, img)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss = running_loss / len(trainloader)
train_loss.append(loss)
return train_loss
# train the network
train_loss = train(net, tic_tac_toe_data_loader, NUM_EPOCHS)
I can access the lower dimensional embedding using
print(Encoder().forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
But is this using the trained weight values for the embedding ? If I call Encoder multiple times with same values:
print(Encoder().forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
print(Encoder().forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
different results are returned:
tensor([0.5083, 0.5020], grad_fn=<SigmoidBackward>)
tensor([0.4929, 0.6940], grad_fn=<SigmoidBackward>)
Why is this the case ? Is an extra training step being invoked as a result of calling Encoder ?
By calling Encoder() you are basically creating a new instance of the encoder everytime and the weights are randomly initialized each time.
Generally, you make one instance of it and train it, save the weights, and infer on it.
Also, for PyTorch, you need not call .forward(), but call the instance directly. Forward is called by it implicitly, including other hook methods if any.
enc = Encoder()
input = torch.from_numpy(np.asarray([0,1,0,1,0,1,0,1,0]).float()
print(enc(input))
print(enc(input))
Training pass happens when you pass the Encode() instance to train function. Calling Encoder() only creates a new object.
Since each object has it's own weights, and the weights are initialized randomly (see xavier and kaiming initialization), you are different outputs. Moving to a single object, you still have to explicitly train it with the train function.
Like other responder pointed out when you call Encoder() you generate new instances with randomly initialized weights. Because you are interested in lower dimensional embedding produced by your encoder you need to access the weights of the encoder in your trained net:
trained_encoder = net.fc1
Now that you have your encoder with trained weights the following lines should produces same result:
print(trained_encoder.forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
print(trained_encoder.forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
As pointed out by others you can further simplify by passing input directly:
test_input = torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()
print(trained_encoder(test_input))

PyTorch naive single label classification with embedding layer fails at random

I am new to PyTorch and I am trying out the Embedding Layer.
I wrote a naive classification task, where all the inputs are the equal and all the labels are set to 1.0. I hence expect the model to learn quickly to predict 1.0.
The input is always 0, which is fed into a nn.Embedding(1,32) layer, followed by nn.Linear(32,1) and nn.Relu().
However, an unexpected and undesired behavior occurs: training outcome is different for different times I run the code.
For example,
setting the random seed to 10, model converges: loss decreases and model always predicts 1.0
setting the random seed to 1111, model doesn't converge: loss doesn't decrease and model always predicts 0.5. In those cases the parameters are not updated
Here is the minimal, replicable code:
from torch.nn import BCEWithLogitsLoss
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from torch.utils.data import Dataset
import torch
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.vgg_fc = nn.Linear(32, 1)
self.relu = nn.ReLU()
self.embeddings = nn.Embedding(1, 32)
def forward(self, data):
emb = self.embeddings(data['index'])
return self.relu(self.vgg_fc(emb))
class MyDataset(Dataset):
def __init__(self):
pass
def __len__(self):
return 1000
def __getitem__(self, idx):
return {'label': 1.0, 'index': 0}
def train():
model = MyModel()
db = MyDataset()
dataloader = DataLoader(db, batch_size=256, shuffle=True, num_workers=16)
loss_function = BCEWithLogitsLoss()
optimizer_rel = optim.SGD(model.parameters(), lr=0.1)
for epoch in range(50):
for i_batch, sample_batched in enumerate(dataloader):
model.zero_grad()
out = model({'index': Variable(sample_batched['index'])})
labels = Variable(sample_batched['label'].type(torch.FloatTensor).view(sample_batched['label'].shape[0], 1))
loss = loss_function(out, labels)
loss.backward()
optimizer_rel.step()
print 'Epoch:', epoch, 'batch', i_batch, 'Tr_Loss:', loss.data[0]
return model
if __name__ == '__main__':
# please, try seed 10 (converge) and seed 1111 (fails)
torch.manual_seed(10)
train()
Without specifying the random seed, different runs have different outcome.
Why is, in those cases, the model unable to learn such a easy task?
Is there any mistake in the way I use nn.Embedding layer?
Thank you
I found the problem was the final relu layer, before the sigmoid.
As stated here, that layer will:
throw away information without adding any additional benefit
Removing the layer, the network learned as expected with any seed.

Convergence of LSTM network using Tensorflow

I am trying to detect micro-events in a long time series. For this purpose, I will train a LSTM network.
Data. Input for each time sample is 11 different features somewhat normalized to fit 0-1. Output will be either one of two classes.
Batching. Due to huge class imbalance I have extracted the data in batches of each 60 time samples, of which at least 5 will always be class 1, and the rest class to. In this way the class imbalance is reduced from 150:1 to around 12:1 I have then randomized the order of all my batches.
Model. I am attempting to train an LSTM, with initial configuration of 3 different cells with 5 delay steps. I expect the micro events to arrive in sequences of at least 3 time steps.
Problem: When I try to train the network it will quickly converge towards saying that EVERYTHING belongs to the majority class. When I implement a weighted loss function, at some certain threshold it will change to saying that EVERYTHING belongs to the minority class. I suspect (without being expert) that there is no learning in my LSTM cells, or that my configuration is off?
Below is the code for my implementation. I am hoping that someone can tell me
Is my implementation correct?
What other reasons could there be for such behaviour?
ar_model.py
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn
import ar_config
config = ar_config.get_config()
class ARModel(object):
def __init__(self, is_training=False, config=None):
# Config
if config is None:
config = ar_config.get_config()
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_classes], name='ModelOutput')
# Hidden layer
with tf.variable_scope('lstm') as scope:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_delays)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, [self._features], dtype=tf.float32)
# Output layer
output = outputs[-1]
softmax_w = tf.get_variable('softmax_w', [config.num_hidden, config.num_classes], tf.float32)
softmax_b = tf.get_variable('softmax_b', [config.num_classes], tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
# Evaluate
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
self._cost = cost = tf.reduce_mean(loss)
self._predict = tf.argmax(tf.nn.softmax(logits), 1)
self._correct = tf.equal(tf.argmax(logits, 1), tf.argmax(self._targets, 1))
self._accuracy = tf.reduce_mean(tf.cast(self._correct, tf.float32))
self._final_state = state
if not is_training:
return
# Optimize
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(cost)
#property
def features(self):
return self._features
#property
def targets(self):
return self._targets
#property
def cost(self):
return self._cost
#property
def accuracy(self):
return self._accuracy
#property
def train_op(self):
return self._train_op
#property
def predict(self):
return self._predict
#property
def initial_state(self):
return self._initial_state
#property
def final_state(self):
return self._final_state
ar_train.py
import os
from datetime import datetime
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
import ar_network
import ar_config
import ar_reader
config = ar_config.get_config()
def main(argv=None):
if gfile.Exists(config.train_dir):
gfile.DeleteRecursively(config.train_dir)
gfile.MakeDirs(config.train_dir)
train()
def train():
train_data = ar_reader.ArousalData(config.train_data, num_steps=config.max_steps)
test_data = ar_reader.ArousalData(config.test_data, num_steps=config.max_steps)
with tf.Graph().as_default(), tf.Session() as session, tf.device('/cpu:0'):
initializer = tf.random_uniform_initializer(minval=-0.1, maxval=0.1)
with tf.variable_scope('model', reuse=False, initializer=initializer):
m = ar_network.ARModel(is_training=True)
s = tf.train.Saver(tf.all_variables())
tf.initialize_all_variables().run()
for batch_input, batch_target in train_data:
step = train_data.iter_steps
dict = {
m.features: batch_input,
m.targets: batch_target
}
session.run(m.train_op, feed_dict=dict)
state, cost, accuracy = session.run([m.final_state, m.cost, m.accuracy], feed_dict=dict)
if not step % 10:
test_input, test_target = test_data.next()
test_accuracy = session.run(m.accuracy, feed_dict={
m.features: test_input,
m.targets: test_target
})
now = datetime.now().time()
print ('%s | Iter %4d | Loss= %.5f | Train= %.5f | Test= %.3f' % (now, step, cost, accuracy, test_accuracy))
if not step % 1000:
destination = os.path.join(config.train_dir, 'ar_model.ckpt')
s.save(session, destination)
if __name__ == '__main__':
tf.app.run()
ar_config.py
class Config(object):
# Directories
train_dir = '...'
ckpt_dir = '...'
train_data = '...'
test_data = '...'
# Data
num_features = 13
num_classes = 2
batch_size = 60
# Model
num_hidden = 3
num_delays = 5
# Training
max_steps = 100000
def get_config():
return Config()
UPDATED ARCHITECTURE:
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features, config.num_delays], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_output], name='ModelOutput')
# Weights
weights = {
'hidden': tf.get_variable('w_hidden', [config.num_features, config.num_hidden], tf.float32),
'out': tf.get_variable('w_out', [config.num_hidden, config.num_classes], tf.float32)
}
biases = {
'hidden': tf.get_variable('b_hidden', [config.num_hidden], tf.float32),
'out': tf.get_variable('b_out', [config.num_classes], tf.float32)
}
#Layer in
with tf.variable_scope('input_hidden') as scope:
inputs = self._features
inputs = tf.transpose(inputs, perm=[2, 0, 1]) # (BatchSize,NumFeatures,TimeSteps) -> (TimeSteps,BatchSize,NumFeatures)
inputs = tf.reshape(inputs, shape=[-1, config.num_features]) # (TimeSteps,BatchSize,NumFeatures -> (TimeSteps*BatchSize,NumFeatures)
inputs = tf.add(tf.matmul(inputs, weights['hidden']), biases['hidden'])
#Layer hidden
with tf.variable_scope('hidden_hidden') as scope:
inputs = tf.split(0, config.num_delays, inputs) # -> n_steps * (batchsize, features)
cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, inputs, dtype=tf.float32)
#Layer out
with tf.variable_scope('hidden_output') as scope:
output = outputs[-1]
logits = tf.add(tf.matmul(output, weights['out']), biases['out'])
Odd elements
Weighted loss
I am not sure your "weighted loss" does what you want it to do:
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
this is applied before calculating the loss function (further I think you wanted an element-wise multiplication as well? also your ratio is above 1 which makes the second part negative?) so it forces your predictions to behave in a certain way before applying the softmax.
If you want weighted loss you should apply this after
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
with some element-wise multiplication of your weights.
loss = loss * weights
Where your weights have a shape like [2,]
However, I would not recommend you to use weighted losses. Perhaps try increasing the ratio even further than 1:6.
Architecture
As far as I can read, you are using 5 stacked LSTMs with 3 hidden units per layer?
Try removing the multi rnn and just use a single LSTM/GRU (maybe even just a vanilla RNN) and jack the hidden units up to ~100-1000.
Debugging
Often when you are facing problems with an odd behaving network, it can be a good idea to:
Print everything
Literally print the shapes and values of every tensor in your model, use sess to fetch it and then print it. Your input data, the first hidden representation, your predictions, your losses etc.
You can also use tensorflows tf.Print() x_tensor = tf.Print(x_tensor, [tf.shape(x_tensor)])
Use tensorboard
Using tensorboard summaries on your gradients, accuracy metrics and histograms will reveal patterns in your data that might explain certain behavior, such as what lead to exploding weights. Like maybe your forget bias goes to infinity or your not tracking gradient through a certain layer etc.
Other questions
How large is your dataset?
How long are your sequences?
Are the 13 features categorical or continuous? You should not normalize categorical variables or represent them as integers, instead you should use one-hot encoding.
Gunnar has already made lots of good suggestions. A few more small things worth paying attention to in general for this sort of architecture:
Try tweaking the Adam learning rate. You should determine the proper learning rate by cross-validation; as a rough start, you could just check whether a smaller learning rate saves your model from crashing on the training data.
You should definitely use more hidden units. It's cheap to try larger networks when you first start out on a dataset. Go as large as necessary to avoid the underfitting you've observed. Later you can regularize / pare down the network after you get it to learn something useful.
Concretely, how long are the sequences you are passing into the network? You say you have a 30k-long time sequence.. I assume you are passing in subsections / samples of this sequence?

Categories

Resources