I am new to PyTorch and I am trying out the Embedding Layer.
I wrote a naive classification task, where all the inputs are the equal and all the labels are set to 1.0. I hence expect the model to learn quickly to predict 1.0.
The input is always 0, which is fed into a nn.Embedding(1,32) layer, followed by nn.Linear(32,1) and nn.Relu().
However, an unexpected and undesired behavior occurs: training outcome is different for different times I run the code.
For example,
setting the random seed to 10, model converges: loss decreases and model always predicts 1.0
setting the random seed to 1111, model doesn't converge: loss doesn't decrease and model always predicts 0.5. In those cases the parameters are not updated
Here is the minimal, replicable code:
from torch.nn import BCEWithLogitsLoss
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from torch.utils.data import Dataset
import torch
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.vgg_fc = nn.Linear(32, 1)
self.relu = nn.ReLU()
self.embeddings = nn.Embedding(1, 32)
def forward(self, data):
emb = self.embeddings(data['index'])
return self.relu(self.vgg_fc(emb))
class MyDataset(Dataset):
def __init__(self):
pass
def __len__(self):
return 1000
def __getitem__(self, idx):
return {'label': 1.0, 'index': 0}
def train():
model = MyModel()
db = MyDataset()
dataloader = DataLoader(db, batch_size=256, shuffle=True, num_workers=16)
loss_function = BCEWithLogitsLoss()
optimizer_rel = optim.SGD(model.parameters(), lr=0.1)
for epoch in range(50):
for i_batch, sample_batched in enumerate(dataloader):
model.zero_grad()
out = model({'index': Variable(sample_batched['index'])})
labels = Variable(sample_batched['label'].type(torch.FloatTensor).view(sample_batched['label'].shape[0], 1))
loss = loss_function(out, labels)
loss.backward()
optimizer_rel.step()
print 'Epoch:', epoch, 'batch', i_batch, 'Tr_Loss:', loss.data[0]
return model
if __name__ == '__main__':
# please, try seed 10 (converge) and seed 1111 (fails)
torch.manual_seed(10)
train()
Without specifying the random seed, different runs have different outcome.
Why is, in those cases, the model unable to learn such a easy task?
Is there any mistake in the way I use nn.Embedding layer?
Thank you
I found the problem was the final relu layer, before the sigmoid.
As stated here, that layer will:
throw away information without adding any additional benefit
Removing the layer, the network learned as expected with any seed.
Related
I am creating a GRU to predict if data derived from traffic packets from a device is considered safe or anomalous. I plan to do this by training a model only on safe/ normal operating data and then having it check what it considers new unseen traffic to be (testing). I wish to only train on the safe data (one class) as an attack could take many forms and I don't want to train the model on labeled attack data and then have it miss an attack type that I didn't train it on (basically I want to overfit on the normal operating data). As such I need it to be able to check if the incoming unlabeled data matches the one class it has already trained on (i.e. does the incoming data match the normal operation of the device) or if it is anomalous.
The issue I am having is that as the model is being trained on only one class it is having trouble differentiating the anomalous unseen data from normal data and considers virtually all data that it sees as normal (same as the class it trained on).
As such I would appreciate it if anyone has any ideas or could point out flaws in the way I have implemented by model.
# Imports
import pandas as pd
import numpy as np
import torch
import torchvision # torch package for vision related things
import torch.nn.functional as F # Parameterless functions, like (some) activation functions
import torchvision.datasets as datasets # Standard datasets
import torchvision.transforms as transforms # Transformations we can perform on our dataset for augmentation
from torch import optim # For optimizers like SGD, Adam, etc.
from torch import nn # All neural network modules
from torch.utils.data import Dataset, DataLoader # Gives easier dataset managment by creating mini batches etc.
from tqdm import tqdm # For a nice progress bar
from sklearn.preprocessing import StandardScaler
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Hyperparameters
input_size = 24
hidden_size = 128
num_layers = 1
num_classes = 2
sequence_length = 1
learning_rate = 0.005
batch_size = 8
num_epochs = 5
# Recurrent neural network with GRU (many-to-one)
class RNN_GRU(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN_GRU, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size * sequence_length, num_classes)
def forward(self, x):
# Set initial hidden and cell states
x = x.unsqueeze(1)
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
# Forward propagate GRU
out, _ = self.gru(x, h0)
out = out[:, -1, :]
# Decode the hidden state of the last time step
out = self.fc(out)
return out
class MyDataset(Dataset):
def __init__(self,file_name):
stats_df=pd.read_csv(file_name)
x=stats_df.iloc[:,0:24].values
y=stats_df.iloc[:,24].values
self.x_train=torch.tensor(x,dtype=torch.float32)
self.y_train=torch.tensor(y,dtype=torch.float32)
def __len__(self):
return len(self.y_train)
def __getitem__(self,idx):
return self.x_train[idx],self.y_train[idx]
nomDs=MyDataset("nomStats.csv")
atkDs=MyDataset("atkStats.csv")
train_loader=DataLoader(dataset=nomDs,batch_size=batch_size)
test_loader=DataLoader(dataset=atkDs,batch_size=batch_size)
# Initialize network
model = RNN_GRU(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Train Network
for epoch in range(num_epochs):
for batch_idx, (data, targets) in enumerate(tqdm(train_loader)):
# Get data to cuda if possible
data = data.to(device=device).squeeze(1)
targets = targets.to(device=device)
targets = targets.to(dtype=torch.long)
# forward
scores = model(data)
loss = criterion(scores, targets)
# backward
optimizer.zero_grad()
loss.backward()
# gradient descent update step/adam step
optimizer.step()
# Check accuracy on training & test to see how good our model
def check_accuracy(loader, model):
num_correct = 0
num_samples = 0
# Set model to eval
model.eval()
with torch.no_grad():
for x, y in loader:
x = x.to(device=device).squeeze(1)
y = y.to(device=device)
scores = model(x)
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
# Toggle model back to train
model.train()
return num_correct / num_samples
print(f"Accuracy on training set: {check_accuracy(train_loader, model)*100:.2f}%")
print(f"Accuracy on test set: {check_accuracy(test_loader, model)*100:.2f}%")
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import transforms, datasets
#neural network class
class Net(nn.Module):
#intialize class
def __init__(self):
super().__init__()
#feedforward neural network passes data from input layer to output layer
#fully connected layer with input shape of 28*28 pixels (flatten image to one row) and output feature is size 64. Linear means flat layer
self.fc1 = nn.Linear(28*28,64)
#feed in data from fc1 to fc2
self.fc2 = nn.Linear(64,64)
self.fc3 = nn.Linear(64,64)
#output layer has input 64 and output of size 10 to represent 10 classes in MNIST
self.fc4 = nn.Linear(64,10)
#forward pass through the data
def forward(self, x):
#relu is activation function and performs operation on input data
#input and output dimenson of relu are the same
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
#softmax function gets probability distribution for each class adding up to 1 for output layer
x = F.log_softmax(self.fc4(x), dim=1)
return x
#declare a model
net = Net()
#print(net)
#passing in random data
x = torch.rand(28,28)
#resize to represent input shape (batch size, input x, input y)
x = x.view(-1,28,28)
#print(x)
#optmizer adjusts neural network based on error calculation
import torch.optim as optim
#net.parameters() means all the adjustable parts of the neural network, learning rate is amount of change (we don't want model to swerve based on one train case)
optimizer = optim.Adam(net.parameters(),lr=0.001)
#get datasets using torchvision.datasets transforms are application applied to data (transforms conversion to tensors)
train = torchvision.datasets.MNIST("", train=True, download=True,transform=transforms.Compose([transforms.ToTensor()]))
test = torchvision.datasets.MNIST("", train=False, download=True,transform=transforms.Compose([transforms.ToTensor()]))
#store in data loader, batch size is how many samples is passed through the model at once (in GPU memory), best batch size is between 8-64
#shuffling avoids feeding too much of one kind of image and leads to more generalization
trainset = torch.utils.data.DataLoader(train,batch_size=10,shuffle=True)
testset = torch.utils.data.DataLoader(test,batch_size=10,shuffle=True)
#full pass through data is epoch
EPOCHS = 3
for epoch in range(EPOCHS):
#data is a batch of data in the training set
for data in trainset:
#split into features and labels
features, labels = data
#print(features, labels)
#reset the gradient for next passes to avoid convoluting the results of multiple backpropogations
net.zero_grad()
#pass data into network (make sure input shape matches)
output = net(features.view(-1,28*28))
#compute error (output,expected)
loss = F.nll_loss(output,labels)
print("loss is the ", loss)
#backpropogate loss through trainiable parameters of model
loss.backward()
#adjust neural network
optimizer.step()
I am using Pytorch on Google colab. The error message says that the gradient isn't there.
I am unsure where the error stems from? I used this tutorial: https://www.youtube.com/watch?v=9j-_dOze4IM&list=PLQVvvaa0QuDdeMyHEYc0gxFpYwHY2Qfdh&index=4
Error message from Google colab
Error message
Here is an autoencoder I’m working on from tutorial:https://debuggercafe.com/implementing-deep-autoencoder-in-pytorch/
I’m just learning about autoencoders and I’ve modified the source encode a custom small dataset which consists of:
[0,1,0,1,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0]
It seems to work ok, but I’m unsure how to access the lower dimensional embedding values of dimension 2 (set by parameter out_features).
I've added a methods to the Autoencoder class to return the embedding , is this the recommended method of accessing the embedding's ?
Code:
import torch
import torchvision
from torch import nn
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import save_image
import warnings
import os
# import packages
import os
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms
import torch.optim as optim
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.utils import save_image
import numpy as np
# utility functions
def get_device():
if torch.cuda.is_available():
device = 'cuda:0'
else:
device = 'cpu'
return device
device = get_device()
features = torch.tensor(np.array([ [0,1,0,1,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0],[0,1,1,0,0,1,0,1,0] ])).float()
tic_tac_toe_data_loader = torch.utils.data.DataLoader(features, batch_size=1, shuffle=True)
class Encoder(nn.Module):
def __init__(self):
super(Encoder, self).__init__()
self.fc1 = nn.Linear(in_features=9, out_features=2)
def forward(self, x):
return F.sigmoid(self.fc1(x))
class Decoder(nn.Module):
def __init__(self):
super(Decoder, self).__init__()
self.fc1 = nn.Linear(in_features=2, out_features=9)
def forward(self, x):
return F.sigmoid(self.fc1(x))
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.fc1 = Encoder()
self.fc2 = Decoder()
def forward(self, x):
return self.fc2(self.fc1(x))
net = Autoencoder()
net.to(device)
NUM_EPOCHS = 50
LEARNING_RATE = 1e-3
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=LEARNING_RATE)
# image transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
outputs = None
def train(net, trainloader, NUM_EPOCHS):
train_loss = []
for epoch in range(NUM_EPOCHS):
running_loss = 0.0
for data in trainloader:
img = data
img = img.to(device)
img = img.view(img.size(0), -1)
# print('img.shape' , img.shape)
optimizer.zero_grad()
outputs = net(img)
loss = criterion(outputs, img)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss = running_loss / len(trainloader)
train_loss.append(loss)
return train_loss
# train the network
train_loss = train(net, tic_tac_toe_data_loader, NUM_EPOCHS)
I can access the lower dimensional embedding using
print(Encoder().forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
But is this using the trained weight values for the embedding ? If I call Encoder multiple times with same values:
print(Encoder().forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
print(Encoder().forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
different results are returned:
tensor([0.5083, 0.5020], grad_fn=<SigmoidBackward>)
tensor([0.4929, 0.6940], grad_fn=<SigmoidBackward>)
Why is this the case ? Is an extra training step being invoked as a result of calling Encoder ?
By calling Encoder() you are basically creating a new instance of the encoder everytime and the weights are randomly initialized each time.
Generally, you make one instance of it and train it, save the weights, and infer on it.
Also, for PyTorch, you need not call .forward(), but call the instance directly. Forward is called by it implicitly, including other hook methods if any.
enc = Encoder()
input = torch.from_numpy(np.asarray([0,1,0,1,0,1,0,1,0]).float()
print(enc(input))
print(enc(input))
Training pass happens when you pass the Encode() instance to train function. Calling Encoder() only creates a new object.
Since each object has it's own weights, and the weights are initialized randomly (see xavier and kaiming initialization), you are different outputs. Moving to a single object, you still have to explicitly train it with the train function.
Like other responder pointed out when you call Encoder() you generate new instances with randomly initialized weights. Because you are interested in lower dimensional embedding produced by your encoder you need to access the weights of the encoder in your trained net:
trained_encoder = net.fc1
Now that you have your encoder with trained weights the following lines should produces same result:
print(trained_encoder.forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
print(trained_encoder.forward( torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()))
As pointed out by others you can further simplify by passing input directly:
test_input = torch.tensor(np.array([0,1,0,1,0,1,0,1,0])).float()
print(trained_encoder(test_input))
The documentation for tf.keras.callbacks.TensorBoard states the tool can do it:
This callback logs events for TensorBoard, including:
Metrics summary plots
Training graph visualization
Activation histograms
Sampled profiling
Also later:
histogram_freq: frequency (in epochs) at which to compute activation and weight histograms for the layers of the model. If set to 0, histograms won't be computed. Validation data (or split) must be specified for histogram visualizations.
However when using this parameter I don't see any activation summary written, only the weights themselves are written. Looking at the source code I don't see anything activation-related either.
So am I missing something? Is it possible to write activation summaries without custom code in TF2?
Based on my answer to Create keras callback to save model predictions and targets for each batch during training, I use the following code:
"""Demonstrate activation histograms."""
import tensorflow as tf
from tensorflow import keras
class ActivationHistogramCallback(keras.callbacks.Callback):
"""Output activation histograms."""
def __init__(self, layers):
"""Initialize layer data."""
super().__init__()
self.layers = layers
self.batch_layer_outputs = {}
self.writer = tf.summary.create_file_writer("activations")
self.step = tf.Variable(0, dtype=tf.int64)
def set_model(self, _model):
"""Wrap layer calls to access layer activations."""
for layer in self.layers:
self.batch_layer_outputs[layer] = tf_nan(layer.output.dtype)
def outer_call(inputs, layer=layer, layer_call=layer.call):
outputs = layer_call(inputs)
self.batch_layer_outputs[layer].assign(outputs)
return outputs
layer.call = outer_call
def on_train_batch_end(self, _batch, _logs=None):
"""Write training batch histograms."""
with self.writer.as_default():
for layer, outputs in self.batch_layer_outputs.items():
if isinstance(layer, keras.layers.InputLayer):
continue
tf.summary.histogram(f"{layer.name}/output", outputs, step=self.step)
self.step.assign_add(1)
def tf_nan(dtype):
"""Create NaN variable of proper dtype and variable shape for assign()."""
return tf.Variable(float("nan"), dtype=dtype, shape=tf.TensorShape(None))
def main():
"""Run main."""
model = keras.Sequential([keras.layers.Dense(1, input_shape=(2,))])
callback = ActivationHistogramCallback(model.layers)
model.compile(loss="mse", optimizer="adam")
model.fit(
x=tf.transpose(tf.range(7.0) + [[0.2], [0.4]]),
y=tf.transpose(tf.range(7.0) + 10 + [[0.5]]),
validation_data=(
tf.transpose(tf.range(11.0) + 30 + [[0.6], [0.7]]),
tf.transpose(tf.range(11.0) + 40 + [[0.9]]),
),
shuffle=False,
batch_size=3,
epochs=2,
verbose=0,
callbacks=[callback],
)
if __name__ == "__main__":
main()
For the example training with 2 epochs and 3 batches (of unequal size due to the odd number of 7 training samples), one then sees the expected output (6 batches with 3, 3, 1, 3, 3, 1 peaks).
With 200 epochs (600 batches), one can also see training progress:
This is the model I defined it is a simple lstm with 2 fully connect layers.
import copy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class mylstm(nn.Module):
def __init__(self,input_dim, output_dim, hidden_dim,linear_dim):
super(mylstm, self).__init__()
self.hidden_dim=hidden_dim
self.lstm=nn.LSTMCell(input_dim,self.hidden_dim)
self.linear1=nn.Linear(hidden_dim,linear_dim)
self.linear2=nn.Linear(linear_dim,output_dim)
def forward(self, input):
out,_=self.lstm(input)
out=nn.Dropout(p=0.3)(out)
out=self.linear1(out)
out=nn.Dropout(p=0.3)(out)
out=self.linear2(out)
return out
x_train and x_val are float dataframe with shape (4478,30), while y_train and y_val are float df with shape (4478,10)
x_train.head()
Out[271]:
0 1 2 3 ... 26 27 28 29
0 1.6110 1.6100 1.6293 1.6370 ... 1.6870 1.6925 1.6950 1.6905
1 1.6100 1.6293 1.6370 1.6530 ... 1.6925 1.6950 1.6905 1.6960
2 1.6293 1.6370 1.6530 1.6537 ... 1.6950 1.6905 1.6960 1.6930
3 1.6370 1.6530 1.6537 1.6620 ... 1.6905 1.6960 1.6930 1.6955
4 1.6530 1.6537 1.6620 1.6568 ... 1.6960 1.6930 1.6955 1.7040
[5 rows x 30 columns]
x_train.shape
Out[272]: (4478, 30)
Define the varible and do one time bp, I can find out the vaildation loss is 1.4941
model=mylstm(30,10,200,100).double()
from torch import optim
optimizer=optim.RMSprop(model.parameters(), lr=0.001, alpha=0.9)
criterion=nn.L1Loss()
input_=torch.autograd.Variable(torch.from_numpy(np.array(x_train)))
target=torch.autograd.Variable(torch.from_numpy(np.array(y_train)))
input2_=torch.autograd.Variable(torch.from_numpy(np.array(x_val)))
target2=torch.autograd.Variable(torch.from_numpy(np.array(y_val)))
optimizer.zero_grad()
output=model(input_)
loss=criterion(output,target)
loss.backward()
optimizer.step()
moniter=criterion(model(input2_),target2)
moniter
Out[274]: tensor(1.4941, dtype=torch.float64, grad_fn=<L1LossBackward>)
But I called forward function again I get a different number due to randomness of dropout
moniter=criterion(model(input2_),target2)
moniter
Out[275]: tensor(1.4943, dtype=torch.float64, grad_fn=<L1LossBackward>)
what should I do that I can eliminate all the dropout in predicting phrase?
I tried eval():
moniter=criterion(model.eval()(input2_),target2)
moniter
Out[282]: tensor(1.4942, dtype=torch.float64, grad_fn=<L1LossBackward>)
moniter=criterion(model.eval()(input2_),target2)
moniter
Out[283]: tensor(1.4945, dtype=torch.float64, grad_fn=<L1LossBackward>)
And pass an addtional parameter p to control dropout:
import copy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class mylstm(nn.Module):
def __init__(self,input_dim, output_dim, hidden_dim,linear_dim,p):
super(mylstm, self).__init__()
self.hidden_dim=hidden_dim
self.lstm=nn.LSTMCell(input_dim,self.hidden_dim)
self.linear1=nn.Linear(hidden_dim,linear_dim)
self.linear2=nn.Linear(linear_dim,output_dim)
def forward(self, input,p):
out,_=self.lstm(input)
out=nn.Dropout(p=p)(out)
out=self.linear1(out)
out=nn.Dropout(p=p)(out)
out=self.linear2(out)
return out
model=mylstm(30,10,200,100,0.3).double()
output=model(input_)
loss=criterion(output,target)
loss.backward()
optimizer.step()
moniter=criterion(model(input2_,0),target2)
Traceback (most recent call last):
File "<ipython-input-286-e49b6fac918b>", line 1, in <module>
output=model(input_)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'p'
But neither of them worked.
You have to define your nn.Dropout layer in your __init__ and assign it to your model to be responsive for calling eval().
So changing your model like this should work for you:
class mylstm(nn.Module):
def __init__(self,input_dim, output_dim, hidden_dim,linear_dim,p):
super(mylstm, self).__init__()
self.hidden_dim=hidden_dim
self.lstm=nn.LSTMCell(input_dim,self.hidden_dim)
self.linear1=nn.Linear(hidden_dim,linear_dim)
self.linear2=nn.Linear(linear_dim,output_dim)
# define dropout layer in __init__
self.drop_layer = nn.Dropout(p=p)
def forward(self, input):
out,_= self.lstm(input)
# apply model dropout, responsive to eval()
out= self.drop_layer(out)
out= self.linear1(out)
# apply model dropout, responsive to eval()
out= self.drop_layer(out)
out= self.linear2(out)
return out
If you change it like this dropout will be inactive as soon as you call eval().
NOTE: If you want to continue training afterwards you need to call train() on your model to leave evaluation mode.
You can also find a small working example for dropout with eval() for evaluation mode here:
nn.Dropout vs. F.dropout pyTorch
I add this answer just because I'm facing now the same issue while trying to reproduce Deep Bayesian active learning through dropout disagreement.
If you need to keep dropout active (for example to bootstrap a set of different predictions for the same test instances) you just need to leave the model in training mode, there is no need to define your own dropout layer.
Since in pytorch you need to define your own prediction function, you can just add a parameter to it like this:
def predict_class(model, test_instance, active_dropout=False):
if active_dropout:
model.train()
else:
model.eval()
As the other answers said, the dropout layer is desired to be defined in your model's __init__ method, so that your model can keep track of all information of each pre-defined layer. When the model's state is changed, it would notify all layers and do some relevant work. For instance, while calling model.eval() your model would deactivate the dropout layers but directly pass all activations. In general, if you wanna deactivate your dropout layers, you'd better define the dropout layers in __init__ method using nn.Dropout module.