Fully Connected Neural Network not predicting correctly - python

I am a newbie in ML and DL but i decide to try out something but i found out that my network is not predicting correctly.
I have a fully connected neural network with just one dense (linear) layer and i used SGD as the optimizer and it predicted 9.9 instead of 10 but when i use Adam it predicted 10. expected result is 10, i'm confused can someone explain to me why is this so?
!pip install -Uqq tqdm
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm as tqdm
My training data as a sample
X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)
My model or network for forward pass and neural net
class SimpleNeuralNetwork(nn.Module) :
def __init__(self, num_input, num_output):
super(SimpleNeuralNetwork, self).__init__()
self.fc = nn.Linear(num_input, num_output)
def forward(self, x):
x = self.fc(x)
return x
In features and batch
in_samples, in_features = X.shape
Defining and Initialising my loss function
criterion = nn.MSELoss()
Parameters for the training process
learning_rate = 0.01
ePoch = 1000
Initialising my model
sNN = SimpleNeuralNetwork(in_features, in_features)
Initialising my optimizer
optimiser = optim.SGD(sNN.parameters(), lr=learning_rate)
Training my Network
for i in tqdm(list(range(ePoch))):
# prediction - forward pass in the model
y_pred = sNN(X)
# loss - check how well or how far our model did with the prediction
loss = criterion(Y, y_pred)
# gradient - do a backward propagation (backward pass)
loss.backward()
# update weight - readjust the weight using our learning rate as a proximity
optimiser.step()
# zero gradient - reinitialize our memory to zero so that the neural network will not cram
optimiser.zero_grad()
# if i % 10 == 0:
# [w, b] = sNN.parameters()
# print(f'epoch: {i + 1}, weight: {w[0][0].item()}, bias: {b[0].item()}, pred: {y_pred}')
Actual prediction
predict = sNN(torch.tensor([5], dtype=torch.float32))
print(f'prediction for 5: {predict[0].item()}')

Related

Inplace operation error in control problem

I'm new to pytorch and I'm having a problem with some code to train a a neural network to solve a control problem. I use the following code to solve a toy version of my problem:
# SOME IMPORTS
import torch
import torch.autograd as autograd
from torch import Tensor
import torch.nn as nn
import torch.optim as optim
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# PARAMETERS OF THE PROBLEM
layers = [4, 32, 32, 4] # Layers of the NN
steps = 10000 # Simulation steps
train_step = 1 # I train the NN for 1 epoch every train_step steps
lr = 1e-3 # Learning rate
After this I define a very simple network:
# DEFINITION OF THE NETWORK (A SIMPLE FEED FORWARD)
class FCN(nn.Module):
def __init__(self,layers):
super(FCN, self).__init__() #call __init__ from parent class
self.linears = []
for i in range(len(layers)-2):
self.linears.append(
nn.Linear(layers[i], layers[i+1])
)
self.linears.append(
nn.ReLU()
)
self.linears.append(
nn.Linear(layers[-2], layers[-1])
)
self.linear_stack = nn.Sequential(*self.linears)
'forward pass'
def forward(self,x):
out = self.linear_stack(x)
return out
I then use the defined class to create my model:
model = FCN(layers)
model.to(device)
params = list(model.parameters())
optimizer = torch.optim.Adam(model.parameters(),lr=lr,amsgrad=False)
Then I define the loss function and the simulation function, i.e. the function that updates the state of my problem.
def simulate(state_old, model):
state_new = model(state_old)
return state_new
def lossNN(state_old,state_new, model):
error = torch.sum( (state_old-state_new)**2 )
return error
And finally I train my model:
torch.autograd.set_detect_anomaly(True)
state_old = torch.Tensor([0.01, 0.01, 0.5, 0.1]).to(device)
for i in range(steps):
state_new = simulate(state_old, model)
if i%train_step == 0:
optimizer.zero_grad()
loss = lossNN(state_old, state_new, model)
loss.backward(retain_graph=True)
optimizer.step()
state_old = state_new
if (i%1000)==0:
print(loss)
print(state_new)
I then get the following error. Here you can find the backtrace :
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 4]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
You need to use detach to remove the gradient created in the previous state.
state_old = state_new
state_old = state_new.detach()
Then your training code changes to:
torch.autograd.set_detect_anomaly(True)
state_old = torch.Tensor([0.01, 0.01, 0.5, 0.1]).to(device)
for i in range(steps):
state_new = simulate(state_old, model)
if i%train_step == 0:
optimizer.zero_grad()
loss = lossNN(state_old, state_new, model)
loss.backward(retain_graph=True)
optimizer.step()
state_old = state_new.detach()
if (i%1000)==0:
print(loss)
print(state_new)

How do I retrieve the weights estimated from a neural net using skorch?

I've trained a simple neural net using skorch to make it sklearn compatible and I would like to know how to retrieve the actual estimated weights.
Here's a replicable example of what I need.
The neural net presented here uses 10 features, has one hidden layer of 2 nodes, uses ReLu activation functions and linearly combines the output of the 2 nodes.
import torch
import numpy as np
from torch.autograd import Variable
# Create example data
np.random.seed(2022)
train_size = 1000
n_features= 10
X_train = np.random.rand(n_features, train_size).astype("float32")
l2_params_1 = np.random.rand(1,n_features).astype("float32")
l2_params_2 = np.random.rand(1,n_features).astype("float32")
l1_X = np.matmul(l2_params_1, X_train)
l2_X = np.matmul(l2_params_2, X_train)
y_train = l1_X + l2_X
# Defining my NN
class NNModule(torch.nn.Module):
def __init__(self, in_features):
super(NNModule, self).__init__()
self.l1 = torch.nn.Linear(in_features, 2)
self.a1 = torch.nn.ReLU()
self.l2 = torch.nn.Linear(2, 1)
def forward(self, x):
x = self.l1(x)
x = self.a1(x)
return self.l2(x)
# Initialize the NN
torch.manual_seed(200)
model = NNModule(in_features = 10)
model.l1.weight.data.uniform_(0.0, 1.0)
model.l1.bias.data.uniform_(0.0, 1.0)
# Define criterion and optimizer
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# Train the NN
torch.manual_seed(200)
for epoch in range(100):
inputs = Variable(torch.from_numpy(np.transpose(X_train)))
labels = Variable(torch.from_numpy(np.transpose(y_train)))
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
The parameters at which I'm arriving are the following:
list(model.parameters())
[Output]:
[Parameter containing:
tensor([[0.8997, 0.8345, 0.8284, 0.6950, 0.5949, 0.1217, 0.9067, 0.1824, 0.8272,
0.2372],
[0.7525, 0.6577, 0.4358, 0.6109, 0.8817, 0.5429, 0.5263, 0.7531, 0.1552,
0.7066]], requires_grad=True),
Parameter containing:
tensor([0.6617, 0.1079], requires_grad=True),
Parameter containing:
tensor([[0.9225, 0.8339]], requires_grad=True),
Parameter containing:
tensor([0.0786], requires_grad=True)]
Now, to wrap my NNModule with skorch, I'm using this:
from skorch import NeuralNetRegressor
torch.manual_seed(200)
net = NeuralNetRegressor(
module=NNModule(in_features=10),
criterion=torch.nn.MSELoss,
optimizer=torch.optim.SGD,
optimizer__lr=0.01,
max_epochs=100,
verbose=0
)
net.fit(np.transpose(X_train), np.transpose(y_train))
And I'd like to retrieve the weights obtained in the training. I've used dir(net) to see if the weights are stored in any attributes to no avail.
To retrieve the weights one needs to output them like this:
list(net.module.parameters())

Pytorch GRU Trained on one class to Predict Unlabelled Data

I am creating a GRU to predict if data derived from traffic packets from a device is considered safe or anomalous. I plan to do this by training a model only on safe/ normal operating data and then having it check what it considers new unseen traffic to be (testing). I wish to only train on the safe data (one class) as an attack could take many forms and I don't want to train the model on labeled attack data and then have it miss an attack type that I didn't train it on (basically I want to overfit on the normal operating data). As such I need it to be able to check if the incoming unlabeled data matches the one class it has already trained on (i.e. does the incoming data match the normal operation of the device) or if it is anomalous.
The issue I am having is that as the model is being trained on only one class it is having trouble differentiating the anomalous unseen data from normal data and considers virtually all data that it sees as normal (same as the class it trained on).
As such I would appreciate it if anyone has any ideas or could point out flaws in the way I have implemented by model.
# Imports
import pandas as pd
import numpy as np
import torch
import torchvision # torch package for vision related things
import torch.nn.functional as F # Parameterless functions, like (some) activation functions
import torchvision.datasets as datasets # Standard datasets
import torchvision.transforms as transforms # Transformations we can perform on our dataset for augmentation
from torch import optim # For optimizers like SGD, Adam, etc.
from torch import nn # All neural network modules
from torch.utils.data import Dataset, DataLoader # Gives easier dataset managment by creating mini batches etc.
from tqdm import tqdm # For a nice progress bar
from sklearn.preprocessing import StandardScaler
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Hyperparameters
input_size = 24
hidden_size = 128
num_layers = 1
num_classes = 2
sequence_length = 1
learning_rate = 0.005
batch_size = 8
num_epochs = 5
# Recurrent neural network with GRU (many-to-one)
class RNN_GRU(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN_GRU, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size * sequence_length, num_classes)
def forward(self, x):
# Set initial hidden and cell states
x = x.unsqueeze(1)
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
# Forward propagate GRU
out, _ = self.gru(x, h0)
out = out[:, -1, :]
# Decode the hidden state of the last time step
out = self.fc(out)
return out
class MyDataset(Dataset):
def __init__(self,file_name):
stats_df=pd.read_csv(file_name)
x=stats_df.iloc[:,0:24].values
y=stats_df.iloc[:,24].values
self.x_train=torch.tensor(x,dtype=torch.float32)
self.y_train=torch.tensor(y,dtype=torch.float32)
def __len__(self):
return len(self.y_train)
def __getitem__(self,idx):
return self.x_train[idx],self.y_train[idx]
nomDs=MyDataset("nomStats.csv")
atkDs=MyDataset("atkStats.csv")
train_loader=DataLoader(dataset=nomDs,batch_size=batch_size)
test_loader=DataLoader(dataset=atkDs,batch_size=batch_size)
# Initialize network
model = RNN_GRU(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Train Network
for epoch in range(num_epochs):
for batch_idx, (data, targets) in enumerate(tqdm(train_loader)):
# Get data to cuda if possible
data = data.to(device=device).squeeze(1)
targets = targets.to(device=device)
targets = targets.to(dtype=torch.long)
# forward
scores = model(data)
loss = criterion(scores, targets)
# backward
optimizer.zero_grad()
loss.backward()
# gradient descent update step/adam step
optimizer.step()
# Check accuracy on training & test to see how good our model
def check_accuracy(loader, model):
num_correct = 0
num_samples = 0
# Set model to eval
model.eval()
with torch.no_grad():
for x, y in loader:
x = x.to(device=device).squeeze(1)
y = y.to(device=device)
scores = model(x)
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
# Toggle model back to train
model.train()
return num_correct / num_samples
print(f"Accuracy on training set: {check_accuracy(train_loader, model)*100:.2f}%")
print(f"Accuracy on test set: {check_accuracy(test_loader, model)*100:.2f}%")

why Gradient Descent doesn't work as expected with pytorch

so I'm starting with Pytorch and tried to start with an easy Linear Regression Example. Actually I made an easy Implementation of Linear Regression with Pytorch to calculate the equation 2*x+1 but the loss stay stuck at 120 and there is a Problem with Gradient Descent because it doesn't converge to a small loss value. I don't know why this is happening and it made me crazy because I don't see what's wrong. actually this example should be very easy to solve. this is the Code I'm using
import torch
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import numpy as np
X = np.array([i for i in np.arange(1, 20)]).reshape(-1, 1)
X = torch.tensor(X, dtype=torch.float32, requires_grad=True)
y = np.array([2*i+1 for i in np.arange(1, 20)]).reshape(-1, 1)
y = torch.tensor(y, dtype=torch.float32, requires_grad=True)
print(X.shape, y.shape)
class LR(torch.nn.Module):
def __init__(self, n_features, n_hidden1, n_out):
super(LR, self).__init__()
self.linear = torch.nn.Linear(n_features, n_hidden1)
self.predict = torch.nn.Linear(n_hidden1, n_out)
def forward(self, x):
x = F.relu(self.linear(x))
x = self.predict(x)
return x
model = LR(1, 10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = torch.nn.MSELoss()
def train(epochs=100):
for e in range(epochs):
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"epoch: {e} and loss= {loss}")
desired output is a small loss value and that the model train to give a good prediction later.
Your learning rate is too large. The model takes a few steps in the right direction, but it can't land on an actually good minimizer and henceforth zigzags around it. If you try lr=0.001 instead, your performance will be much better. This is why it's often useful to decay your learning rate over time when using first order optimizers.

Learning parameters of MultivariateNormalDiag in tensorflow

I have been trying to code up a Variational Autoencoder (VAE) in tensorflow. I was able to implement the version with has a Gaussian encoder network and a Bernoulli decoder as in the paper Auto-Encoding Variational Bayes.
However, I would like to work with real valued data and I have not been able to get a VAE with a Gaussian decoder to work. I have narrowed this down to the problem with my network: my network does not seem to learn the parameters of the diagonal multivariate Gaussian. Here is the code for very simple test case. Where my input data is just drawn from a normal(0,1). The network needs to learn is the mean and variance of my data. I would expect the mean to converge to 0 and variance to converge to 1. But it does not:
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
input_dim = 1
hidden_dim = 10
learning_rate = 0.001
num_batches = 1000
# Network
x = tf.placeholder(tf.float32, (None, input_dim))
with tf.variable_scope('Decoder'):
h1 = tf.layers.dense(x, hidden_dim, activation=tf.nn.softplus, name='h1')
mu = tf.layers.dense(h1, input_dim, activation=tf.nn.softplus, name='mu')
diag_stdev = tf.layers.dense(h1, input_dim, activation=tf.nn.softplus, name='diag_stdev')
# Loss: -log(p(x))
with tf.variable_scope('Loss'):
dist = tf.contrib.distributions.MultivariateNormalDiag(loc=mu, scale_diag=diag_stdev)
loss = - tf.reduce_mean(tf.log(1e-10 + dist.prob(x)))
# Optimizer
train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
summary_writer = tf.summary.FileWriter('./log_dir', tf.get_default_graph())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
mu_plot = np.zeros(num_batches,)
for i in range(num_batches): # degenerate case batch_size of 1
input_ = np.random.multivariate_normal(mean=[0], cov=np.diag([1]), size=(1))
loss_ , mu_ , diag_stdev_ , _ = sess.run([loss, mu, diag_stdev, train_step],feed_dict={x: input_})
print("-p(x): {}, mu: {}, diag_stdev: {}".format(loss_, mu_,diag_stdev_))

Categories

Resources