'None' gradients in pytorch - python

I am trying to implement a simple MDN that predicts the parameters of a distribution over a target variable instead of a point value, and then assigns probabilities to discrete bins of the point value. Narrowing down the issue, the code from which the 'None' springs is:
import torch
# params
tte_bins = np.linspace(
start=0,
stop=399,
num=400,
dtype='float32'
).reshape(1, 1, -1)
bins = torch.tensor(tte_bins, dtype=torch.float32)
x_train = np.random.randn(1, 1024, 3)
y_labels = np.random.randint(low=0, high=399, size=(1, 1024))
y_train = np.eye(400)[y_labels]
# data
in_train = torch.tensor(x_train[0:1, :, :], dtype=torch.float)
in_train = (in_train - torch.mean(in_train)) / torch.std(in_train)
out_train = torch.tensor(y_train[0:1, :, :], dtype=torch.float)
# model
linear = torch.nn.Linear(in_features=3, out_features=2)
lin = linear(in_train)
preds = torch.exp(lin)
# intermediate values
alpha = torch.clamp(preds[0:1, :, 0:1], 0, 500)
beta = torch.clamp(preds[0:1, :, 1:2], 0, 100)
# probs
p1 = torch.exp(-torch.pow(bins / alpha, beta))
p2 = torch.exp(-torch.pow((bins + 1.0) / alpha, beta))
probs = p1 - p2
# loss
loss = torch.mean(torch.pow(out_train - probs, 2))
# gradients
loss.backward()
for p in linear.parameters():
print(p.grad, 'gradient')
in_train has shape: [1, 1024, 3], out_train has shape: [1, 1024, 400], bins has shape: [1, 1, 400]. All the broadcasting etc.. appears find, the resulting matrices (like alpha/beta/loss) are the right shape and have the right values - there's simply no gradients
edit: added loss.backward() and x_train/y_train, now I have nans

You simply forgot to compute the gradients. While you calculate the loss, you never tell pytorch with respect to which function it should calculate the gradients.
Simply adding
loss.backward()
to your code should fix the problem.
Additionally, in your code some intermediate results like alpha are sometimes zero but are in a denominator when computing the gradient. This will lead to the nan results you observed.

Related

GPflow 2: VGP model with MOK and multiple-input throws ValueError

I'm following the multi-output kernel notebook of GPflow 2.5.2 documentation. I try to replace the SVGP model by either VGP or GPR model because I have only little data and do not need the sparse aspect.
I'm using the SharedIndependent multi-output kernel.
For both of the models I get ValueErrors that the dimensions in matrix multiplication are incorrect. I guess I need to change the format of the input data but I don't know how so I just use the same format as for the SVGP model.
Error message for VGP model:
ValueError: Dimensions must be equal, but are 2 and 100 for '{{node MatMul}} = BatchMatMulV2[T=DT_DOUBLE, adj_x=false, adj_y=false](Cholesky, MatMul/identity_CONSTRUCTED_AT_top_level/forward/ReadVariableOp)' with input shapes: [100,2,2], [100,2].
Error message for GPR model:
ValueError: Dimensions must be equal, but are 2 and 100 for '{{node triangular_solve/MatrixTriangularSolve}} = MatrixTriangularSolve[T=DT_DOUBLE, adjoint=false, lower=true](Cholesky, sub)' with input shapes: [100,2,2], [100,2].
I already tried setting the q_mu and q_sqrt values as suggested here after initializing the VGP model like this (didn't work):
m.q_mu = np.zeros((len(x_train)*len(y_train.T), 1), dtype=gpflow.config.default_float())
m.q_sqrt = np.expand_dims(np.identity(len(x_train)*len(y_train.T), dtype=gpflow.config.default_float()), axis=0)
The code looks as follows:
import gpflow as gpf
def generate_data(N=100):
X1 = np.random.rand(N, 1)
Y1 = np.sin(6 * X1) + np.random.randn(*X1.shape) * 0.03 + 2
Y2 = np.sin(5 * X1 + 0.7) + np.random.randn(*X1.shape) * 0.1 + 0.5
return X1, np.concatenate((Y1, Y2), axis=1)
N=100
M=15
P=2
data = (X, Y) = generate_data(N)
# create multi-output kernel
kernel = gpf.kernels.SharedIndependent(
gpf.kernels.Matern52(active_dims=list(range(X.shape[1]))), output_dim=P
)
# initialization of inducing input locations (M random points from the training inputs)
Zinit = np.linspace(0, 1, M)[:, None]
Z = Zinit.copy()
# create multi-output inducing variables from Z
iv = gpf.inducing_variables.SharedIndependentInducingVariables(
gpf.inducing_variables.InducingPoints(Z)
)
m = gpf.models.SVGP(kernel, gpf.likelihoods.Gaussian(), inducing_variable=iv, num_latent_gps=P)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
m.training_loss_closure(data),
variables=m.trainable_variables,
method="l-bfgs-b",
options={"iprint": 0, "maxiter": ci_niter(2000)},
)
# implementation of VGP
m = gpf.models.VGP(data, kernel, gpf.likelihoods.Gaussian(), num_latent_gps=P)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
m.training_loss,
variables=m.trainable_variables,
method="l-bfgs-b",
options={"iprint": 0, "maxiter": ci_niter(2000)},
)
## implementation og gpflow.models.GPR
m = gpf.models.GPR(data, kernel)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
m.training_loss,
variables=m.trainable_variables,
method="l-bfgs-b",
options={"iprint": 0, "maxiter": ci_niter(2000)},
)
Unfortunately, it is currently only the SVGP that supports the multi-output kernels.
Supporting it in more models is a common request, but it is a surprisingly large amount of work, so it has never gotten done.
The good news is that the simpler models do support just broadcasting a single kernel over multiple dimensions, which is exactly the same as the SharedIndependent MOK. Just drop in a single-output kernel and it will broadcast. For example:
import gpflow as gpf
import numpy as np
D = 1
P = 2
def generate_data(N=100):
X1 = np.random.rand(N, D)
Y1 = np.sin(6 * X1) + np.random.randn(*X1.shape) * 0.03 + 2
Y2 = np.sin(5 * X1 + 0.7) + np.random.randn(*X1.shape) * 0.1 + 0.5
return X1, np.concatenate((Y1, Y2), axis=1)
N = 100
M = 15
data = generate_data(N)
train_data = (data[0][:70], data[1][:70])
test_data = (data[0][70:], data[1][70:])
# create multi-output kernel
kernel = gpf.kernels.SharedIndependent(gpf.kernels.Matern52(), output_dim=P)
# initialization of inducing input locations (M random points from the training inputs)
Zinit = np.linspace(0, 1, M)[:, None]
Z = Zinit.copy()
# create multi-output inducing variables from Z
iv = gpf.inducing_variables.SharedIndependentInducingVariables(
gpf.inducing_variables.InducingPoints(Z)
)
m = gpf.models.SVGP(
kernel, gpf.likelihoods.Gaussian(), inducing_variable=iv, num_latent_gps=P
)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
m.training_loss_closure(train_data),
variables=m.trainable_variables,
method="l-bfgs-b",
options={"iprint": 0, "maxiter": 2000},
)
print("svgp", np.mean(m.predict_log_density(test_data)))
# implementation of VGP
m = gpf.models.VGP(
train_data, gpf.kernels.Matern52(), gpf.likelihoods.Gaussian(), num_latent_gps=P
)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
m.training_loss,
variables=m.trainable_variables,
method="l-bfgs-b",
options={"iprint": 0, "maxiter": 2000},
)
print("vgp", np.mean(m.predict_log_density(test_data)))
## implementation og gpflow.models.GPR
m = gpf.models.GPR(
train_data,
gpf.kernels.Matern52(),
)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
m.training_loss,
variables=m.trainable_variables,
method="l-bfgs-b",
options={"iprint": 0, "maxiter": 2000},
)
print("gpr", np.mean(m.predict_log_density(test_data)))

How can I fix weight issue in neural network

I'm trying to train a neural network for a HW (can't use classes). I have pretty much everything figured out, but when it comes to updating the weights, I fixed the shape error I had, but the resulting weights are [nan nan nan, nan nan...], which shouldn't happen. I'm not asking anyone to do my HW for me, I'm just not sure where in the code I need to fix.
# Creating the activation function: Sigmoid function ==> h(x) = 1/1+e^(-x)
def sig(x):
return 1/(1+np.exp(-x))
# Derivative of act. function:
def deriv(x):
return x*(1-x)
# Establishing inputs and outputs
x = np.array([[1, 1, 1], [1, 0, 0], [1, 1, 0], [0, 1, 1], [1, 0, 1]])
y = np.array([[1], [0], [0], [1], [1]])
alpha = 0.05
#print(x.shape)
np.random.seed(1)
# Establishing weights randomly with a mean of 0 for a weight matrix
w = np.random.uniform(low=-0.4, high=0.4, size=(3,1))
print('Random weight:',w)
# Going to iterate 1000 times
for _ in range(1000):
# Feed Forward:
# Input
z = np.dot(x, w)
h = sig(z)
# Output
yhat = sig(z)
# Backpropagation:
# Calculating the errors:
J = ((1/2)*(np.power((yhat - y), 2)))
deltay = yhat - y
grad = deriv(z)
dz = deltay*grad
const = np.dot(dz, w.T)
dJdw = np.dot(x.T, const*deriv(z))
#print(dJdw.shape)
#print(w.shape)
# Update the weights (shape is messing this and 3 up for some reason....)
w = w - alpha*dJdw
print('Weights after training: ',w)
print('Outputs after training: ',y_hat)
print('Error obtained: ',delta_y)

How to normalize images in PyTorch

transform = transforms.Compose([
transforms.ToTensor()
])
trainset = torchvision.datasets.ImageFolder(root='C:/Users/beomseokpark/Desktop/CNN/train_data', transform = transform)
data_loader = DataLoader(dataset = trainset, batch_size = 8, shuffle = True, num_workers=2)
with torch.no_grad():
for num, data in enumerate(trainset):
imgs, label = data
I loaded images with ImageFolder in torchvision library, and how can I get mean and std from each channel of my images?
Can anyone please help me out?
There's the "lazy man" approach: You can simply plug a nn.BatchNorm2d as the very first layer of your network. With the appropriate momentum, and track_running_stats=True this layer will estimate your data's mean and variance for you.
Alternatively, you can compute the mean and variance using
mu = torch.zeros((3,), dtype=torch.float)
sig = torch.zeros((3,), dtype=torch.float)
n = 0
with torch.no_grad():
for num, data in enumerate(trainset):
imgs, _ = data
mu += torch.sum(imgs, dim=(0, 2, 3))
sig += torch.sum(imgs**2, dim=(0, 2, 3))
n += imgs.numel() // imgs.shape[0]
n = float(n)
mu = mu / n # mean
sig = sig / n - (mu ** 2)
import torch as t
batch_size = 8
imgs = t.empty(batch_size, 3, 128, 128).normal_()
t.nn.Flatten(start_dim=1)(imgs.permute(1, 0, 2, 3)).mean(dim=1)
t.nn.Flatten(start_dim=1)(imgs.permute(1, 0, 2, 3)).std(dim=1).shape
torch.Size([3])

classifying integer data by tensorflow

I want to classify
if input data is under 200 than output is (0, 1)
and if input data is over 200 than output is (1, 0)
input value is sequential integer value and layer is 5.
hidden layer use sigmoid and last hidden layer use softmax function
loss function is reduce_mean and training with gradient descendent
import numpy as np
import tensorflow as tf
def set_x_data():
x_data = np.array([[50]
, [60]
, [70]
, [80]
, [90]
, [110]
, [120]
, [130]
, [140]
, [150]
, [160]
, [170]
, [180]
, [190]
, [200]
, [210]
, [220]
, [230]
, [240]
, [250]
, [260]
, [270]
, [280]
, [290]
, [300]
, [310]
, [320]
, [330]
, [340]
, [350]
, [360]
, [370]
, [380]
, [390]])
return x_data
def set_y_data(x):
y_data = np.array([[0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]])
return y_data
def set_bias(efficiency):
arr = np.array([efficiency])
return arr
W1 = tf.Variable(tf.random_normal([1, 5]), name='weight1')
W2 = tf.Variable(tf.random_normal([5, 5]), name='weight2')
W3 = tf.Variable(tf.random_normal([5, 5]), name='weight3')
W4 = tf.Variable(tf.random_normal([5, 5]), name='weight4')
W5 = tf.Variable(tf.random_normal([5, 2]), name='weight5')
def inference(input, b):
hidden_layer1 = tf.sigmoid(tf.matmul(input, W1) + b)
hidden_layer2 = tf.sigmoid(tf.matmul(hidden_layer1, W2) + b)
hidden_layer3 = tf.sigmoid(tf.matmul(hidden_layer2, W3) + b)
hidden_layer4 = tf.sigmoid(tf.matmul(hidden_layer3, W4) + b)
out_layer = tf.nn.softmax(tf.matmul(hidden_layer4, W5) + b)
return out_layer
def loss(hypothesis, y):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(hypothesis), reduction_indices=[1]))
return cross_entropy
def train(loss):
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train = optimizer.minimize(loss)
return train
x_data = set_x_data(1)
y_data = set_y_data(0)
b_data = set_bias(0.8)
x= tf.placeholder(tf.float32, shape=[None, 1])
y= tf.placeholder(tf.float32, shape=[None, 2])
b = tf.placeholder(tf.float32, shape=[None])
hypothesis = inference(x, b)
loss = loss(hypothesis, y)
train = train(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(W1))
for step in range(2000):
sess.run(train, feed_dict={x:x_data, y:y_data, b:b_data})
print(sess.run(W1))
print(sess.run(hypothesis, feed_dict={x:np.array([[1000]]), b:b_data}))
when I print W1 before training and after training, value doesn't change specially and testing when input = 1000, that value doesn't currect what I expect. I think value nearly close to (1, 0), but result is almost (0.5, 0.5)
I guess that mistakes come from loss function because it was copied from here and there, but I can't be sure about it
upper code is just simplified of my code but I think I have to show my real code
the code is too long so I create new post
classifying data by tensorflow but accuracy value didn't change
There are a few issues in the training of the above network, but with a few changes you can achieve a network that gets this decision function
(The plot in the link shows the score of class 2, i.e. if x > 200)
The list of issues subject to improvement in this network:
The training data is very scarce (only 34 points!) This is typically too small, especially for a 5-layer network as in your case. You typically want many more input samples than parameters in the network. Try adding more input values and reducing the number of layers (as in the code below - I've used floats instead of integers to get more points, but I think it is still compatible).
The input ranges typically require scaling (below I've tried a super-simple scaling by dividing by a constant). This is because you typically want to avoid high ranges of variables (especially of you pass many layers with a soft-max non-linearity, this would destroy the information contained in the very high or very low values). In more advanced cases you might want to do Min-Max Scaling or z-scores.
Try more epochs (and try plotting the evolution of the loss function value). With the given number of epochs, the optimization of the loss function had not converged. Below I do 10x more epochs. See how the code below now almost converges in this plot (and see how 2000 epochs were not enough):
Something that helped was shuffling the (x,y) data. Though this is not crucial in this case, it converges faster (see the paper "Efficient Backprop" by Le Cun). And in more serious examples it is typically needed.
Importantly, I think you want b to be a parameter, not a constant, don't you? The bias of a network is typically also optimized together with the multiplicative weights. (Also, it is not common to use a single, shared bias for all the hidden layers. )
Below is the code. Note there might be further improvements but these few tricks end up with the desired decision function.
I've added some inline comments to indicate changes with respect to the original. I hope you find these pieces of advice insightful!
The code:
import numpy as np
import tensorflow as tf
# I've modified the functions set_x_data and set_y_data
# so as to generate a larger set of numbers.
# Generate a range of numbers from 50 to 390
def set_x_data():
x_data = np.arange(50, 390, 0.1)
return x_data[:,None]
# Assign labels depending on x_data
def set_y_data(x_data):
ydata1 = x_data >= 200
ydata2 = x_data < 200
return np.hstack((ydata1, ydata2))
def set_bias(efficiency):
arr = np.array([efficiency])
return arr
# Let's keep W1 and W5 (one hidden layer only)
# BTW, in this problem you could do with 0 hidden layers. But keeping
# 1 to show it works
W1 = tf.Variable(tf.random_normal([1, 5]), name='weight1')
W5 = tf.Variable(tf.random_normal([5, 2]), name='weight5')
# BTW, b should be a parameter, too.
b = tf.Variable(tf.constant(0.0))
# Just keeping 1 hidden layer
def inference(input):
hidden_layer1 = tf.sigmoid(tf.matmul(input, W1) + b)
out_layer = tf.nn.softmax(tf.matmul(hidden_layer1, W5) + b)
return out_layer
# This is unchanged
def loss(hypothesis, y):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(hypothesis), reduction_indices=[1]))
return cross_entropy
# This is unchanged
def train(loss):
optimizer =
tf.train.GradientDescentOptimizer(learning_rate=0.1)
train = optimizer.minimize(loss)
return train
# Using SCALE to normalize the input variables (range of inputs too big)
# This is a simple normalization in this case. Other examples are
# Min-Max normalization or z-scores.
SCALE = 1000
x_data = set_x_data()
y_data = set_y_data(x_data)
x_data /= SCALE
# Now only placeholders are x and y (b is a parameter)
x= tf.placeholder(tf.float32, shape=[None, 1])
y= tf.placeholder(tf.float32, shape=[None, 2])
hypothesis = inference(x)
loss = loss(hypothesis, y)
train = train(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(W1))
# Epochs x 10, it did not converge with fewer epochs
epochs = 20000
losses = np.zeros(epochs)
for step in range(epochs):
# Shuffle data
r = np.random.permutation(x_data.shape[0])
x_data = x_data[r]
y_data = y_data[r,:]
# Small modification here to capture the loss.
_, l = sess.run([train, loss], feed_dict={x:x_data, y:y_data})
losses[step] = l
print(sess.run(W1))
print(sess.run(b))
The code to display the decision function above:
%matplotlib inline
import matplotlib.pyplot as plt
ystar = np.arange(50, 400, 10)[:,None]
plt.plot(ystar, sess.run(hypothesis, feed_dict={x:ystar/SCALE})[:,0])

How to convert a softmax output to one-hot format in customized Keras loss

In a 2D semantic segmentation task. I want to calculate an average dice coefficient for each category in a customized Keras loss function.
So I think the first step is calculate dice coefficients for each category, then average coefficients to get avg_dice.
Now my loss function looks like
def avg_dice_coef(y_true, y_pred, n_classes, smooth=1e-5):
# y_pred_new = K.variable(np_utils.to_categorical(K.argmax(y_pred), num_classes=OPTIONS.nb_classes))
avg_dice = 0. # 用于求和每个类别的骰子系数,之后求平均
for class_index in range(n_classes): # 对每个类别进行循环
intersection = K.sum(y_true[:, :, :, class_index] * y_pred_new[:, :, :, class_index], axis=[1, 2])
union = K.sum(y_true[:, :, :, class_index], axis=[1, 2]) + K.sum(y_pred_new[:, :, :, class_index], axis=[1, 2])
dice_one_class = K.mean((2. * intersection + smooth) / (union + smooth), axis=0)
avg_dice += dice_one_class
return avg_dice / n_classes # 之后求平均
in this function, y_pred is outputs from network after softmax, labels_shape=(batch_size, 1024, 512, n_classes), predicts_shape=(batch_size, 1024, 512, n_classes)
I think my loss is wrong because I use float y_pred. According to the equation
I think I should use integer 0 or 1 y_pred value instead of float. So I need to 1) use K.argmax() to get the index of max value of each pixel, 2) convert the result of K.argmax() to one-hot format.(A simple example: convert [0.1, 0.1, 0.8] to [0, 0, 1])
But when I add
y_pred_new = K.variable(np_utils.to_categorical(K.argmax(y_pred), num_classes=OPTIONS.nb_classes))
to achieve this goal, I got an error:
ValueError: setting an array element with a sequence.
How can I repair my loss and whether my idea of averaging is right?
In my opinion, the function np_utils.to_categorical() needs array but it got a sequence likes a tensor.
I met the problem too, then I changed np_utils.to_categorical() into tf.one_hot, it works.
Hope this helps:D

Categories

Resources