Shape of target and predictions tensors in PyTorch loss functions

Shape of target and predictions tensors in PyTorch loss functions - python

I am confused with the input shapes for tensors in nn.CrossEntropyLoss.
I am trying to implement a simple autoencoder for text sequences. The core of my problem can be illustrated by the following code
predictions = torch.rand(2, 3, 4)
target = torch.rand(2, 3)
print(predictions.shape)
print(target.shape)
nn.CrossEntropyLoss(predictions.transpose(1, 2), target)
In my case predictions has the shape (time_step, batch_size, vocabulary_size) while target has the shape (time_step, batch_size). Next I am transposing the predictions as per description which says that the second dimension of predictions should be the number of classes - vocabulary_size in my case. The code returns an error RuntimeError: bool value of Tensor with more than one value is ambiguous. Could someone please enlighten me how to use the damn thing? Thank you in advance!

You are not calling the loss function, but you are building it. The signature of the nn.CrossEntropyLoss constructor is:
nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
You are setting the predictions as the weight and the target as size_average,
where weight is an optional rescaling of the classes and size_average is deprecated, but expects a boolean. The target is a tensor of size [2, 3], which cannot be converted to a boolean.
You need to create the loss function first, as you don't use any of the optional parameters of the constructor, you don't specify any of them.
# Create the loss function
cross_entropy = nn.CrossEntropyLoss()
# Call it to calculate the loss for your data
loss = cross_entropy(predictions.transpose(1, 2), target)
Alternatively, you can directly use the functional version nn.functional.cross_entropy:
import torch.nn.functional as F
loss = F.cross_entropy(predictions.transpose(1, 2), target)
The advantage of the class version, compared to the functional version, is that you only need to specify the extra parameters once (such as the weight) instead of having to supply them manually each time.
Regarding the dimensions of the tensors, the batch size must be the first dimension, because the losses are averaged per element in the batch, so you have tensor of losses with size [batch_size]. If you used reduction="none", you would get back theses losses per element in the batch, but by default (reduction="mean") the mean of these losses is returned. That result would be different if the mean is taken across time steps rather than batches.
Lastly, the targets need to be the class indices, which means they need to have type torch.long not torch.float. In this randomly chosen example, you could create the random classes with torch.randint.
predictions = torch.rand(2, 3, 4)
target = torch.randint(4, (2, 3))
# Reorder the dimensions
# From: [time_step, batch_size, vocabulary_size]
# To: [batch_size, vocabulary_size, time_step]
predictions = predictions.permute(1, 2, 0)
# From: [time_step, batch_size]
# To: [batch_size, time_step]
target = target.transpose(0, 1)
F.cross_entropy(predictions, target)

Related

How to resolve `RuntimeError: The size of tensor a (3) must match the size of tensor b (128) at non-singleton dimension 1` for SNN?

What is a loss function in PyTorch that will allow me to calculate the loss for a multi-target problem? I have three target variables. I saw a suggestion for BCEWithLogitsLoss() but it produces this error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (128) at non-singleton dimension 1
I am working on a spiking neural network as well. The RunTimeError above is thrown at acc = np.mean((targets == idx).detach().cpu().numpy()). I don't actually this this is a matter of the loss function, but rather a function I have to print the batch accuracy:
def print_batch_accuracy(data, targets, train = False):
output, _ = net(data.view(batch_size, -1))
_, idx = output.sum(dim = 0).max(1)
print(targets)
acc = np.mean((targets == idx).detach().cpu().numpy())
if train:
print(f"Train set accuracy for a single minibatch: {acc * 100:.2f}%")
else:
print(f"Test set accuracy for a single minibatch: {acc * 100:.2f}%")
The shape of my batch is torch.Size([25, 128, 3]) of type Float.

The error is a result of the accuracy printer function not being designed for multi-target classification. I am guessing your target tensor's first dimension corresponds to the total number of correct classes (3), whereas the function is expecting the first dim to be batch size (128).
The function _, idx = output.sum(dim = 0).max(1) is returning the neuron with the largest number of spikes. This is then checked against targets (targets == idx), which implies accuracy is being measured for a single-target problem. This should be modified to check against all possible correct classes.
As for your question about suitable loss functions, BCEWithLogitsLoss() could be applied to the accumulated output spikes and that would work well.
Alternatively, each output neuron may have a target spike count which is compared against the actual spike count using MSELoss().

Keras custom loss function - how to access actual truth values and predictions

I am working with time series forecasting with Keras LSTM. I take the last n_input_steps occurrences of the series and try to predict one step forward. For example, if my time series is [1, 2, 3, 4] and n_input_steps = 2, the supervised learning dataset would be:
[1,2]--> 3
[2,3]--> 4
Thus, the series to be forecast (y_true) would be [3,4].
Now I have a Keras model to predict such type of series:
model = Sequential()
model.add(LSTM(neurons, activation='relu', input_shape=(n_steps_in, 1)))
model.add(RepeatVector(1))
model.add(LSTM(neurons, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss=my_loss,run_eagerly=True)
hist=model.fit(trainX, trainY, epochs=epochs, verbose=2,validation_data=(testX,testY))
And my loss function is:
def my_loss(y_true,y_pred):
print(kbe.shape(y_true))
y_true_c = kbe.cast(y_true,'float32')
y_pred_c = kbe.cast(y_pred,'float32')
ytn = y_true_c.numpy()
print(ytn.shape)
# Do some complex calculation requiring the elements of y_true_c and y_pred_c.
# ...
return result
In my poor understanding, if I call model.fit(trainX, trainY,...) with trainX corresponding to [[1, 2], [2, 3]] (an array in the proper shape) and trainY corresponding to [3, 4], the y_true inside my_loss should be a tensor corresponding to [3, 4]. However this is not that I am finding. The print output of my loss function (the shapes of tensor and array) is:
tf.Tensor([32 1 1], shape=(3,), dtype=int32)
(32, 1, 1)
regardless of the size of the input array. And if I print the values of the array, they have no remembrance to the original values. Even if I remove all the layers of the model, keeping a bare Sequential, I get the same shapes. Therefore, I am completely lost.

Based on the comments above, I did further search and found the response there is a default batch size ruling, as pointed out by Jorge Avila. The length of 32 is the default
used by Keras. The truth data and the predicted data come in batches of this size, so I should use batch_size=len(trainX) in the call to model.fit(). Furthermore, on top of that, the data comes in shuffled, that is why it becomes even more confusing. So, I have to use shuffle=False also in model.fit().
However, as pointed out by Jakub, even with these modifications, my intended loss function will not work because Keras requires symbolic derivatives of the function, which cannot be achieved by having logic which requires the numpy values. So, I have to start from scratch with another loss function acceptable by Keras.

Keep batch_sizes in multiples of 32-1024 depeding on you're data as 2** of always works as it is a common, but you shouldn't have to use shuffle in fit as TimeseriesGenrator is where the changes need to be made not fit.

Tensorflow 2: How can I use the shape of tensor y_true in custom loss?

I pass a list a to my custom function and I want to tf.tile it after converting it to a constant tensor. The times I tile it depends on the shape of y_true. I don't know how I can get the shape of y_true as integers. Here's the code:
def getloss(a):
a = tf.constant(a, tf.float32)
def loss(y_true, y_pred):
a = tf.reshape(a, [1,1,-1])
ytrue_shape = y_true.get_shape().as_list() #????
multiples = tf.constant([ytrue_shape[0], ytrue_shape[1], 1], tf.int32)
a = tf.tile(a, multiples)
#...
return loss
I have tried y_true.get_shape().as_list() but it reports an error because the first dimension (batch size) is None when compiling the model. Is there any way I can use the shape of y_true here?

When trying to access the shape of a tensor during the building of the model, when not all shapes are known, it is best to use tf.shape. It will be evaluated when the model is ran, as stated in the doc :
tf.shape and Tensor.shape should be identical in eager mode. Within tf.function or within a compat.v1 context, not all dimensions may be known until execution time. Hence when defining custom layers and models for graph mode, prefer the dynamic tf.shape(x) over the static x.shape.
ytrue_shape = tf.shape(y_true)
This will yield a Tensor, so use TF ops to get what you want :
multiples = tf.concat((tf.shape(y_true_shape)[:2],[1]),axis=0)

matching PyTorch tensor dimensions

I am having some issues with regards to the dimensionality of my tensors in my training function at present. I am using the MNIST dataset, so 10 possible targets, and originally wrote the prototype code using a training batch size of 10, which was in retrospect not the wisest choice. It gave poor results during some earlier tests, and increasing the amount of training iterations saw no benefit. Upon trying to then increase the batch size, I realised that what I had written was not that general, and I was likely never training it on the proper data. Below is my training function:
def Train(tLoops, Lrate):
for _ in range(tLoops):
tempData = train_data.view(batch_size_train, 1, 1, -1)
output = net(tempData)
trainTarget = train_targets
criterion = nn.MSELoss()
print("target:", trainTarget.size())
print("Output:", output.size())
loss = criterion(output, trainTarget.float())
# print("error is:", loss)
net.zero_grad() # zeroes the gradient buffers of all parameters
loss.backward()
for j in net.parameters():
j.data.sub_(j.grad.data * Lrate)
of which the print functions return
target: torch.Size([100])
Output: torch.Size([100, 1, 1, 10])
before the error message on the line where loss is calculated;
RuntimeError: The size of tensor a (10) must match the size of tensor b (100) at non-singleton dimension 3
The first print, target, is a 1-dimensional list of the respective ground truth values for each image. Output contains the output of the neural net for each of those 100 samples, so a 10 x 100 list, however from skimming and reshaping the data from 28 x 28 to 1 x 784 earlier, I seem to have extra dimensions unnecessarily. Does PyTorch provide a way to remove these? I couldn't find anything in the documentation, or is there something else that could be my issue?

There are several problems in your training script. I will address each of them below.
First, you should NOT do data batching by hand. Pytorch/torchvision have functions for that, use a dataset and a data loader: https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html.
You should also NEVER update the parameters of you network by hand. Use an Optimizer: https://pytorch.org/docs/stable/optim.html. In your case, SGD without momentum will have the same effect.
The dimensionality of your input seems to be wrong, for MNIST an input tensor should be (batch_size, 1, 28, 28) or (batch_size, 784) if you're training a MLP. Furthermore, the output of your network should be (batch_size, 10)

Why do the keras loss functions reduce the dimensionality by one?

When computing the loss between y_true and y_pred, the keras loss functions reduce the dimensionality by one. For example, when training a network on pairs of 64x64 greyscale images with batch size = 8, the shape of y_true and y_pred would be (8, 64, 64). The keras loss functions will produce a loss tensor with shape (8, 64), averaging over the last dimension.
I do not get why that would be necessary, all it does is average the loss over the rows of the image. Doesn't the network need the loss to be calculated individually for every output value (and therefore conserve the shape)? As far as I understand it, backpropagation looks at the individual loss of each output value compared to the target, and then updates previous weights accordingly. How can it do that, just knowing the averaged loss of each row, not every value individually? Here is a code snippet that shows the behaviour I described:
y_true = K.random_uniform([8,64,64])
y_pred = K.random_uniform([8,64,64])
c= mean_absolute_error(y_true,y_pred)
print(K.eval(tf.shape(c))) # (8,64)

I wondered the same thing. I believe, Keras assumes your data to have the following dimensions: [batch, W, H, n_classes] which means averaging over axis=-1 means averaging the loss over all different classes. However, in your case you do not have that dimension because you presumably do a binary classification in a grayscale image. So instead it ends up averaging the loss over the rows/columns. Interestingly enough, the model can still train and even improve in performance like this, which makes me believe that people in similar situations often just train their model without ever noticing.
You can avoid this by adding a dummy axis to your data.
This is how I got there:
From: https://keras.io/api/losses/
"(Note ondN-1: all loss functions reduce by 1 dimension, usually axis=-1.) “
Furthermore: "loss class instances feature a reduction constructor argument, which defaults to "sum_over_batch_size" (i.e. average). Allowable values are "sum_over_batch_size", "sum", and "none":
• "sum_over_batch_size" means the loss instance will return the average of the per-sample losses in the batch.
• "sum" means the loss instance will return the sum of the per-sample losses in the batch.
• "none" means the loss instance will return the full array of per-sample losses. "
From https://www.tensorflow.org/api_docs/python/tf/keras/losses/Reduction
"Caution: Verify the shape of the outputs when using Reduction.NONE. The builtin loss functions wrapped by the loss classes reduce one dimension (axis=-1, or axis if specified by loss function). Reduction.NONE just means that no additional reduction is applied by the class wrapper. For categorical losses with an example input shape of [batch, W, H, n_classes] the n_classes dimension is reduced. For pointwise losses you must include a dummy axis so that [batch, W, H, 1] is reduced to [batch, W, H]. Without the dummy axis [batch, W, H] will be incorrectly reduced to [batch, W]."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Shape of target and predictions tensors in PyTorch loss functions - python

Related

How to resolve `RuntimeError: The size of tensor a (3) must match the size of tensor b (128) at non-singleton dimension 1` for SNN?

Keras custom loss function - how to access actual truth values and predictions

Tensorflow 2: How can I use the shape of tensor y_true in custom loss?

matching PyTorch tensor dimensions

Why do the keras loss functions reduce the dimensionality by one?

Categories

Resources