matching PyTorch tensor dimensions - python

I am having some issues with regards to the dimensionality of my tensors in my training function at present. I am using the MNIST dataset, so 10 possible targets, and originally wrote the prototype code using a training batch size of 10, which was in retrospect not the wisest choice. It gave poor results during some earlier tests, and increasing the amount of training iterations saw no benefit. Upon trying to then increase the batch size, I realised that what I had written was not that general, and I was likely never training it on the proper data. Below is my training function:
def Train(tLoops, Lrate):
for _ in range(tLoops):
tempData = train_data.view(batch_size_train, 1, 1, -1)
output = net(tempData)
trainTarget = train_targets
criterion = nn.MSELoss()
print("target:", trainTarget.size())
print("Output:", output.size())
loss = criterion(output, trainTarget.float())
# print("error is:", loss)
net.zero_grad() # zeroes the gradient buffers of all parameters
loss.backward()
for j in net.parameters():
j.data.sub_(j.grad.data * Lrate)
of which the print functions return
target: torch.Size([100])
Output: torch.Size([100, 1, 1, 10])
before the error message on the line where loss is calculated;
RuntimeError: The size of tensor a (10) must match the size of tensor b (100) at non-singleton dimension 3
The first print, target, is a 1-dimensional list of the respective ground truth values for each image. Output contains the output of the neural net for each of those 100 samples, so a 10 x 100 list, however from skimming and reshaping the data from 28 x 28 to 1 x 784 earlier, I seem to have extra dimensions unnecessarily. Does PyTorch provide a way to remove these? I couldn't find anything in the documentation, or is there something else that could be my issue?

There are several problems in your training script. I will address each of them below.
First, you should NOT do data batching by hand. Pytorch/torchvision have functions for that, use a dataset and a data loader: https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html.
You should also NEVER update the parameters of you network by hand. Use an Optimizer: https://pytorch.org/docs/stable/optim.html. In your case, SGD without momentum will have the same effect.
The dimensionality of your input seems to be wrong, for MNIST an input tensor should be (batch_size, 1, 28, 28) or (batch_size, 784) if you're training a MLP. Furthermore, the output of your network should be (batch_size, 10)

Related

How to resolve `RuntimeError: The size of tensor a (3) must match the size of tensor b (128) at non-singleton dimension 1` for SNN?

What is a loss function in PyTorch that will allow me to calculate the loss for a multi-target problem? I have three target variables. I saw a suggestion for BCEWithLogitsLoss() but it produces this error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (128) at non-singleton dimension 1
I am working on a spiking neural network as well. The RunTimeError above is thrown at acc = np.mean((targets == idx).detach().cpu().numpy()). I don't actually this this is a matter of the loss function, but rather a function I have to print the batch accuracy:
def print_batch_accuracy(data, targets, train = False):
output, _ = net(data.view(batch_size, -1))
_, idx = output.sum(dim = 0).max(1)
print(targets)
acc = np.mean((targets == idx).detach().cpu().numpy())
if train:
print(f"Train set accuracy for a single minibatch: {acc * 100:.2f}%")
else:
print(f"Test set accuracy for a single minibatch: {acc * 100:.2f}%")
The shape of my batch is torch.Size([25, 128, 3]) of type Float.
The error is a result of the accuracy printer function not being designed for multi-target classification. I am guessing your target tensor's first dimension corresponds to the total number of correct classes (3), whereas the function is expecting the first dim to be batch size (128).
The function _, idx = output.sum(dim = 0).max(1) is returning the neuron with the largest number of spikes. This is then checked against targets (targets == idx), which implies accuracy is being measured for a single-target problem. This should be modified to check against all possible correct classes.
As for your question about suitable loss functions, BCEWithLogitsLoss() could be applied to the accumulated output spikes and that would work well.
Alternatively, each output neuron may have a target spike count which is compared against the actual spike count using MSELoss().

how to throw some of samples within a mini-batch during training

I'm using AdamOptimizer to train a simple DNN network.(I'm using version tf1.4).
And I want to throw away some bad samples within a batch during training. Say I have 4096 samples within a batch, and I want to throw 96 samples away and only use the remaining 4000 samples to calculate loss and do backpropagation.
How can I achieve this?
The code set up is very straightforward like below:
lables = tf.reshape(labels, [batch_size, 1])
logits = tf.reshape(logits, [batch_size, 1])
loss_vector = tf.nn.sigmoid_cross_entropy_with_logits(multi_class_labels=labels,
logits=logits)
loss_scalar = tf.reduce_mean(loss_vector)
opt = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = opt.minimize(loss_scalar, global_step=global_step)
one possible solution is to do a mask operation after loss_vector and before reduce_mean. But I'm not sure if it's the right solution and I have some questions about what's going on underhood:
in the minimize() operation, since the input parameter is a scalar, how will it know that some of input samples are masked out?
in the minimize() operation, how many times of backpropagation will happen?
during training, samples are feed in the graph one by one, how can TF know that which should be kept and which should throw away?

How to iterate through tensors in custom loss function?

I'm using keras with tensorflow backend. My goal is to query the batchsize of the current batch in a custom loss function. This is needed to compute values of the custom loss functions which depend on the index of particular observations. I like to make this clearer given the minimum reproducible examples below.
(BTW: Of course I could use the batch size defined for the training procedure and plugin it's value when defining the custom loss function, but there are some reasons why this can vary, especially if epochsize % batchsize (epochsize modulo batchsize) is unequal zero, then the last batch of an epoch has different size. I didn't found a suitable approach in stackoverflow, especially e. g.
Tensor indexing in custom loss function and Tensorflow custom loss function in Keras - loop over tensor and Looping over a tensor because obviously the shape of any tensor can't be inferred when building the graph which is the case for a loss function - shape inference is only possible when evaluating given the data, which is only possible given the graph. Hence I need to tell the custom loss function to do something with particular elements along a certain dimension without knowing the length of the dimension.
(this is the same in all examples)
from keras.models import Sequential
from keras.layers import Dense, Activation
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
example 1: nothing special without issue, no custom loss
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
(Output omitted, this runs perfectily fine)
example 2: nothing special, with a fairly simple custom loss
def custom_loss(yTrue, yPred):
loss = np.abs(yTrue-yPred)
return loss
model.compile(optimizer='rmsprop',
loss=custom_loss,
metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
(Output omitted, this runs perfectily fine)
example 3: the issue
def custom_loss(yTrue, yPred):
print(yPred) # Output: Tensor("dense_2/Sigmoid:0", shape=(?, 1), dtype=float32)
n = yPred.shape[0]
for i in range(n): # TypeError: __index__ returned non-int (type NoneType)
loss = np.abs(yTrue[i]-yPred[int(i/2)])
return loss
model.compile(optimizer='rmsprop',
loss=custom_loss,
metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
Of course the tensor has not shape info yet which can't be inferred when building the graph, only at training time. Hence for i in range(n) rises an error. Is there any way to perform this?
The traceback of the output:
-------
BTW here's my true custom loss function in case of any questions. I skipped it above for clarity and simplicity.
def neg_log_likelihood(yTrue,yPred):
yStatus = yTrue[:,0]
yTime = yTrue[:,1]
n = yTrue.shape[0]
for i in range(n):
s1 = K.greater_equal(yTime, yTime[i])
s2 = K.exp(yPred[s1])
s3 = K.sum(s2)
logsum = K.log(y3)
loss = K.sum(yStatus[i] * yPred[i] - logsum)
return loss
Here's an image of the partial negative log-likelihood of the cox proportional harzards model.
This is to clarify a question in the comments to avoid confusion. I don't think it is necessary to understand this in detail to answer the question.
As usual, don't loop. There are severe performance drawbacks and also bugs. Use only backend functions unless totally unavoidable (usually it's not unavoidable)
Solution for example 3:
So, there is a very weird thing there...
Do you really want to simply ignore half of your model's predictions? (Example 3)
Assuming this is true, just duplicate your tensor in the last dimension, flatten and discard half of it. You have the exact effect you want.
def custom_loss(true, pred):
n = K.shape(pred)[0:1]
pred = K.concatenate([pred]*2, axis=-1) #duplicate in the last axis
pred = K.flatten(pred) #flatten
pred = K.slice(pred, #take only half (= n samples)
K.constant([0], dtype="int32"),
n)
return K.abs(true - pred)
Solution for your loss function:
If you have sorted times from greater to lower, just do a cumulative sum.
Warning: If you have one time per sample, you cannot train with mini-batches!!!
batch_size = len(labels)
It makes sense to have time in an additional dimension (many times per sample), as is done in recurrent and 1D conv netoworks. Anyway, considering your example as expressed, that is shape (samples_equal_times,) for yTime:
def neg_log_likelihood(yTrue,yPred):
yStatus = yTrue[:,0]
yTime = yTrue[:,1]
n = K.shape(yTrue)[0]
#sort the times and everything else from greater to lower:
#obs, you can have the data sorted already and avoid doing it here for performance
#important, yTime will be sorted in the last dimension, make sure its (None,) in this case
# or that it's (None, time_length) in the case of many times per sample
sortedTime, sortedIndices = tf.math.top_k(yTime, n, True)
sortedStatus = K.gather(yStatus, sortedIndices)
sortedPreds = K.gather(yPred, sortedIndices)
#do the calculations
exp = K.exp(sortedPreds)
sums = K.cumsum(exp) #this will have the sum for j >= i in the loop
logsums = K.log(sums)
return K.sum(sortedStatus * sortedPreds - logsums)

Using noise_shape of the Dropout layer. Batch_size does not fit into provided samples. What to do?

I am using a dropout layer in my model. As I use temporal data, I want the noise_shape to be the same per timestep -> (batch_size, 1, features).
The problem is if I use a batch size that does not fit into the provided samples, I get an error message. Example: batch_size= 2, samples= 7. In the last iteration, the batch_size (2) is larger than the rest of the samples (1)
The other layers (my case: Masking, Dense, and LSTM) apparently don`t have a problem with that and just use a smaller batch for the last, not fitting, samples.
ConcreteError:
Training data shape is:[23, 300, 34]
batchsize=3
InvalidArgumentError (see above for traceback): Incompatible shapes:
[2,300,34] vs. [3,1,34] [[Node: dropout_18/cond/dropout/mul =
Mul[T=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](dropout_18/cond/dropout/div,
dropout_18/cond/dropout/Floor)]]
Meaning that for the last batch [2,300,34], the batch_size cannot split up into [3,1,34]
As I am still in the parameter tuning phase (does that ever stop :-) ),
Lookback(using LSTMs),
split-percentage of train/val/test,
and batchsize
will still constantly change. All of the mentioned influence the actual length and shape of the Training data.
I could try to always find the next fitting int for batch_size by some calculations. Example, if batch_size=4 and samples=21, I could reduce batch_size to 3. But if the number of training samples are e.g. primes this again would not work. Also If I choose 4, I probably would like to have 4.
Do I think to complex? Is there a simple solution without a lot of exception programming?
Thank you
Thanks to nuric in this post, the answer is quite simple.
The current implementation does adjust the according to the runtime
batch size. From the Dropout layer implementation code:
symbolic_shape = K.shape(inputs) noise_shape = [symbolic_shape[axis]
if shape is None else shape
for axis, shape in enumerate(self.noise_shape)]
So if you give noise_shape=(None, 1, features) the shape will be
(runtime_batchsize, 1, features) following the code above.

Keras RNN Regression Input dimensions and architecture

I have recently constructed a CNN in Keras (with Tensorflow as this backend) that takes stellar spectra as an input and predicts three stellar parameters as outputs: Temperature, Surface Gravity, and Metallicity. I am now trying to create an RNN that does the same thing in order to compare the two models.
After searching through examples and forums I haven't come across many applications that are similar enough to my project. I have tried implementing a simple RNN to see if I can come up with sensible results, but no luck so far: the networks don't seem to be learning at all.
I could really use some guidance to get me started. Specifically:
Is an RNN an appropriate network for this type of problem?
What is the correct input shape for the model? I know this depends on the architecture of the network, so I guess that my next question would be: what is a simple architecture to start with that is capable of computing regression predictions?
My input data is such that I have m=50,000 spectra each with n=7000 data points, and L=3 output labels that I am trying to learn. I also have test sets and cross-validation sets with the same n & L dimensions.
When structuring my input data as (m,n,1) and my output targets as (m,L) and using the following architecture, the loss doesn't seem to decrease.
n=7000
L=3
## train_X.shape = (50000, n, 1)
## train_Y.shape = (50000, L)
## cv_X.shape = (10000, n, 1)
## cv_Y.shape = (10000, L)
batch_size=32
lstm_layers = [16, 32]
input_shape = (None, n, 1)
model = Sequential([
InputLayer(batch_input_shape=input_shape),
LSTM(lstm_layers[0],return_sequences=True, dropout_W=0.2, dropout_U=0.2),
LSTM(lstm_layers[1], return_sequences=False),
Dense(L),
Activation('linear')
])
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
model.fit(train_X, train_Y, batch_size=batch_size, nb_epoch=20,
validation_data=(cv_X, cv_Y), verbose=2)
I have also tried changing my input shape to (m, 1, n) and still haven't had any success. I am not looking for an optimal network, just something that trains and then I can take it from there. My input data isn't in time-series, but there are relationships between one section of the spectrum and the previous section, so is there a way I can structure each spectrum into a 2D array that will allow an RNN to learn stellar parameters from the spectra?
First you set
train_X.shape = (50000, n, 1)
Then you write
input_shape = (None, 1, n)
Why don't you try
input_shape = (None, n, 1) ?
It would make more sense for your RNN to receive a sequence of n timesteps and 1 value per time step than the other way around.
Does it help? :)
**EDIT : **
Alright, after re-reading here is my 2cents for your questions : the LSTM isn't a good idea.
1) because there is no "temporal" information, there isnt a "direction" in the spectrum information. The LSTM is good at capturing a changing state of the world for example. It's not the best at combining informations at the beginning of your spectrum with information at the end. It will "read" starting from the beginning and that information will fade as the state is updated. You could try a bidirectionnal LSTM to counter the fact that there is "no direction". However, go to the second point.
2) 7000 timesteps is wayyy too much for an LSTM to work. When it trains, at the backpropagation step, the LSTM is unrolled and the information will have to go through "7000 layers" (not actually 7000 because they have the same weights). This is very very difficult to train. I would limit LSTM to 100 steps at max (from my experience).
Otherwise your input shape is correct :)
Have you tried a deep fully connected network?! I believe this would be more efficient.

Categories

Resources