I want to use Tensorflows tf.losses.compute_weighted_loss but cannot find any good example. I have a multi-class classification problem and use the tf.nn.sigmoid_cross_entropy_with_logits as loss. Now I want to weigh the errors for each label independently. Let's say I have n labels, that means I need a n-sized weight vector. Unfortunately tf expects me to pass a (b, n) shaped matrix of error weights, where b is the batch size. So basically I would need to repeat the weight vector b times. That's okay given a fixed batch size, but if my batch size is variable (e.g. smaller batch at the end of the dataset) I have to adapt. Is there a way around this or did I miss something?
I just had to reshape the vector from (n,) to (1,n) to make the broadcasting possible:
error_weights = error_weights.reshape(1, error_weights.shape[0])
Adding to the existing answer, use tf.expand_dims if error_weights is a Tensor.
error_weights = tf.expand_dims(error_weights, 0) # changes shape [n] to [1, n]
Related
For more robustnes of my model I want to normalize my feature tensor.
I tried doing it the way that is to the best of my knowledge standard for pictures:
class Dataset(torch.utils.data.Dataset):
'Characterizes a dataset for PyTorch'
def __init__(self, input_tensor, transform = transforms.Normalize(mean= 0.5, std=0.5)):
self.labels = input_tensor[:,:,-1]
self.features = input_tensor[:,:,:-1]
self.transform = transform
def __len__(self):
return self.labels_planned.shape[0]
def __getitem__(self, index):
# Load data and get label
X = self.features[index]
y = self.labelslabels[index]
if self.transform:
X = self.transform(X)
return X, y
But receive this error message:
ValueError: Expected tensor to be a tensor image of size (C, H, W). Got tensor.size() = torch.Size([8, 25]).
Everywhere I looked people suggest that one should use .view to generate the third dimension in order to comply with the standard shape of pictures, but this seems very odd to me. Is there maybe a cleaner way to do this.
Also where should I best place the normalization? Just for the batch or for the entire train dataset?
You are asking two different questions, I will try to answer both.
Indeed, you should first reshape to (c, h, w) where c is the channel dimension In most cases, you will need that extra dimension because most 'image' layers are built to receive 3d dimensional tensors - not counting the batch dimension - such as nn.Conv2d, BatchNorm2d, etc... I don't believe there's anyways around it, and doing so would restrict yourself to one-layer image datasets.
You can broadcast to the desired shape with torch.reshape or Tensor.view:
X = X.reshape(1, *X.shape)
Or by adding an additional dimension using torch.unsqueeeze:
X.unsqueeze(0)
About normalization. Batch-normalization and dataset-normalization are two different approaches.
The former is a technique that can achieve improved performance in convolution networks. This kind of operation can be implemented using a nn.BatchNorm2d layer and is done using learnable parameters: a scale factor (~ std) and a bias (~ mean). This type of normalization is applied when the model is called and is applied per-batch.
The latter is a pre-processing technique which allows making different features have the same scale. This normalization can be applied inside the dataset per-element. It requires you measure the mean and standard deviation of your training set.
I am learning Neural network.
Here is the complete piece of code:
https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20(Exercises).ipynb
When I transpose features, I get the following output:
import torch
def activation(x):
return 1/(1+torch.exp(-x))
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable
# Features are 5 random normal variables
features = torch.randn((1, 5))
# True weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a true bias term
bias = torch.randn((1, 1))
product = features.t() * weights + bias
output = activation(product.sum())
tensor(0.9897)
However, if I transpose weights, I get a different output:
weights_prime = weights.view(5,1)
prod = torch.mm(features, weights_prime) + bias
y_hat = activation(prod.sum())
tensor(0.1595)
Why does this happen?
Update
I took a look at the solution:
https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20(Solution).ipynb
And I saw this:
y = activation((features * weights).sum() + bias)
why can a matrix features(1,5) multiply another matrix weights(1,5) without transposing weights first?
Update 2
After read several posts, I realized that
matrixA * matrixB is different from torch.mm(matrixA,matrixB) and torch.matmul(matrixA,matrixB).
Could someone confirm my three understandings between?
So the * means element-wise multiplication, whereas torch.mm() and torch.matmul() are matrix-wise multiplication.
differences between torch.mm() and torch.matmul(): mm() is used specifically for 2 dimensions matrix, whereas matmul() can be used for more complicated cases.
In Neutral Network for this Udacity coding exercise mentioned in my above link, it needs element-wise multiplication.
Update 3
Just to bring in the Video screenshot for someone who has the same confusion:
And here is the video link: https://www.youtube.com/watch?time_continue=98&v=6Z7WntXays8&feature=emb_logo
Looking at https://pytorch.org/docs/master/generated/torch.nn.Linear.html
The typical linear (fully connected) layer in torch uses input features of shape (N,∗,in_features) and weights of shape (out_features,in_features) to produce an output of shape (N,*,out_features). Here N is the batch size, and * is any number of other dimensions (may be none).
The implementation is:
output = input.matmul(weight.t())
So, the answer is that neither of your formulas is correct according to convention; the standard formula is the one above.
You may use a non-standard shape since you're implementing things from scratch; as long as it's consistent it may work, but I don't recommend it for learning. It's unclear what 1 and 5 is in your code, but presumably you want 5 input features and one output feature, with a batch size of 1 as well. In which case the standard shapes should be input = torch.randn((1, 5)) for batch size=1 and in_features=5, and weights = torch.randn((5, 1)) for in_features=5 and out_features=1.
There is no reason why weights should ever be the same shape as features; thus weights = torch.randn_like(features) doesn't make sense.
Lastly, for your actual questions:
"Should I transpose features or weights in Neural network?" - in torch convention, you should transpose weights, but use matmul with the features first. Other frameworks may have a different convention; as long as in_features dimension of the weights is multiplied by the num_features dimension of the input, it would work.
"Why does this happen?" - these are two completely different calculations; there is no reason to think they would produce the same result.
"So the * means element-wise multiplication, whereas torch.mm() and torch.matmul() are matrix-wise multiplication." - Yes; mm is matrix-matrix only, matmul is vector-matrix or matrix-matrix, including batched versions of same - check the docs for everything matmul can do (which is kinda a lot).
"differences between torch.mm() and torch.matmul(): mm() is used specifically for 2 dimensions matrix, whereas matmul() can be used for more complicated cases." - Yes; the big difference is that matmul can broadcast. Use it when you specifically intend that; use mm to prevent unintentional broadcasting.
"In Neutral Network for this Udacity coding exercise mentioned in my above link, it needs element-wise multiplication." - I doubt it; it's probably an error in the Udacity code. This bit of code weights = torch.randn_like(features) looks like an error in any case; the dimensions of weights have a meaning different from the dimensions of features.
This line is taking an outer product between the two vectors.
product = features.t() * weights + bias
The resulting shape is 5x5.
If you change this to a dot product, then output will match y_hat.
product = torch.mm(weights, features.t()) + bias
I want to train a network with planar curves, which I represent as numpy arrays with shape (L,2).
The number 2 stands for x,y coordinates and L is the number of points which is changing in my dataset. I treat x,y as 2 different "channels".
I implemented a function, next_batch(batch_size), that provides the next batch as a 1D numpy array with shape (batch_size,), containing elements which are 2D arrays with shape: (L,2). These are my curves, and as mentioned before, L is different between the elements. (I didn't want to confine to fixed number of points in the curve).
My question:
How can I manipulate the output from next_batch() so I will able to feed the network with the input curves, using a scheme similar to what appears in Tensorflow tutorial: https://www.tensorflow.org/get_started/mnist/pros
i.e, using the feed_dict mechanism.
In the given turorial the input size was fixed, in the tutorial's code line:
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
batch[0] has a fixed shape: (50,784) (50 = #samples,784 = #pixels)
I cannot transform my input into numpy array with shape (batch_size,L,2)
since the array should have fixed size in every dimension.
So what can I do?
I already defined a placeholder (that can have unknown size):
#first dimension is the sample dim, second is curve length, third:x,y coordinates
x = tf.placeholder(tf.float32, [None, None,2])
but how can I feed it properly?
Short answer that you're probably looking for: you can't without padding or grouping samples by lenght.
To elaborate a bit: in tensorflow, dimensions must be fixed throughout a batch, and jagged arrays are not natively supported.
Dimensions may be unknown a priori (in which case you set the placeholders' dimensions to None) but are still inferred at runtime, so
your solution of having a placeholder:
x = tf.placeholder(tf.float32, [None, None, 2])
couldn't work because it's semantically equivalent to saying "I don't know the constant length of the curves in a batch a priori, infer it at runtime from the data".
This is not to say that your model in general can't accept inputs of different dimensions, if you structure it accordingly, but the data that you feed it each time you call sess.run() must have fixed dimensions.
Your options, then, are as follows:
Pad your batches along the second dimension.
Say that you have 2 curves of shape (4, 2) and (5, 2) and you know the maximum curve length in you dataset is 6, you could use np.pad as follows:
In [1]: max_len = 6
...: curve1 = np.random.rand(4, 2)
...: curve2 = np.random.rand(5, 2)
...: batch = [curve1, curve2]
In [2]: for b in batch:
...: dim_difference = max_len - b.shape[0]
...: print np.pad(b, [(0, dim_difference), (0,0)], 'constant')
...:
[[ 0.92870128 0.12910409]
[ 0.41894655 0.59203704]
[ 0.3007023 0.52024492]
[ 0.47086336 0.72839691]
[ 0. 0. ]
[ 0. 0. ]]
[[ 0.71349902 0.0967278 ]
[ 0.5429274 0.19889411]
[ 0.69114597 0.28624011]
[ 0.43886002 0.54228625]
[ 0.46894651 0.92786989]
[ 0. 0. ]]
Have your next_batch() function return batches of curves grouped by length.
These are the standard ways of doing things when dealing with jagged arrays.
Another possibility, if your task allows for it, is to concatenate all your points in a single tensor of shape (None, 2) and change your model to operate on single points as if they were samples in a batch. If you save the original sample lengths in a separate array, you can then restore the model outputs by slicing them correctly. This is highly inefficient and requires all sorts of assumptions on your problem, but it's a possibility.
Cheers and good luck!
You can use input with different sizes in TF. just feed the data in the same way as in the tutorial you listed, but make sure to define the changing dimensions in the placeholder as None.
Here's an simple example of feeding a placeholder with different shapes:
import tensorflow as tf
import numpy as np
array1 = np.arange(9).reshape((3,3))
array2 = np.arange(16).reshape((4,4))
array3 = np.arange(25).reshape((5,5))
model_input = tf.placeholder(dtype='float32', shape=[None, None])
sqrt_result = tf.sqrt(model_input)
with tf.Session() as sess:
print sess.run(sqrt_result, feed_dict={model_input:array1})
print sess.run(sqrt_result, feed_dict={model_input:array2})
print sess.run(sqrt_result, feed_dict={model_input:array3})
You can use placeholder with initial the var with [None, ..., None]. Each 'None' means there are input feed data at that dimension for the compiler. For example, [None, None] means a matrix with any row and column length you can feed. However, you should take care about which kind of NN you use. Because when you deal with CNN, at the convolution layer and pool layer you must identify the specific size of the 'tensor'.
Tensorflow Fold might be of interest to you.
From the Tensorflow Fold README:
TensorFlow Fold is a library for creating TensorFlow models that consume structured data, where the structure of the computation graph depends on the structure of the input data.Fold implements dynamic batching. Batches of arbitrarily shaped computation graphs are transformed to produce a static computation graph. This graph has the same structure regardless of what input it receives, and can be executed efficiently by TensorFlow.
The graph structure can be set up so as to accept an arbitrary L value so that any structured input can be read in. This is especially helpful when building architectures such as recursive neural nets. The overall structure is very similar to what you are used to (feed dicts, etc). Since you need a dynamic computational graph for your application, this might be a good move for you in the long run.
I am a beginner for Tensorflow. I am a bit confused by the tutorial. The author firstly gives a formula y=softmax(Wx+b), but use xW+b in the python code and explain it is a small trick. I do not understand the trick, why does the author need to flip the formula?
https://www.tensorflow.org/get_started/mnist/beginners
First, we multiply x by W with the expression tf.matmul(x, W). This is
flipped from when we multiplied them in our equation, where we had Wx,
as a small trick to deal with x being a 2D tensor with multiple
inputs. We then add b, and finally apply tf.nn.softmax.
As you can see from the formula,
y=softmax(Wx + b)
the input x is multiplied by the Weight variable W,but in the doc
y = tf.nn.softmax(tf.matmul(x, W) + b)
W is multiplied by x for calculation convenience, so we must flip W from 10*784 to 784*10 keep consistent with the formula.
In general in machine learning, esp. tensorflow, you always want your first dimension to represent your batch. The trick is only a way of ensuring that without transposing everything before and after each matrix multiplication.
x is not really a column vector of features, but a 2D matrix of shape (batch_size, n_features).
If you keep Wx, then you'll transpose x (to x' of shape (n_features, batch_size)) use W of shape (n_outputs, n_features), and Wx' will be of shape (n_outputs, batch_size), so you'll have to transpose it back to (batch_size, n_outputs), which is what you want in the end.
If you're using tf.matmul(x, W), then W is of shape (n_features, n_outputs ), and the result is directly of shape (batch_size, n_outputs).
I agree this is not clear at first.
x being a 2D tensor with multiple inputs
is a very succinct way to tell you that in tensorflow, data is stored in tensors following conventions that are not those of linear algebra.
In particular, the outermost dimension (i.e. columns for matrices) is always the sample dimension: that is, it has the same size as your number of samples.
When you store sample features in a 2D tensor (a matrix), the features are therefore stored in the inner-most dimension, i.e. lines. That is, tensor x is the transposed of variable $x$ in the equation. So are W and b. The fact that x.T*W.T=(W.x).T explains the swap inconsistency in the multiplication between the linear algebra equation and the tensor implementation of it.
I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'
They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.