How is it that you would create a tensorflow vector from a tensorflow constant/variable etc?
For example I have a constant x and I want to create a vector which is [x].
I have tried the code below and it doesn't work.
Any help would be appreciated.
x = tf.placeholder_with_default(1.0,[], name="x")
nextdd = tf.constant([x], shape=[1], dtype=tf.float32)
First I'd like to define a tensor for you:
Tensors are n-dimensional matrices. A rank 0 tensor is a scalar, e.g. 42. a rank 1 tensor is a Vector, e.g. [1,2,3], a rank 2 tensor is a matrix, a rank 3 tensor might be an image of shape [640, 480, 3] (640x480 resolution, 3 color channels). a rank 4 tensor might be a batch of such images of shape [10, 640, 480, 3] (10 640x480 images), etc.
Second, you have basically 4 types of tensors in Tensorflow.
1) Placeholders - these are tensors that you pass into tensorflow when you call sess.run. For example: sess.run([nextdd], {x:[1,2,3]}) creates a rank 1 tensor out of x.
2) Constants - these are fixed values as the name suggests. E.g. tf.constant(42) and should be specified at compile time, not runtime (eluding to your primary mistake here).
3) Computed tensors - x = tf.add(a,b) is a computed tensor, it's computed from a,b. Its value is not stored after the computation is finished.
4) Variables - These are mutable tensors that are kept around after the computation is complete. For example the weights of a neural network.
Now to address your question explicitly. x is already a tensor. If you were passing in a vector then it's a rank 1 tensor (aka a vector). You can use it just like you'd use a constant, computed tensor, or variable. They all work the same in operations. There is no reason for the nextdd line at all.
Now, nextdd fails becuase you tried to create a constant from a variable term, which isn't a defined operation. tf.constant(42) is well defined, that's what a constant is.
You could just use x directly, as in:
x = tf.placeholder_with_default(1.0,[], name="x")
y = tf.add(x, x)
sess = tf.InteractiveSession()
y.eval()
Result:
2.0
From you description, it looks like you want to use tf.expand_dims:
# 't' is a tensor of shape [2]
tf.shape(tf.expand_dims(t, 0)) # [1, 2]
tf.shape(tf.expand_dims(t, 1)) # [2, 1]
tf.shape(tf.expand_dims(t, -1)) # [2, 1]
Related
I am running into a "InvalidArgumentError: PartialTensorShape: Incompatible shapes during merge" error when I try to tf.concat two tensors who's shapes are dependent on the function input within the vectorized function (even though the output shape is the same for each a,b pair). Below is an example of the situation
import tensorflow as tf
def test_fn(inputs):
a,b = inputs
out = tf.concat([tf.ones(a),tf.zeros(b)],0)
return out
a = tf.constant([5,4,3,2])
b = tf.constant([5,6,7,8])
x_a = tf.vectorized_map(test_fn,(a,b))
I am looking for an explanation of why the error is happening.
Note: I noticed in the source code the comment "- The shape and dtype of any intermediate or output tensors in the computation of fn should not depend on the input to fn." which seems to be the scenario here. Is there a workaround that can still take advantage of the vectorization?
Using x_a = tf.map_fn(test_fn,(a,b),fn_output_signature=tf.TensorSpec((10,))) works but doesn't parallelize.
The problem is you are passing a tensor to tf.ones and tf.zeros instead of a shape. For example, if you pass the tensor a to tf.ones, it will be interpreted as the shape resulting in a tensor with the shape (5, 4, 3, 2). That is probably not what you want. Try something like this:
import tensorflow as tf
def test_fn(inputs):
a, b = inputs
out = tf.stack([tf.ones_like(a), tf.zeros_like(b)], 0)
return out
a = tf.constant([5,4,3,2])
b = tf.constant([5,6,7,8])
x_a = tf.vectorized_map(test_fn,(a,b))
x_a = tf.transpose(x_a)
print(x_a)
tf.Tensor(
[[1 1 1 1]
[0 0 0 0]], shape=(2, 4), dtype=int32)
Note that you have to use tf.stack instead of tf.concat because TF does not currently support scalar concatenation when using tf.vectorized_map. Check out the limitations of tf.vectorized_map here.
I'm trying to differentiate a gradient in PyTorch. I found this link but can't get it to work.
My code looks as follows:
import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim
class net_x(nn.Module):
def __init__(self):
super(net_x, self).__init__()
self.fc1=nn.Linear(2, 20)
self.fc2=nn.Linear(20, 20)
self.out=nn.Linear(20, 4)
def forward(self, x):
x=self.fc1(x)
x=self.fc2(x)
x=self.out(x)
return x
nx = net_x()
r = torch.tensor([1.0,2.0])
nx(r)
>>>tensor([-0.2356, -0.7315, -0.2100, -0.6741], grad_fn=<AddBackward0>)
But when I try to differentiate the function with respect to the first parameter
grad(nx, r[0])
I get the error
TypeError: 'net_x' object is not iterable
Update
Trying to extend this to tensors:
For some reason the gradient is the same for all inputs.
a = torch.rand((8,2), requires_grad=True)
s = []
s_t = []
for input_tensor in a:
output_tensor = nx(input_tensor)
s.append(output_tensor[0])
s_t_value = grad(output_tensor[0], input_tensor)[0][0]
s_t.append(s_t_value)
print(s_t)
But the output is:
[tensor(-0.1326), tensor(-0.1326), tensor(-0.1326), tensor(-0.1326), tensor(-0.1326), tensor(-0.1326), tensor(-0.1326), tensor(-0.1326)]
First thing to change if you want to have the gradients with respect to r is to set the requires_grad flag to True for this tensor :
nx = net_x()
r = torch.tensor([1.0,2.0], requires_grad=True)
Then, as explained in autograd documentation, grad computes the gradients of oputputs with respect to the inputs, so you need to save the output of the model :
y = nx(r)
Now you can compute the gradients with respect to r. But there is one last issue : grad only knows how to propagate gradients from a scalar tensor, which y is not. So you need to compute the gradients with respect to each coordinate :
for x in y:
print(grad(x, r, retain_graph=True))
or equivalently:
for i in range(y.shape[0]):
# prints the vector (dy_i/dr_0, dy_i/dr_1, ... dy_i/dr_n)
print(grad(y[i], r, retain_graph=True))
You need to retain_graph because without this flag, the computational graph is cleared after the first gradient propagation. And there you have it, the derivative of each coordinate of nx(r) with respect to r !
To answer your question in the comments :
Not an error, it's normal. So you have a batched input of size (B, 2), with B = 8. You get a batched output of shape (B, 4). Now, for each vector of the batched output, for each coordinate of this vector, you can compute the derivative with respect to the batched input, which will yield a gradient of size (B,2), like that :
for b in y: # There a B vectors b of shape (4)
for x in b: # There are 4 coordinates
# This prints a tensor of shape (B, 2)
print(grad(x, r, retain_graph=True))
Now remember the way batches work : all batches are computed together to harvest the power of GPU, but they are actually completely independant. So al b vectors are actually results of the network from different inputs. Which means, the gradient of the i-th vector b with respect to the j-th vector of the input must be 0 if i!=j. Does that make sense ? It's like computing f(x,y) = (x^2, y^2). The derivative of y^2 with respect to x is obviously 0 ! Well consider x and y to be two samples from one batch, and you have you explaination for why there are a lot of 0 in your results.
A last sample of code to make it even clearer :
inputs = [torch.randn(1, 2, requires_grad=True) for i in range(8)]
r = torch.cat(inputs) # shape : (8, 2)
y = nx(r) # shape : (8, 4)
for i in range(len(y)):
print(f"Gradients of y[{i}] wrt r[{i}]")
for x in y[i]:
# prints a tensor of size (2)
print(grad(x, inputs[i], retain_graph=True))
On to why all the gradients are the same. This is because your neural network is completely linear. You have 3 nn.Linear layers, and no non-linear activation function (as a consequence, this is literally equivalent to a network with only one layer). One property of linear layers is that their gradient is constant : d(alpha*x)/dx = alpha (independant of x). Therefore the gradients will be identical along all dimensions. Just add non-linear activation layers like sigmoids and this behavior will not happen again.
I have a tensor with 64 elements in pytorch and I want to convert it to a complex tensor with 32 elements. Order is important for me and everything should be in PyTorch so I can use it in my customized loss function:
the first half in my primary tensor (W) are my real numbers and the second half are my imaginary ones. so my final tensor should be like:
W_final = tensor(W[0]+jW[32], W[1]+jW[33], W[2]+jW[34], W[3]+jW[35], ... , W[31]+jW[63])
I tried this approach:
import torch
W_1 = = torch.reshape(W,(2,32)) #reshape W with shape (64) to W_1 with shape (2,32)
W_2 = torch.transpose(W_1,0,1) #transpose W_1 to W_2 with shape (32,2), so I can use view_as_complex
W_final = torch.view_as_complex(W_2)
The problem is that with transpose, the stride also changes and I get this error:
RuntimeError: Tensor must have a last dimension with stride 1
Do know how can I deal with stride? or is there any way to reshape with different orders same as numpy?
or any other way to convert to complex?
It has to do with the non contiguous memory allocation for W_2 after you do reshape.
To handle this error you should call .contiguous() on W_2.
From Pytorch Docs:
" Strides are a list of integers: the k-th stride represents the jump in the memory necessary to go from one element to the next one in the k-th dimension of the Tensor. This concept makes it possible to perform many tensor operations efficiently."
Once you call contiguous all dimensions of returned tensor will have stride 1.
Here is a working sample code:
import torch
W = torch.randn(64)
W_2 = W.view(-1,32).permute(1,0).contiguous()
W_final = torch.view_as_complex(W_2)
First call view to reshape tensor to shape (2,32), then permute dimensions to transpose the result and call contiguous.
Let's say I want to compute the Hessian of a scalar-valued function with respect to some parameters W (e.g the weights and biases of a feed-forward neural network).
If you consider the following code, implementing a two-dimensional linear model trained to minimize a MSE loss:
import numpy as np
import tensorflow as tf
x = tf.placeholder(dtype=tf.float32, shape=[None, 2]) #inputs
t = tf.placeholder(dtype=tf.float32, shape=[None,]) #labels
W = tf.placeholder(np.eye(2), dtype=tf.float32) #weights
preds = tf.matmul(x, W) #linear model
loss = tf.reduce_mean(tf.square(preds-t), axis=0) #mse loss
params = tf.trainable_variables()
hessian = tf.hessians(loss, params)
you'd expect session.run(tf.hessian,feed_dict={}) to return a 2x2 matrix (equal to W). It turns out that because paramsis a 2x2 tensor, the output is rather a tensor with shape [2, 2, 2, 2]. While I can easily reshape the tensor to obtain the matrix I want, it seems that this operation might be extremely cumbersome when paramsbecomes a list of tensors of varying size (i.e when the model is a deep neural network for instance).
It seems that are two ways around this:
Flatten params to be a 1D tensor called flat_params:
flat_params = tf.concat([tf.reshape(p, [-1]) for p in params])
so that tf.hessians(loss, flat_params) naturally returns a 2x2 matrix. However as noted in Why does Tensorflow Reshape tf.reshape() break the flow of gradients? for tf.gradients (but also holds for tf.hessians), tensorflow is not able to see the symbolic link in the graph between paramsand flat_params and tf.hessians(loss, flat_params) will raise an error as the gradients will be seen as None.
In https://afqueiruga.github.io/tensorflow/2017/12/28/hessian-mnist.html, the author of the code goes the other way, and first create the flat parameter and reshapes its parts into self.params. This trick does work and gets you the hessian with its expected shape (2x2 matrix). However, it seems to me that this will be cumbersome to use when you have a complex model, and impossible to apply if you create your model via built-in functions (like tf.layers.dense, ..).
Is there no straight-forward way to get the Hessian matrix (as in the 2x2 matrix in this example) from tf.hessians, when self.params is a list of tensor of arbitrary shapes? If not, how can you automatize the reshaping of the output tensor of tf.hessians?
It turns out (per TensorFlow r1.13) that if len(xs) > 1, then tf.hessians(ys, xs) returns tensors corresponding to only the block diagonal submatrices of the full Hessian matrix. Full story and solutions in this paper https://arxiv.org/pdf/1905.05559, and code at https://github.com/gknilsen/pyhessian
I'm a novice to tensorflow. I was practicing coding with this tutorial code. Most of all the code made sense to me but at some points I got stuck.
import tensorflow as tf
x = tf.placeholder("float", [None, n_steps, n_input])
x = tf.transpose(x, [1, 0, 2])
x = tf.reshape(x, [-1, n_input])
With tf.placholder function I had to specify variable length dimesion with None. But with tf.reshape I had to use -1, not None. In documentation for the two functions, both of the pertaining arguments have the name shape. So I am feeling lost here. Do they really have different meanings? Or is it just a small design mistake of the tensorflow developers?
You can understand it this way: in a placeholder the value "None" indicates: "could be any value". Like in your case: you have a batch size that can be anything.
In a reshape function, -1 indicates "whatever value is remaining to make this shape work". In you case, your x gets the shape (batch*n_steps), as this is the shape x needs to be to fit the same data in a matrix.
Interesting note: you CAN use multiple None values in a placeholder (to indicate any batch size, any width and height of an image)... But you can't use multiple -1 values in a reshape function!