Can't understand about pytorch tensor broadcast - python

I have the following code:
import torch
d = 2
n = 50
X = torch.randn(n,d)
z = torch.tensor([[-1.0], [2.0]])
y = X # z
X.size()
z.size()
y.size()
The output is:
torch.Size([50, 2])
torch.Size([2, 1])
torch.Size([50, 1])
My question is, why after broadcasting, the size of the result y is [50,1] rather than [50,2], I think it should be [50,2], am I correct?

The # is not broadcasting but multiplication.
In python 3.5, the # operator was introduced for matrix
multiplication, following PEP465. This is implemented e.g. in numpy
as the matmul operator.
So the size of y is fine.
multiply a matrix of size [50,2] with a vector of size [2,1] will output a vector of size [50,1].
An example showing it more clearly is:
import torch
xx = torch.ones(3, 2)
zz = torch.tensor([[-1.0], [2.0]])
yy = xx # zz
print(xx)
print(zz)
print(yy)
# tensor([[1., 1.],
# [1., 1.],
# [1., 1.]])
# tensor([[-1.],
# [ 2.]])
# tensor([[1.],
# [1.],
# [1.]])
As you can see the third output is indeed just the multiplication of the 2 tensors.
If you wish to do broadcasting I recommend that you will refer to https://medium.com/ai%C2%B3-theory-practice-business/understanding-broadcasting-in-pytorch-ca9e9533f05f

Related

Modifying sparce matrix using fancy indexing

I am trying to use fancy indexing to modifying a large sparce matrix. Suppose you have the following code:
import numpy as np
import scipy.sparse as sp
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = sp.lil_matrix(a)
c = sp.lil_matrix((3,4))
c[[1,2], 0] = b[[1,2], 0]
However, this code gives the following error:
ValueError: shape mismatch in assignment
I don't understand why this doesn't work. Both matrices have the same shape and this usually works if both matrices are numpy arrays. I would appreciate any help.
Yeah this is a bug with the sparse __setitem__. I've run into it before (but I just worked around it). Now I actually looked into it; first, you can fix this pretty easily:
import numpy as np
import scipy.sparse as sp
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = sp.lil_matrix(a)
c = sp.lil_matrix((3,4))
c[[1,2], 0] = b[[1,2], 0]
This raises the ValueError you saw. This doesn't and works as expected:
c[[1,2], 0] = b[[1,2], [0]]
>>> c.A
array([[0., 0., 0., 0.],
[5., 0., 0., 0.],
[9., 0., 0., 0.]])
Lets just walk through the offending __setitem__ (I'm going to omit a lot of code that doesn't get called):
row, col = self._validate_indices(key)
This is fine - row = [1, 2] and col = 0
col = np.atleast_1d(col)
i, j = _broadcast_arrays(row, col)
So far so good - i = [1, 2] and j = [0, 0]
if i.ndim == 1:
# Inner indexing, so treat them like row vectors.
i = i[None]
j = j[None]
broadcast_row = x.shape[0] == 1 and i.shape[0] != 1
broadcast_col = x.shape[1] == 1 and i.shape[1] != 1
Here's our problem - i and j both got turned into row vectors with shape (1, 2). x here is what you're trying to assign (b[[1,2], 0]), which is of shape (2, 1); the next step raises a ValueError cause x and the indices don't align.
>>> c[[1,2], 0] = b[[1,2], 0].A
ValueError: cannot reshape array of size 4 into shape (2,)
Here's the same problem but __setitem__ broadcasts x into a (2,2) array, which then fails again because it's larger than the array you're assigning it to.
The workaround (b[[1,2], [0]]) has a shape of (1, 2) which is not correct, but that error ends up cancelling out the error in indexing c.
I'm not sure exactly what the logic is behind this indexing code so I'm not sure how to fix this without introducing other subtle bugs.

Back-Propagation of y = x / sum(x, dim=0) where size of tensor x is (H,W)

Q1.
I'm trying to make my custom autograd function with pytorch.
But I had a problem with making analytical back propagation with y = x / sum(x, dim=0)
where size of tensor x is (Height, Width) (x is 2-dimensional).
Here's my code
class MyFunc(torch.autograd.Function):
#staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
input = input / torch.sum(input, dim=0)
return input
#staticmethod
def backward(ctx, grad_output):
input = ctx.saved_tensors[0]
H, W = input.size()
sum = torch.sum(input, dim=0)
grad_input = grad_output * (1/sum - input*1/sum**2)
return grad_input
I used (torch.autograd import) gradcheck to compare Jacobian matrix,
from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.randn(3,3,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
and the result was
Please someone help me to get correct back propagation result
Thanks!
Q2.
Thanks for answers!
Because of your help, I could implement back propagation in case of (H,W) tensor.
However, while I implemented back propagation in case of (N,H,W) tensor, I got a problem.
I think the problem would be initializing new tensor.
Here's my new code
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyFunc(torch.autograd.Function):
#staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
N = input.size(0)
for n in range(N):
input[n] /= torch.sum(input[n], dim=0)
return input
#staticmethod
def backward(ctx, grad_output):
input = ctx.saved_tensors[0]
N, H, W = input.size()
I = torch.eye(H).unsqueeze(-1)
sum = input.sum(1)
grad_input = torch.zeros((N,H,W), dtype = torch.double, requires_grad=True)
for n in range(N):
grad_input[n] = ((sum[n] * I - input[n]) * grad_output[n] / sum[n]**2).sum(1)
return grad_input
Gradcheck code is
from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.rand(2,2,2,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
print(test)
and result is
enter image description here
I don't know why the error occurs...
Your help will be very helpful for me to implement my own convolutional network.
Thanks! Have a nice day.
Let's look an example with a single column, for instance: [[x1], [x2], [x3]].
Let sum be x1 + x2 + x3, then normalizing x will give y = [[y1], [y2], [y3]] = [[x1/sum], [x2/sum], [x3/sum]]. You're looking for dL/dx1, dL/x2, and dL/x3 - we'll just write them as: dx1, dx2, and dx3. Same for all dL/dyi.
So dx1 is equal to dL/dy1*dy1/dx1 + dL/dy2*dy2/dx1 + dL/dy3*dy3/dx1. That's because x1 contributes to all ouput element on the corresponding column: y1, y2, and y3.
We have:
dy1/dx1 = d(x1/sum)/dx1 = (sum - x1)/sum²
dy2/dx1 = d(x2/sum)/dx1 = -x2/sum²
similarly, dy3/dx1 = d(x3/sum)/dx1 = -x3/sum²
Therefore dx1 = (sum - x1)/sum²*dy1 - x2/sum²*dy2 - x3/sum²*dy3. Same for dx2 and dx3. As a result, the Jacobian is [dxi]_i = (sum - xi)/sum² and [dxi]_j = -xj/sum² (for all j different to i).
In your implementation, you seem to be missing all non-diagonal components.
Keeping the same one-column example, with x1=2, x2=3, and x3=5:
>>> x = torch.tensor([[2.], [3.], [5.]])
>>> sum = input.sum(0)
tensor([10])
The Jacobian will be:
>>> J = (sum*torch.eye(input.size(0)) - input)/sum**2
tensor([[ 0.0800, -0.0200, -0.0200],
[-0.0300, 0.0700, -0.0300],
[-0.0500, -0.0500, 0.0500]])
For an implementation with multiple columns, it's a bit trickier, more specifically for the shape of the diagonal matrix. It's easier to keep the column axis last so we don't have to bother with broadcastings:
>>> x = torch.tensor([[2., 1], [3., 3], [5., 5]])
>>> sum = x.sum(0)
tensor([10., 9.])
>>> diag = sum*torch.eye(3).unsqueeze(-1).repeat(1, 1, len(sum))
tensor([[[10., 9.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[10., 9.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[10., 9.]]])
Above diag has a shape of (3, 3, 2) where the two columns are on the last axis. Notice how we didn't need to broadcast sum.
What I wouldn't have done is: torch.eye(3).unsqueeze(0).repeat(len(sum), 1, 1). Since with this kind of shape - (2, 3, 3) - you will have to use sum[:, None, None], and will need further broadcasting down the road...
The Jacobian is simply:
>>> J = (diag - x)/sum**2
tensor([[[ 0.0800, 0.0988],
[-0.0300, -0.0370],
[-0.0500, -0.0617]],
[[-0.0200, -0.0123],
[ 0.0700, 0.0741],
[-0.0500, -0.0617]],
[[-0.0200, -0.0123],
[-0.0300, -0.0370],
[ 0.0500, 0.0494]]])
You can check the results by backpropagating through the operation using an arbitrary dy vector (not with torch.ones though, you'll get 0s because of J!). After backpropagating, x.grad should equal to torch.einsum('abc,bc->ac', J, dy).
Your Jacobian is not accurate: It is a 4d tensor, you only computed a 2D slice of it.
You neglected the second row of the Jacobian:
Answer for Q2.
I implemented back propagation myself for many batch case.
I used unsqueeze function and it worked.
size of input : (N,H,W) (N is batch size)
forward:
out = input / torch.sum(input, dim=1).unsqueeze(1)
backward:
diag = torch.eye(input.size(1), dtype=torch.double, requires_grad=True).unsqueeze(-1)
sum = input.sum(1)
grad_input = ((sum.unsqueeze(1).unsqueeze(1) * diag - input.unsqueeze(1)) * grad_out.unsqueeze(1) / (sum**2).unsqueeze(1).unsqueeze(1)).sum(2)

Add 2 column vector to ndarray [duplicate]

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.
This code works, and illustrates my goals:
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
permutation = numpy.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
For example:
>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
[1, 1],
[3, 3]]), array([2, 1, 3]))
However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.
Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.
One other thought I had was this:
def shuffle_in_unison_scary(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.
Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b):
assert len(a) == len(b)
p = numpy.random.permutation(len(a))
return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html
Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays a and b look like this:
a = numpy.array([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 6., 7., 8.],
[ 9., 10., 11.]],
[[ 12., 13., 14.],
[ 15., 16., 17.]]])
b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original a and b:
a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).
In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.
This solution could be adapted to the case that a and b have different dtypes.
Very simple solution:
randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way
James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)
from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]
Shuffle any number of arrays together, in-place, using only NumPy.
import numpy as np
def shuffle_arrays(arrays, set_seed=-1):
"""Shuffles arrays in-place, in the same order, along axis=0
Parameters:
-----------
arrays : List of NumPy arrays.
set_seed : Seed value if int >= 0, else seed is random.
"""
assert all(len(arr) == len(arrays[0]) for arr in arrays)
seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c])
A few things to note:
The assert ensures that all input arrays have the same length along
their first dimension.
Arrays shuffled in-place by their first dimension - nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.
you can make an array like:
s = np.arange(0, len(a), 1)
then shuffle it:
np.random.shuffle(s)
now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.
x_data = x_data[s]
x_label = x_label[s]
There is a well-known function that can handle this:
from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)
Just setting test_size to 0 will avoid splitting and give you shuffled data.
Though it is usually used to split train and test data, it does shuffle them too.
From documentation
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data into a
single call for splitting (and optionally subsampling) data in a
oneliner.
This seems like a very simple solution:
import numpy as np
def shuffle_in_unison(a,b):
assert len(a)==len(b)
c = np.arange(len(a))
np.random.shuffle(c)
return a[c],b[c]
a = np.asarray([[1, 1], [2, 2], [3, 3]])
b = np.asarray([11, 22, 33])
shuffle_in_unison(a,b)
Out[94]:
(array([[3, 3],
[2, 2],
[1, 1]]),
array([33, 22, 11]))
One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.
# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
np.random.seed(seed)
np.random.shuffle(a)
np.random.seed(seed)
np.random.shuffle(b)
That's it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.
EDIT, don't use np.random.seed() use np.random.RandomState instead
def shuffle(a, b, seed):
rand_state = np.random.RandomState(seed)
rand_state.shuffle(a)
rand_state.seed(seed)
rand_state.shuffle(b)
When calling it just pass in any seed to feed the random state:
a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)
Output:
>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]
Edit: Fixed code to re-seed the random state
Say we have two arrays: a and b.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])
We can first obtain row indices by permutating first dimension
indices = np.random.permutation(a.shape[0])
[1 2 0]
Then use advanced indexing.
Here we are using the same indices to shuffle both arrays in unison.
a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]
This is equivalent to
np.take(a, indices, axis=0)
[[4 5 6]
[7 8 9]
[1 2 3]]
np.take(b, indices, axis=0)
[[6 6 6]
[4 2 0]
[9 1 1]]
If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array
for old_index in len(a):
new_index = numpy.random.randint(old_index+1)
a[old_index], a[new_index] = a[new_index], a[old_index]
b[old_index], b[new_index] = b[new_index], b[old_index]
This implements the Knuth-Fisher-Yates shuffle algorithm.
Shortest and easiest way in my opinion, use seed:
random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)
most solutions above work, however if you have column vectors you have to transpose them first. here is an example
def shuffle(self) -> None:
"""
Shuffles X and Y
"""
x = self.X.T
y = self.Y.T
p = np.random.permutation(len(x))
self.X = x[p].T
self.Y = y[p].T
With an example, this is what I'm doing:
combo = []
for i in range(60000):
combo.append((images[i], labels[i]))
shuffle(combo)
im = []
lab = []
for c in combo:
im.append(c[0])
lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)
I extended python's random.shuffle() to take a second arg:
def shuffle_together(x, y):
assert len(x) == len(y)
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
y[i], y[j] = y[j], y[i]
That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.
Just use numpy...
First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.
import numpy as np
def shuffle_2d(a, b):
rows= a.shape[0]
if b.shape != (rows,1):
b = b.reshape((rows,1))
S = np.hstack((b,a))
np.random.shuffle(S)
b, a = S[:,0], S[:,1:]
return a,b
features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

Scipy optimize behaves differently with 1-d Matrix vs Vector input st. 1-d Matrix solution is wrong

I have experienced that feeding scipy.optimize a 1-d matrix (of shape (N,1)) gives different (wrong) results vs. giving it the same data in the form of vectors (vectors are w and y in the MVE below
import numpy as np
from scipy.optimize import minimize
X = np.array([[ 1.13042959, 0.45915372, 0.8007231 , -1.15704469, 0.42920652],
[ 0.14131009, 0.9257914 , 0.72182141, 0.86906652, -0.32328187],
[-1.40969139, 1.32624329, 0.49157981, 0.2632826 , 1.29010016],
[-0.87733399, -1.55999729, -0.73784827, 0.15161383, 0.11189782],
[-0.94649544, 0.10406324, 0.65316464, -1.37014083, -0.28934968]])
wtrue = np.array([3.14,2.78,-1,0, 1.6180])
y = X.dot(wtrue)
def cost_function(w, X, y):
return np.mean(np.abs(y - X.dot(w)))
# %%
w0 = np.zeros(5)
output = minimize(cost_function, w0, args=(X, y), options={'disp':False, 'maxiter':128})
print('Vector Case:\n', output.x, '\n', output.fun)
# Reshaping w0 and y to (N,1) will 'break things'
w0 = np.zeros(5).reshape(-1,1)
y = y.reshape(-1,1) #This is the problem, only commenting this out will make below work
output = minimize(cost_function, w0, args=(X, y), options={'disp':False, 'maxiter':128})
print('1-d Matrix Case:\n', output.x, '\n', output.fun)
Gives
Vector Case:
[3.13999999e+00 2.77999996e+00 -9.99999940e-01 1.79002338e-08,1.61800001e+00]
1.7211226932545288e-08 // TRUE almost 0
1-d Matrix Case:
[-0.35218177 -0.50008129 0.34958755 -0.42210756 0.79680766]
3.3810648518841924 // WRONG nowhere close to true solution
Does anyone know why the solution using the 1-d matrix inputs come out 'wrong'?
I suspect that this is b/c somewhere along the way .minimize turns the parameter vector into an actual vector and then I know that (2,) + (2,1) gives a (2,2) matrix rather than a (2,) or a (2,1). This still strikes me as 'weird' and I would like to know if I'm missing some bigger point here.
In [300]: y
Out[300]: array([ 4.7197293 , 1.7725223 , 0.85632763, -6.17272225, -3.8040323 ])
In [301]: w0
Out[301]: array([0., 0., 0., 0., 0.])
In [302]: cost_function(w0,X,y)
Out[302]: 3.465066756332
Initially changing the shape of y doesn't change the cost:
In [306]: cost_function(w0,X,y.reshape(-1,1))
Out[306]: 3.4650667563320003
Now get a solution:
In [308]: output = optimize.minimize(cost_function, w0, args=(X, y), options={'disp':False
...: , 'maxiter':128})
In [310]: output.x
Out[310]:
array([ 3.14000001e+00, 2.77999999e+00, -9.99999962e-01, -5.58139763e-08,
1.61799993e+00])
Evaluate the cost as the optimal x
In [311]: cost_function(output.x,X,y)
Out[311]: 7.068144833866085e-08 # = output.fun
But with the reshaped y, the cost is different:
In [312]: cost_function(output.x,X,y.reshape(-1,1))
Out[312]: 4.377833258899681
The initial value x0 is flattened by the code (look at optimize.optimize._minimize_bfgs), so changing the shape of w0 doesn't matter. But the args arrays are passed to the cost function without changed. So if changing the shape of y changes the cost calculation, it will change the optimization.

What does data.norm() < 1000 do in PyTorch?

I am following the PyTorch tutorial here.
It says that
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
Out:
tensor([-590.4467, 97.6760, 921.0221])
Could someone explain what data.norm() does here?
When I change .randn to .ones its output is tensor([ 1024., 1024., 1024.]).
It's simply the L2 norm (a.k.a Euclidean norm) of the tensor. Below is a reproducible illustration:
In [15]: x = torch.randn(3, requires_grad=True)
In [16]: y = x * 2
In [17]: y.data
Out[17]: tensor([-1.2510, -0.6302, 1.2898])
In [18]: y.data.norm()
Out[18]: tensor(1.9041)
# computing the norm using elementary operations
In [19]: torch.sqrt(torch.sum(torch.pow(y, 2)))
Out[19]: tensor(1.9041)
Explanation: First, it takes a square of every element in the input tensor x, then it sums them together, and finally it takes a square root of the resulting sum. All in all, these operations compute the so-called L2 or Euclidean norm.
Building on what #kmario23 says, the code multiplies the elements of a vector by 2 until the Euclidean magnitude (distance from origin) / L2 norm of the vector is at least 1000.
With the example of the vector with (1,1,1): it increases to (512, 512, 512), where the l2 norm is about 886. This is less than 1000, so it gets multiplied by 2 again and becomes (1024, 1024, 1024). This has a magnitude greater than 1000, so it stops.
y.data.norm()
is equivalent to
torch.sqrt(torch.sum(torch.pow(y, 2)))
Lets break it step wise to understand the code better.
The below block of code creates a tensor x of shape (1,3)
x = torch.ones(3, requires_grad=True)
print(x)
>>> tensor([1., 1., 1.], requires_grad=True)
The below block of code creates a tensor y by multiplying each element of x by 2
y = x * 2
print(y)
print(y.requires_grad)
>>> tensor([2., 2., 2.], grad_fn=<MulBackward0>)
>>> True
TORCH.data returns a tensor whose requires_grad is set to false
print(y.data)
print('Type of y: ', type(y.data))
print('requires_grad: ', y.data.requires_grad)
>>> tensor([2., 2., 2.])
>>> Type of y: <class 'torch.Tensor'>
>>> requires_grad: False
TORCH.norm() Returns the matrix norm or vector norm of a given tensor. By default it returns a Frobenius norm aka L2-Norm which is calculated using the formula .
In our example since every element in y is 2, y.data.norm() returns 3.4641 since is equal to 3.4641
print(y.data.norm())
>>> tensor(3.4641)
The below loop is run until the norm value is less than 1000
while y.data.norm() < 1000:
print('Norm value: ', y.data.norm(), 'y value: ', y.data )
y = y * 2
>>> Norm value: tensor(6.9282) y value: tensor([4., 4., 4.])
>>> Norm value: tensor(3.4641) y value: tensor([2., 2., 2.])
>>> Norm value: tensor(13.8564) y value: tensor([8., 8., 8.])
>>> Norm value: tensor(27.7128) y value: tensor([16., 16., 16.])
>>> Norm value: tensor(55.4256) y value: tensor([32., 32., 32.])
>>> Norm value: tensor(110.8512) y value: tensor([64., 64., 64.])
>>> Norm value: tensor(221.7025) y value: tensor([128., 128., 128.])
>>> Norm value: tensor(443.4050) y value: tensor([256., 256., 256.])
>>> Norm value: tensor(886.8100) y value: tensor([512., 512., 512.])
>>>
>>> Final y value: tensor([1024., 1024., 1024.], grad_fn=<MulBackward0>)

Categories

Resources