I am following the PyTorch tutorial here.
It says that
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
Out:
tensor([-590.4467, 97.6760, 921.0221])
Could someone explain what data.norm() does here?
When I change .randn to .ones its output is tensor([ 1024., 1024., 1024.]).
It's simply the L2 norm (a.k.a Euclidean norm) of the tensor. Below is a reproducible illustration:
In [15]: x = torch.randn(3, requires_grad=True)
In [16]: y = x * 2
In [17]: y.data
Out[17]: tensor([-1.2510, -0.6302, 1.2898])
In [18]: y.data.norm()
Out[18]: tensor(1.9041)
# computing the norm using elementary operations
In [19]: torch.sqrt(torch.sum(torch.pow(y, 2)))
Out[19]: tensor(1.9041)
Explanation: First, it takes a square of every element in the input tensor x, then it sums them together, and finally it takes a square root of the resulting sum. All in all, these operations compute the so-called L2 or Euclidean norm.
Building on what #kmario23 says, the code multiplies the elements of a vector by 2 until the Euclidean magnitude (distance from origin) / L2 norm of the vector is at least 1000.
With the example of the vector with (1,1,1): it increases to (512, 512, 512), where the l2 norm is about 886. This is less than 1000, so it gets multiplied by 2 again and becomes (1024, 1024, 1024). This has a magnitude greater than 1000, so it stops.
y.data.norm()
is equivalent to
torch.sqrt(torch.sum(torch.pow(y, 2)))
Lets break it step wise to understand the code better.
The below block of code creates a tensor x of shape (1,3)
x = torch.ones(3, requires_grad=True)
print(x)
>>> tensor([1., 1., 1.], requires_grad=True)
The below block of code creates a tensor y by multiplying each element of x by 2
y = x * 2
print(y)
print(y.requires_grad)
>>> tensor([2., 2., 2.], grad_fn=<MulBackward0>)
>>> True
TORCH.data returns a tensor whose requires_grad is set to false
print(y.data)
print('Type of y: ', type(y.data))
print('requires_grad: ', y.data.requires_grad)
>>> tensor([2., 2., 2.])
>>> Type of y: <class 'torch.Tensor'>
>>> requires_grad: False
TORCH.norm() Returns the matrix norm or vector norm of a given tensor. By default it returns a Frobenius norm aka L2-Norm which is calculated using the formula .
In our example since every element in y is 2, y.data.norm() returns 3.4641 since is equal to 3.4641
print(y.data.norm())
>>> tensor(3.4641)
The below loop is run until the norm value is less than 1000
while y.data.norm() < 1000:
print('Norm value: ', y.data.norm(), 'y value: ', y.data )
y = y * 2
>>> Norm value: tensor(6.9282) y value: tensor([4., 4., 4.])
>>> Norm value: tensor(3.4641) y value: tensor([2., 2., 2.])
>>> Norm value: tensor(13.8564) y value: tensor([8., 8., 8.])
>>> Norm value: tensor(27.7128) y value: tensor([16., 16., 16.])
>>> Norm value: tensor(55.4256) y value: tensor([32., 32., 32.])
>>> Norm value: tensor(110.8512) y value: tensor([64., 64., 64.])
>>> Norm value: tensor(221.7025) y value: tensor([128., 128., 128.])
>>> Norm value: tensor(443.4050) y value: tensor([256., 256., 256.])
>>> Norm value: tensor(886.8100) y value: tensor([512., 512., 512.])
>>>
>>> Final y value: tensor([1024., 1024., 1024.], grad_fn=<MulBackward0>)
Related
I want to compute discrete difference of identity matrix.
The code below use numpy and scipy.
import numpy as np
from scipy.sparse import identity
from scipy.sparse import csc_matrix
x = identity(4).toarray()
y = csc_matrix(np.diff(x, n=2))
print(y)
I would like to improve performance or memory usage.
Since identity matrix produce many zeros, it would reduce memory usage to perform calculation in compressed sparse column(csc) format. However, np.diff() does not accept csc format, so converting between csc and normal format using csc_matrix would slow it down a bit.
Normal format
x = identity(4).toarray()
print(x)
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
csc format
x = identity(4)
print(x)
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
(3, 3) 1.0
Thanks
Here is my hacky solution to get the sparse matrix as you want.
L - the length of the original identity matrix,
n - the parameter of np.diff.
In your question they are:
L = 4
n = 2
My code produces the same y as your code, but without the conversions between csc and normal formats.
Your code:
from scipy.sparse import identity, csc_matrix
x = identity(L).toarray()
y = csc_matrix(np.diff(x, n=n))
My code:
from scipy.linalg import pascal
def get_data(n, L):
nums = pascal(n + 1, kind='lower')[-1].astype(float)
minuses_from = n % 2 + 1
nums[minuses_from : : 2] *= -1
return np.tile(nums, L - n)
data = get_data(n, L)
row_ind = (np.arange(n + 1) + np.arange(L - n).reshape(-1, 1)).flatten()
col_ind = np.repeat(np.arange(L - n), n + 1)
y = csc_matrix((data, (row_ind, col_ind)), shape=(L, L - n))
I have noticed that after applying np.diff to the identity matrix n times, the values of the columns are the binomial coefficients with their signs alternating. This is my variable data.
Then I am just constructing the csc_matrix.
Unfortunately, it does not seem that SciPy provides any tools for this kind of sparse matrix manipulation. Regardless, by cleverly manipulating the indices and data of the entries one can emulate np.diff(x,n) in a straightforward fashion.
Given 2D NumPy array (matrix) of dimension MxN, np.diff() multiplies each column (of column index y) with -1 and adds the next column to it (column index y+1). Difference of order k is just the iterative application of k differences of order 1. A difference of order 0 is just the returns the input matrix.
The method below makes use of this, iterateively eliminating duplicate entries by addition through sum_duplicates(), reducing the number of columns by one, and filtering non-valid indices.
def csc_diff(x, n):
'''Emulates np.diff(x,n) for a sparse matrix by iteratively taking difference of order 1'''
assert isinstance(x, csc_matrix) or (isinstance(x, np.ndarray) & len(x.shape) == 2), "Input matrix must be a 2D np.ndarray or csc_matrix."
assert isinstance(n, int) & n >= 0, "Integer n must be larger or equal to 0."
if n >= x.shape[1]:
return csc_matrix(([], ([], [])), shape=(x.shape[0], 0))
if isinstance(x, np.ndarray):
x = csc_matrix(x)
# set-up of data/indices via column-wise difference
if(n > 0):
for k in range(1,n+1):
# extract data/indices of non-zero entries of (current) sparse matrix
M, N = x.shape
idx, idy = x.nonzero()
dat = x.data
# difference: this row (y) * (-1) + next row (y+1)
idx = np.concatenate((idx, idx))
idy = np.concatenate((idy, idy-1))
dat = np.concatenate(((-1)*dat, dat))
# filter valid indices
validInd = (0<=idy) & (idy<N-1)
# x_diff: csc_matrix emulating np.diff(x,1)'s output'
x_diff = csc_matrix((dat[validInd], (idx[validInd], idy[validInd])), shape=(M, N-1))
x_diff.sum_duplicates()
x = x_diff
return x
Moreover, the method outputs an empty csc_matrix of dimension Mx0 when the difference order is larger or equal to the number of columns of the input matrix. This is why the output is identical, see
csc_diff(x, 2).toarray()
> array([[ 1., 0.],
[-2., 1.],
[ 1., -2.],
[ 0., 1.]])
which is identical to
np.diff(x.toarray(), 2)
> array([[ 1., 0.],
[-2., 1.],
[ 1., -2.],
[ 0., 1.]])
This identity holds for other difference orders, too
(csc_diff(x, 0).toarray() == np.diff(x.toarray(), 0)).all()
>True
(csc_diff(x, 3).toarray() == np.diff(x.toarray(), 3)).all()
>True
(csc_diff(x, 13).toarray() == np.diff(x.toarray(), 13)).all()
>True
Q1.
I'm trying to make my custom autograd function with pytorch.
But I had a problem with making analytical back propagation with y = x / sum(x, dim=0)
where size of tensor x is (Height, Width) (x is 2-dimensional).
Here's my code
class MyFunc(torch.autograd.Function):
#staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
input = input / torch.sum(input, dim=0)
return input
#staticmethod
def backward(ctx, grad_output):
input = ctx.saved_tensors[0]
H, W = input.size()
sum = torch.sum(input, dim=0)
grad_input = grad_output * (1/sum - input*1/sum**2)
return grad_input
I used (torch.autograd import) gradcheck to compare Jacobian matrix,
from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.randn(3,3,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
and the result was
Please someone help me to get correct back propagation result
Thanks!
Q2.
Thanks for answers!
Because of your help, I could implement back propagation in case of (H,W) tensor.
However, while I implemented back propagation in case of (N,H,W) tensor, I got a problem.
I think the problem would be initializing new tensor.
Here's my new code
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyFunc(torch.autograd.Function):
#staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
N = input.size(0)
for n in range(N):
input[n] /= torch.sum(input[n], dim=0)
return input
#staticmethod
def backward(ctx, grad_output):
input = ctx.saved_tensors[0]
N, H, W = input.size()
I = torch.eye(H).unsqueeze(-1)
sum = input.sum(1)
grad_input = torch.zeros((N,H,W), dtype = torch.double, requires_grad=True)
for n in range(N):
grad_input[n] = ((sum[n] * I - input[n]) * grad_output[n] / sum[n]**2).sum(1)
return grad_input
Gradcheck code is
from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.rand(2,2,2,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
print(test)
and result is
enter image description here
I don't know why the error occurs...
Your help will be very helpful for me to implement my own convolutional network.
Thanks! Have a nice day.
Let's look an example with a single column, for instance: [[x1], [x2], [x3]].
Let sum be x1 + x2 + x3, then normalizing x will give y = [[y1], [y2], [y3]] = [[x1/sum], [x2/sum], [x3/sum]]. You're looking for dL/dx1, dL/x2, and dL/x3 - we'll just write them as: dx1, dx2, and dx3. Same for all dL/dyi.
So dx1 is equal to dL/dy1*dy1/dx1 + dL/dy2*dy2/dx1 + dL/dy3*dy3/dx1. That's because x1 contributes to all ouput element on the corresponding column: y1, y2, and y3.
We have:
dy1/dx1 = d(x1/sum)/dx1 = (sum - x1)/sum²
dy2/dx1 = d(x2/sum)/dx1 = -x2/sum²
similarly, dy3/dx1 = d(x3/sum)/dx1 = -x3/sum²
Therefore dx1 = (sum - x1)/sum²*dy1 - x2/sum²*dy2 - x3/sum²*dy3. Same for dx2 and dx3. As a result, the Jacobian is [dxi]_i = (sum - xi)/sum² and [dxi]_j = -xj/sum² (for all j different to i).
In your implementation, you seem to be missing all non-diagonal components.
Keeping the same one-column example, with x1=2, x2=3, and x3=5:
>>> x = torch.tensor([[2.], [3.], [5.]])
>>> sum = input.sum(0)
tensor([10])
The Jacobian will be:
>>> J = (sum*torch.eye(input.size(0)) - input)/sum**2
tensor([[ 0.0800, -0.0200, -0.0200],
[-0.0300, 0.0700, -0.0300],
[-0.0500, -0.0500, 0.0500]])
For an implementation with multiple columns, it's a bit trickier, more specifically for the shape of the diagonal matrix. It's easier to keep the column axis last so we don't have to bother with broadcastings:
>>> x = torch.tensor([[2., 1], [3., 3], [5., 5]])
>>> sum = x.sum(0)
tensor([10., 9.])
>>> diag = sum*torch.eye(3).unsqueeze(-1).repeat(1, 1, len(sum))
tensor([[[10., 9.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[10., 9.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[10., 9.]]])
Above diag has a shape of (3, 3, 2) where the two columns are on the last axis. Notice how we didn't need to broadcast sum.
What I wouldn't have done is: torch.eye(3).unsqueeze(0).repeat(len(sum), 1, 1). Since with this kind of shape - (2, 3, 3) - you will have to use sum[:, None, None], and will need further broadcasting down the road...
The Jacobian is simply:
>>> J = (diag - x)/sum**2
tensor([[[ 0.0800, 0.0988],
[-0.0300, -0.0370],
[-0.0500, -0.0617]],
[[-0.0200, -0.0123],
[ 0.0700, 0.0741],
[-0.0500, -0.0617]],
[[-0.0200, -0.0123],
[-0.0300, -0.0370],
[ 0.0500, 0.0494]]])
You can check the results by backpropagating through the operation using an arbitrary dy vector (not with torch.ones though, you'll get 0s because of J!). After backpropagating, x.grad should equal to torch.einsum('abc,bc->ac', J, dy).
Your Jacobian is not accurate: It is a 4d tensor, you only computed a 2D slice of it.
You neglected the second row of the Jacobian:
Answer for Q2.
I implemented back propagation myself for many batch case.
I used unsqueeze function and it worked.
size of input : (N,H,W) (N is batch size)
forward:
out = input / torch.sum(input, dim=1).unsqueeze(1)
backward:
diag = torch.eye(input.size(1), dtype=torch.double, requires_grad=True).unsqueeze(-1)
sum = input.sum(1)
grad_input = ((sum.unsqueeze(1).unsqueeze(1) * diag - input.unsqueeze(1)) * grad_out.unsqueeze(1) / (sum**2).unsqueeze(1).unsqueeze(1)).sum(2)
I have the following code:
import torch
d = 2
n = 50
X = torch.randn(n,d)
z = torch.tensor([[-1.0], [2.0]])
y = X # z
X.size()
z.size()
y.size()
The output is:
torch.Size([50, 2])
torch.Size([2, 1])
torch.Size([50, 1])
My question is, why after broadcasting, the size of the result y is [50,1] rather than [50,2], I think it should be [50,2], am I correct?
The # is not broadcasting but multiplication.
In python 3.5, the # operator was introduced for matrix
multiplication, following PEP465. This is implemented e.g. in numpy
as the matmul operator.
So the size of y is fine.
multiply a matrix of size [50,2] with a vector of size [2,1] will output a vector of size [50,1].
An example showing it more clearly is:
import torch
xx = torch.ones(3, 2)
zz = torch.tensor([[-1.0], [2.0]])
yy = xx # zz
print(xx)
print(zz)
print(yy)
# tensor([[1., 1.],
# [1., 1.],
# [1., 1.]])
# tensor([[-1.],
# [ 2.]])
# tensor([[1.],
# [1.],
# [1.]])
As you can see the third output is indeed just the multiplication of the 2 tensors.
If you wish to do broadcasting I recommend that you will refer to https://medium.com/ai%C2%B3-theory-practice-business/understanding-broadcasting-in-pytorch-ca9e9533f05f
I want to vectorize a function f(a, b) so that, when I enter a and b as two vectors, the tensor of combinations is returned. Here is an illustrative example:
import numpy as np
def tester(a, b):
mysumm = 0.
for ii in range(a):
for jj in range(b):
mysumm += a * b
return mysumm
tester = np.vectorize(tester)
x, y = [2, 4], [3, 5, 8]
print(tester(x, 3)) # [ 36. 144.]
print(tester(x, 5)) # [100. 400.]
print(tester(x, 8)) # [ 256. 1024.]
print(tester(2, y)) # [ 36. 100. 256.]
print(tester(4, y)) # [ 144. 400. 1024.]
print(tester(x, y)) # ValueError: operands could not be broadcast together with shapes (2,) (3,)
I expected the tester(x, y) call to return a 2x3 matrix, something like [[ 36. 100. 256.], [ 144. 400. 1024.]] and I was surprised that this isn't the default behavior.
How can I make the vecotirzed function return the tensor of possible combinations of the input vectors?
You can chain with np.ix_:
>>> import functools
>>>
>>> def tensorize(f):
... fv = np.vectorize(f)
... #functools.wraps(f)
... def ft(*args):
... return fv(*np.ix_(*map(np.ravel, args)))
... return ft
...
>>> tester = tensorize(tester)
>>> tester(np.arange(3), np.arange(2))
array([[0., 0.],
[0., 1.],
[0., 4.]])
I have a N*N matrix:
N=3
x = scipy.sparse.lil_matrix( (N,N) )
for _ in xrange(N):
x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100)
Assume the matrix looks as below:
X Y Z
X 0 [2,3] [1,4]
Y [2,3] 0 0
Z [1,4] 0 0
How do I add the N+1 vertex without disturbing the existing values?
X Y Z A
X 0 [2,3] [1,4] 0
Y [2,3] 0 0 0
Z [1,4] 0 0 [1]
Would the entire matrix need to be re-constructed?
When I try vstack to add a new row,I get an error:
>>> import scipy.sparse as sp
>>> c=sp.coo_matrix(x)
>>> c.todense()
matrix([[ 1., 3., 5.],
[ 2., 6., 4.],
[ 8., 2., 10.]])
>>> sp.vstack([c,sp.coo_matrix(1,3)])
Traceback (most recent call last):
File "<pyshell#41>", line 1, in <module>
sp.vstack([c,sp.coo_matrix(1,3)])
File "c:\working\QZPkgs\eggs\scipy-0.10.1-py2.6-win32.egg\scipy\sparse\construct.py", line 293, in vstack
return bmat([ [b] for b in blocks ], format=format, dtype=dtype)
File "c:\working\QZPkgs\eggs\scipy-0.10.1-py2.6-win32.egg\scipy\sparse\construct.py", line 355, in bmat
raise ValueError('blocks[:,%d] has incompatible column dimensions' % j)
ValueError: blocks[:,0] has incompatible column dimensions
There are a number of ways to do this, depending on what you are expecting the matrix to look like after you add to it. If you want to add a row to the matrix, use sparse.vstack:
from scipy import sparse
from numpy import random
N=3
x = sparse.lil_matrix( (N,N) )
for _ in xrange(N):
x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100)
x = sparse.vstack([x, sparse.lil_matrix((1,3))])
If you wanted to a degree of freedom to a linear system, so that the resulting matrix would be square, use sparse.bmat instead of sparse.vstack:
x = sparse.bmat([ [x, None], [None,sparse.lil_matrix((1,1))] ])
In the first example, x is expanded from 3x3 to 4x3, while in the second example x is expanded to 4x4.
Looks like you are not assigning the output of the todense().
Try:
c_dense = c.todense()
sp.vstack([c_dense,sp.coo_matrix(1,3)])