dot product and diagonal and multidimensional matrices - python

I just want to get the dot product of some sets of multidimensional data.
For simplicity, I am posting the pieces small, and demonstrating my efforts
thus far.
To just get 'a' dot 'q', and the 4 numbers that I want is easy enough.
import numpy as np
a = np.arange(1,4) # shape = (3,)
q = np.array([[x, x, x] for x in range(4)])+1 # shape = (4, 3)
c = np.dot(a, q.T) # array([ 6, 12, 18, 24]) shape = (4,)
If I want to add another set to 'a', I can expand the dimensions. Again, pretty easy. The dot product simply reflects the additional dimension.
a = np.arange(1,4).reshape(1,3) # shape = (1,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24]]) shape = (1,4)
and the other set...
a = np.vstack((a,a+1)) # shape = (2,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24], [ 9, 18, 27, 36]]) shape = (2,4)
To add another dimension to q, the transpose needs to be a little more complicated.
q = np.expand_dims(q, axis=0) # shape = (1, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 1, 4)
now stack 'q' matrix
q = np.vstack((q, q+1)) # shape = (2, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 2, 4)
Though, what I am going for is the diagonal of c. While I have not tried it yet, I am imagining that when 'a' and 'q' start to reach >(2000, 3) and >(2000, 4, 3) c will be (2000, 2000, 4) and I only need 1/2000th of that. Does anyone know how to make this more efficient than doing the calculation and then taking the diagonal?
Again, what I want is...
c = np.dot(a, np.transpose(q, (0, 2, 1)))
c = c[np.arange(2), np.arange(2)]
or
c[0] = np.dot(a[0:1], np.transpose(q[0:1], (0, 2, 1)))
c[1] = np.dot(a[1:2], np.transpose(q[1:2], (0, 2, 1)))
but without having to make the enormous matrix first and then trim it later.
I have read a couple other, kinda, similar questions. Though, I hope that this question is perceived to be more complicated than a dot product of the same vector and its diagonal, Also, if the answer is np.einsum(), could you explain the process a more than the numpy docs?

I reposted the question, with the einsum() entries at each c. In fact, Alexander Korovin linked to an excellent einsum summary.
I just want to get the dot product of some sets of multidimensional data.
For simplicity, I am posting the pieces small, and demonstrating my efforts
thus far.
To just get 'a' dot 'q', and the 4 numbers that I want is easy enough.
import numpy as np
a = np.arange(1,4) # shape = (3,)
q = np.array([[x, x, x] for x in range(4)])+1 # shape = (4, 3)
c = np.dot(a, q.T) # array([ 6, 12, 18, 24]) shape = (4,)
c = np.einsum('i,ji->j', a, q)
If I want to add another set to 'a', I can expand the dimensions. Again, pretty easy. The dot product simply reflects the additional dimension.
a = np.arange(1,4).reshape(1,3) # shape = (1,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24]]) shape = (1,4)
c = np.einsum('ij,ij->i', a, q)
and the other set...
a = np.vstack((a,a+1)) # shape = (2,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24], [ 9, 18, 27, 36]]) shape = (2,4)
c = np.einsum('ij,gj->ig', a, q)
To add another dimension to q, the transpose needs to be a little more complicated.
q = np.expand_dims(q, axis=0) # shape = (1, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 1, 4)
c = np.einsum('ij,fgj->fig', a, q)
now stack 'q' matrix
q = np.vstack((q, q+1)) # shape = (2, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 2, 4)
c = np.einsum('ij,fgj->fig', a, q)
Though, what I am going for is the diagonal of c. While I have not tried it yet, I am imagining that when 'a' and 'q' start to reach >(2000, 3) and >(2000, 4, 3) c will be (2000, 2000, 4) and I only need 1/2000th of that. Does anyone know how to make this more efficient than doing the calculation and then taking the diagonal?
Again, what I want is...
c = np.dot(a, np.transpose(q, (0, 2, 1)))
c = c[np.arange(2), np.arange(2)]
or
c[0] = np.dot(a[0:1], np.transpose(q[0:1], (0, 2, 1)))
c[1] = np.dot(a[1:2], np.transpose(q[1:2], (0, 2, 1)))
but without having to make the enormous matrix first and then trim it later.
So do this...
c = np.einsum('ik,ijk->ij', a, q)

Related

NumPy: Concatenating 1D array to 3D array

Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]

np.dot product between two 3D matrices along specified axis

I have two 3D matrices:
a = np.random.normal(size=[3,2,5])
b = np.random.normal(size=[5,2,3])
I want the dot product of each slice along 2 and 0 axes respectively:
c = np.zeros([3,3,5]) # c.size is 45
c[:,:,0] = a[:,:,0].dot(b[0,:,:])
c[:,:,1] = a[:,:,1].dot(b[1,:,:])
...
I would like to do that using np.tensordot (for efficiency and speed)
I have tried:
c = np.tensordot(a, b, axes=[2,0])
but I get a 4D array with 36 elements (instead of 45). c.shape, c.size = ((3L, 2L, 2L, 3L), 36). I have found a similar question here (Numpy tensor: Tensordot over frontal slices of tensor) but it's not exactly what I want, and I was unable to extrapolate that solution to my problem.
To summarise, can I use np.tensordot to compute c array show above?
Update #1
The answer by #hpaulj is what I wanted, however in my system (python 2.7 and np 1.13.3) those aproaches are pretty slow:
n = 3000
a = np.random.normal(size=[n, 20, 5])
b = np.random.normal(size=[5, 20, n])
t = time.clock()
c_slice = a[:,:,0].dot(b[0,:,:])
print('one slice_x_5: {:.3f} seconds'.format( (time.clock()-t)*5 ))
t = time.clock()
c = np.zeros([n, n, 5])
for i in range(5):
c[:,:,i] = a[:,:,i].dot(b[i,:,:])
print('for loop: {:.3f} seconds'.format(time.clock()-t))
t = time.clock()
d = np.einsum('abi,ibd->adi', a, b)
print('einsum: {:.3f} seconds'.format(time.clock()-t))
t = time.clock()
e = np.tensordot(a,b,[1,1])
e1 = e.transpose(0,3,1,2)[:,:,np.arange(5),np.arange(5)]
print('tensordot: {:.3f} seconds'.format(time.clock()-t))
a = a.transpose(2,0,1)
t = time.clock()
f = np.matmul(a,b)
print('matmul: {:.3f} seconds'.format(time.clock()-t))
It's easier to work with einsum than tensordot. So let's start there:
In [469]: a = np.random.normal(size=[3,2,5])
...: b = np.random.normal(size=[5,2,3])
...:
In [470]: c = np.zeros([3,3,5]) # c.size is 45
In [471]: for i in range(5):
...: c[:,:,i] = a[:,:,i].dot(b[i,:,:])
...:
In [472]: d = np.einsum('abi,ibd->iad', a, b)
In [473]: d.shape
Out[473]: (5, 3, 3)
In [474]: d = np.einsum('abi,ibd->adi', a, b)
In [475]: d.shape
Out[475]: (3, 3, 5)
In [476]: np.allclose(c,d)
Out[476]: True
I had to think a bit about to match up the dimensions. It helped to focus on a[:,:,i] as 2d, and similarly for b[i,:,:]. So the dot sum is over the middle dimension of both arrays (size 2).
In testing ideas it might help if the first 2 dimensions of c were different. There'd be less chance of mixing them up.
It's easy to specify the dot summation axis (axes) in tensordot, but harder to constrain the handling of the other dimensions. That's why you get a 4d array.
I can get it to work with a transpose, followed by taking the diagonal:
In [477]: e = np.tensordot(a,b,[1,1])
In [478]: e.shape
Out[478]: (3, 5, 5, 3)
In [479]: e1 = e.transpose(0,3,1,2)[:,:,np.arange(5),np.arange(5)]
In [480]: e1.shape
Out[480]: (3, 3, 5)
In [481]: np.allclose(c,e1)
Out[481]: True
I've calculated a lot more values than needed, and thrown most of them away.
matmul with some transposing might work better.
In [482]: f = a.transpose(2,0,1)#b
In [483]: f.shape
Out[483]: (5, 3, 3)
In [484]: np.allclose(c, f.transpose(1,2,0))
Out[484]: True
I think of the 5 dimension as 'going-along-for-ride'. That's what your loop does. In einsum the i is the same in all parts.

np.concatenate a list of numpy.ndarray in new dimension?

I have a list with numpy.ndarrays - each of shape (33,1,8,45,3)
Problem that when i concatenate the list using a = np.concatenate(list)
The output shape of a becomes
print a.shape
(726,1,8,45,3)
instead of shape (22,33,1,8,45,3).
How do I cleanly concatenate the list, without having to change the input.
You can use numpy.array() or numpy.stack():
import numpy
a = [numpy.random.rand(33,1,8,45,3) for i in range(22)]
b = numpy.array(a)
b.shape # (22, 33, 1, 8, 45, 3)
c = numpy.stack(a, axis=0)
c.shape # (22, 33, 1, 8, 45, 3)
np.concatenate:
Join a sequence of arrays along an existing axis.
np.stack:
Stack a sequence of arrays along a new axis.
a = np.ones((3, 4))
b = np.stack([a, a])
print(b.shape) # (2, 3, 4)

multiply and sum arrays in numpy

Im trying to calculate my own distance with numpy array by adding a weight to each sum in euclidean distance, for example:
a = ((1, 2, 3))
b = ((4, 5, 6))
distance = np.sum((a-b)**2)
but what I want is set my distance as:
a = ((1, 2, 3))
b = ((4, 5, 6))
w = ((0.2, 0,3, 0,5))
distance = 0.2*((1-4)**2) + 0.3*((2-5)**2) + 0.5*((3-6)**2)
is it any form of do this with numpy without iterate over echa vector and do this manually?
You're halfway there:
a = np.array([[1., 2, 3]])
b = np.array([[4., 5, 6]])
w = np.array([[0.2, 0.3, 0.5]])
result = float(np.dot((a - b)**2, w.T))
So, you simply multiply a row-vector (a - b)**2 by a column-vector w.T to get the number you want.
Please note that you'll have to make sure the arrays' dimensions match.

Theano/numpy advanced indexing

I have a 4d theano tensor (with the shape (1, 700, 16, 95000) for example) and a 4d 'mask' tensor with the shape (1, 700, 16, 1024) such that every element in the mask is an index that I need from the original tensor. How can I use my mask to index my tensor? Things like sample[mask] or sample[:, :, :, mask] don't really seem to work.
I also tried using a binary mask but since the tensor is rather large I get a 'device out of memory' exception.
Other ideas on how to get my indices from the tensor would also be very appreciated.
Thanks
So in the lack of an answer, I've decided to use the more computationally intensive solution which is unfolding both my data the the indices tensors, adding an offset to the indices to bring them to global positions, indexing the data and reshaping it back to original.
I'm adding here my test code, including a (commented-out) solution for matrices.
def theano_convertion(els, inds, offsets):
els = T.flatten(els)
inds = T.flatten(inds) + offsets
return T.reshape(els[inds], (2, 3, 16, 5))
if __name__ == '__main__':
# command: np.transpose(t[range(2), indices])
# t = np.random.randint(0, 10, (2, 20))
# indices = np.random.randint(0, 10, (5, 2))
t = np.random.randint(0, 10, (2, 3, 16, 20)).astype('int32')
indices = np.random.randint(0, 10, (2, 3, 16, 5)).astype('int32')
offsets = np.asarray(range(1, 2 * 3 * 16 + 1), dtype='int32')
offsets = (offsets * 20) - 20
offsets = np.repeat(offsets, 5)
offsets_tens = T.ivector('offsets')
inds_tens = T.itensor4('inds')
t_tens = T.itensor4('t')
func = theano.function(
[t_tens, inds_tens, offsets_tens],
[theano_convertion(t_tens, inds_tens, offsets_tens)]
)
shaped_elements = []
flattened_elements = []
[tmp] = func(t, indices, offsets)
for i in range(2):
for j in range(3):
for k in range(16):
shaped_elements.append(t[i, j, k, indices[i, j, k, :]])
flattened_elements.append(tmp[i, j, k, :])
print shaped_elements[-1] == flattened_elements[-1]

Categories

Resources