I have the following 3rd order tensors. Both tensors matrices the first tensor containing 100 10x9 matrices and the second containing 100 3x10 matrices (which I have just filled with ones for this example).
My aim is to multiply the matrices as the line up one to one correspondance wise which would result in a tensor with shape: (100, 3, 9) This can be done with a for loop that just zips up both tensors and then takes the dot of each but I am looking to do this just with numpy operators. So far here are some failed attempts
Attempt 1:
import numpy as np
T1 = np.ones((100, 10, 9))
T2 = np.ones((100, 3, 10))
print T2.dot(T1).shape
Ouput of attempt 1 :
(100, 3, 100, 9)
Which means it tried all possible combinations ... which is not what I am after.
Actually non of the other attempts even compile. I tried using np.tensordot , np.einsum (read here https://jameshensman.wordpress.com/2010/06/14/multiple-matrix-multiplication-in-numpy that it is supposed to do the job but I did not get Einsteins indices correct) also in the same link there is some crazy tensor cube reshaping method that I did not manage to visualize. Any suggestions / ideas-explanations on how to tackle this ?
Did you try?
In [96]: np.einsum('ijk,ilj->ilk',T1,T2).shape
Out[96]: (100, 3, 9)
The way I figure this out is look at the shapes:
(100, 10, 9)) (i, j, k)
(100, 3, 10) (i, l, j)
-------------
(100, 3, 9) (i, l, k)
the two j sum and cancel out. The others carry to the output.
For 4d arrays, with dimensions like (100,3,2,24 ) there are several options:
Reshape to 3d, T1.reshape(300,2,24), and after reshape back R.reshape(100,3,...). Reshape is virtually costless, and a good numpy tool.
Add an index to einsum: np.einsum('hijk,hilj->hilk',T1,T2), just a parallel usage to that of i.
Or use elipsis: np.einsum('...jk,...lj->...lk',T1,T2). This expression works with 3d, 4d, and up.
Related
Let a be a numpy array of shape (n,m,k) and a_msk is an array of shape (n,m) containing that masks elements from a through multiplication.
Up to my knowledge, I had to create a new axis in a_msk in order to make it compatible with a for multiplication.
b = a * a_msk[:,:,np.newaxis]
Unfortunately, my Google Colab runtime is running out of memory at this very operation given the large size of the arrays.
My question is whether I can achieve the same thing without creating that new axis for the mask array.
As #hpaulj commented adding an axis to make the two arrays "compatible" for broadcasting is the most straightforward way to do your multiplication.
Alternatively, you can move the last axis of your array a to the front which would also make the two arrays compatible (I wonder though whether this would solve your memory issue):
a = np.moveaxis(a, -1, 0)
Then you can simply multiply:
b = a * a_msk
However, to get your result you have to move the axis back:
b = np.moveaxis(b, 0, -1)
Example: both solutions return the same answer:
import numpy as np
a = np.arange(24).reshape(2, 3, 4)
a_msk = np.arange(6).reshape(2, 3)
print(f'newaxis solution:\n {a * a_msk[..., np.newaxis]}')
print()
print(f'moveaxis solution:\n {np.moveaxis((np.moveaxis(a, -1, 0) * a_msk), 0, -1)}')
I'm trying to take array
a = [1,5,4,5,7,8,9,8,4,13,43,42]
and array
b = [3,5,6,2,7]
And I want b to be the indexes in a, e.g. a new array that is
[a[b[0]], a[b[1]], a[b[2]], a[b[3]] ...]
So the values in b are indexes into a.
And there are 500k entries in a and 500k in b (approximately).
Is there a fast way to kick in all cores in numpy to do this?
I already do it just fine in for loops and it is sloooooooowwwwww.
Edit to clarify. The solution has to work for 2D and 3D arrays.
so maybe
b = [(2,3), (5,4), (1,2), (1,0)]
and we want
c = [a[b[0], a[b[1], ...]
Not saying it is fast, but the numpy way would simply be:
a[b]
outputs:
array([5, 8, 9, 4, 8])
This can be done in NumPy using advanced indexing. As Christian's answer pointed out, in the 1-D case, you would simply write:
a[b]
and that is equivalent to:
[a[b[x]] for x in range(b.shape[0])]
In higher-dimensional cases, however, you need to have separate lists for each dimension of the indices. Which means, you can't do:
a = np.random.randn(7, 8, 9) # 3D array
b = [(2, 3, 0), (5, 4, 1), (1, 2, 2), (1, 0, 3)]
print(a[b]) # this is incorrect
but you can do:
b0, b1, b2 = zip(*b)
print(a[b0, b1, b2])
you can also use np.take:
print(np.take(a, b))
I solved this by writing a C extension to numpy called Tensor Weighted Interpolative Transfer, in order to get speed and multi-threading. In pure python it is 3 seconds per 200x100x3 image scale and fade across, and in multi-threaded C with 8 cores is 0.5 milliseconds for the same operation.
The core C code ended up being like
t2[dstidxs2[i2] + doff1] += t1[srcidxs2[i2] + soff1] * w1 * ws2[i2];
Where the doff1 is the offset in the destination array etc. The w1 and ws2 are the interpolated weights.
All the code is ultra optimized in C for speed. (not code size or maintainability)
All code is available on https://github.com/RMKeene/twit and on PyPI.
I expect furthur optimization in the future such as special cases if all weights are 1.0.
I have two lists of shape (130, 64, 2048), call it (s, f, b), and one vector of length 64, call this v. I need to append these two lists together to make a list of shape (130, 2, 64, 2048) and multiply all 2048 values in f[i] with the i th value of v.
The output array also needs to have shape (130, 2, 64, 2048)
Obviously these two steps can be done interchangeably. I want to know the most Pythonic way of doing something like this.
My main issue is that my code takes forever in turning the list into a numpy array which is necessary for some of my calculations. I have:
new_prof = np.asarray( new_prof )
but this seems to take two long for the size and shape of my list. Any thoughts as to how I could initialise this better?
The problem outlined above is shown by my attempt:
# Converted data should have shape (130, 2, 64, 2048)
converted_data = IQUV_to_AABB( data, basis = "cartesian" )
new_converted = np.array((130, 2, 64, 2048))
# I think s.shape is (2, 64, 2048) and cal_fa has length 64
for i, s in enumerate( converted_data ):
aa = np.dot( s[0], cal_fa )
bb = np.dot( s[1], cal_fb )
new_converted[i].append( (aa, bb) )
However, this code doesn't work and I think it's got something to do with the dot product. Maybe??
I would also love to know why the process of changing my list to a numpy array is taking so long.
Try to start small and look at the results in the console:
import numpy as np
x = np.arange(36)
print(x)
y = np.reshape(x, (3, 4, 3))
print(y)
# this is a vector of the same size as dimension 1
a = np.arange(4)
print(a)
# expand and let numpy's broadcasting do the rest
# https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
# https://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc
b = a[np.newaxis, :, np.newaxis]
print(b)
c = y * b
print(c)
You can read about np.newaxis here, here and here.
Using numpy.append is rather slow as it has to preallocate memory and copy the whole array each time. A numpy array is a continuous block of memory.
You might have to use it if you run out of computer memory. But in this case try to iterate over appropriate chunks, as big as your computer can still handle them. Re-aranging the dimension is sometimes a way to speed up calculations.
I am new to Python.
I am confused as to what is happening with the following:
B = np.array([A[..., n:n+5] for n in (5*4, 5*5)])
Where A.shape = (60L, 128L, 128L)
and B.shape = (2L, 60L, 128L, 5L)
I believe it is supposed to make some sort of image patch. Can someone explain to me what this does? This example is in the context of applying neural networks to images.
The shape of A tells me that A is most likely an array of 60 grayscale images (batch size 60), with each image having a size of 128x128 pixels.
We have: B = np.array([A[..., n:n+5] for n in (5*4, 5*5)]). To better understand what's happening here, let's unpack this line in reverse:
for n in (5*4, 5*5): This is the same as for n in (20, 25). The author probably chose to write it in this way for some intuitive reason related to the data or the rest of the code. This gives us n=20 and n=25.
A[..., n:n+5]: This is the same as A[:, :, n:n+5]. This gives us all the rows from all the images of in A, but only the 5 columns at n:n+5. The shape of the resulting array is then (60, 128, 5).
n=20 gives us A[:, :, 20:25] and n=25 gives us A[:, :, 25:30]. Each of these arrays is therefore of size (60, 128, 5).
Together, [A[..., n:n+5] for n in (5*4, 5*5)] gives us a list (thanks list comprehension!) with two elements, each a numpy array of size (60, 128, 5). np.array() converts this list into a numpy array of shape (2, 60, 128, 5).
The result is that B contains 2 patches of each image, each a 5 pixel column wide subset of the original image- one starting at column 20 and the second one starting at column 25.
I can't speculate to the reason for this crop without further information about the network and its purpose.
Hope this helps!
This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.
How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.
More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.
One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:
import numpy as np
M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)
# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False
To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:
M_strided = np.lib.stride_tricks.as_strided(
M, # input array
(700, M.shape[0], M.shape[1]), # output dimensions
(0, M.strides[0], M.strides[1]) # stride length in bytes
)