Automatically masking a numpy array for a given operation

Automatically masking a numpy array for a given operation - python

Say I have two numpy arrays, for example
import numpy as np
A = np.arange(5*3*3*2).reshape(5, 3, 3, 2)
B = np.arange(3*3).reshape(3, 3)
If I want to add A and B across a shared axis, I would just do
C = A + B[None, :, :, None]
# C has shape (5, 3, 3, 2) which is what I want
I want to write a write function that generalizes this kind of summation but am not how to get started. It would look something like
def mask(M, Mshape, out_shape):
# not sure what to put here
pass
def add_tensors(A, B, Ashape, Bshape, out_shape):
# Here I mask A, B so that it has shape out_shape
A = mask(A, Aaxis, out_shape)
B = mask(B, Baxis, out_shape)
return A + B
Any suggestions? Is it possible to make this a ufunc?

In [447]: A = np.arange(5*3*3*2).reshape(5, 3, 3, 2)
...: B = np.arange(3*3).reshape(3, 3)
These are all equivalent:
In [448]: A + B[None,:, :, None];
In [449]: A + B[:, :, None]; # initial None is automatic
build the index tuple from list:
In [454]: tup = [slice(None)]*3; tup[-1] = None; tup = tuple(tup)
In [455]: tup
Out[455]: (slice(None, None, None), slice(None, None, None), None)
In [456]: A + B[tup];
or the equivalent shape:
In [457]: sh = B.shape + (1,)
In [458]: sh
Out[458]: (3, 3, 1)
In [459]: A + B.reshape(sh);
expand_dims also uses a parameterized reshape:
In [462]: np.expand_dims(B,2).shape
Out[462]: (3, 3, 1)
In [463]: A+np.expand_dims(B,2);

Related

Use np.nditer as zip

I have tried to apply the function np.nditer() like zip() with arrays of different dimensions, where the iterator should use only the first dimensions.
Minimal example
a_all = np.arange(6).reshape(2,3)
idx_all = np.arange(12).reshape(2,3,2)
for a, idx in np.nditer([a_all, idx_all]):
print((a, idx))
Which throws the error:
ValueError: operands could not be broadcast together with shapes (2,3) (2,3,2)
My use case
I have two arrays with data which I want to calculate each other. Furthermore I have an index list for another array. So I try:
a_all = np.arange(6).reshape(2,3)
b_all = np.arange(6).reshape(2,3)
idx_all = (
((0,0), (0,1), (0,2)),
((1,0), (1,1), (1,2))
)
result = np.zeros((2,3))
for a, b, idx in np.nditer([a_all, b_all, idx_all]):
result[idx] += a*b
Which throws the same error like the minimal example.
I assume the problem is that np.nditer() tries to iterate over all dimensions of idx_all, but I couldn't figure out how to limit it to the first two.
zip() I do not want to use, otherwise I would need two loops:
for a_, b_, idx_ in zip(a_all, b_all, idx_all):
for a, b, idx in zip(a_, b_, idx_):
result[idx] += a*b
More sensible example
a_all = np.random.randn(2,3)
b_all = np.random.randn(2)
idx_all = (
((1,1), (2,2))
)
result = np.zeros(2)
for a, b, idx, res in np.nditer([a_all, b_all, idx_all, result], op_flags=['readwrite']):
res += a[idx] + b

Look at the first case, corrected so the arrays do broadcast together (if you don't understand what I've changed, you have read enough basic numpy docs.)
In [14]: a_all = np.arange(6).reshape(2,3,1)
...: idx_all = np.arange(12).reshape(2,3,2)
...:
...: for a, idx in np.nditer([a_all, idx_all]):
...: print((a, idx))
...:
(array(0), array(0))
(array(0), array(1))
(array(1), array(2))
(array(1), array(3))
(array(2), array(4))
(array(2), array(5))
(array(3), array(6))
(array(3), array(7))
(array(4), array(8))
(array(4), array(9))
(array(5), array(10))
(array(5), array(11))
nditer iterates in a 'flat' sense, passing single element arrays (0d) to the body. It's not like zip which just iterates on the first dimension (or outer layer of nested lists).
np.vectorize (which I don't recommend either), does the same sort of broadcasting, but passes python scalar elements to the function instead:
In [15]: np.vectorize(lambda a,idx: print((a,idx)))(a_all,idx_all)
(0, 0) # test run
(0, 0)
(0, 1)
(1, 2)
(1, 3)
(2, 4)
(2, 5)
(3, 6)
(3, 7)
(4, 8)
(4, 9)
(5, 10)
(5, 11)
Out[15]:
array([[[None, None],
[None, None],
[None, None]],
[[None, None],
[None, None],
[None, None]]], dtype=object)
nditer needs the same sort of performance disclaimer as np.vectorize. It doesn't help, at least not when using in python code. In cython it can be useful, as demonstrated in the larger nditer documentation page.
Also nditer inputs can be complex, as shown by the TypeError that your last example produces.
Your last example:
I had to change idx_all to array, not tuple, so it can be readwrite able. Read the op_flags docs more carefully.
And we still get the broadcasting error. It isn't iterating the 'first layer'.
In [24]: a_all = np.random.randn(2,3)
...: b_all = np.random.randn(2)
...: idx_all = (
...: ((1,1), (2,2))
...: ); idx_all=np.array(idx_all)
...: result = np.zeros(2)
...:
...: for a, b, idx, res in np.nditer([a_all, b_all, idx_all, result], op_flags=['readwrite']):
...: res += a[idx] + b
...:
ValueError: operands could not be broadcast together with shapes (2,3) (2,) (2,2) (2,)

How to reshape 3D numpy array <10,3,2> to 4D array <10,1,3,2>?

I have the following numpy arrays:
import numpy as np
np.ones((10, 3, 2))
and I need to reshape it to <10,1,3,2>.
How can I do so?

How about this:
np.ones((10, 3, 2)).reshape([10,1,3,2])

x = np.ones((10, 3, 2))
# in place
x.shape = (10,1,3,2)
# new view
x.reshape((10,1,3,2))
# Add new axis
x[:, np.newaxis, :, :]

Like others mentioned, you can .reshape it. An alternative is to use np.newaxis or np.expand_dims like this:
arr = np.ones((10, 3, 2))
arr1 = arr[:, np.newaxis, ...]
print(arr1.shape) # (10, 1, 3, 2)
arr2 = np.expand_dims(arr, 1)
print(arr2.shape) # (10, 1, 3, 2)
# check if the two arrays are equal
print(np.array_equal(arr1, arr2)) # True

dot product and diagonal and multidimensional matrices

I just want to get the dot product of some sets of multidimensional data.
For simplicity, I am posting the pieces small, and demonstrating my efforts
thus far.
To just get 'a' dot 'q', and the 4 numbers that I want is easy enough.
import numpy as np
a = np.arange(1,4) # shape = (3,)
q = np.array([[x, x, x] for x in range(4)])+1 # shape = (4, 3)
c = np.dot(a, q.T) # array([ 6, 12, 18, 24]) shape = (4,)
If I want to add another set to 'a', I can expand the dimensions. Again, pretty easy. The dot product simply reflects the additional dimension.
a = np.arange(1,4).reshape(1,3) # shape = (1,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24]]) shape = (1,4)
and the other set...
a = np.vstack((a,a+1)) # shape = (2,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24], [ 9, 18, 27, 36]]) shape = (2,4)
To add another dimension to q, the transpose needs to be a little more complicated.
q = np.expand_dims(q, axis=0) # shape = (1, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 1, 4)
now stack 'q' matrix
q = np.vstack((q, q+1)) # shape = (2, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 2, 4)
Though, what I am going for is the diagonal of c. While I have not tried it yet, I am imagining that when 'a' and 'q' start to reach >(2000, 3) and >(2000, 4, 3) c will be (2000, 2000, 4) and I only need 1/2000th of that. Does anyone know how to make this more efficient than doing the calculation and then taking the diagonal?
Again, what I want is...
c = np.dot(a, np.transpose(q, (0, 2, 1)))
c = c[np.arange(2), np.arange(2)]
or
c[0] = np.dot(a[0:1], np.transpose(q[0:1], (0, 2, 1)))
c[1] = np.dot(a[1:2], np.transpose(q[1:2], (0, 2, 1)))
but without having to make the enormous matrix first and then trim it later.
I have read a couple other, kinda, similar questions. Though, I hope that this question is perceived to be more complicated than a dot product of the same vector and its diagonal, Also, if the answer is np.einsum(), could you explain the process a more than the numpy docs?

I reposted the question, with the einsum() entries at each c. In fact, Alexander Korovin linked to an excellent einsum summary.
I just want to get the dot product of some sets of multidimensional data.
For simplicity, I am posting the pieces small, and demonstrating my efforts
thus far.
To just get 'a' dot 'q', and the 4 numbers that I want is easy enough.
import numpy as np
a = np.arange(1,4) # shape = (3,)
q = np.array([[x, x, x] for x in range(4)])+1 # shape = (4, 3)
c = np.dot(a, q.T) # array([ 6, 12, 18, 24]) shape = (4,)
c = np.einsum('i,ji->j', a, q)
If I want to add another set to 'a', I can expand the dimensions. Again, pretty easy. The dot product simply reflects the additional dimension.
a = np.arange(1,4).reshape(1,3) # shape = (1,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24]]) shape = (1,4)
c = np.einsum('ij,ij->i', a, q)
and the other set...
a = np.vstack((a,a+1)) # shape = (2,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24], [ 9, 18, 27, 36]]) shape = (2,4)
c = np.einsum('ij,gj->ig', a, q)
To add another dimension to q, the transpose needs to be a little more complicated.
q = np.expand_dims(q, axis=0) # shape = (1, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 1, 4)
c = np.einsum('ij,fgj->fig', a, q)
now stack 'q' matrix
q = np.vstack((q, q+1)) # shape = (2, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 2, 4)
c = np.einsum('ij,fgj->fig', a, q)
Though, what I am going for is the diagonal of c. While I have not tried it yet, I am imagining that when 'a' and 'q' start to reach >(2000, 3) and >(2000, 4, 3) c will be (2000, 2000, 4) and I only need 1/2000th of that. Does anyone know how to make this more efficient than doing the calculation and then taking the diagonal?
Again, what I want is...
c = np.dot(a, np.transpose(q, (0, 2, 1)))
c = c[np.arange(2), np.arange(2)]
or
c[0] = np.dot(a[0:1], np.transpose(q[0:1], (0, 2, 1)))
c[1] = np.dot(a[1:2], np.transpose(q[1:2], (0, 2, 1)))
but without having to make the enormous matrix first and then trim it later.
So do this...
c = np.einsum('ik,ijk->ij', a, q)

Python | Addition of numpy arrays with different shapes [duplicate]

I have a question regarding the conversion between (N,) dimension arrays and (N,1) dimension arrays. For example, y is (2,) dimension.
A=np.array([[1,2],[3,4]])
x=np.array([1,2])
y=np.dot(A,x)
y.shape
Out[6]: (2,)
But the following will show y2 to be (2,1) dimension.
x2=x[:,np.newaxis]
y2=np.dot(A,x2)
y2.shape
Out[14]: (2, 1)
What would be the most efficient way of converting y2 back to y without copying?
Thanks,
Tom

reshape works for this
a = np.arange(3) # a.shape = (3,)
b = a.reshape((3,1)) # b.shape = (3,1)
b2 = a.reshape((-1,1)) # b2.shape = (3,1)
c = b.reshape((3,)) # c.shape = (3,)
c2 = b.reshape((-1,)) # c2.shape = (3,)
note also that reshape doesn't copy the data unless it needs to for the new shape (which it doesn't need to do here):
a.__array_interface__['data'] # (22356720, False)
b.__array_interface__['data'] # (22356720, False)
c.__array_interface__['data'] # (22356720, False)

Use numpy.squeeze:
>>> x = np.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> np.squeeze(x).shape
(3,)
>>> np.squeeze(x, axis=(2,)).shape
(1, 3)

Slice along the dimension you want, as in the example below. To go in the reverse direction, you can use None as the slice for any dimension that should be treated as a singleton dimension, but which is needed to make shapes work.
In [786]: yy = np.asarray([[11],[7]])
In [787]: yy
Out[787]:
array([[11],
[7]])
In [788]: yy.shape
Out[788]: (2, 1)
In [789]: yy[:,0]
Out[789]: array([11, 7])
In [790]: yy[:,0].shape
Out[790]: (2,)
In [791]: y1 = yy[:,0]
In [792]: y1.shape
Out[792]: (2,)
In [793]: y1[:,None]
Out[793]:
array([[11],
[7]])
In [794]: y1[:,None].shape
Out[794]: (2, 1)
Alternatively, you can use reshape:
In [795]: yy.reshape((2,))
Out[795]: array([11, 7])

the opposite translation can be made by:
np.atleast_2d(y).T

Another option in your toolbox could be ravel:
>>> y2.shape
(2, 1)
>>> y_ = y2.ravel()
>>> y_.shape
(2,)
Again, a copy is made only if needed, but this is not the case:
>>> y2.__array_interface__["data"]
(2700295136768, False)
>>> y_.__array_interface__["data"]
(2700295136768, False)
For further details, you can take a look at this answer.

Inserting newaxis at variable position in NumPy arrays

Normally, when we know where should we insert the newaxis, we can do a[:, np.newaxis,...]. Is there any good way to insert the newaxis at certain axis?
Here is how I do it now. I think there must be some much better ways than this:
def addNewAxisAt(x, axis):
_s = list(x.shape)
_s.insert(axis, 1)
return x.reshape(tuple(_s))
def addNewAxisAt2(x, axis):
ind = [slice(None)]*x.ndim
ind.insert(axis, np.newaxis)
return x[ind]

That singleton dimension (dim length = 1) could be added as a shape criteria to the original array shape with np.insert and thus directly change its shape, like so -
x.shape = np.insert(x.shape,axis,1)
Well, we might as well extend this to invite more than one new axes with a bit of np.diff and np.cumsum trick, like so -
insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
x.shape = np.insert(x.shape,insert_idx,1)
Sample runs -
In [151]: def addNewAxisAt(x, axis):
...: insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
...: x.shape = np.insert(x.shape,insert_idx,1)
...:
In [152]: A = np.random.rand(4,5)
In [153]: addNewAxisAt(A, axis=1)
In [154]: A.shape
Out[154]: (4, 1, 5)
In [155]: A = np.random.rand(5,6,8,9,4,2)
In [156]: addNewAxisAt(A, axis=5)
In [157]: A.shape
Out[157]: (5, 6, 8, 9, 4, 1, 2)
In [158]: A = np.random.rand(5,6,8,9,4,2,6,7)
In [159]: addNewAxisAt(A, axis=(1,3,4,6))
In [160]: A.shape
Out[160]: (5, 1, 6, 1, 1, 8, 1, 9, 4, 2, 6, 7)

np.insert does
slobj = [slice(None)]*ndim
...
slobj[axis] = slice(None, index)
...
new[slobj] = arr[slobj2]
Like you it constructs a list of slices, and modifies one or more elements.
apply_along_axis constructs an array, and converts it to indexing tuple
outarr[tuple(i.tolist())] = res
Other numpy functions work this way as well.
My suggestion is to make initial list large enough to hold the None. Then I don't need to use insert:
In [1076]: x=np.ones((3,2,4),int)
In [1077]: ind=[slice(None)]*(x.ndim+1)
In [1078]: ind[2]=None
In [1080]: x[ind].shape
Out[1080]: (3, 2, 1, 4)
In [1081]: x[tuple(ind)].shape # sometimes converting a list to tuple is wise
Out[1081]: (3, 2, 1, 4)
Turns out there is a np.expand_dims
In [1090]: np.expand_dims(x,2).shape
Out[1090]: (3, 2, 1, 4)
It uses reshape like you do, but creates the new shape with tuple concatenation.
def expand_dims(a, axis):
a = asarray(a)
shape = a.shape
if axis < 0:
axis = axis + len(shape) + 1
return a.reshape(shape[:axis] + (1,) + shape[axis:])
Timings don't tell me much about which is better. They are the 2 µs range, where simply wrapping the code in a function makes a difference.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Automatically masking a numpy array for a given operation - python

Related

Use np.nditer as zip

How to reshape 3D numpy array <10,3,2> to 4D array <10,1,3,2>?

dot product and diagonal and multidimensional matrices

Python | Addition of numpy arrays with different shapes [duplicate]

Inserting newaxis at variable position in NumPy arrays

Categories

Resources