I have a two-dimensional NxM numpy array:
a = np.ndarray((N,M), dtype=np.float32)
I would like to make a sub-matrix with a selected number of columns and matrices. For each dimension I have as input either a binary vector, or a vector of indices. How can I do this most efficient?
Examples
a = array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
cols = [True, False, True]
rows = [False, False, True, True]
cols_i = [0,2]
rows_i = [2,3]
result = wanted_function(a, cols, rows) or wanted_function_i(a, cols_i, rows_i)
result = array([[2, 3],
[ 10, 11]])
There are several ways to get submatrix in numpy:
In [35]: ri = [0,2]
...: ci = [2,3]
...: a[np.reshape(ri, (-1, 1)), ci]
Out[35]:
array([[ 2, 3],
[10, 11]])
In [36]: a[np.ix_(ri, ci)]
Out[36]:
array([[ 2, 3],
[10, 11]])
In [37]: s=a[np.ix_(ri, ci)]
In [38]: np.may_share_memory(a, s)
Out[38]: False
note that the submatrix you get is a new copy, not a view of the original mat.
You only need to makes cols and rows be a numpy array, and then you can just use the [] as:
import numpy as np
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
cols = np.array([True, False, True])
rows = np.array([False, False, True, True])
result = a[cols][:,rows]
print(result)
print(type(result))
# [[ 2 3]
# [10 11]]
# <class 'numpy.ndarray'>
Related
Is there a way to apply multiple masks at once to a multi-dimensional Numpy array?
For instance:
X = np.arange(12).reshape(3, 4)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
m0 = (X>0).all(axis=1) # array([False, True, True])
m1 = (X<3).any(axis=0) # array([ True, True, True, False])
# In one step: error
X[m0, m1]
# IndexError: shape mismatch: indexing arrays could not
# be broadcast together with shapes (2,) (3,)
# In two steps: works (but awkward)
X[m0, :][:, m1]
# array([[ 4, 5, 6],
# [ 8, 9, 10]])
Try:
>>> X[np.ix_(m0, m1)]
array([[ 4, 5, 6],
[ 8, 9, 10]])
From the docs:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
Another solution (also straight from the docs but less intuitive IMO):
>>> X[m0.nonzero()[0][:, np.newaxis], m1]
array([[ 4, 5, 6],
[ 8, 9, 10]])
The error tells you what you need to do: the mask dimensions need to broadcast together. You can fix this at the source:
m0 = (X>0).all(axis=1, keepdims=True)
m1 = (X<3).any(axis=0, keepdims=True)
>>> X[m0 & m1]
array([ 4, 5, 6, 8, 9, 10])
You only really need to apply keepdims to m0, so you can leave the masks as 1D:
>>> X[m0[:, None] & m1]
array([ 4, 5, 6, 8, 9, 10])
You can reshape to the desired shape:
>>> X[m0[:, None] & m1].reshape(np.count_nonzero(m0), np.count_nonzero(m1))
array([[ 4, 5, 6],
[ 8, 9, 10]])
Another option is to convert the masks to indices:
>>> X[np.flatnonzero(m0)[:, None], np.flatnonzero(m1)]
array([[ 4, 5, 6],
[ 8, 9, 10]])
I have a numpy array of arbitrary shape, e.g.:
a = array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
a.shape = (2, 3, 2)
and a result of argmax over the last axis:
np.argmax(a, axis=-1) = array([[1, 1, 0],
[1, 0, 1]])
I'd like to get max:
np.max(a, axis=-1) = array([[ 2, 4, 8],
[ 8, 9, 12]])
But without recalculating everything. I've tried:
a[np.arange(len(a)), np.argmax(a, axis=-1)]
But got:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,3)
How to do it? Similar question for 2-d: numpy 2d array max/argmax
You can use advanced indexing -
In [17]: a
Out[17]:
array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
In [18]: idx = a.argmax(axis=-1)
In [19]: m,n = a.shape[:2]
In [20]: a[np.arange(m)[:,None],np.arange(n),idx]
Out[20]:
array([[ 2, 4, 8],
[ 8, 9, 12]])
For a generic ndarray case of any number of dimensions, as stated in the comments by #hpaulj, we could use np.ix_, like so -
shp = np.array(a.shape)
dim_idx = list(np.ix_(*[np.arange(i) for i in shp[:-1]]))
dim_idx.append(idx)
out = a[dim_idx]
For ndarray with arbitrary shape, you can flatten the argmax indices, then recover the correct shape, as so:
idx = np.argmax(a, axis=-1)
flat_idx = np.arange(a.size, step=a.shape[-1]) + idx.ravel()
maximum = a.ravel()[flat_idx].reshape(*a.shape[:-1])
For arbitrary-shape arrays, the following should work :)
a = np.arange(5 * 4 * 3).reshape((5,4,3))
# for last axis
argmax = a.argmax(axis=-1)
a[tuple(np.indices(a.shape[:-1])) + (argmax,)]
# for other axis (eg. axis=1)
argmax = a.argmax(axis=1)
idx = list(np.indices(a.shape[:1]+a.shape[2:]))
idx[1:1] = [argmax]
a[tuple(idx)]
or
a = np.arange(5 * 4 * 3).reshape((5,4,3))
argmax = a.argmax(axis=0)
np.choose(argmax, np.moveaxis(a, 0, 0))
argmax = a.argmax(axis=1)
np.choose(argmax, np.moveaxis(a, 1, 0))
argmax = a.argmax(axis=2)
np.choose(argmax, np.moveaxis(a, 2, 0))
argmax = a.argmax(axis=-1)
np.choose(argmax, np.moveaxis(a, -1, 0))
I'm trying to return a numpy flattened array of a numpy matrix where all the values where the row == col is ignored.
For example:
>>> m = numpy.matrix([[1,2,3],[4,5,6],[7,8,9]])
>>> m
matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Some function....
# result:
m_flat = array([2,3,4,6,7,8])
You could use np.eye to create the appropriate boolean mask:
In [139]: np.eye(m.shape[0], dtype='bool')
Out[139]:
array([[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
In [140]: m[~np.eye(m.shape[0], dtype='bool')]
Out[140]: matrix([[2, 3, 4, 6, 7, 8]])
For example, I would like to set to zero all elements of a matrix over its counterdiagonal(i + j < n - 1).
I thought about generating a mask, but it would lead to the same problem of accessing such elements in the mask matrix.
What's the best solution?
Since your matrix seems to be square, you can use a boolean mask and do:
n = mat.shape[0]
idx = np.arange(n)
mask = idx[:, None] + idx < n - 1
mat[mask] = 0
To understand what's going on:
>>> mat = np.arange(16).reshape(4, 4)
>>> n = 4
>>> idx = np.arange(n)
>>> idx[:, None] + idx
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
>>> idx[:, None] + idx < n - 1
array([[ True, True, True, False],
[ True, True, False, False],
[ True, False, False, False],
[False, False, False, False]], dtype=bool)
>>> mat[idx[:, None] + idx < n -1] = 0
>>> mat
array([[ 0, 0, 0, 3],
[ 0, 0, 6, 7],
[ 0, 9, 10, 11],
[12, 13, 14, 15]])
I am trying to figure out a better way to check if two 2D arrays contain the same rows. Take the following case for a short example:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> b
array([[6, 7, 8],
[3, 4, 5],
[0, 1, 2]])
In this case b=a[::-1]. To check if two rows are equal:
>>>a=a[np.lexsort((a[:,0],a[:,1],a[:,2]))]
>>>b=b[np.lexsort((b[:,0],b[:,1],b[:,2]))]
>>> np.all(a-b==0)
True
This is great and fairly fast. However the issue comes about when two rows are "close":
array([[-1.57839867 2.355354 -1.4225235 ],
[-0.94728367 0. -1.4225235 ],
[-1.57839867 -2.355354 -1.4225215 ]]) <---note ends in 215 not 235
array([[-1.57839867 -2.355354 -1.4225225 ],
[-1.57839867 2.355354 -1.4225225 ],
[-0.94728367 0. -1.4225225 ]])
Within a tolerance of 1E-5 these two arrays are equal by row, but the lexsort will tell you otherwise. This can be solved by a different sorting order but I would like a more general case.
I was toying with the idea of:
a=a.reshape(-1,1,3)
>>> a-b
array([[[-6, -6, -6],
[-3, -3, -3],
[ 0, 0, 0]],
[[-3, -3, -3],
[ 0, 0, 0],
[ 3, 3, 3]],
[[ 0, 0, 0],
[ 3, 3, 3],
[ 6, 6, 6]]])
>>> np.all(np.around(a-b,5)==0,axis=2)
array([[False, False, True],
[False, True, False],
[ True, False, False]], dtype=bool)
>>>np.all(np.any(np.all(np.around(a-b,5)==0,axis=2),axis=1))
True
This doesn't tell you if the arrays are equal by row just if all points in b are close to a value in a. The number of rows can be several hundred and I need to do it quite a bit. Any ideas?
Your last code doesn't do what you think it is doing. What it tells you is whether every row in b is close to a row in a. If you change the axis you use for the outer calls to np.any and np.all, you could check whether every row in a is close to some row in b. If both every row in b is close to a row in a, and every row in a is close to a row in b, then the sets are equal. Probably not very computationally efficient, but probably very fast in numpy for moderately sized arrays:
def same_rows(a, b, tol=5) :
rows_close = np.all(np.round(a - b[:, None], tol) == 0, axis=-1)
return (np.all(np.any(rows_close, axis=-1), axis=-1) and
np.all(np.any(rows_close, axis=0), axis=0))
>>> rows, cols = 5, 3
>>> a = np.arange(rows * cols).reshape(rows, cols)
>>> b = np.arange(rows)
>>> np.random.shuffle(b)
>>> b = a[b]
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> b
array([[ 9, 10, 11],
[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
>>> same_rows(a, b)
True
>>> b[0] = b[1]
>>> b
array([[ 3, 4, 5],
[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
>>> same_rows(a, b) # not all rows in a are close to a row in b
False
And for not too big arrays, performance is reasonable, even though it is having to build an array of (rows, rows, cols):
In [2]: rows, cols = 1000, 10
In [3]: a = np.arange(rows * cols).reshape(rows, cols)
In [4]: b = np.arange(rows)
In [5]: np.random.shuffle(b)
In [6]: b = a[b]
In [7]: %timeit same_rows(a, b)
10 loops, best of 3: 103 ms per loop