Numpy, how to get a sub matrix with boolean slicing - python

I have a question: how to get a sub matrix like a sub array by boolean slicing?
For example:
a2 = np.array(np.arange(30).reshape(5, 6))
a2[a2[:, 1] > 10]
will give me:
array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
but:
m2 = np.mat(np.arange(30).reshape(5, 6))
m2[m2[:, 1] > 10]
will give me:
matrix([[12, 18, 24]])
Why the output is different and How can I get the same result as array from matrix?
Thank you!

The issue you're experiencing comes down to the fact that operations on a matrix return always return a 2-dimensional array.
When you build the mask on the first array, you get:
In [24]: a2[:,1] > 10
Out[24]: array([False, False, True, True, True], dtype=bool)
which, as you can see, is a 1-dimensional array.
When you do the same thing with the matrix, you get:
In [25]: m2[:,1] > 10
Out[25]:
matrix([[False],
[False],
[ True],
[ True],
[ True]], dtype=bool)
In other words, you have a nx1 array, not an array of length n.
Indexing in numpy operates differently depending on whether you're indexing with a one or n dimensional array.
In your first case, numpy will treat the array of length n as row indices, so you'll get the expected result:
In [28]: a2[a2[:,1] > 10]
Out[28]:
array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
In the second case, because you have a 2-dimensional index array, numpy has enough information to extract both the row and the column, and so it only grabs things from the matching column (the first one):
In [29]: m2[m2[:,1] > 10]
Out[29]: matrix([[12, 18, 24]])
To answer your question: you can get this behaviour by converting your masks to an array and grabbing the first column, to extract your initial array of length n:
In [32]: m2[np.array(m2[:,1] > 10)[:,0]]
Out[32]:
matrix([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
Alternatively, you could do the conversion first, getting the same result as before:
In [34]: np.array(m2)[:,1] > 10
Out[34]: array([False, False, True, True, True], dtype=bool)
Now, both of those fixes require conversions between matrices and arrays, which can be pretty ugly.
The question I'd be asking yourself is why you wish to use a matrix, and yet expect the behaviour of an array.
It could be that the right tool for your job is actually an array, not a matrix.

If you flatten the boolean mask like:
m2[np.asarray(m2[:,1]>10).flatten()]
you get the same result, but I would recommend using np.array instead of np.matrix for the reasons given in this answer.

Related

Split a numpy array with several sorted sequences

I have a large numpy array (typically a few thousands of numbers) that is consisted of several sorted sequences,
for example:
arr = [12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11]
I would like to split it into subarrays - each one holds another sequence -
[12, 13, 14], [22, 23, 24, 25, 26], [9, 10, 11]
What is the fastest way to do that?
I would do it following way
import numpy as np
arr = np.array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
splits = np.flatnonzero(np.diff(arr)!=1)
sub_arrs = np.split(arr, splits+1)
print(sub_arrs)
output
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]
Explanation: I create array with differences between adjacent elements using numpy.diff (np.diff(arr)) then process it to get array with Trues where difference is 1 and Falses in every other case (np.diff(arr)!=1) then find indices of Trues in that array using np.flatnonzero (True is treated as 1 and False is treated as 0 in python) finally I use numpy.split to get list of subarrays made from arr at spllited at splits offseted by 1 (note that numpy.diff returns array which is shorter by 1 than its input).
Side note: I would call this finding sub-arrays with consecutive runs, rather than merely sorted as you might split your arr into [[12, 13, 14, 22, 23, 24, 25, 26], [9, 10, 11]] and full-fill requirement that every sub-array is sorted
First of all, the problem could be really complex, but based on your example I assume that the values in subarrays are increasing by 1.
Here is a one liner solution with plain numpy: np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
Explanation: You can calculate the difference between consecutive values with np.diff.
>>> import numpy as np
>>> a
array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
>>> np.diff(a)
array([ 1, 1, 8, 1, 1, 1, 1, -17, 1, 1])
Then, get the indices of the values that represents the last element of the subarrays, that is the values that do no equal 1.
>>> np.where(np.diff(a) != 1)
(array([2, 7]),)
Finally, we add 1 to the boundaries to be able to use np.array_split() correctly to generate the subarrays.
>>> np.where(np.diff(a) != 1)[0]+1
array([3, 8])
>>> np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]

How to extract non-zero values of a numpy array

I have a numpy array and want to extract some values out of it. This is my array:
arr= array([[0, 0, 23, 28],
[0, 19, 24, 29],
[0, 20, 25, 30],
[17, 21, 26, 31],
[18, 22, 27, 32]])
I want to first sort the non-zero part of it and chage it into:
arr= array([[0, 0, 27, 32],
[0, 22, 26, 31],
[0, 21, 25, 30],
[18, 20, 24, 29],
[17, 19, 23, 28]])
Then, from the first column, I want to extract the last two rows (18 and 17). In the second column I have four none zero one: it means 2 more nonzero rows compred to previous column. So, I want two upper rows. In the third column I see five non-zero rows which is one row more than the second row, so I want the one row. In last column, the difference of its non-zero rows with the previous one zero, so I do not want any row from it. Finally. I want to have these extracted numbersas a list or numpy array:
result= [17, 18, 21, 22, 27]
I tried the following but it was successful at all:
result=[]
for m in range (len (arr[0,:])):
for i in range (len (arr[:,m])):
if arr[i,m]==0 and arr[i+1,m]!=0:
b= arr[i+1:,m]
result.append (b)
I appreciate any help in advance.
Let's try:
mask = arr != 0
# mask the 0 with infinity and sort
new_arr = np.sort(np.where(mask, arr, np.inf), axis=0)
# replace back:
arr[:] = np.where(mask, new_arr[::-1], 0)
# extract the result
result = arr[np.arange(arr.shape[0]),mask.argmax(axis=1)]

Split an array into non-decreasing arrays

I'm trying to split a given array into the non-decreasing arrays without for loops or using np.diff. I wonder if that could be done with np.where but can't imagine how to make it without looping.
Here's a way using numpy:
def split_increasing(x):
# Check if following value is greater
ix = np.greater(a[:-1], a[1:])
# Use the indices where the above is True
# to split the array
return np.split(a, np.flatnonzero(ix)+1)
Lets check with some random array:
a = np.random.randint(1,20,10)
# array([12, 15, 3, 7, 18, 18, 9, 16, 15, 19])
split_increasing(a)
Output
[array([12, 15]), array([ 3, 7, 18, 18]), array([ 9, 16]), array([15, 19])]

SciPy Sparse Array: Get index for a data point

I am creating a csr sparse array (because I have a lot of empty elements/cells) that I need to use forwards and backwards. That is, I need to input two indices and get the element that corresponds to it ( matrix[0][9]=34) but I also need to be able to get the indices upon just knowing the value is 34. The elements in my array will all be unique. I have looked all over for an answer regarding this, but have not found one, or may have not understood it was what I was looking for if I did! I'm quite new to python, so if you could make sure to let me know what the functions you find do and the steps to retrieve the indices for the element, I would very much appreciate it!
Thanks in advance!
Here's a way of finding a specific value that is applicable to both numpy arrays and sparse matrices
In [119]: A=sparse.csr_matrix(np.arange(12).reshape(3,4))
In [120]: A==10
Out[120]:
<3x4 sparse matrix of type '<class 'numpy.bool_'>'
with 1 stored elements in Compressed Sparse Row format>
In [121]: (A==10).nonzero()
Out[121]: (array([2], dtype=int32), array([2], dtype=int32))
In [122]: (A.A==10).nonzero()
Out[122]: (array([2], dtype=int32), array([2], dtype=int32))
You can use the nonzero method:
In [44]: from scipy.sparse import csr_matrix
In [45]: a = np.arange(50).reshape(5, 10)
In [46]: a
Out[46]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
In [47]: s = csr_matrix(a)
In [48]: s
Out[48]:
<5x10 sparse matrix of type '<type 'numpy.int64'>'
with 49 stored elements in Compressed Sparse Row format>
In [49]: (s == 36).nonzero()
Out[49]: (array([3], dtype=int32), array([6], dtype=int32))
In general, it often works to try a method which worked on a numpy array. This does not always work, but at least here it just did (and I learned something new today).

Logical indexing in Numpy with two indices as in MATLAB

How do I replicate this indexing done in MATLAB with Numpy?
X=magic(5);
M=[0,0,1,2,1];
X(M==0,M==2)
that returns:
ans =
8
14
I've found that doing this in Numpy is not correct, since it does not give me the same results..
X = np.matrix([[17, 24, 1, 8, 15],
[23, 5, 7, 14, 16],
[ 4, 6, 13, 20, 22],
[10, 12, 19, 21, 3],
[11, 18, 25, 2, 9]])
M=array([0,0,1,2,1])
X.take([M==0]).take([M==2], axis=1)
since I get:
matrix([[24, 24, 24, 24, 24]])
What is the correct way to logically index with two indices in numpy?
In general there are two ways to interpret X[a, b] when both a and b are arrays (vectors in matlab), "inner-style" indexing or "outer-style" indexing.
The designers of matlab chose "outer-style" indexing and the designers of numpy chose inner-style indexing. To do "outer-style" indexing in numpy one can use:
X[np.ix_(a, b)]
# This is roughly equal to matlab's
X(a, b)
for completness you can do "inner-style" indexing in matlab by doing:
X(sub2ind(size(X), a, b))
# This is roughly equal to numpy's
X[a, b]
In short, try X[np.ix_(M == 0, M == 1)].

Categories

Resources