How do I replicate this indexing done in MATLAB with Numpy?
X=magic(5);
M=[0,0,1,2,1];
X(M==0,M==2)
that returns:
ans =
8
14
I've found that doing this in Numpy is not correct, since it does not give me the same results..
X = np.matrix([[17, 24, 1, 8, 15],
[23, 5, 7, 14, 16],
[ 4, 6, 13, 20, 22],
[10, 12, 19, 21, 3],
[11, 18, 25, 2, 9]])
M=array([0,0,1,2,1])
X.take([M==0]).take([M==2], axis=1)
since I get:
matrix([[24, 24, 24, 24, 24]])
What is the correct way to logically index with two indices in numpy?
In general there are two ways to interpret X[a, b] when both a and b are arrays (vectors in matlab), "inner-style" indexing or "outer-style" indexing.
The designers of matlab chose "outer-style" indexing and the designers of numpy chose inner-style indexing. To do "outer-style" indexing in numpy one can use:
X[np.ix_(a, b)]
# This is roughly equal to matlab's
X(a, b)
for completness you can do "inner-style" indexing in matlab by doing:
X(sub2ind(size(X), a, b))
# This is roughly equal to numpy's
X[a, b]
In short, try X[np.ix_(M == 0, M == 1)].
Related
I have a large numpy array (typically a few thousands of numbers) that is consisted of several sorted sequences,
for example:
arr = [12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11]
I would like to split it into subarrays - each one holds another sequence -
[12, 13, 14], [22, 23, 24, 25, 26], [9, 10, 11]
What is the fastest way to do that?
I would do it following way
import numpy as np
arr = np.array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
splits = np.flatnonzero(np.diff(arr)!=1)
sub_arrs = np.split(arr, splits+1)
print(sub_arrs)
output
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]
Explanation: I create array with differences between adjacent elements using numpy.diff (np.diff(arr)) then process it to get array with Trues where difference is 1 and Falses in every other case (np.diff(arr)!=1) then find indices of Trues in that array using np.flatnonzero (True is treated as 1 and False is treated as 0 in python) finally I use numpy.split to get list of subarrays made from arr at spllited at splits offseted by 1 (note that numpy.diff returns array which is shorter by 1 than its input).
Side note: I would call this finding sub-arrays with consecutive runs, rather than merely sorted as you might split your arr into [[12, 13, 14, 22, 23, 24, 25, 26], [9, 10, 11]] and full-fill requirement that every sub-array is sorted
First of all, the problem could be really complex, but based on your example I assume that the values in subarrays are increasing by 1.
Here is a one liner solution with plain numpy: np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
Explanation: You can calculate the difference between consecutive values with np.diff.
>>> import numpy as np
>>> a
array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
>>> np.diff(a)
array([ 1, 1, 8, 1, 1, 1, 1, -17, 1, 1])
Then, get the indices of the values that represents the last element of the subarrays, that is the values that do no equal 1.
>>> np.where(np.diff(a) != 1)
(array([2, 7]),)
Finally, we add 1 to the boundaries to be able to use np.array_split() correctly to generate the subarrays.
>>> np.where(np.diff(a) != 1)[0]+1
array([3, 8])
>>> np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]
I'm trying to split a given array into the non-decreasing arrays without for loops or using np.diff. I wonder if that could be done with np.where but can't imagine how to make it without looping.
Here's a way using numpy:
def split_increasing(x):
# Check if following value is greater
ix = np.greater(a[:-1], a[1:])
# Use the indices where the above is True
# to split the array
return np.split(a, np.flatnonzero(ix)+1)
Lets check with some random array:
a = np.random.randint(1,20,10)
# array([12, 15, 3, 7, 18, 18, 9, 16, 15, 19])
split_increasing(a)
Output
[array([12, 15]), array([ 3, 7, 18, 18]), array([ 9, 16]), array([15, 19])]
import numpy as np
m = []
k = []
a = np.array([[1,2,3,4,5,6],[50,51,52,40,20,30],[60,71,82,90,45,35]])
for i in range(len(a)):
m.append(a[i, -1:])
for j in range(len(a[i])-1):
n = abs(m[i] - a[i,j])
k.append(n)
k.append(m[i])
print(k)
Expected Output in k:
[5,4,3,2,1,6],[20,21,22,10,10,30],[25,36,47,55,10,35]
which is also a numpy array.
But the output that I am getting is
[array([5]), array([4]), array([3]), array([2]), array([1]), array([6]), array([20]), array([21]), array([22]), array([10]), array([10]), array([30]), array([25]), array([36]), array([47]), array([55]), array([10]), array([35])]
How can I solve this situation?
You want to subtract the last column of each sub array from themselves. Why don't you use a vectorized approach? You can do all the subtractions at once by subtracting the last column from the rest of the items and then column_stack together with unchanged version of the last column. Also note that you need to change the dimension of the last column inorder to be subtractable from the 2D array. For that sake we can use broadcasting.
In [71]: np.column_stack((abs(a[:, :-1] - a[:, None, -1]), a[:,-1]))
Out[71]:
array([[ 5, 4, 3, 2, 1, 6],
[20, 21, 22, 10, 10, 30],
[25, 36, 47, 55, 10, 35]])
Sorry for the badly explained title. I am trying to parallelise a part of my code and got stuck on a dot product. I am looking for an efficient way of doing what the code below does, I'm sure there is a simple linear algebra solution but I'm very stuck:
puy = np.arange(8).reshape(2,4)
puy2 = np.arange(12).reshape(3,4)
print puy, '\n'
print puy2.T
zz = np.zeros([4,2,3])
for i in range(4):
zz[i,:,:] = np.dot(np.array([puy[:,i]]).T,
np.array([puy2.T[i,:]]))
One way would be to use np.einsum, which allows you to specify what you want to happen to the indices:
>>> np.einsum('ik,jk->kij', puy, puy2)
array([[[ 0, 0, 0],
[ 0, 16, 32]],
[[ 1, 5, 9],
[ 5, 25, 45]],
[[ 4, 12, 20],
[12, 36, 60]],
[[ 9, 21, 33],
[21, 49, 77]]])
>>> np.allclose(np.einsum('ik,jk->kij', puy, puy2), zz)
True
Here's another way with broadcasting -
(puy[None,...]*puy2[:,None,:]).T
I have a question: how to get a sub matrix like a sub array by boolean slicing?
For example:
a2 = np.array(np.arange(30).reshape(5, 6))
a2[a2[:, 1] > 10]
will give me:
array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
but:
m2 = np.mat(np.arange(30).reshape(5, 6))
m2[m2[:, 1] > 10]
will give me:
matrix([[12, 18, 24]])
Why the output is different and How can I get the same result as array from matrix?
Thank you!
The issue you're experiencing comes down to the fact that operations on a matrix return always return a 2-dimensional array.
When you build the mask on the first array, you get:
In [24]: a2[:,1] > 10
Out[24]: array([False, False, True, True, True], dtype=bool)
which, as you can see, is a 1-dimensional array.
When you do the same thing with the matrix, you get:
In [25]: m2[:,1] > 10
Out[25]:
matrix([[False],
[False],
[ True],
[ True],
[ True]], dtype=bool)
In other words, you have a nx1 array, not an array of length n.
Indexing in numpy operates differently depending on whether you're indexing with a one or n dimensional array.
In your first case, numpy will treat the array of length n as row indices, so you'll get the expected result:
In [28]: a2[a2[:,1] > 10]
Out[28]:
array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
In the second case, because you have a 2-dimensional index array, numpy has enough information to extract both the row and the column, and so it only grabs things from the matching column (the first one):
In [29]: m2[m2[:,1] > 10]
Out[29]: matrix([[12, 18, 24]])
To answer your question: you can get this behaviour by converting your masks to an array and grabbing the first column, to extract your initial array of length n:
In [32]: m2[np.array(m2[:,1] > 10)[:,0]]
Out[32]:
matrix([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
Alternatively, you could do the conversion first, getting the same result as before:
In [34]: np.array(m2)[:,1] > 10
Out[34]: array([False, False, True, True, True], dtype=bool)
Now, both of those fixes require conversions between matrices and arrays, which can be pretty ugly.
The question I'd be asking yourself is why you wish to use a matrix, and yet expect the behaviour of an array.
It could be that the right tool for your job is actually an array, not a matrix.
If you flatten the boolean mask like:
m2[np.asarray(m2[:,1]>10).flatten()]
you get the same result, but I would recommend using np.array instead of np.matrix for the reasons given in this answer.