I have a numpy array and want to extract some values out of it. This is my array:
arr= array([[0, 0, 23, 28],
[0, 19, 24, 29],
[0, 20, 25, 30],
[17, 21, 26, 31],
[18, 22, 27, 32]])
I want to first sort the non-zero part of it and chage it into:
arr= array([[0, 0, 27, 32],
[0, 22, 26, 31],
[0, 21, 25, 30],
[18, 20, 24, 29],
[17, 19, 23, 28]])
Then, from the first column, I want to extract the last two rows (18 and 17). In the second column I have four none zero one: it means 2 more nonzero rows compred to previous column. So, I want two upper rows. In the third column I see five non-zero rows which is one row more than the second row, so I want the one row. In last column, the difference of its non-zero rows with the previous one zero, so I do not want any row from it. Finally. I want to have these extracted numbersas a list or numpy array:
result= [17, 18, 21, 22, 27]
I tried the following but it was successful at all:
result=[]
for m in range (len (arr[0,:])):
for i in range (len (arr[:,m])):
if arr[i,m]==0 and arr[i+1,m]!=0:
b= arr[i+1:,m]
result.append (b)
I appreciate any help in advance.
Let's try:
mask = arr != 0
# mask the 0 with infinity and sort
new_arr = np.sort(np.where(mask, arr, np.inf), axis=0)
# replace back:
arr[:] = np.where(mask, new_arr[::-1], 0)
# extract the result
result = arr[np.arange(arr.shape[0]),mask.argmax(axis=1)]
Related
Is there a way to find an index of an array inside another array without converting them to list or using a for loop?
I have a huge data set and I don't want to add another loop and make it slower
arr = np.array([[11, 19, 18], [14, 15, 11], [19, 21, 46], [29, 21, 19]])
find_this_array = np.array([14, 15, 11])
# I want to avoid this
a = arr.tolist()
val = find_this_array.tolist()
a.index(val)
output:
1
You can try this:
np.where((arr == find_this_array).all(axis=1))[0][0]
output:
1
You can find more details about Numpy where from their documentation:
https://numpy.org/doc/stable/reference/generated/numpy.where.html
I have a large numpy array (typically a few thousands of numbers) that is consisted of several sorted sequences,
for example:
arr = [12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11]
I would like to split it into subarrays - each one holds another sequence -
[12, 13, 14], [22, 23, 24, 25, 26], [9, 10, 11]
What is the fastest way to do that?
I would do it following way
import numpy as np
arr = np.array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
splits = np.flatnonzero(np.diff(arr)!=1)
sub_arrs = np.split(arr, splits+1)
print(sub_arrs)
output
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]
Explanation: I create array with differences between adjacent elements using numpy.diff (np.diff(arr)) then process it to get array with Trues where difference is 1 and Falses in every other case (np.diff(arr)!=1) then find indices of Trues in that array using np.flatnonzero (True is treated as 1 and False is treated as 0 in python) finally I use numpy.split to get list of subarrays made from arr at spllited at splits offseted by 1 (note that numpy.diff returns array which is shorter by 1 than its input).
Side note: I would call this finding sub-arrays with consecutive runs, rather than merely sorted as you might split your arr into [[12, 13, 14, 22, 23, 24, 25, 26], [9, 10, 11]] and full-fill requirement that every sub-array is sorted
First of all, the problem could be really complex, but based on your example I assume that the values in subarrays are increasing by 1.
Here is a one liner solution with plain numpy: np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
Explanation: You can calculate the difference between consecutive values with np.diff.
>>> import numpy as np
>>> a
array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
>>> np.diff(a)
array([ 1, 1, 8, 1, 1, 1, 1, -17, 1, 1])
Then, get the indices of the values that represents the last element of the subarrays, that is the values that do no equal 1.
>>> np.where(np.diff(a) != 1)
(array([2, 7]),)
Finally, we add 1 to the boundaries to be able to use np.array_split() correctly to generate the subarrays.
>>> np.where(np.diff(a) != 1)[0]+1
array([3, 8])
>>> np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]
This question already has answers here:
How do I get all the values from a NumPy array excluding a certain index?
(5 answers)
Closed 4 years ago.
Suppose I have a NumPy ndarray M with the following content at M[0,:]:
[2, 3.9, 7, 9, 0, 1, 8.1, 3.2]
and I am given an integer, k, at runtime between 0 and 7. I want to produce the vector consisting of all items in this row except at column k. (Example: if k=3, then the desired vector is [2,3.9,7,0,1,8.1,3.2])
Is there an easy way to do this?
What if I have a vector of indices k, one for each row of M, representing the column I want to exclude from the row?
I'm kind of lost, other than a non-vectorized loop that mutates a result matrix:
nrows = M.shape[0]
result = np.zeros(nrows,M.shape[1]-1))
for irow in xrange(nrows):
result[irow,:k[irow]] = M[irow,:k[irow]] # content before the split point
result[irow,k[irow]:] = M[irow,k[irow]+1:] # content after the split point
One approach would be with masking/boolean-indexing -
mask = np.ones(M.shape,dtype=bool)
mask[np.arange(len(k)),k] = 0
out = M[mask].reshape(len(M),-1)
Alternativley, we could use broadcasting to get that mask -
np.not_equal.outer(k,np.arange(M.shape[1]))
# or k[:,None]!=np.arange(M.shape[1])
Thus, giving us a one-liner/compact version -
out = M[k[:,None]!=np.arange(M.shape[1])].reshape(len(M),-1)
To exclude multiple ones per row, edit the advanced-indexing part for the first method -
def exclude_multiple(M,*klist):
k = np.stack(klist).T
mask = np.ones(M.shape,dtype=bool)
mask[np.arange(len(k))[:,None],k] = 0
out = M[mask].reshape(len(M),-1)
return out
Sample run -
In [185]: M = np.arange(40).reshape(4,10)
In [186]: exclude_multiple(M,[1,3,2,0],[4,5,8,1])
Out[186]:
array([[ 0, 2, 3, 5, 6, 7, 8, 9],
[10, 11, 12, 14, 16, 17, 18, 19],
[20, 21, 23, 24, 25, 26, 27, 29],
[32, 33, 34, 35, 36, 37, 38, 39]])
Improvement on #Divakar's answer to extend this to zero or more excluded indices per row:
def excluding(A, *klist):
"""
excludes column k from each row of A, for each k in klist
(make sure the index vectors have no common elements)
"""
mask = np.ones(A.shape,dtype=bool)
for k in klist:
mask[np.arange(len(k)),k] = 0
return A[mask].reshape(len(A),-1)
Test:
M = np.arange(40).reshape(4,10)
excluding(M,[1,3,2,0],[4,5,8,1])
returns
array([[ 0, 2, 3, 5, 6, 7, 8, 9],
[10, 11, 12, 14, 16, 17, 18, 19],
[20, 21, 23, 24, 25, 26, 27, 29],
[32, 33, 34, 35, 36, 37, 38, 39]])
I have two arrays:
array1 = [1,2,3]
array2 = [10,20,30]
I want the next sum:
array3 = [10+1,10+2,10+3,20+1,20+2,20+3,30+1,30+2,30+3]
How can I do that?
(I know that it can be done with two for loops but I want something more efficient if possible)
Note: those two arrays are contained in a dataframe (pandas)
I do not think pandas is necessary here
[x+y for x in array2 for y in array1]
Out[293]: [11, 12, 13, 21, 22, 23, 31, 32, 33]
If they are in the dataframe
df=pd.DataFrame({'a':array1,'b':array2})
df
Out[296]:
a b
0 1 10
1 2 20
2 3 30
df.a.values+df.b.values[:,None]
Out[297]:
array([[11, 12, 13],
[21, 22, 23],
[31, 32, 33]], dtype=int64)
Update
(df.a.values+df.b.values[:,None]).ravel()
Out[308]: array([11, 12, 13, 21, 22, 23, 31, 32, 33], dtype=int64)
I wanted to recommend using itertools.product here, https://docs.python.org/3/library/itertools.html included a lot of other recipes that allows you to code
more clearly
from itertools import product
array1 = [1,2,3]
array2 = [10,20,30]
[x+y for x,y in product(array1,array2)]
# fp style
[*map(sum, product(array1,array2))]
I have a question: how to get a sub matrix like a sub array by boolean slicing?
For example:
a2 = np.array(np.arange(30).reshape(5, 6))
a2[a2[:, 1] > 10]
will give me:
array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
but:
m2 = np.mat(np.arange(30).reshape(5, 6))
m2[m2[:, 1] > 10]
will give me:
matrix([[12, 18, 24]])
Why the output is different and How can I get the same result as array from matrix?
Thank you!
The issue you're experiencing comes down to the fact that operations on a matrix return always return a 2-dimensional array.
When you build the mask on the first array, you get:
In [24]: a2[:,1] > 10
Out[24]: array([False, False, True, True, True], dtype=bool)
which, as you can see, is a 1-dimensional array.
When you do the same thing with the matrix, you get:
In [25]: m2[:,1] > 10
Out[25]:
matrix([[False],
[False],
[ True],
[ True],
[ True]], dtype=bool)
In other words, you have a nx1 array, not an array of length n.
Indexing in numpy operates differently depending on whether you're indexing with a one or n dimensional array.
In your first case, numpy will treat the array of length n as row indices, so you'll get the expected result:
In [28]: a2[a2[:,1] > 10]
Out[28]:
array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
In the second case, because you have a 2-dimensional index array, numpy has enough information to extract both the row and the column, and so it only grabs things from the matching column (the first one):
In [29]: m2[m2[:,1] > 10]
Out[29]: matrix([[12, 18, 24]])
To answer your question: you can get this behaviour by converting your masks to an array and grabbing the first column, to extract your initial array of length n:
In [32]: m2[np.array(m2[:,1] > 10)[:,0]]
Out[32]:
matrix([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
Alternatively, you could do the conversion first, getting the same result as before:
In [34]: np.array(m2)[:,1] > 10
Out[34]: array([False, False, True, True, True], dtype=bool)
Now, both of those fixes require conversions between matrices and arrays, which can be pretty ugly.
The question I'd be asking yourself is why you wish to use a matrix, and yet expect the behaviour of an array.
It could be that the right tool for your job is actually an array, not a matrix.
If you flatten the boolean mask like:
m2[np.asarray(m2[:,1]>10).flatten()]
you get the same result, but I would recommend using np.array instead of np.matrix for the reasons given in this answer.