I have two arrays a and b of length n and m respectively, where n > m, a has values in 1,...,m and b is a permutation of 1,...,m:
# n > m
n = 20000
m = 10000
a = np.random.randint(1, m + 1, size=n)
b = np.random.permutation(m) + 1
How can I find an array c of length n with values in 1,...,m such that the following holds?
assert(b[c-1]==a)
This is one way:
_, c = np.nonzero(b == a[:, None])
assert np.allclose(b[c], a)
Just note that it asserts b[c] to a instead of b[c-1].
Working:
The line b == a[:, None] returns a boolean array of shape n x m, where each row compares the row index-th element of a with all elements of b. That is why you have m boolean elements in a row with True in the corresponding column index col where that element from a equals to b[col]. This uses broadcasting for elementwise comparison.
This is a small illustration:
>>> m = 5
>>> n = 10
>>> a = np.random.randint(1, m+1, size=n)
>>> b = np.random.permutation(m) + 1
>>> a
array([5, 4, 2, 1, 4, 2, 5, 4, 5, 2])
>>> b
array([3, 5, 1, 2, 4])
>>> b == a[:, None]
array([[False, True, False, False, False],
[False, False, False, False, True],
[False, False, False, True, False],
[False, False, True, False, False],
[False, False, False, False, True],
[False, False, False, True, False],
[False, True, False, False, False],
[False, False, False, False, True],
[False, True, False, False, False],
[False, False, False, True, False]])
On applying np.nonzero(), on this 2D boolean array, you get 2 1D arrays of row and column indices of where the passed array has True elements i.e., each (i[k], j[k]) position of the boolean array has True. Here I have shown the row and column index arrays as i and j.
>>> i, j = np.nonzero(b == a[:, None])
>>> i
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> j
array([1, 4, 3, 2, 4, 3, 1, 4, 1, 3])
In a way the columns j gives how the array a can be obtained by indexing b with j.
>>> b[j]
array([5, 4, 2, 1, 4, 2, 5, 4, 5, 2])
>>> a
array([5, 4, 2, 1, 4, 2, 5, 4, 5, 2])
Essentially you have elements in a coming from the set b. The idea above is just to compare where each element in a appears in b and then get the corresponding index.
Related
so let`s say I have a matrix mat= [[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]]
and a lower bound vector vector_low = [2.1,1.9,1.7] and upper bound vector vector_up = [3.1,3.5,4.1].
How do I get the values in the matrix in between the upper and lower bounds for every row?
Expected Output:
[[3],[2,3],[2,3,4]] (it`s a list #mozway)
alternatively a vector with all of them would also do...
(Extra question: get the values of the matrix that are between the upper and lower bound, but rounded down/up to the next value in the matrix..
Expected Output:
[[2,3,4],[1,2,3,4],[1,2,3,4,5]])
There should be a fast solution without loop, hope someone can help, thanks!
PS: In the end I just want to sum over the list entries, so the output format is not important...
I probably shouldn't indulge you since you haven't provided the code I asked for, but to satisfy my own curiosity, here my solution(s)
Your lists:
In [72]: alist = [[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]]
In [73]: low = [2.1,1.9,1.7]; up = [3.1,3.5,4.1]
A utility function:
In [74]: def between(row, l, u):
...: return [i for i in row if l <= i <= u]
and the straightforward list comprehension solution - VERY PYTHONIC:
In [75]: [between(row, l, u) for row, l, u in zip(alist, low, up)]
Out[75]: [[3], [2, 3], [2, 3, 4]]
A numpy solutions requires starting with arrays:
In [76]: arr = np.array(alist)
In [77]: Low = np.array(low)
...: Up = np.array(up)
We can check the bounds with:
In [79]: Low[:, None] <= arr
Out[79]:
array([[False, False, True, True, True, True],
[False, True, True, True, True, True],
[False, True, True, True, True, True]])
In [80]: (Low[:, None] <= arr) & (Up[:,None] >= arr)
Out[80]:
array([[False, False, True, False, False, False],
[False, True, True, False, False, False],
[False, True, True, True, False, False]])
Applying the mask to index arr produces a flat array of values:
In [81]: arr[_]
Out[81]: array([3, 2, 3, 2, 3, 4])
to get values by row, we still have to iterate:
In [82]: [row[mask] for row, mask in zip(arr, Out[80])]
Out[82]: [array([3]), array([2, 3]), array([2, 3, 4])]
For the small case I expect the list approach to be faster. For larger cases [81] will do better - IF we already have arrays. Creating arrays from the lists is not a time-trivial task.
I was doing a python challenge and this one stumped me.
This is the input matrix (numpy format):
# [[1, 7, 2, 2, 1],
# [7, 7, 9, 3, 2],
# [2, 9, 4, 4, 2],
# [2, 3, 4, 3, 2],
# [1, 2, 2, 7, 1]]
and the function would output this matrix
# [[False, True, False, False, False],
# [True, False, True, False, False],
# [False, True, False, True, False],
# [False, False, False, False, False],
# [False, False, False, True, False]]
And you can see the value will be 'true' if any (up/down/left/right) neighbor is 2 smaller than itself. We've been learning numpy, but this doesn't feel like it's too much of a numpy thing).
I tried to do simple if comparison=true checks, but I kept stumbling into out-of-index errors and I couldnt find any way to circumvent/ignore those.
Thanks in advance.
This is the essence of what I've tried so far. I've simplified the task here to simply check the first row horizontally. If I could get this to work, I would extend it to check the next row horizontally until the end, and then I would do the same thing but vertically.
import numpy as np
ex=np.array([[7, 2, 3, 4, 3, 4, 7]])
def count_peaks(A):
matrixHeight=A.shape[0]
matrixWidth=A.shape[1]
peakTable=np.zeros(shape=(matrixHeight,matrixWidth))
for i in range(matrixWidth):
if A[i]-A[i+1]>=2 or A[i]-A[i-1]>=2:
peakTable[0,i]=1
return peakTable
... which of course outputs:
IndexError: index 1 is out of bounds for axis 0 with size 1
as I'm trying to find the value of A[-1] which doesn't exist.
You are using numpy arrays, so don't loop, use vectorial code:
import numpy as np
# get shape
x,y = a.shape
# generate row/col of infinites
col = np.full([x, 1], np.inf)
row = np.full([1, y], np.inf)
# shift left/right/up/down
# and compute difference from initial array
left = a - np.c_[col, a[:,:-1]]
right = a - np.c_[a[:,1:], col]
up = a - np.r_[row, a[:-1,:]]
down = a -np.r_[a[1:,:], row]
# get max of each shift and compare to threshold
peak_table = np.maximum.reduce([left,right,up,down])>=2
# NB. if we wanted to use a maximum threshold, we would use
# `np.minimum` instead and initialize the shifts with `-np.inf`
output:
array([[False, True, False, False, False],
[ True, False, True, False, False],
[False, True, False, True, False],
[False, False, True, False, False],
[False, False, False, True, False]])
input:
import numpy as np
a = np.array([[1, 7, 2, 2, 1],
[7, 7, 9, 3, 2],
[2, 9, 4, 4, 2],
[2, 3, 4, 3, 2],
[1, 2, 2, 7, 1]])
If you don't mind me not using numpy to get the solution, but converting to numpy at the end, here is my attempt:
import numpy as np
def check_neighbors(mdarray,i,j):
neighbors = (-1, 0), (1, 0), (0, -1), (0, 1)
for neighbor in neighbors:
try:
if mdarray[i][j]-mdarray[i+neighbor[0]][j+neighbor[1]]>=2:
return True
except IndexError:
pass
return False
mdarray= [[1, 7, 2, 2, 1],
[7, 7, 9, 3, 2],
[2, 9, 4, 4, 2],
[2, 3, 4, 3, 2],
[1, 2, 2, 7, 1]]
peak_matrix =[]
for i in range(len(mdarray)):
row = []
for j in range(len(mdarray[i])):
#print(check_neighbors(mdarray,i,j))
row.append(check_neighbors(mdarray,i,j))
peak_matrix.append(row)
y=np.array([np.array(xi) for xi in peak_matrix])
print(y)
I use the try-except block to avoid errors when the index goes out of bounds.
Note: Row 4 Column 3 (starting counts at 1) of my output seems to differ from yours. I think that the 4 and 2 difference in the neighbors should make this entry true?
Output:
[[False True False False False]
[ True False True False False]
[False True False True False]
[False False True False False]
[False False False True False]]
Edit: changed from bare except to IndexError as Neither suggests in the comments. pass and continue doesn't make a difference in this case but yes.
I have two NumPy arrays as below:
import numpy as np
a = np.array([2, 1, 1, 2, 0, 2, 2, 2, 1, 1])
b = np.array([4, 3, 4, 4, 3, 3, 4, 3, 4, 3])
I want to count how many times an item 2 is encountered in the array a with the condition that the array b had items 4 at corresponding indices:
a = np.array([2, 1, 1, 2, 0, 2, 2, 2, 1, 1])
b = np.array([4, 3, 4, 4, 3, 3, 4, 3, 4, 3])
↑ ↑ ↑
As you can see there are 3 such cases. How do I calculate that?
You can achieve it as follows:
import numpy as np
a = np.array([2, 1, 1, 2, 0, 2, 2, 2, 1, 1])
b = np.array([4, 3, 4, 4, 3, 3, 4, 3, 4, 3])
result = ((a == 2) & (b == 4)).sum()
print(result)
# 3
a == 2 and b == 4 will make boolean arrays with True values when the items equal 2 and 4 respectively:
>>> a == 2
array([ True, False, False, True, False, True, True, True, False, False])
>>> b == 4
array([ True, False, True, True, False, False, True, False, True, False])
By using the logical and operator & in (a == 2) & (b == 4) we will get a boolean array with True for those positions where both items are True:
>>> (a == 2) & (b == 4)
array([ True, False, False, True, False, False, True, False, False, False])
and to count the total number of True values we can just use the sum method.
References:
Indexing and slicing
Boolean or “mask” index arrays
I have a numpy array with booleans:
bool_array.shape
Out[84]: (78, 8)
bool_array.dtype
Out[85]: dtype('bool')
And I would like to find the indices where the second dimension is True:
bool_array[30:35]
Out[87]:
array([[False, False, False, False, True, False, False, False],
[ True, False, False, False, True, False, False, False],
[False, False, False, False, False, True, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, False, False, False, False, False, False]], dtype=bool)
I have been using numpy.where to do this, but sometimes there are more than 1 indices along the second dimension with the True value.
I would like to find a way to obtain the same result as numpy.where but avoiding to have 2 indices from the same row:
np.where(bool_array)[0][30:35]
Out[88]: array([30, 31, 31, 32, 33])
I currently solve this by looping over the results of numpy.where, finding which n indices are equal to n-1, and using numpy.delete to remove the unwanted indices.
I would like to know if there is a more directly way to obtain the kind of results that I want.
Notes:
The rows of the boolean arrays that I use always have at least 1
True value.
I don't care which one of the multiples True values remains, i only
care to have just 1.
IIUC and given the fact that there is at least one TRUE element per row, you can simply use np.argmax along the second axis to select the first TRUE element along each row, like so -
col_idx = bool_array.argmax(1)
Sample run -
In [246]: bool_array
Out[246]:
array([[ True, True, True, True, False],
[False, False, True, True, False],
[ True, True, False, False, True],
[ True, True, False, False, True]], dtype=bool)
In [247]: np.where(bool_array)[0]
Out[247]: array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
In [248]: np.where(bool_array)[1]
Out[248]: array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
In [249]: bool_array.argmax(1)
Out[249]: array([0, 2, 0, 0])
Explanation -
Corresponding to the duplicates from the output of np.where(bool_array)[0], i.e. :
array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
, we need to select anyone from the output of np.where(bool_array)[1], i.e. :
array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
^ ^ ^ ^
Thus, selecting the first True from each row with bool_array.argmax(1) gives us :
array([0, 2, 0, 0])
You could call np.unique on the resultant array like so:
>>> np.where(bool_array)[0][30:35]
Out[4]: array([0, 1, 1, 2, 3, 4])
>>> np.unique(np.where(bool_array)[0][30:35])
Out[5]: array([0, 1, 2, 3, 4])
I'm trying to return a numpy flattened array of a numpy matrix where all the values where the row == col is ignored.
For example:
>>> m = numpy.matrix([[1,2,3],[4,5,6],[7,8,9]])
>>> m
matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Some function....
# result:
m_flat = array([2,3,4,6,7,8])
You could use np.eye to create the appropriate boolean mask:
In [139]: np.eye(m.shape[0], dtype='bool')
Out[139]:
array([[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
In [140]: m[~np.eye(m.shape[0], dtype='bool')]
Out[140]: matrix([[2, 3, 4, 6, 7, 8]])