Find location of pair of elements in two arrays in numpy - python

I have two numpy arrays x and y
Suppose x = [0, 1, 1, 1, 3, 4, 5, 5, 5] and y = [0, 2, 3, 4, 2, 1, 3, 4, 5]
The length of both arrays is the same and the coordinate pair I am looking for definitely exists in the array.
How can I find the index of (a, b) in these arrays, where a is an element in x and b is the corresponding element in y. For example, the index of (1, 4) would be 3: the elements at index 3 of x and y are 1 and 4 respectively.

You could use numpy.where combined with numpy.logical_and if you want a purely numpy solution:
In [16]: import numpy as np
In [17]: x = np.array([0, 1, 1, 1, 3, 4, 5, 5, 5])
In [18]: y = np.array([0, 2, 3, 4, 2, 1, 3, 4, 5])
In [19]: np.where(np.logical_and(x == 1, y == 4))[0]
Out[19]: array([3], dtype=int64)
numpy.logical_and allows you to element-wise perform a logical AND operation between two numpy arrays. What we're doing here is determining which locations contain both the x values being 1 and the y values being 4 in the same corresponding locations. Those locations that satisfy this are True. numpy.where determines the locations in the array where this condition is satisfied. numpy.where actually returns both row and column locations of where the condition is True separately as a tuple of two elements, but as we are only considered with one dimension, only the first tuple is valid which is why we immediately index the first element of the tuple.
The output is a numpy array of locations where the condition is valid. You can even go further and coerce the output to be a list of indices to make things neater and/or if it is required (thanks #EddoHintoso):
In [20]: list(np.where(np.logical_and(x == 1, y == 4))[0])
Out[20]: [3]

You could compare your first array with the first value, second array with the second value and then find where both True. Then you could get that True with argmax which will give you first index of the first True occurence:
x = np.array([0, 1, 1, 1, 3, 4, 5, 5, 5])
y = np.array([0, 2, 3, 4, 2, 1, 3, 4, 5])
idx = ((x == 1) & (y == 4)).argmax()
In [35]: idx
Out[35]: 3
In [36]: x == 1
Out[36]: array([False, True, True, True, False, False, False, False, False], dtype=bool)
In [37]: y == 4
Out[37]: array([False, False, False, True, False, False, False, True, False], dtype=bool)
If you could have multiple occurence you could use following using nonzero:
idx_list = ((x == 1) & (y == 4))
idx = idx_list.nonzero()[0]
In [51]: idx
Out[51]: array([3], dtype=int64)
Or if you need list of indices:
In [57]: idx_list.nonzero()[0].tolist()
Out[57]: [3]
You could do that in one line with:
idx = ((x == 1) & (y == 4)).nonzero()[0]

x = [0, 1, 1, 1, 3, 4, 5, 5, 5]
y = [0, 2, 3, 4, 2, 1, 3, 4, 5]
w=zip(x,y)
w.index((1,4))

Related

How to get the indices of at least two consecutive values that are all greater than a threshold?

For example, let's consider the following numpy array:
[1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
Also, let's suppose that the threshold is equal to 3.
That is to say that we are looking for sequences of at least two consecutive values that are all above the threshold.
The output would be the indices of those values, which in our case is:
[[3, 4, 5], [8, 9]]
If the output array was flattened that would work as well!
[3, 4, 5, 8, 9]
Output Explanation
In our initial array we can see that for index = 1 we have the value 5, which is greater than the threshold, but is not part of a sequence (of at least two values) where every value is greater than the threshold. That's why this index would not make it to our output.
On the other hand, for indices [3, 4, 5] we have a sequence of (at least two) neighboring values [5, 4, 6] where each and every of them are above the threshold and that's the reason that their indices are included in the final output!
My Code so far
I have approached the issue with something like this:
(arr > 3).nonzero()
The above command gathers the indices of all the items that are above the threshold. However, I cannot determine if they are consecutive or not. I have thought of trying a diff on the outcome of the above snippet and then may be locating ones (that is to say that indices are one after the other). Which would give us:
np.diff((arr > 3).nonzero())
But I'd still be missing something here.
If you convolve a boolean array with a window full of 1 of size win_size ([1] * win_size), then you will obtain an array where there is the value win_size where the condition held for win_size items:
import numpy as np
def groups(arr, *, threshold, win_size, merge_contiguous=False, flat=False):
conv = np.convolve((arr >= threshold).astype(int), [1] * win_size, mode="valid")
indexes_start = np.where(conv == win_size)[0]
indexes = [np.arange(index, index + win_size) for index in indexes_start]
if flat or merge_contiguous:
indexes = np.unique(indexes)
if merge_contiguous:
indexes = np.split(indexes, np.where(np.diff(indexes) != 1)[0] + 1)
return indexes
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
win_size = 2
print(groups(arr, threshold=threshold, win_size=win_size))
print(groups(arr, threshold=threshold, win_size=win_size, merge_contiguous=True))
print(groups(arr, threshold=threshold, win_size=win_size, flat=True))
[array([3, 4]), array([4, 5]), array([8, 9])]
[array([3, 4, 5]), array([8, 9])]
[3 4 5 8 9]
You can do what you want using simple numpy operations
import numpy as np
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
arr_padded = np.concatenate(([0], arr, [0]))
a = np.where(arr_padded > 3, 1, 0)
da = np.diff(a)
idx_start = (da == 1).nonzero()[0]
idx_stop = (da == -1).nonzero()[0]
valid = (idx_stop - idx_start >= 2).nonzero()[0]
result = [list(range(idx_start[i], idx_stop[i])) for i in valid]
print(result)
Explanation
Array a is a padded binary version of the original array, with 1s where the original elements are greater than three. da contains 1s where "islands" of 1s begin in a, and -1 where the "islands" end in a. Due to the padding, there is guaranteed to be an equal number of 1s and -1s in da. Extracting their indices, we can calculate the length of the islands. Valid index pairs are those whose respective "islands" have length >= 2. Then, its just a matter of generating all numbers between the index bounds of the valid "islands".
I follow your original idea. You are almost done.
I use another diff2 to pick the index of the first value in a sequence. See comments in code for details.
import numpy as np
arr = np.array([ 1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
all_idx = (arr > threshold).nonzero()[0]
# array([1, 3, 4, 5, 8, 9])
result = np.empty(0)
if all_idx.size > 1:
diff1 = np.zeros_like(all_idx)
diff1[1:] = np.diff(all_idx)
# array([0, 2, 1, 1, 3, 1])
diff1[0] = diff1[1]
# array([2, 2, 1, 1, 3, 1])
# **Positions with a value 1 in diff1 should be reserved.**
# But we also want the position before each 1. Create another diff2
diff2 = np.zeros_like(all_idx)
diff2[:-1] = np.diff(diff1)
# array([ 2, -1, 0, 2, -2, 0])
# **Positions with a negative value in diff2 should be reserved.**
result = all_idx[(diff1==1) | (diff2<0)]
print(result)
# array([3, 4, 5, 8, 9])
I'll try something different using window views, I'm not sure this works all the time so counterexamples are welcome. It has the advantage of not requiring Python loops.
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as window
def consec_thresh(arr, thresh):
win = window(np.argwhere(arr > thresh), (2, 1))
return np.unique(win[np.diff(win, axis=2).ravel() == 1, :,:].ravel())
How does it work?
So we start with the array and gather the indices where the threshold is met:
In [180]: np.argwhere(arr > 3)
Out[180]:
array([[1],
[3],
[4],
[5],
[8],
[9]])
Then we build a sliding window that makes up pair of values along the column (which is the reason for the (2, 1) shape of the window).
In [181]: window(np.argwhere(arr > 3), (2, 1))
Out[181]:
array([[[[1],
[3]]],
[[[3],
[4]]],
[[[4],
[5]]],
[[[5],
[8]]],
[[[8],
[9]]]])
Now we want to take the difference inside each pair, if it's one then the indices are consecutive.
In [182]: np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2)
Out[182]:
array([[[[2]]],
[[[1]]],
[[[1]]],
[[[3]]],
[[[1]]]])
We can plug those values back in the windows we created above,
In [185]: window(np.argwhere(arr > 3), (2, 1))[np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2).ravel() == 1, :, :]
Out[185]:
array([[[[3],
[4]]],
[[[4],
[5]]],
[[[8],
[9]]]])
Then we can ravel (flatten without copy when possible), we have to get rid of the repeated indices created by windowing so I call np.unique. We ravel again and get:
array([3, 4, 5, 8, 9])
The below iteration code should help with O(n) complexity
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
threshold = 3
sequence = 2
output = []
temp_arr = []
for i in range(len(arr)):
if arr[i] > threshold:
temp_arr.append(i)
else:
if len(temp_arr) >= sequence:
output.append(temp_arr)
temp_arr = []
if len(temp_arr):
output.append(temp_arr)
temp_arr = []
print(output)
# Output
# [[3, 4, 5], [8, 9]]
I would suggest using a for loop with two indces. You will have one that starts at j=1 and the other at i=0, both stepping forward by 1.
You can then ask if the value at both is greater than the threshold, if so
add the indices to a list and keep moving forward with j until the threshold or .next() is not greater than threshhold.
values = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
res=[]
threshold= 3
i=0
j=0
for _ in values:
j=i+1
lista=[]
try:
print(f"i: {i} j:{j}")
# check if condition is met
if(values[i] > threshold and values[j] > threshold):
lista.append(i)
# add sequence
while values[j] > threshold:
lista.append(j)
print(f"j while: {j}")
j+=1
if(j>=len(values)):
break
res.append(lista)
i=j
if(j>=len(values)):
break
except:
print("ex")
this works. but needs refactoring
Let's try the following code:
# Simple is better than complex
# Complex is better than complicated
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
arr_3=[i if arr[i]>3 else 'a' for i in range(len(arr))]
arr_4=''.join(str(x) for x in arr_3)
i=0
while i<len(arr_5):
if len(arr_5[i]) <=1:
del arr_5[i]
else:
i+=1
arr_6=[list(map(lambda x: int(x), list(x))) for x in arr_5]
print(arr_6)
Outputs:
[[3, 4, 5], [8, 9]]
Here is a solution that makes use of pandas Series:
thresh = 3
win_size = 2
s = pd.Series(arr)
# locating groups of values where there are at least (win_size) consecutive values above the threshold
groups = s.groupby(s.le(thresh).cumsum().loc[s.gt(thresh)]).transform('count').ge(win_size)
0 False
1 False
2 False
3 True
4 True
5 True
6 False
7 False
8 True
9 True
dtype: bool
We can now easily take their indices in a 1D array:
np.flatnonzero(groups)
# array([3, 4, 5, 8, 9], dtype=int64)
OR multiple lists:
[np.arange(index.start, index.stop) for index in np.ma.clump_unmasked(np.ma.masked_not_equal(groups.values, value=True))]
# [array([3, 4, 5], dtype=int64), array([8, 9], dtype=int64)]

Count occurrences of two NumPy arrays having given items at corresponding indices

I have two NumPy arrays as below:
import numpy as np
a = np.array([2, 1, 1, 2, 0, 2, 2, 2, 1, 1])
b = np.array([4, 3, 4, 4, 3, 3, 4, 3, 4, 3])
I want to count how many times an item 2 is encountered in the array a with the condition that the array b had items 4 at corresponding indices:
a = np.array([2, 1, 1, 2, 0, 2, 2, 2, 1, 1])
b = np.array([4, 3, 4, 4, 3, 3, 4, 3, 4, 3])
↑ ↑ ↑
As you can see there are 3 such cases. How do I calculate that?
You can achieve it as follows:
import numpy as np
a = np.array([2, 1, 1, 2, 0, 2, 2, 2, 1, 1])
b = np.array([4, 3, 4, 4, 3, 3, 4, 3, 4, 3])
result = ((a == 2) & (b == 4)).sum()
print(result)
# 3
a == 2 and b == 4 will make boolean arrays with True values when the items equal 2 and 4 respectively:
>>> a == 2
array([ True, False, False, True, False, True, True, True, False, False])
>>> b == 4
array([ True, False, True, True, False, False, True, False, True, False])
By using the logical and operator & in (a == 2) & (b == 4) we will get a boolean array with True for those positions where both items are True:
>>> (a == 2) & (b == 4)
array([ True, False, False, True, False, False, True, False, False, False])
and to count the total number of True values we can just use the sum method.
References:
Indexing and slicing
Boolean or “mask” index arrays

Locate asymmetries in a matrix

I have generated matrix of pairwise distances between list items, but something went wrong and it is not symmetric.
In the case the matrix looks like this:
array = np.array([
[0, 3, 4],
[3, 0, 2],
[1, 2, 0]
])
How can I locate the actual asymmetries? In this case, the indices of 4 and 1.
I have confirmed the asymmetry by trying to condense the matrix by scipy squareform function, and then using
def check_symmetric(a, rtol=1e-05, atol=1e-08):
return np.allclose(a, a.T, rtol=rtol, atol=atol)
quite late but here would be a alternative the numpy way...
import numpy as np
m = np.array([[0, 3, 4 ],
[ 3, 0, 2 ],
[ 1, 2, 0 ]])
def check_symmetric(a):
diff = a - a.T
boolmatrix = np.isclose(a, a.T) # play around with your tolerances here...
output = np.argwhere(boolmatrix == False)
return output
output:
check_symmetric(m)
>>> array([[0, 2],
[2, 0]])
You can simply use the negation of np.isclose():
mask = ~np.isclose(array, array.T)
mask
# array([[False, False, True],
# [False, False, False],
# [ True, False, False]])
Use that value as an index to get the values:
array[mask]
# array([4, 1])
And use np.where() if you want the indices instead:
np.where(mask)
# (array([0, 2]), array([2, 0]))
The following is quick and slow but if the object is to debug will probably do.
a # nearly symmetric array.
Out:
array([[8, 1, 6, 5, 3],
[1, 9, 4, 4, 4],
[6, 4, 3, 7, 1],
[5, 4, 7, 5, 2],
[3, 4, 1, 3, 7]])
Define function to find and print the differences.
ERROR_LIMIT = 0.00001
def find_asymmetries( a ):
""" Prints the row and column indices with the difference
where abs(a[r,c] - a[c,r]) > ERROR_LIMIT """
res = a-a.T
for r, row in enumerate(res):
for c, cell in enumerate(row):
if abs(cell) > ERROR_LIMIT : print( r, c, cell )
find_asymmetries( a )
3 4 -1
4 3 1
This version halves the volume of results.
def find_asymmetries( a ):
res = a-a.T
for r, row in enumerate(res):
for c, cell in enumerate(row):
if c == r: break # Stop column search once c == r
if abs(cell) > ERROR_LIMIT : print( r, c, cell )
find_asymmetries( a )
4 3 1 # Row number always greater than column number

Explanation for the code that returns indices of unique values in 1D NumPy array

I found this snippet of code online and am having difficulty in understanding what each part of it is doing as I'm not proficient in Python.
The following routine takes an array as input and returns a dictionary that maps each unique value to its indices
def partition(array):
return {i: (array == i).nonzero()[0] for i in np.unique(array)}
Trace each part out, this should speak for itself. Comments inlined.
In [304]: array = np.array([1, 1, 2, 3, 2, 1, 2, 3])
In [305]: np.unique(array) # unique values in `array`
Out[305]: array([1, 2, 3])
In [306]: array == 1 # retrieve a boolean mask where elements are equal to 1
Out[306]: array([ True, True, False, False, False, True, False, False])
In [307]: (array == 1).nonzero()[0] # get the `True` indices for the operation above
Out[307]: array([0, 1, 5])
In summary; the code is creating a mapping of <unique_value : all indices of unique_value in array> -
In [308]: {i: (array == i).nonzero()[0] for i in np.unique(array)}
Out[308]: {1: array([0, 1, 5]), 2: array([2, 4, 6]), 3: array([3, 7])}
And here's the slightly more readable version -
In [313]: mapping = {}
...: for i in np.unique(array):
...: mapping[i] = np.where(array == i)[0]
...:
In [314]: mapping
Out[314]: {1: array([0, 1, 5]), 2: array([2, 4, 6]), 3: array([3, 7])}
consider also the following Pandas solution:
import pandas as pd
In [165]: s = pd.Series(array)
In [166]: d = s.groupby(s).groups
In [167]: d
Out[167]:
{1: Int64Index([0, 1, 5], dtype='int64'),
2: Int64Index([2, 4, 6], dtype='int64'),
3: Int64Index([3, 7], dtype='int64')}
PS pandas.Int64Index - supports all methods and indexing like a regular 1D numpy array
it can be easily converted to Numpy array:
In [168]: {k:v.values for k,v in s.groupby(s).groups.items()}
Out[168]:
{1: array([0, 1, 5], dtype=int64),
2: array([2, 4, 6], dtype=int64),
3: array([3, 7], dtype=int64)}
array == i Return a boolean array of True whenever the value is equal to i and False otherwise.
nonzero() Return the indices of the elements that are non-zero(not False). https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.nonzero.html
nonzero()[0] Return the first index where array[index] = i.
for i in np.unique(array) Iterate over all the unique values of array or in other words do the logic foreach value of unique value of the array.

Basics of numpy where function, what does it do to the array?

I have seen the post Difference between nonzero(a), where(a) and argwhere(a). When to use which? and I don't really understand the use of the where function from numpy module.
For example I have this code
import numpy as np
Z =np.array(
[[1,0,1,1,0,0],
[0,0,0,1,0,0],
[0,1,0,1,0,0],
[0,0,1,1,0,0],
[0,1,0,0,0,0],
[0,0,0,0,0,0]])
print Z
print np.where(Z)
Which gives:
(array([0, 0, 0, 1, 2, 2, 3, 3, 4], dtype=int64),
array([0, 2, 3, 3, 1, 3, 2, 3, 1], dtype=int64))
The definition of where function is:
Return elements, either from x or y, depending on condition. But it doesn't also makes sense to me
So what does the output exactly mean?
np.where returns indices where a given condition is met. In your case, you're asking for the indices where the value in Z is not 0 (e.g. Python considers any non-0 value as True). Which for Z results in:
(0, 0) # top left
(0, 2) # third element in the first row
(0, 3) # fourth element in the first row
(1, 3) # fourth element in the second row
... # and so on
np.where starts to make sense in the following scenarios:
a = np.arange(10)
np.where(a > 5) # give me all indices where the value of a is bigger than 5
# a > 5 is a boolean mask like [False, False, ..., True, True, True]
# (array([6, 7, 8, 9], dtype=int64),)
Hope that helps.

Categories

Resources