I am trying to get both the elements and indices from two arrays where the elements match. I think I am overthinking this but I have tried the where function and intersection and cannot get it to work. My actual array is much longer but here two simple arrays to demonstrate what I want:
import numpy as np
arr1 = np.array([0.00, 0.016, 0.033, 0.050, 0.067])
arr2 = np.array([0.016, 0.033, 0.050, 0.067, 0.083])
ind = np.intersect1d(np.where(arr1 >= 0.01), np.where(arr2 >= 0.01))
Printing ind shows array([1, 2, 3, 4]). Technically, I want the elements 1, 2, 3, 4 from arr1 and elements 0, 1, 2, 3 from arr2, which gives the elements 0.016, 0.033, 0.050, 0.067, which match in both arrays.
np.where converts a boolean mask like arr1 >= 0.01 into an index. You can select with the mask directly, but it won't be invertible. You need to invert the indices because you want to intersect from the original array, not the selection. Make sure to set return_indices=True to get indices from intersect1d:
index1 = np.nonzero(arr1 >= 0.01)
index2 = np.nonzero(arr2 >= 0.01)
selection1 = arr1[index1]
selection2 = arr2[index1]
elements, ind1, ind2 = np.intersect1d(selection1, selection2, return_indices=True)
index1 = index1[ind1]
index2 = index2[ind2]
While you get elements directly from the intersection, the indices ind1 and ind2 are referencing the masked selections. Since index1 is the original index of each element in selection1, index1[ind1] converts ind1 back into the arr1 reference frame.
Your original expression was actually meaningless. You were intersecting the indices in each array that met your condition. That has nothing to do with the values at those indices (which wouldn't have to match at all). The seemingly correct result is purely a coincidence based on a fortuitous array construction.
Related
I have a numpy array arr containing 0s and 1s,
arr = np.random.randint(2, size=(800,800))
Then I casted it to astype(np.float32) and inserted various float numbers at various positions. In fact, what I would like to do is insert those float numbers only where the original array had 1 rather than 0; where the original array had 0 I want to keep 0.
My thought was to take a copy of the array (with .copy()) and reinsert from that later. So now I have arr above (1s and 0s), and a same-shaped array arr2 with numerical elements. I want to replace the elements in arr2 with those in arr only where (and everywhere where) the element in arr is 0. How can I do this?
Small example:
arr = np.array([1,0],
[0,1])
arr2 = np.array([2.43, 5.25],
[1.54, 2.59])
Desired output:
arr2 = np.array([2.43, 0],
[0, 2.59])
N.B. should be as fast as possible on arrays of around 800x800
Simply do:
arr2[arr == 0] = 0
or
arr2 = arr2 * arr
#swag2198 is correct, an alternative is below
Numpy has a functin called 'where' which allows you to set values based on a condition from another array - this is essentially masking
below will achieve what you want - essentially it will return an array the same dimensions as arr2, except wherever there is a zero in arr, it will be replaced with zero
arr = np.array([[1,0],[0,1]])
arr2 = np.array([[2.43, 5.25],
[1.54, 2.59]])
arr_out = np.where(arr, arr2, 0)
the advantage of this way is that you can pick values based on two arrays if you wish - say you wanted to mask an image for instance - replace the background
I want to solve something like the problem detailed at Find index mapping between two numpy arrays, but where the two arrays do not necessarily contain the same set of values, although their values are unique within each array, and are sorted.
E.g. if I have two arrays:
a = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
b = np.array([2.2, 3.0, 4.4, 6.0])
I want to get an array of the same length as a which gives the index into b where the matching element is, or -1 if there is no match. I.e. in this case:
map = np.array([-1, 0, -1, 2, -1])
Is there a neat, fast way to do this using np.searchsorted?
Use the searchsorted indices to do a check on matches and then mask the invalid ones with the invalid-specifier. For the matching check, do b[idx]==a with idx as those indices. Hence -
invalid_specifier = -1
idx = np.searchsorted(b,a)
idx[idx==len(b)] = 0
out = np.where(b[idx]==a, idx, invalid_specifier)
If given an array like
a = array([[2,4,9,8,473],[54,7,24,19,20]])
then how can I write the indexes of the array which are between values x and y?
currently I've got:
where(5 > a > 10)
if will however give an output if I say for example:
where(a > 5)
but the where function doesn't take this command and once it will it should output a 2 one dimensional array, is there a way to easily stack them?
You can use logical operator &(and) | (or) to chain different conditions together, so for your case, you can do:
np.where((a > 5) & (a < 10))
# (array([0, 0, 1]), array([2, 3, 1]))
# here np.where gives a tuple, the first element of which gives the row index, while the
# second element gives the corresponding column index
If you want the indices to be an array where each row represents an element, you can stack them:
np.stack(np.where((a > 5) & (a < 10)), axis=-1)
# array([[0, 2],
# [0, 3],
# [1, 1]])
Or as #Divakar commented use np.argwhere((a > 5) & (a < 10)).
you have two indexes that you need to specify, one for which inner array you are referencing and the other for what actual member of that array you are referring to
I have the following array:
arr = numpy.array([[.5, .5], [.9, .1], [.8, .2]])
I would like to get the indices of arr that contain an array whose max value is greater or equal than .9. So, for this case, the result would be [1] because the array with index 1 [.9, .1] is the only one whose max value is >= 9.
I tried:
>>> condition = np.max(arr) >= .9
>>> arr[condition]
array([ 0.5, 0.5])
But, as you see, it yields the wrong answer.
I think you want np.where here. This function returns the indices of any values which meet a particular condition:
>>> np.where(arr >= 0.9)[0] # here we look at the whole 2D array
array([1])
(np.where(arr >= 0.9) returns a tuple of arrays of indices, one for each axis of the array. Your expected output implies that you only want the row indices (axis 0).)
If you want to take the maximum of each row first, you can use arr.max(axis=1):
>>> np.where(arr.max(axis=1) >= 0.9)[0] # here we look at the 1D array of row maximums
array([1])
In [18]: arr = numpy.array([[.5, .5], [.9, .1], [.8, .2]])
In [19]: numpy.argwhere(numpy.max(arr, 1) >= 0.9)
Out[19]: array([[1]])
The reason you are getting the wrong answer is because np.max(arr) gives you the max of the flattened array. You want np.max(arr, axis=1) or, better yet, arr.max(axis=1).
(arr.max(axis=1)>=.9).nonzero()
Use max along an axis to get the row max values, and then where to get the indexes of the biggest:
np.where(arr.max(axis=1)>=0.9)
Suppose I have defined a 3x3x3 numpy array with
x = numpy.arange(27).reshape((3, 3, 3))
Now, I can get an array containing the (0,1) element of each 3x3 subarray with x[:, 0, 1], which returns array([ 1, 10, 19]). What if I have a tuple (m,n) and want to retrieve the (m,n) element of each subarray(0,1) stored in a tuple?
For example, suppose that I have t = (0, 1). I tried x[:, t], but it doesn't have the right behaviour - it returns rows 0 and 1 of each subarray. The simplest solution I have found is
x.transpose()[tuple(reversed(t))].transpose()
but I am sure there must be a better way. Of course, in this case, I could do x[:, t[0], t[1]], but that can't be generalised to the case where I don't know how many dimensions x and t have.
you can create the index tuple first:
index = (numpy.s_[:],)+t
x[index]
HYRY solution is correct, but I have always found numpy's r_, c_ and s_ index tricks to be a bit strange looking. So here is the equivalent thing using a slice object:
x[(slice(None),) + t]
That single argument to slice is the stop position (i.e. None meaning all in the same way that x[:] is equivalent to x[None:None])