Python: intersection indices numpy array - python

How can I get the indices of intersection points between two numpy arrays? I can get intersecting values with intersect1d:
import numpy as np
a = np.array(xrange(11))
b = np.array([2, 7, 10])
inter = np.intersect1d(a, b)
# inter == array([ 2, 7, 10])
But how can I get the indices into a of the values in inter?

You could use the boolean array produced by in1d to index an arange. Reversing a so that the indices are different from the values:
>>> a[::-1]
array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>> a = a[::-1]
intersect1d still returns the same values...
>>> numpy.intersect1d(a, b)
array([ 2, 7, 10])
But in1d returns a boolean array:
>>> numpy.in1d(a, b)
array([ True, False, False, True, False, False, False, False, True,
False, False], dtype=bool)
Which can be used to index a range:
>>> numpy.arange(a.shape[0])[numpy.in1d(a, b)]
array([0, 3, 8])
>>> indices = numpy.arange(a.shape[0])[numpy.in1d(a, b)]
>>> a[indices]
array([10, 7, 2])
To simplify the above, though, you could use nonzero -- this is probably the most correct approach, because it returns a tuple of uniform lists of X, Y... coordinates:
>>> numpy.nonzero(numpy.in1d(a, b))
(array([0, 3, 8]),)
Or, equivalently:
>>> numpy.in1d(a, b).nonzero()
(array([0, 3, 8]),)
The result can be used as an index to arrays of the same shape as a with no problems.
>>> a[numpy.nonzero(numpy.in1d(a, b))]
array([10, 7, 2])
But note that under many circumstances, it makes sense just to use the boolean array itself, rather than converting it into a set of non-boolean indices.
Finally, you can also pass the boolean array to argwhere, which produces a slightly differently-shaped result that's not as suitable for indexing, but might be useful for other purposes.
>>> numpy.argwhere(numpy.in1d(a, b))
array([[0],
[3],
[8]])

If you need to get unique values as given by intersect1d:
import numpy as np
a = np.array([range(11,21), range(11,21)]).reshape(20)
b = np.array([12, 17, 20])
print(np.intersect1d(a,b))
#unique values
inter = np.in1d(a, b)
print(a[inter])
#you can see these values are not unique
indices=np.array(range(len(a)))[inter]
#These are the non-unique indices
_,unique=np.unique(a[inter], return_index=True)
uniqueIndices=indices[unique]
#this grabs the unique indices
print(uniqueIndices)
print(a[uniqueIndices])
#now they are unique as you would get from np.intersect1d()
Output:
[12 17 20]
[12 17 20 12 17 20]
[1 6 9]
[12 17 20]

indices = np.argwhere(np.in1d(a,b))

For Python >= 3.5, there's another solution to do so
Other Solution
Let we go through this step by step.
Based on the original code from the question
import numpy as np
a = np.array(range(11))
b = np.array([2, 7, 10])
inter = np.intersect1d(a, b)
First, we create a numpy array with zeros
c = np.zeros(len(a))
print (c)
output
>>> [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Second, change array value of c using intersect index. Hence, we have
c[inter] = 1
print (c)
output
>>>[ 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]
The last step, use the characteristic of np.nonzero(), it will return exactly the index of the non-zero term you want.
inter_with_idx = np.nonzero(c)
print (inter_with_idx)
Final output
array([ 2, 7, 10])
Reference
[1] numpy.nonzero

As of numpy version 1.15.0 intersect1d has a return_indices option :
numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

Related

Assign zeros to minimum values in numpy 3d array

I have a numpy array of shape (100, 100, 20) (in python 3)
I want to find for each 'pixel' the 15 channels with minimum values, and make them zeros (meaning: make the array sparse, keep only the 5 highest values).
Example:
input: array = [[1,2,3], [7,6,9], [12,71,3]], num_channles_to_zero = 2
output: [[0,0,3], [0,0,9], [0,71,0]]
How can I do it?
what I have for now:
array = numpy.random.rand(100, 100, 20)
inds = numpy.argsort(array, axis=-1) # also shape (100, 100, 20)
I want to do something like
array[..., inds[..., :15]] = 0
but it doesn't give me what I want
np.argsort outputs indices suitable for the [...]_along_axis functions of numpy. This includes np.put_along_axis:
import numpy as np
array = np.random.rand(100, 100, 20)
print(array[0,0])
#[0.44116124 0.94656705 0.20833932 0.29239585 0.33001399 0.82396784
# 0.35841905 0.20670957 0.41473762 0.01568006 0.1435386 0.75231818
# 0.5532527 0.69366173 0.17247832 0.28939985 0.95098187 0.63648877
# 0.90629116 0.35841627]
inds = np.argsort(array, axis=-1)
np.put_along_axis(array, inds[..., :15], 0, axis=-1)
print(array[0,0])
#[0. 0.94656705 0. 0. 0. 0.82396784
# 0. 0. 0. 0. 0. 0.75231818
# 0. 0. 0. 0. 0.95098187 0.
# 0.90629116 0. ]
As it mentioned in the numpy documentation
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
>>>x = np.array([[1, 2], [3, 4], [5, 6]])
>>>x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
So, for your example:
a = np.array([[1,2,3], [7,6,9], [12,71,3]])
amax = a.argmax(axis=-1)
a[np.arange(a.shape[0]), amax] = 0
a
array([[ 1, 2, 0],
[ 7, 6, 0],
[12, 0, 3]])

check for identical rows in different numpy arrays

how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?
Given datas:
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
Result step 1:
c = np.array([True, True,False,True])
Result final:
a = a[c]
So how do I get the array c ????
P.S.: In this example the arrays a and b are sorted, please give also information if in your solution it is important that the arrays are sorted
Here's a vectorised solution:
res = (a[:, None] == b).all(-1).any(-1)
print(res)
array([ True, True, False, True])
Note that a[:, None] == b compares each row of a with b element-wise. We then use all + any to deduce if there are any rows which are all True for each sub-array:
print(a[:, None] == b)
[[[ True True]
[False True]
[False False]]
[[False True]
[ True True]
[False False]]
[[False False]
[False False]
[False False]]
[[False False]
[False False]
[ True True]]]
Approach #1
We could use a view based vectorized solution -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
A,B = view1D(a,b)
out = np.isin(A,B)
Sample run -
In [8]: a
Out[8]:
array([[1, 0],
[2, 0],
[3, 1],
[4, 2]])
In [9]: b
Out[9]:
array([[1, 0],
[2, 0],
[4, 2]])
In [10]: A,B = view1D(a,b)
In [11]: np.isin(A,B)
Out[11]: array([ True, True, False, True])
Approach #2
Alternatively for the case when all rows in b are in a and rows are lexicographically sorted, using the same views, but with searchsorted -
out = np.zeros(len(A), dtype=bool)
out[np.searchsorted(A,B)] = 1
If the rows are not necessarily lexicographically sorted -
sidx = A.argsort()
out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1
you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on
import numpy as np
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)
You can do it as a list comp via:
c = np.array([row in b for row in a])
though this approach will be slower than a pure numpy approach (if it exists).
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
i = 0
j = 0
result = []
We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind:
while i < len(a) and j < len(b):
if tuple(a[i])== tuple(b[j]):
result.append(True)
i += 1
j += 1 # get rid of this depending on how you want to handle duplicates
elif tuple(a[i]) > tuple(b[j]):
j += 1
else:
result.append(False)
i += 1
Pad with False if it ends early.
if len(result) < len(a):
result.extend([False] * (len(a) - len(result)))
print(result) # [True, True, False, True]
This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)
You can use scipy's cdist which has a few advantages:
from scipy.spatial.distance import cdist
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = cdist(a, b)==0
print(c.any(axis=1))
[ True True False True]
print(a[c.any(axis=1)])
[[1 0]
[2 0]
[4 2]]
Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:
c = cdist(a, b, lambda u, v: (u==v).all())
print(c)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]]
And now you can find which index matches. Which will also indicate if there are multiple matches.
# Array with multiple instances
a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])
c2 = cdist(a2, b, lambda u, v: (u==v).all())
print(c2)
idx = np.where(c2==1)
print(idx)
print(idx[0][idx[1]==2])
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0. 0. 0.]
[0. 0. 1.]]
(array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
[3 5]
The recommended answer is good, but will struggle when dealing with arrays with a large number of rows. An alternative is:
baseval = np.max([a.max(), b.max()]) + 1
a[:,1] = a[:,1] * baseval
b[:,1] = b[:,1] * baseval
c = np.isin(np.sum(a, axis=1), np.sum(b, axis=1))
This uses the maximum value contained in either array plus 1 as a numeric base and treats the columns as baseval^0 and baseval^1 values. This ensures that the sum of the columns are unique for each possible pair of values. If the order of the columns is not important then both input arrays can be sorted column-wise using np.sort(a,axis=1) beforehand.
This can be extended to arrays with more columns using:
baseval = np.max([a.max(), b.max()]) + 1
n_cols = a.shape[1]
a = a * baseval ** np.array(range(n_cols))
b = b * baseval ** np.array(range(n_cols))
c = np.isin(np.sum(a, axis=1), np.sum(b, axis=1))
Overflow can occur when baseval ** (n_cols+1) > 9223372036854775807 if using int64. This can be avoided by setting the numpy arrays to use python integers using dtype=object.

Get indices of top N values in 2D numpy ndarray or numpy matrix

I have an array of N-dimensional vectors.
data = np.array([[5, 6, 1], [2, 0, 8], [4, 9, 3]])
In [1]: data
Out[1]:
array([[5, 6, 1],
[2, 0, 8],
[4, 9, 3]])
I'm using sklearn's pairwise_distances function to compute a matrix of distance values. Note that this matrix is symmetric about the diagonal.
dists = pairwise_distances(data)
In [2]: dists
Out[2]:
array([[ 0. , 9.69535971, 3.74165739],
[ 9.69535971, 0. , 10.48808848],
[ 3.74165739, 10.48808848, 0. ]])
I need the indices corresponding to the top N values in this matrix dists, because these indices will correspond the pairwise indices in data that represent vectors with the greatest distances between them.
I have tried doing np.argmax(np.max(distances, axis=1)) to get the index of the max value in each row, and np.argmax(np.max(distances, axis=0)) to get the index of the max value in each column, but note that:
In [3]: np.argmax(np.max(dists, axis=1))
Out[3]: 1
In [4]: np.argmax(np.max(dists, axis=0))
Out[4]: 1
and:
In [5]: dists[1, 1]
Out[5]: 0.0
Because the matrix is symmetric about the diagonal, and because argmax returns the first index it finds with the max value, I end up with the cell in the diagonal in the row and column matching where the max values are stored, instead of the row and column of the top values themselves.
At this point I'm sure I could write some more code to find the values I'm looking for, but surely there is an easier way to do what I'm trying to do. So I have two questions that are more or less equivalent:
How can I find the indices corresponding to the top N values in a matrix, or , how can I find the vectors with the top N pairwise distances from an array of vectors?
I'd ravel, argsort, and then unravel. I'm not claiming this is the best way, only that it's the first way that occurred to me, and I'll probably delete it in shame after someone posts something more obvious. :-)
That said (choosing the top 2 values, arbitrarily):
In [73]: dists = sklearn.metrics.pairwise_distances(data)
In [74]: dists[np.tril_indices_from(dists, -1)] = 0
In [75]: dists
Out[75]:
array([[ 0. , 9.69535971, 3.74165739],
[ 0. , 0. , 10.48808848],
[ 0. , 0. , 0. ]])
In [76]: ii = np.unravel_index(np.argsort(dists.ravel())[-2:], dists.shape)
In [77]: ii
Out[77]: (array([0, 1]), array([1, 2]))
In [78]: dists[ii]
Out[78]: array([ 9.69535971, 10.48808848])
As a slight improvement over the otherwise very good answer by DSM, instead of using np.argsort(), it is more efficient to use np.argpartition() if the order of the N greatest is of no consequence.
Partitioning an array arr with index i rearranges the elements such that the element at index i is the ith greatest, while those on the left are greater and on the right are lesser. The partitions on the left and right are not necessarily sorted. This has the advantage that it runs in linear time.

Python - replace masked data in arrays

I would like to replace by zero value all my masked values in 2D array.
I saw with np.copyto it was apparently possible to do that as :
test=np.copyto(array, 0, where = mask)
But i have an error message...'module' object has no attribute 'copyto'. Is there an equivalent way to do that?
Try numpy.ma.filled()
I think this is exactly what you need
In [29]: a
Out[29]: array([ 1, 0, 25, 0, 1, 4, 0, 2, 3, 0])
In [30]: am = n.ma.MaskedArray(n.ma.log(a),fill_value=0)
In [31]: am
Out[31]:
masked_array(data = [0.0 -- 3.2188758248682006 -- 0.0 1.3862943611198906 -- 0.6931471805599453 1.0986122886681098 --],
mask = [False True False True False False True False False True],
fill_value = 0.0)
In [32]: am.filled()
Out[32]:
array([ 0. , 0. , 3.21887582, 0. , 0. ,
1.38629436, 0. , 0.69314718, 1.09861229, 0. ])
test = np.copyto(array, 0, where=mask) is equivalent to:
array = np.where(mask, 0, array)
test = None
(I'm not sure why you would want to assign a value to the return value of np.copyto; it always returns None if no Exception is raised.)
Why not use array[mask] = 0?
Indeed, that would work (and has nicer syntax) if mask is a boolean array with the same shape as array. If mask doesn't have the same shape then array[mask] = 0 and np.copyto(array, 0, where=mask) may behave differently:
np.copyto (is documented to) and np.where (appears to) broadcast the shape of the mask to match array.
In contrast, array[mask] = 0 does not broadcast mask. This leads to a big difference in behavior when the mask does not have the same shape as array:
In [60]: array = np.arange(12).reshape(3,4)
In [61]: mask = np.array([True, False, False, False], dtype=bool)
In [62]: np.where(mask, 0, array)
Out[62]:
array([[ 0, 1, 2, 3],
[ 0, 5, 6, 7],
[ 0, 9, 10, 11]])
In [63]: array[mask] = 0
In [64]: array
Out[64]:
array([[ 0, 0, 0, 0],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
When array is 2-dimensional and mask is a 1-dimensional boolean array,
array[mask] is selecting rows of array (where mask is True) and
array[mask] = 0 sets those rows to zero.
Surprisingly, array[mask] does not raise an IndexError even though the mask has 4 elements and array only has 3 rows. No IndexError is raised when the fourth value is False, but an IndexError is raised if the fourth value is True:
In [91]: array[np.array([True, False, False, False])]
Out[91]: array([[0, 1, 2, 3]])
In [92]: array[np.array([True, False, False, True])]
IndexError: index 3 is out of bounds for axis 0 with size 3

Numpy Arrays: Slice y-values array based on threshold, then slice the x-values array correspondingly

Very quick question, can't find an answer with these keywords. What is a better way of doing the following?
t = linspace(0,1000,300)
x0 = generic_function(t)
x1 = x0[x0>0.8]
t1 = t[t>t[len(x0)-len(x1)-1]]
The operation I'm using #t1 strikes me as very un-pythonic and inefficient. Any pointers?
IIUC, you can simply reuse the cut array. For example:
>>> from numpy import arange, sin
>>> t = arange(5)
>>> t
array([0, 1, 2, 3, 4])
>>> y = sin(t)
>>> y
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
As you've already done, you can make a bool array:
>>> y > 0.8
array([False, True, True, False, False], dtype=bool)
and then you can use this to filter both t and y:
>>> t[y > 0.8]
array([1, 2])
>>> y[y > 0.8]
array([ 0.84147098, 0.90929743])
No use of len or assumptions about monotonicity involved.

Categories

Resources