Summing values of numpy array based on indices in other array - python

Assume I have the following arrays:
N = 8
M = 4
a = np.zeros(M)
b = np.random.randint(M, size=N) # contains indices for a
c = np.random.rand(N) # contains random values
I want to sum the values of c according to the indices provided in b, and store them in a. Writing a loop for this is trivial:
for i, v in enumerate(b):
a[v] += c[i]
Since N can get quite big in my real-world problem I'd like to avoid using python loops, but I can't figure out how to write it as a numpy-statement. Can anyone help me out?
Ok, here some example values:
In [27]: b
Out[27]: array([0, 1, 2, 0, 2, 3, 1, 1])
In [28]: c
Out[28]:
array([ 0.15517108, 0.84717734, 0.86019899, 0.62413489, 0.24357903,
0.86015187, 0.85813481, 0.7071174 ])
In [30]: a
Out[30]: array([ 0.77930596, 2.41242955, 1.10377802, 0.86015187])

import numpy as np
N = 8
M = 4
b = np.array([0, 1, 2, 0, 2, 3, 1, 1])
c = np.array([ 0.15517108, 0.84717734, 0.86019899, 0.62413489, 0.24357903, 0.86015187, 0.85813481, 0.7071174 ])
a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)
returns
array([ 0.77930597, 2.41242955, 1.10377802, 0.86015187])

Related

Fancy indexing to matrix operations

Suppose:
A=np.array([1,2,0,-4])
B=np.array([1,1,1,1])
C=np.array([1,2,3,4])
With fancy indexing I can assign a scalar value to C wherever A > 0.
C[A > 0]= 1
But is there anyway to get something like C = B/A wherever A > 0 while preserving the original values of C for A <= 0 with fancy indexing ? If I try something like
C[A > 0] = B/A
I get an error like:
<input>:1: RuntimeWarning: divide by zero encountered in true_divide
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: NumPy boolean array indexing assignment cannot assign 4 input values to the 2 output values where the mask is true
I can get the result with a for loop or making copies of A & C where :
D = np.copy(A)
E = np.copy(C)
D[ D <= 0]= 1
E=B/A
E[A <=0] = C
or set C=Run(A,B) where
def Run(A,B):
C=np.zeros(A.shape[0],A.shape[1])
for i in range(len(A)):
if A[i] != O:
C[i] = A[i]/B[i]
else:
C[i] = C[i]
But i was just wondering if there was a more direct way to do it without adding so many steps if i am looping millions of times. Thanks.
You can index the operands: C[A > 0] = B[A > 0] / A[A > 0]. You might want to compute A > 0 once, and reuse it, e.g.
mask = A > 0
C[mask] = B[mask] / A[mask]
A more efficient alternative is to use the where parameter of np.divide or np.floor_divide. For example,
In [19]: A = np.array([1, 2, 0, -4])
In [20]: B = np.array([1, 1, 1, 1])
In [21]: C = np.array([1, 2, 3, 4])
In [22]: np.floor_divide(B, A, where=A > 0, out=C)
Out[22]: array([1, 0, 3, 4])
In [23]: C
Out[23]: array([1, 0, 3, 4])
I had to use floor_divide because all the arrays are integer arrays, and numpy.divide creates a floating point array, so that function will complain about the type mismatch if the out array is an integer array. If you want a floating point result, C should be an array of floating point values:
In [24]: C = np.array([1., 2., 3., 4.])
In [25]: np.divide(B, A, where=A > 0, out=C)
Out[25]: array([1. , 0.5, 3. , 4. ])
In [26]: C
Out[26]: array([1. , 0.5, 3. , 4. ])

Matrix of matrices in python

Hey so I'm working on this code for a material analysis. I have a matrix generated for each layer of the material and I want to save each of these matrices as their own element. The way I was doing this was by saving it to a dictionary. I then form one matrix by summing all the values of the dictionary. Now I do this for three different conditions which leaves me with 3 matrices: A, B, and D. I want to make a matrix of all of these so that it looks like:
| A B |
| B D |
However I can't get it to print properly as it always says matrix: then one of the matrices such as A. It prints the second matrix, B, on the third line where A ended instead of being next to A. I also need to perform future operations on this massive matrix so I'm wondering what the best way to go about that would be. This is a part of my code:
Qbars = {}
for i in plies:
Qbar11 = Q11 * math.cos(float(thetas[j]))**4 + Q22 *math.sin(float(thetas[j]))**4 + \
2 * (Q12 + 2 * Q66) * math.sin(float(thetas[j]))**2 * math.cos(float(thetas[j]))**2
Qbar22 = Q11 * math.sin(float(thetas[j]))**4 + Q22 *math.cos(float(thetas[j]))**4 + \
2 * (Q12 + 2 * Q66) * math.sin(float(thetas[j]))**2 * math.cos(float(thetas[j]))**2
Qbar12 = (Q11 + Q22 - 4 * Q66) * math.sin(float(thetas[j]))**2 * \
math.cos(float(thetas[j]))**2 + Q12 * (math.cos(float(thetas[j]))**4 + \
math.sin(float(thetas[j]))**4)
Qbar66 = (Q11 + Q22 - 2 * Q12 - 2 * Q66) * math.sin(float(thetas[j]))**2 * \
math.cos(float(thetas[j])) **2 + Q66 * (math.sin(float(thetas[j]))**4 + \
math.cos(float(thetas[j]))**4)
Qbar16 = (Q11 - Q12 - 2 * Q66) * math.cos(float(thetas[j]))**3 * \
math.sin(float(thetas[j])) - (Q22 - Q12 - 2 * Q66) * math.cos(float(thetas[j])) * \
math.sin(float(thetas[j]))**3
Qbar26 = (Q11 - Q12 - 2 * Q66) * math.cos(float(thetas[j])) * \
math.sin(float(thetas[j]))**3 - (Q22 - Q12 - 2 * Q66) * \
math.cos(float(thetas[j]))**3 * math.sin(float(thetas[j]))
Qbar = np.matrix ([[Qbar11, Qbar12, Qbar16], [Qbar12, Qbar22, Qbar26], \
[Qbar16, Qbar26, Qbar66]])
Qbars[i] = Qbar
if len(thetas) == 1:
j = 0
else:
j = j + 1
k=0
Alist = {}
for i in plies:
Alist[i] = Qbars[i].dot(h[k])
if len(h) == 1:
k = 0
else:
k = k + 1
A = sum(Alist.values())
ABD = ([A, B],[B, D])
print ABD
One of the next operations I intend to perform would be to multiply the matrix by a 6x1 array that would look like such:
| Nx | | A A A B B B |
| Ny | | A A A B B B |
| Nxy| | A A A B B B |
------ * ----------------
| Mx | | B B B D D D |
| My | | B B B D D D |
| Mxy| | B B B D D D |
What would be the best way to go about doing this?
EDIT: I made this shorter code to reproduce what I'm dealing with, I couldn't think of how to make it even smaller.
import os
import numpy as np
import math
os.system('cls')
ang = raw_input("ENTER 0 (SPACE) 45 ")
thetas = [int(i) for i in ang.split()]
x = 40
h = [3, 5]
y = [1,2]
j = 0
Qbars = {}
for i in y:
theta = [thetas[j] * math.pi / 180]
Q = math.sin(float(thetas[j]))
Qbar = np.matrix ([[Q, Q, Q], [Q, Q, Q], [Q, Q, Q]])
Qbars[i] = Qbar
if len(thetas) == 1:
j = 0
else:
j = j + 1
print Qbars
k=0
Alist = {}
for i in y:
Alist[i] = Qbars[i].dot(h[k])
if len(h) == 1:
k = 0
else:
k = k + 1
A = sum(Alist.values())
AAAA = ([A, A], [A, A])
print AAAA
test = raw_input("Press ENTER to close")
As others have noted, the matrix class is pretty much deprecated by now. They are more limited than ndarrays, with very little additional functionality. The main reason why people prefer to use numpy matrices is that linear algebra (in particular, matrix multiplication) works more naturally for matrices.
However, as far as I can tell you're using np.dot rather than the overloaded arithmetic operators of the matrix class to begin with, so you would not see any loss of functionality from using np.array instead. Furthermore, if you would switch to python 3.5 or newer, you could use the # matrix multiplication operator that would let you write things such as
Alist[i] = Qbars[i] # h[k]
In the following I'll use the ndarray class instead of the matrix class for the above reasons.
So, your question has two main parts: creating your block matrix and multiplying the result with a vector. I suggest using an up-to-date numpy version, since there's numpy.block introduced in version 1.13. This conveniently does exactly what you want it to do:
>>> import numpy as np
>>> A,B,C = (np.full((3,3),k) for k in range(3))
>>> A
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> B
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>>> C
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
>>> np.block([[A,B],[B,C]])
array([[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[1, 1, 1, 2, 2, 2],
[1, 1, 1, 2, 2, 2],
[1, 1, 1, 2, 2, 2]])
Similarly, you can concatenate your two 3-length vectors using np.concatenate or one of the stacking methods (these are available in older versions too).
Now, the problem is that you can't multiply a matrix of shape (6,1) with a matrix of shape (6,6), so the question is what you're really trying to do here. In case you want to multiply each element of your matrix with the corresponding row of your vector, you can just multiply your arrays (of class np.ndarray!) and make use of array broadcasting:
>>> Q = np.block([[A,B],[B,C]]) # (6,6)-shape array
>>> v = np.arange(6).reshape(-1,1) # (6,1)-shape array
>>> v * Q
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 1, 1, 1],
[ 0, 0, 0, 2, 2, 2],
[ 3, 3, 3, 6, 6, 6],
[ 4, 4, 4, 8, 8, 8],
[ 5, 5, 5, 10, 10, 10]])
The other option is that you want to do matrix-vector multiplication, but then either you have to transpose your vector (in order to multiply it with the matrix from the right) or swap the order of the matrix and the vector (multiplying the vector with the matrix from the left). Example for the former:
>>> v.T # Q # python 3.5 and up
array([[12, 12, 12, 27, 27, 27]])
>>> v.T.dot(Q)
array([[12, 12, 12, 27, 27, 27]])
Another benefit of arrays (rather than matrices) is that arrays can be multidimensional. Instead of putting numpy arrays inside a dict and summing them that way, you could define a 3d array (a collection of 2d arrays along a third axis), then you could sum along the third dimension. One huge benefit of numpy is its efficient memory need and performance, and these aspects are strongest if you use numpy objects and methods all through your code. Mixing native python objects (such as dicts, zips, loops) typically hinders performance.

Python - remove elements from array

I have an array called a and another array b. The array a is the main array where I store float data, and b is an array which contain some indexes belonging to a.
Example:
a = [1.3, 1.7, 18.4, 56.2, 82.2, 18.1, 81.9, 56.9, -274.45]
b = [0, 1, 2, 3, 4, 5, 6, 7]
In this example b contains indexes of a from 0 to 7.
What I'm trying to do in Python is to remove "duplicates", I mean to remove all indexes from b which have their similar value in a. For example, notice that there are pair 1.3 and 1.7. Also, there are 18.4 and 18.1 etc. I want to find all these values and to write -1 in all places in array b which has that value.
Output should be the following:
b = [0, -1, 2, 3, 4, -1, -1, -1]
I think it is obvious what I am trying to achieve. Here index 1 is replaced with -1 because in a it represents 1.7 which has "pair" 1.3. Also, last 3 indexes represents 18.1, 81.9 and 56.9 which also have their "pairs" before, so they are replaced with -1.
Of course, I have a parameter x which represents how "similar" values are. So, here x = 2 which mean that any 2 values which differ by 2 are similar.
What have I tried? I tried to use 2 nested for loops and a lot of unnecessary variables and my algorithm eats memory and performance. Is there an elegant np-ish way to achieve it?
Approach #1 : Here's a vectorized approach using broadcasting and a bit memory intensive -
x = 2 # threshold that decides similarity
a_b = a[b]
mask = np.triu(np.abs(a_b[:,None]-a_b)<x,1).any(0)
b[mask[:len(b)]] = -1
Sample run -
In [95]: a = np.array([1.3, 1.7, 18.4, 56.2, 82.2, 18.1, 81.9, 56.9, -274.45])
...: b = np.array([0, 1, 2, 3, 4, 5, 6, 7])
...:
# After code run ...
In [97]: b
Out[97]: array([ 0, -1, 2, 3, 4, -1, -1, -1])
Approach #2 : Less memory intensive approach
import pandas as pd
def set_mask(a,b,thresh):
a_b = a[b]
N = len(a_b)
sidx = a_b.argsort()
sorted_a_b = a_b[sidx]
mask0 = sorted_a_b[1:] - sorted_a_b[:-1] < thresh
id_arr = np.zeros(N, dtype=int)
id_arr[np.flatnonzero(~mask0)+1] = 1
ids = id_arr.cumsum()
d = np.column_stack(( ids, sidx))
df0 = pd.DataFrame(d, columns=(('ids','sidx')))
pp = df0['sidx'].groupby([ids]).min()
maskc = np.ones(N,dtype=bool)
maskc[pp.values] = 0
return maskc
Use this mask to replace the mask needed at the last step from previous approach.

Python: how to avoid loop?

I have a list of entries
l = [5, 3, 8, 12, 24]
and a matrix M
M:
12 34 5 8 7
0 24 12 3 1
I want to find the indeces of the matrix where appear the numbers in l. For the k-entry of l I want to save a random couple of indices i, j where M[i][j]==l[k]. I am doing the following
indI = []
indJ = []
for i in l:
tmp = np.where(M == i)
rd = randint(len(tmp))
indI.append(tmp[0][rd])
indJ.append(tmp[1][rd])
I would like to see if there is a way to avoid that loop
One way in which you should be able to significantly speed up your code is to avoid duplicate work:
tmp = np.where(M == i)
As this gives you a list of all locations in M where the value is equal to i, it must be searching through the entire matrix. So for each element in l, you are searching through the full matrix.
Instead of doing that, try indexing your matrix as a first step:
matrix_index = {}
for i in len(M):
for j in len(M[i]):
if M[i][j] not in matrix_index:
matrix_index[M[i][j]] = [(i,j)]
else:
matrix_index[M[i][j]].append((i,j))
Then for each value in l, instead of doing a costly search through the full matrix, you can just get it straight from your matrix index.
Note: I haven't with numpy very much, so I may have gotten the specific syntax incorrect. There may also be a more idiomatic way of doing this in numpy.
If both l and M are not large matrices like the following:
In: l0 = [5, 3, 8, 12, 34, 1, 12]
In: M0 = [[12, 34, 5, 8, 7],
In: [ 0, 24, 12, 3, 1]]
In: l = np.asarray(l)
In: M = np.asarray(M)
You can try this:
In: np.where(l[None, None, :] == M[:, :, None])
Out:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1]), <- i
array([0, 0, 1, 2, 3, 2, 2, 3, 4]), <- j
array([3, 6, 4, 0, 2, 3, 6, 1, 5])) <- k
The rows should be the i, j, k, respectively and read the column to get every (i, j, k) you need. For example, the 1st column [0, 0, 3] means M[0, 0] = l[3], and the 2nd column [0, 0, 6] says M[0, 0] = l[6], and vice versa. I think these are what you want.
However, the numpy trick can not be extended to very large matrices, such as 2M elements in l or 2500x2500 elements in M. They need quite a lot memory and very very long time to compute... if they are lucky not to crash for out of memory. :)
One solution that does not use the word for is
c = np.apply_along_axis(lambda row: np.random.choice(np.argwhere(row).ravel()), 1, M.ravel()[np.newaxis, :] == l[:, np.newaxis])
indI, indJ = c // M.shape[1], c % M.shape[1]
Note that while that solves the problem, M.ravel()[np.newaxis, :] == l[:, np.newaxis] will quickly produce MemoryErrors. A more pragmatic approach would be to get the indices of interest through something like
s = np.argwhere(M.ravel()[np.newaxis, :] == l[:, np.newaxis])
and then do the random choice post-processing by hand. This, however, probably does not yield any significant performance improvements over your search.
What makes it slow, though, is that you search through the entire matrix in every step of your loop; by pre-sorting the matrix (at a certain cost) gives you a straightforward way of making each individual search much faster:
In [312]: %paste
def direct_search(M, l):
indI = []
indJ = []
for i in l:
tmp = np.where(M == i)
rd = np.random.randint(len(tmp[0])) # Note the fix here
indI.append(tmp[0][rd])
indJ.append(tmp[1][rd])
return indI, indJ
def using_presorted(M, l):
a = np.argsort(M.ravel())
M_sorted = M.ravel()[a]
def find_indices(i):
s = np.searchsorted(M_sorted, i)
j = 0
while M_sorted[s + j] == i:
yield a[s + j]
j += 1
indices = [list(find_indices(i)) for i in l]
c = np.array([np.random.choice(i) for i in indices])
return c // M.shape[1], c % M.shape[1]
## -- End pasted text --
In [313]: M = np.random.randint(0, 1000000, (1000, 1000))
In [314]: l = np.random.choice(M.ravel(), 1000)
In [315]: %timeit direct_search(M, l)
1 loop, best of 3: 4.76 s per loop
In [316]: %timeit using_presorted(M, l)
1 loop, best of 3: 208 ms per loop
In [317]: indI, indJ = using_presorted(M, l) # Let us check that it actually works
In [318]: np.all(M[indI, indJ] == l)
Out[318]: True

Python version of ismember with 'rows' and index

The similar question has been asked, but none of the answers quite do what I need - some allow multidimensional searches (aka 'rows' option in matlab) but dont return the index. Some return the index but dont allow rows. My arrays are very large (1M x 2) and I have bee successful in making a loop that works, but obviously that is very slow. In matlab, the built-in ismember function takes about 10 seconds.
Here is what I am looking for:
a=np.array([[4, 6],[2, 6],[5, 2]])
b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
The exact matlab function that does the trick is:
[~,index] = ismember(a,b,'rows')
where
index = [6, 3, 9]
import numpy as np
def asvoid(arr):
"""
View the array as dtype np.void (bytes)
This views the last axis of ND-arrays as bytes so you can perform comparisons on
the entire row.
http://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05)
Warning: When using asvoid for comparison, note that float zeros may compare UNEQUALLY
>>> asvoid([-0.]) == asvoid([0.])
array([False], dtype=bool)
"""
arr = np.ascontiguousarray(arr)
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
def in1d_index(a, b):
voida, voidb = map(asvoid, (a, b))
return np.where(np.in1d(voidb, voida))[0]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
print(in1d_index(a, b))
prints
[2 5 8]
This would be equivalent to Matlab's [3, 6, 9], since Python uses 0-based indexing.
Some caveats:
The indices are returned in increasing order. They do not correspond
to the location of the items of a in b.
asvoid will work for integer dtypes, but be careful if using asvoid
on float dtypes, since asvoid([-0.]) == asvoid([0.]) returns
array([False]).
asvoid works best on contiguous arrays. If the arrays are not contiguous, the data will be copied to a contiguous array, which will slow down the performance.
Despite the caveats, one might choose to use in1d_index anyway for the sake of speed:
def ismember_rows(a, b):
# http://stackoverflow.com/a/22705773/190597 (ashg)
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
In [41]: a2 = np.tile(a,(2000,1))
In [42]: b2 = np.tile(b,(2000,1))
In [46]: %timeit in1d_index(a2, b2)
100 loops, best of 3: 8.49 ms per loop
In [47]: %timeit ismember_rows(a2, b2)
1 loops, best of 3: 5.55 s per loop
So in1d_index is ~650x faster (for arrays of length in the low thousands), but again note the comparison is not exactly apples-to-apples since in1d_index returns the indices in increasing order, while ismember_rows returns the indices in the order rows of a show up in b.
import numpy as np
def ismember_rows(a, b):
'''Equivalent of 'ismember' from Matlab
a.shape = (nRows_a, nCol)
b.shape = (nRows_b, nCol)
return the idx where b[idx] == a
'''
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
idx = ismember_rows(a, b)
print idx
print np.all(b[idx] == a)
print
array([5, 2, 8])
True
e...I used broadcasting
--------------------------[update]------------------------------
def ismember(a, b):
return np.flatnonzero(np.in1d(b[:,0], a[:,0]) & np.in1d(b[:,1], a[:,1]))
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
a2 = np.tile(a,(2000,1))
b2 = np.tile(b,(2000,1))
%timeit timeit in1d_index(a2, b2)
# 100 loops, best of 3: 8.74 ms per loop
%timeit ismember(a2, b2)
# 100 loops, best of 3: 8.5 ms per loop
np.all(in1d_index(a2, b2) == ismember(a2, b2))
# True
as what unutbu said, the indices are returned in increasing order
The function first turns multiple columns of elements into a single column array, then numpy.in1d can be used to find out the desire answer, please try the following code:
import numpy as np
def ismemberRow(A,B):
'''
This function is find which rows found in A can be also found in B,
The function first turns multiple columns of elements into a single column array, then numpy.in1d can be used
Input: m x n numpy array (A), and p x q array (B)
Output unique numpy array with length m, storing either True or False, True for rows can be found in both A and B
'''
sa = np.chararray((A.shape[0],1))
sa[:] = '-'
sb = np.chararray((B.shape[0],1))
sb[:] = '-'
ba = (A).astype(np.str)
sa2 = np.expand_dims(ba[:,0],axis=1) + sa + np.expand_dims(ba[:,1],axis=1)
na = A.shape[1] - 2
for i in range(0,na):
sa2 = sa2 + sa + np.expand_dims(ba[:,i+2],axis=1)
bb = (B).astype(np.str)
sb2 = np.expand_dims(bb[:,0],axis=1) + sb + np.expand_dims(bb[:,1],axis=1)
nb = B.shape[1] - 2
for i in range(0,nb):
sb2 = sb2 + sb + np.expand_dims(bb[:,i+2],axis=1)
return np.in1d(sa2,sb2)
A = np.array([[1, 3, 4],[2, 4, 3],[7, 4, 3],[1, 1, 1],[1, 3, 4],[5, 3, 4],[1, 1, 1],[2, 4, 3]])
B = np.array([[1, 3, 4],[1, 1, 1]])
d = ismemberRow(A,B)
print A[np.where(d)[0],:]
#results:
#[[1 3 4]
# [1 1 1]
# [1 3 4]
# [1 1 1]]
Here's a function based on libigl's igl::ismember_rows which closely mimics the behavior of Matlab's ismember(A,B,'rows'):
def ismember_rows(A,B, return_index=False):
"""
Return whether each row in A occurs as a row in B
Parameters
----------
A : #A by dim array
B : #B by dim array
return_index : {True,False}, optional.
Returns
-------
IA : #A 1D array, IA[i] == True if and only if
there exists j = LOCB[i] such that B[j,:] == A[i,:]
LOCB : #A 1D array of indices. LOCB[j] == -1 if IA[i] == False,
only returned if return_index=True
"""
IA = np.full(A.shape[0],False)
LOCB = np.full(A.shape[0],-1)
if len(A) == 0: return (IA,LOCB) if return_index else IA
if len(B) == 0: return (IA,LOCB) if return_index else IA
# Get rid of any duplicates
uA,uIuA = np.unique(A, axis=0, return_inverse=True)
uB,uIB = np.unique(B, axis=0, return_index=True)
# Sort both
sIA = np.lexsort(uA.T[::-1])
sA = uA[sIA,:]
sIB = np.lexsort(uB.T[::-1])
sB = uB[sIB,:]
#
uF = np.full(sA.shape[0],False)
uLOCB = np.full(sA.shape[0],-1)
def row_greater_than(a,b):
for c in range(sA.shape[1]):
if(sA[a,c] > sB[b,c]): return True
if(sA[a,c] < sB[b,c]): return False
return False
# loop over sA
bi = 0
past = False
for a in range(sA.shape[0]):
while not past and row_greater_than(a,bi):
bi+=1
past = bi>=sB.shape[0]
if not past and np.all(sA[a,:]==sB[bi,:]):
uF[sIA[a]] = True
uLOCB[sIA[a]] = uIB[sIB[bi]]
for a in range(A.shape[0]):
IA[a] = uF[uIuA[a]]
LOCB[a] = uLOCB[uIuA[a]]
return (IA,LOCB) if return_index else IA
For example,
a=np.array([[4, 6],[6,6],[2, 6],[5, 2]])
b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
(flag,index) = ismember_rows(a,b,return_index=True)
produces
>>> flag
array([ True, False, True, True])
>>> index
array([ 5, -1, 2, 8])
Update: Here's a faster version that makes better use of numpy.unique based on array_correspondence in gpytoolbox.
def ismember_rows(A,B,return_index=False):
"""
Return whether each row in A occurs as a row in B
Parameters
----------
A : #A by dim array
B : #B by dim array
return_index : {True,False}, optional.
Returns
-------
IA : #A 1D array, IA[i] == True if and only if
there exists j = LOCB[i] such that B[j,:] == A[i,:]
LOCB : #A 1D array of indices. LOCB[j] == -1 if IA[i] == False,
only returned if return_index=True
"""
if len(A) == 0 or len(B) == 0:
IA = np.full(A.shape[0],False)
LOCB = np.full(A.shape[0],-1)
return (IA,LOCB) if return_index else IA
uB,mapB = np.unique(B,axis=0, return_index=True)
uU,idx,inv = np.unique(np.vstack((uB,A)),axis=0,return_index=True, return_inverse=True)
imap = idx[inv[uB.shape[0]:]]
imap[imap>=uB.shape[0]] = -1
LOCB = np.where(imap<0, -1, mapB[imap])
IA = LOCB>=0
return (IA,LOCB) if return_index else IA
Seems to be a bit faster on my laptop.

Categories

Resources