Query an array where two other arrays align

Query an array where two other arrays align - python

I have 3 arrays, x, y, and q. Arrays x and y have the same length, q is a query array. Assume all values in x and q are unique. For each value of q, I would like to find the index of the corresponding value in x. I would then like to query that index in y. If a value from q does not appear in x, I would like to return np.nan.
As a concrete example, consider the following arrays:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
q = np.array([2, 0])
Since only the value 2 occurs in x, the correct return value would be:
out = np.array([5, np.nan])
With for loops, this can be done like so:
out = []
for i in range(len(q)):
for j in range(len(x)):
if np.allclose(q[i], x[j]):
out.append(y[j])
break
else:
out.append(np.nan)
output = np.array(out)
Obviously this is quite slow. Is there a simpler way to do this with numpy builtins like np.argwhere? Or would it be easier to use pandas?

Numpy broadcasting should work.
# a mask that flags any matches
m = q == x[:, None]
# replace any value in q without any match in x by np.nan
res = np.where(m.any(0), y[:, None] * m, np.nan).sum(0)
res
# array([ 5., nan])
I should note that this only works if x has no duplicates.
Because it relies on building a len(x) x len(q) array, if q is large, the above solution will run into memory issues. Another pandas solution will work much more efficiently in that case:
# map q to y via x
res = pd.Series(q).map(pd.Series(y, index=x)).values
If x and q are 2D, it's better to convert the Series.map() solution into a DataFrame.merge() one:
res = pd.DataFrame(q).merge(pd.DataFrame(x).assign(y=y), on=[0,1], how='left')['y'].values
Numpy broadcasting will blow up (will require 3D array) and will not be efficient for large arrays. Numba might do well though.

I think you could solve this in one line but using one for, and some broadcasting:
out = [y[bl].item() if bl.any() else None for bl in x[None,:]==q[:,None] ]
seems to me an elegant solution but a little confusing to read. I will go part by part.
x[None,:]==q[:,None] compares every value in q with every in x and returns (len(q),len(x) array of booleans (in this case will be [[False,True,False], [False,False,False]]
you can index y with a boolean array with same length len(y). so you could call y[ [False,True,False] ] to get the value of y[1].
If the bool array contains all false then you have to put a None so that's why to use the if-else

Here is how to use np.argwhere too. Use a more comfortable one, Pandas or numpy.
out_idx = [y[np.argwhere(x==value).reshape(-1)] for value in q]
out = [x[0] if len(x) else np.nan for x in out_idx]

Here's a way to do what your question asks:
query_results = pd.DataFrame(index=q).join(pd.DataFrame({'y':y}, index=x)).T.to_numpy()[0]
Output:
[ 5. nan]

If the performance is the main aim of this question, you can accelerate your looping code with numba library and jitting which will be very fast:
x = np.random.permutation(2000)[:1100]
y = np.random.permutation(2000)[:1100]
q = np.random.permutation(3000)[:500]
print((q > 2000).sum())
#nb.njit
def numba_(x, y, q):
out = []
for i in range(len(q)):
for j in range(len(x)):
if q[i] == x[j]:
out.append(y[j])
break
else:
out.append(np.nan)
return np.array(out)
or in parallel mode:
#nb.njit(parallel=True)
def numba_p(x, y, q):
out = np.empty(q.shape[0])
out.fill(np.nan)
for i in nb.prange(q.shape[0]):
for j in range(x.shape[0]):
if q[i] == x[j]:
out[i] = y[j]
break
return out
On large arrays it was much faster than not a robot answer (np.where) and constantstranger answer and near the same for not a robot answer (Pandas):
100 loops, best of 5: 4.4 ms per loop <-- not a robot (np.where)
100 loops, best of 5: 337 µs per loop <-- not a robot (Pandas)
100 loops, best of 5: 350 µs per loop <-- numba
100 loops, best of 5: 341 µs per loop <-- numba_p
100 loops, best of 5: 2.18 ms per loop <-- constantstranger (Pandas)
Note: np.where will be improved much in terms of performance in the new release, which can help the not a robot answer beat the constantstranger answer on larger arrays.
Update: not a robot answer (Pandas) was much faster (the fastest) on my new test on much larger arrays.

Related

Run two nested for loops in parallel to create matrix

I've written a method that takes in an integer "n" and creates a square matrix where the values of each element are dictated by their respective i,j indices.
When I build a small matrix 30x30 it works just fine, but when I try to do something larger like 1000x1000 it takes very long. Is there any way that I can speed it up with multiprocessing?
def createMatrix(n):
matrix = []
for j in range(1,n+1):
row = []
for i in range(1,n+1):
value = 1/(i+j-1)
row.append(value)
matrix.append(row)
return np.array(matrix)

Parallelizing two computation-bound for loops in Python is not trivial because of GIL. The good news is that your case is perfectly vectorizeable:
def createMatrix(n):
return 1 / (np.arange(n)[None, :] + np.arange(n)[:, None] + 1)
Explanation:
essentially, your formula for the matrix is X[row][column] = 1/(row+column-1), where rows and columns are 1-based
np.arange(n) creates a range that can be used for rows or columns
[None, :] and [:, None] turn it into a 2d array, 1 x n or n x 1
numpy then broadcasts dimensions, replicating row and column indexes to match dimensions - thus, implicitly tiling both into n x n when added
since both ranges are 0-based, using +1 instead of -1
As a rule of thumb, it is almost never a good idea to use for loops on numpy arrays. A vectorized approach (i.e. matrix form computations) is orders of magnitude faster.

It's not a good idea to use fors to fill a list then convert it to a matrix. the operation that you have can be vectorized with numpy from scratch. if you think that given the i,j, M(i,j) = 1/(j+i-1) considering that both indices starts at 1.
Here's my proposal :
def createMatrix2(n):
arr =np.arange(1,n+1)
xx,yy = np.meshgrid(arr,arr)
matrix = 1/(xx+yy-1)
return matrix
looking at Marat answer, I think his/her it's better, so tested the 3 methods:
EDIT: added wwii method as createMatrix4 (correcting the errors):
import numpy as np
from time import time
def createMatrix1(n):
matrix = []
for j in range(1,n+1):
row = []
for i in range(1,n+1):
value = 1/(i+j-1)
row.append(value)
matrix.append(row)
return np.array(matrix)
def createMatrix2(n):
arr =np.arange(1,n+1)
xx,yy = np.meshgrid(arr,arr)
matrix = 1/(xx+yy-1)
return matrix
def createMatrix3(n):
"""Marat's proposed matrix"""
return 1 / (1 + np.arange(n)[None, :] + np.arange(n)[:, None])
def createMatrix4(n):
""" wwii method"""
i,j = np.ogrid[1:n,1:n]
return 1/(i+j-1)
#test all the three methods
n = 10000
t1 = time()
m1 = createMatrix1(n)
t2 = time()
m2 = createMatrix2(n)
t3 = time()
m3 = createMatrix3(n)
t4 = time()
m4 = createMatrix4(n)
t5 = time()
print(np.allclose(m1,m2))
print(np.allclose(m1,m3))
print(np.allclose(m1,m4))
print("Matrix 1 (OP): ",t2-t1)
print("Matrix 2: (mine)",t3-t2)
print("Matrix 3: (Marat)",t4-t3)
print("Matrix 4: (wwii)",t5-t4)
# the output is:
#True
#True
#True
#Matrix 1 (OP): 18.4886577129364
#Matrix 2: (mine) 1.005324363708496
#Matrix 3: (Marat) 0.43033909797668457
#Matrix 4: (wwii) 0.5138359069824219
So Marat's solution is faster. As general comments:
Try to avoid fors loops
Think your problem as operation with indices and dessing operations with numpy arrays directly.
For last, given Marat's answer I thought my proposal is a easier to read, and understand. But it's just a subjective view

Your code can be written in another style, accelerated by numba library in a parallel no python mode:
import numba as nb
#nb.njit("float64[:, ::1](int64)", parallel=True, fastmath=True)
def createMatrix(n):
matrix = np.empty((n, n)) # np.zeros is slower than np.empty
for j in nb.prange(1, n + 1):
for i in range(1, n + 1):
matrix[j - 1, i - 1] = 1 / (i + j - 1)
return matrix
This solution will be faster than the Marat answer above 3 times.
Benchmarks: (temporary link to colab)
n = 1000
1000 loops, best of 5: 3.52 ms per loop # Marat
1000 loops, best of 5: 1.5 ms per loop # numba accelerated with np.zeros
1000 loops, best of 5: 1.05 ms per loop # numba accelerated with np.empty
n = 3000
1000 loops, best of 5: 39.5 ms per loop
1000 loops, best of 5: 19.3 ms per loop
1000 loops, best of 5: 8.91 ms per loop
n = 5000
1000 loops, best of 5: 109 ms per loop
1000 loops, best of 5: 53.5 ms per loop
1000 loops, best of 5: 24.8 ms per loop

how to compare entries in numpy array with each other efficiently?

I have a numpy array embed_vec of length tot_vec in which each entry is a 3d vector:
[[ 0.52483319 0.78015841 0.71117216]
[ 0.53041481 0.79462171 0.67234534]
[ 0.53645428 0.80896727 0.63119403]
...,
[ 0.72283509 0.40070804 0.15220522]
[ 0.71277758 0.38498613 0.16141834]
[ 0.70221445 0.36918032 0.17370776]]
For each of the elements in this array, I want to find out the number of other entries which are "close" to that entry. By close, I mean that the distance between two vectors is less than a specified value R. For this, I must compare all the possible pairs in this array with each other and then find out the number of close vectors for each of the vectors in the array. So I am doing this:
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
However, this is extremely inefficient because I have two nested python loops and for larger array sizes, this takes forever. If this were in C++ or Fortran, it wouldn't have been a great issue. My question is, can one achieve the same thing using numpy efficiently using some vectorization method? As a side note, I don't mind a solution using Pandas also.

Approach #1 : Vectorized approach -
def vectorized_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
r,c = np.triu_indices(tot_vec,1)
subs = embed_vec[r] - embed_vec[c]
dists = np.einsum('ij,ij->i',subs,subs)
return np.bincount(r,dists<R**2,minlength=tot_vec)
Approach #2 : With less loop complexity (for very large arrays) -
def loopy_less_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
Rsq = R**2
out = np.zeros(tot_vec,dtype=int)
for i in range(tot_vec):
subs = embed_vec[i] - embed_vec[i+1:tot_vec]
dists = np.einsum('ij,ij->i',subs,subs)
out[i] = np.count_nonzero(dists < Rsq)
return out
Benchmarking
Original approach -
def loopy_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
return p
Timings -
In [76]: # Sample random array
...: embed_vec = np.random.rand(3000,3)
...: R = 0.5
...:
In [77]: %timeit loopy_app(embed_vec, R)
1 loops, best of 3: 50.5 s per loop
In [78]: %timeit loopy_less_app(embed_vec, R)
10 loops, best of 3: 143 ms per loop
350x+ speedup there!
Going with much bigger array with the proposed loopy_less_app -
In [81]: # Sample random array
...: embed_vec = np.random.rand(20000,3)
...: R = 0.5
...:
In [82]: %timeit loopy_less_app(embed_vec, R)
1 loops, best of 3: 4.47 s per loop

I am intrigued by that question and attempted to solve it efficintly using scipy's cKDTree. However, this approach may run out of memory because internally a list of all pairs with distance <= R is maintained. If your R and tot_vec are small enough it will work:
import numpy as np
from scipy.spatial import cKDTree as KDTree
tot_vec = 60000
embed_vec = np.random.randn(tot_vec, 3)
R = 0.1
tree = KDTree(embed_vec, leafsize=100)
p = np.zeros(tot_vec)
for pair in tree.query_pairs(R):
p[pair[0]] += 1
p[pair[1]] += 1
In case memory is an issue, with some effort it is possible to rewrite query_pairs as a generator function in Python at the cost of C performance.

first broadcast the difference:
disp_vecs=tot_vec[:,None,:]-tot_vec[None,:,:]
Now, depending on how big your dataset is, you may want to do a fist pass without all the math. If the distance is less than r, all the components should be less than r
first_mask=np.max(disp_vec, axis=-1)<r
Then do the actual calculation
disps=np.linlg.norm(disp_vec[first_mask],axis=-1)
second_mask=disps<r
Now reassign
disps=disps[second_mask]
first_mask[first_mask]=second_mask
disps are now the good values, and first_mask is a boolean mask of where they go. You can process from there.

xor matrix multiplication for AES mix column stage

Hi I'm writing program for AES mix column stage. Here I have to multiply two matrices of (4,4) shape. The only difference is that while multiplying two matrices I have to take 'xor' instead of where I have to add. e.g
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.dot(a,b) # this gives [[(1*5+2*7),(1*6+2*8)][(3*5+4*7),(3*6+4*8)]]
# but I want [[((1*5)^(2*7)),((1*6)^(2*8))][((3*5)^(4*7)),((3*6)^(4*8))]]
Here's the solution with loops
result = [[0,0,0,0],
[0,0,0,0],
[0,0,0,0],
[0,0,0,0]]
# iterate through rows of X
for i in range(len(X)):
# iterate through columns of Y
for j in range(len(Y[0])):
# iterate through rows of Y
for k in range(len(Y)):
result[i][j] = result[i][j] ^ (X[i][k] * Y[k][j])
How to achieve that without using loops?

xor_ab=np.bitwise_xor.reduce(a[...,None]*b,axis=1)
For explanation, consider a rectangular problem for easier identification:
a=np.arange(12).reshape(4,3).astype(object)
b=np.arange(12).reshape(3,4).astype(object)
object is to provide the python int arbitrary precision for AES.
products are obtained by broadcasting,
c=a[...,None]*b # dims : (4,3,1) * ((1),3,4) -> (4,3,4) , c_ijk =a_ij*b_jk
The dot product it then obtained by :
dot_ab=c.sum(axis=1) # ->(4,4)
In [734]: (dot_ab==a.dot(b)).all()
Out[734]: True
Then change to the equivalent xor function :
xor_ab=np.bitwise_xor.reduce(a[...,None]*b,axis=1)
As an alternative, you can interpret your loops with numba (0.23):
from numba import jit
#jit(nopython=True)
def xor(X,Y):
result=np.zeros((4,4),np.uint64)
for i in range(len(X)):
# iterate through columns of Y
for j in range(Y.shape[1]):
# iterate through rows of Y
for k in range(len(Y)):
result[i,j] = result[i,j] ^ (X[i,k] * Y[k,j])
return result
for a impressive efficiency gain,due to optimal memory usage.
But you are limited to 32 bits for a and b:
In [790]: %timeit xor(a,b)
1000000 loops, best of 3: 580 ns per loop
In [791]: %timeit xor_ab=np.bitwise_xor.reduce(a[...,None]*b,axis=1)
100000 loops, best of 3: 13.2 µs per loop
In [792] (xor(a,b)==np.bitwise_xor.reduce(a[...,None]*b,axis=1)).all()
Out[792]: True

Python: efficient way to match 2 different length arrays and find index in larger array

I have 2 arrays: x and bigx. They span the same range, but bigx has many more points.
e.g.
x = np.linspace(0,10,100)
bigx = np.linspace(0,10,1000)
I want to find the indices in bigx where x and bigx match to 2 significant figures. I need to do this extremely quickly as I need the indices for each step of an integral.
Using numpy.where is very slow:
index_bigx = [np.where(np.around(bigx,2) == i) for i in np.around(x,2)]
Using numpy.in1d is ~30x faster
index_bigx = np.where(np.in1d(np.around(bigx), np.around(x,2) == True)
I also tried using zip and enumerate as I know that's supposed be faster but it returns empty:
>>> index_bigx = [i for i,(v,myv) in enumerate(zip(np.around(bigx,2), np.around(x,2))) if myv == v]
>>> print index_bigx
[]
I think I must have muddled things here and I want to optimise it as much as possible. Any suggestions?

Since bigx is always evenly spaced, it's quite straightforward to just directly compute the indices:
start = bigx[0]
step = bigx[1] - bigx[0]
indices = ((x - start)/step).round().astype(int)
Linear time, no searching necessary.

Since we are mapping x to bigx which has its elemments equidistant, you can use a binning operation with np.searchsorted to simulate the index finding operation using its 'left' option. Here's the implementation -
out = np.searchsorted(np.around(bigx,2), np.around(x,2),side='left')
Runtime tests
In [879]: import numpy as np
...:
...: xlen = 10000
...: bigxlen = 70000
...: bigx = 100*np.linspace(0,1,bigxlen)
...: x = bigx[np.random.permutation(bigxlen)[:xlen]]
...:
In [880]: %timeit np.where(np.in1d(np.around(bigx,2), np.around(x,2)))
...: %timeit np.searchsorted(np.around(bigx,2), np.around(x,2),side='left')
...:
100 loops, best of 3: 4.1 ms per loop
1000 loops, best of 3: 1.81 ms per loop

If you want just the elements, this should work:
np.intersect1d(np.around(bigx,2), np.around(x,2))
If you want the indices, try this:
around_x = set(np.around(x,2))
index_bigx = [i for i,b in enumerate(np.around(bigx,2)) if b in around_x]
Note: these were not tested.

Filling a list faster

I have a small block of code which I use to fill a list with integers. I need to improve its performance, perhaps translating the whole thing into numpy arrays, but I'm not sure how.
Here's the MWE:
import numpy as np
# List filled with integers.
a = np.random.randint(0,100,1000)
N = 10
b = [[] for _ in range(N-1)]
for indx,integ in enumerate(a):
if 0<elem<N:
b[integ-1].append(indx)
This is what it does:
for every integer (integ) in a
see if it is located between a given range (0,N)
if it is, store its index in a sub-list of b where the index of said sub-list is the original integer minus 1 (integ-1)
This bit of code runs pretty fast but my actual code uses much larger lists, hence the need to improve its performance.

Here's one way of doing it:
mask = (a > 0) & (a < N)
elements = a[mask]
indicies = np.arange(a.size)[mask]
b = [indicies[elements == i] for i in range(1, N)]
If we time the two:
import numpy as np
a = np.random.randint(0,100,1000)
N = 10
def original(a, N):
b = [[] for _ in range(N-1)]
for indx,elem in enumerate(a):
if 0<elem<N:
b[elem-1].append(indx)
return b
def new(a, N):
mask = (a > 0) & (a < N)
elements = a[mask]
indicies = np.arange(a.size)[mask]
return [indicies[elements == i] for i in range(1, N)]
The "new" way is considerably (~20x) faster:
In [5]: %timeit original(a, N)
100 loops, best of 3: 1.21 ms per loop
In [6]: %timeit new(a, N)
10000 loops, best of 3: 57 us per loop
And the results are identical:
In [7]: new_results = new(a, N)
In [8]: old_results = original(a, N)
In [9]: for x, y in zip(new_results, old_results):
....: assert np.allclose(x, y)
....:
In [10]:
The "new" vectorized version also scales much better to longer sequences. If we use a million-item-long sequence for a, the original solution takes slightly over 1 second, while the new version takes only 17 milliseconds (a ~70x speedup).

Try this solution! The first half I shamelessly stole from Joe's answer, but after that it uses sorting and binary search, which scales better with N.
def new(a, N):
mask = (a > 0) & (a < N)
elements = a[mask]
indices = np.arange(a.size)[mask]
sorting_idx = np.argsort(elements, kind='mergesort')
ind_sorted = indices[sorting_idx]
x = np.searchsorted(elements, range(N), side='right', sorter=sorting_idx)
return [ind_sorted[x[i]:x[i+1]] for i in range(N-1)]
You could put x = x.tolist() in there for an additional albeit small speed improvement (NB: if you do an a = a.tolist() in your original code, you do get a significant speedup). Also, I used 'mergesort' which is a stable sort but if you don't need the final result sorted, you can get away with a faster sorting algorithm.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Query an array where two other arrays align - python

Here is how to use np.argwhere too. Use a more comfortable one, Pandas or numpy. out_idx = [y[np.argwhere(x==value).reshape(-1)] for value in q] out = [x[0] if len(x) else np.nan for x in out_idx]

Here's a way to do what your question asks: query_results = pd.DataFrame(index=q).join(pd.DataFrame({'y':y}, index=x)).T.to_numpy()[0] Output: [ 5. nan]

Related

Run two nested for loops in parallel to create matrix

how to compare entries in numpy array with each other efficiently?

xor matrix multiplication for AES mix column stage

Python: efficient way to match 2 different length arrays and find index in larger array

Filling a list faster

Categories

Resources