I have a simple two lines block of code that adds values to an array according to the closest elements found in another array. Since it is buried deep inside an MCMC it is executed millions of times, and I need it to be as efficient as possible.
The code below works and it is pretty self explanatory. Basically: the array arr2[0] (the one used to find the closest elements in arr0) contains values in the range (10., 25.). Currently I look for the absolute closest element in arr0 for each element in arr2[0] using np.searchsorted(), taking advantage of the fact that arr0 is already sorted.
I would be willing to trade some accuracy for better performance. That is, I could live with an index that points to a "close" element with a tolerance of say +-0.2, instead of the absolute closest element (which is what I do now)
Can this be done? More importantly: can this be done and improve the performance of the code?
import numpy as np
# Random initial data with the actual shapes used by my code.
Nmax = 1000000
arr0 = np.linspace(5., 30., Nmax)
D = np.random.randint(2, 4)
arr1 = np.random.uniform(-3., 3., (D, Nmax))
arr2 = np.random.uniform(10., 25., (10, 1500))
# Can these two lines be made faster?
# Indexes of elements in 'arr0' closest to the elements in 'arr2[0]'
closest_idxs = np.searchsorted(arr0, arr2[0])
# Add elements from 'arr1' to the first dimensions of 'arr2', according
# to the indexes found above.
arr_final = arr2[:arr1.shape[0]] + arr1[:, closest_idxs]
For an approximate matching with given tolerance value, we can use it to reduce the first arg to searchsorted and hence optimize, like so -
tol = 0.2 # tolerance value
s = int(np.round(tol/(arr0[1]-arr0[0])))
i = np.searchsorted(arr0[::s], arr2[0])
i -= (arr0[i*s]-arr2[0])>tol/2
closest_idxs_out = i*s
Timings on given setup -
In [123]: %%timeit
...: closest_idxs = np.searchsorted(arr0, arr2[0])
...: arr_final = arr2[:arr1.shape[0]] + arr1[:, closest_idxs]
1000 loops, best of 3: 641 µs per loop
In [125]: %%timeit
...: tol = 0.2 # tolerance value
...: s = int(np.round(tol/(arr0[1]-arr0[0])))
...: i = np.searchsorted(arr0[::s], arr2[0])
...: i -= (arr0[i*s]-arr2[0])>tol/2
...: closest_idxs_out = i*s
10000 loops, best of 3: 63.2 µs per loop
Related
I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.
My current approach with numpy:
import numpy as np
refArray = np.random.random(16);
myArray = np.random.random(1000);
def find_nearest(array, value):
idx = (np.abs(array-value)).argmin()
return idx;
for value in np.nditer(myArray):
index = find_nearest(refArray, value);
print(index);
Unfortunately, this takes ages for a large amount of values.
Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?
FYI: I don't necessarily need numpy in my script.
Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.
Here's one vectorized approach with np.searchsorted based on this post -
def closest_argmin(A, B):
L = B.size
sidx_B = B.argsort()
sorted_B = B[sidx_B]
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sidx_B[sorted_idx-mask]
Brief explanation :
Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.
Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.
Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.
Benchmarking
Original approach -
def org_app(myArray, refArray):
out1 = np.empty(myArray.size, dtype=int)
for i, value in enumerate(myArray):
# find_nearest from posted question
index = find_nearest(refArray, value)
out1[i] = index
return out1
Timings and verification -
In [188]: refArray = np.random.random(16)
...: myArray = np.random.random(1000)
...:
In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop
In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop
In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True
50x+ speedup for the posted sample and hopefully more for larger datasets!
An answer that is much shorter than that of #Divakar, also using broadcasting and even slightly faster:
abs(myArray[:, None] - refArray[None, :]).argmin(axis=-1)
I have a numpy array embed_vec of length tot_vec in which each entry is a 3d vector:
[[ 0.52483319 0.78015841 0.71117216]
[ 0.53041481 0.79462171 0.67234534]
[ 0.53645428 0.80896727 0.63119403]
...,
[ 0.72283509 0.40070804 0.15220522]
[ 0.71277758 0.38498613 0.16141834]
[ 0.70221445 0.36918032 0.17370776]]
For each of the elements in this array, I want to find out the number of other entries which are "close" to that entry. By close, I mean that the distance between two vectors is less than a specified value R. For this, I must compare all the possible pairs in this array with each other and then find out the number of close vectors for each of the vectors in the array. So I am doing this:
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
However, this is extremely inefficient because I have two nested python loops and for larger array sizes, this takes forever. If this were in C++ or Fortran, it wouldn't have been a great issue. My question is, can one achieve the same thing using numpy efficiently using some vectorization method? As a side note, I don't mind a solution using Pandas also.
Approach #1 : Vectorized approach -
def vectorized_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
r,c = np.triu_indices(tot_vec,1)
subs = embed_vec[r] - embed_vec[c]
dists = np.einsum('ij,ij->i',subs,subs)
return np.bincount(r,dists<R**2,minlength=tot_vec)
Approach #2 : With less loop complexity (for very large arrays) -
def loopy_less_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
Rsq = R**2
out = np.zeros(tot_vec,dtype=int)
for i in range(tot_vec):
subs = embed_vec[i] - embed_vec[i+1:tot_vec]
dists = np.einsum('ij,ij->i',subs,subs)
out[i] = np.count_nonzero(dists < Rsq)
return out
Benchmarking
Original approach -
def loopy_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
return p
Timings -
In [76]: # Sample random array
...: embed_vec = np.random.rand(3000,3)
...: R = 0.5
...:
In [77]: %timeit loopy_app(embed_vec, R)
1 loops, best of 3: 50.5 s per loop
In [78]: %timeit loopy_less_app(embed_vec, R)
10 loops, best of 3: 143 ms per loop
350x+ speedup there!
Going with much bigger array with the proposed loopy_less_app -
In [81]: # Sample random array
...: embed_vec = np.random.rand(20000,3)
...: R = 0.5
...:
In [82]: %timeit loopy_less_app(embed_vec, R)
1 loops, best of 3: 4.47 s per loop
I am intrigued by that question and attempted to solve it efficintly using scipy's cKDTree. However, this approach may run out of memory because internally a list of all pairs with distance <= R is maintained. If your R and tot_vec are small enough it will work:
import numpy as np
from scipy.spatial import cKDTree as KDTree
tot_vec = 60000
embed_vec = np.random.randn(tot_vec, 3)
R = 0.1
tree = KDTree(embed_vec, leafsize=100)
p = np.zeros(tot_vec)
for pair in tree.query_pairs(R):
p[pair[0]] += 1
p[pair[1]] += 1
In case memory is an issue, with some effort it is possible to rewrite query_pairs as a generator function in Python at the cost of C performance.
first broadcast the difference:
disp_vecs=tot_vec[:,None,:]-tot_vec[None,:,:]
Now, depending on how big your dataset is, you may want to do a fist pass without all the math. If the distance is less than r, all the components should be less than r
first_mask=np.max(disp_vec, axis=-1)<r
Then do the actual calculation
disps=np.linlg.norm(disp_vec[first_mask],axis=-1)
second_mask=disps<r
Now reassign
disps=disps[second_mask]
first_mask[first_mask]=second_mask
disps are now the good values, and first_mask is a boolean mask of where they go. You can process from there.
Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?
We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.
So, we would have a vectorized implementation like so -
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
Runtime test -
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
The solution provided by #Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. [[1.0, 2,0, 3.0, 1.0e+20],...]). In some cases r may be so large that applying a+r and b+r wipes out the original values you're trying to run searchsorted on, and you're just comparing r to r.
To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np
# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)
# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape[0]).reshape(-1,1)
ai['value'] = a
# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape[0]).reshape(-1,1)
vi['value'] = v
# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)
# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape[1]
return result
Edit: The timing on this approach is abysmal!
In [21]: %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop
You would be better off just just using map over the array:
In [22]: %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop
For integer data, #Divakar's approach is still the fastest:
In [23]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop
Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?
We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.
So, we would have a vectorized implementation like so -
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
Runtime test -
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
The solution provided by #Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. [[1.0, 2,0, 3.0, 1.0e+20],...]). In some cases r may be so large that applying a+r and b+r wipes out the original values you're trying to run searchsorted on, and you're just comparing r to r.
To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np
# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)
# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape[0]).reshape(-1,1)
ai['value'] = a
# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape[0]).reshape(-1,1)
vi['value'] = v
# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)
# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape[1]
return result
Edit: The timing on this approach is abysmal!
In [21]: %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop
You would be better off just just using map over the array:
In [22]: %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop
For integer data, #Divakar's approach is still the fastest:
In [23]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop
I need to generate 1D array where repeated sequences of integers are separated by a random number of zeros.
So far I am using next code for this:
from random import normalvariate
regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length
# Sequence of lags lengths
lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]
# Generate list of concatenated zeros and regular sequences
seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
seq = np.concatenate(seq)
It works but looks very slow when I need a lot of long sequences. So, how can I optimize it?
You can pre-compute indices where repeated regular_sequence elements are to be put and then set those with regular_sequence in a vectorized manner. For pre-computing those indices, one can use np.cumsum to get the start of each such chunk of regular_sequence and then add a continuous set of integers extending to the size of regular_sequence to get all indices that are to be updated. Thus, the implementation would look something like this -
# Size of regular_sequence
N = regular_sequence.size
# Use cumsum to pre-compute start of every occurance of regular_sequence
offset_arr = np.cumsum(lag_seq)
idx = np.arange(offset_arr.size)*N + offset_arr
# Setup output array
out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
# Broadcast the start indices to include entire length of regular_sequence
# to get all positions where regular_sequence elements are to be set
np.put(out,idx[:,None] + np.arange(N),regular_sequence)
Runtime tests -
def original_app(lag_seq, regular_sequence):
seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
return np.concatenate(seq)
def vectorized_app(lag_seq, regular_sequence):
N = regular_sequence.size
offset_arr = np.cumsum(lag_seq)
idx = np.arange(offset_arr.size)*N + offset_arr
out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
np.put(out,idx[:,None] + np.arange(N),regular_sequence)
return out
In [64]: # Setup inputs
...: regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
...: n_iter = 1000
...: lag_mean = 10 # mean length of zeros sequence
...: lag_sd = 1 # standard deviation of zeros sequence length
...:
...: # Sequence of lags lengths
...: lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]
...:
In [65]: out1 = original_app(lag_seq, regular_sequence)
In [66]: out2 = vectorized_app(lag_seq, regular_sequence)
In [67]: %timeit original_app(lag_seq, regular_sequence)
100 loops, best of 3: 4.28 ms per loop
In [68]: %timeit vectorized_app(lag_seq, regular_sequence)
1000 loops, best of 3: 294 µs per loop
The best approach, I think, would be to use convolution. You can figure out the lag lengths, combine that with the length of the sequence, and use that to figure out the starting point of each regular sequence. Set those starting points to zero, then convolve with your regular sequence to fill in the values.
import numpy as np
regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10000000
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length
# Sequence of lags lengths
lag_lens = np.round(np.random.normal(lag_mean, lag_sd, n_iter)).astype(np.int)
lag_lens[1:] += len(regular_sequence)
starts_inds = lag_lens.cumsum()-1
# Generate list of convolved ones and regular sequences
seq = np.zeros(lag_lens.sum(), dtype=np.int)
seq[starts_inds] = 1
seq = np.convolve(seq, regular_sequence)
This approach takes something like 1/20th the time on large sequences, even after changing your version to use the numpy random number generator.
Not a trivial problem because data is misaligned. Performance depends on what is a long sequence. Take the example of a square problem : a lot of, long, regular and zeros sequences (n_iter==n_reg==lag_mean):
import numpy as np
n_iter = 1000
n_reg = 1000
regular_sequence = np.arange(n_reg, dtype=np.int)
lag_mean = n_reg # mean length of zeros sequence
lag_sd = lag_mean/10 # standard deviation of zeros sequence length
lag_seq=np.int64(np.random.normal(lag_mean,lag_sd,n_iter)) # Sequence of lags lengths
First your solution :
def seq_hybrid():
seqs = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
seq = np.concatenate(seqs)
return seq
Then a pure numpy one :
def seq_numpy():
seq=np.zeros(lag_seq.sum()+n_iter*n_reg,dtype=int)
cs=np.cumsum(lag_seq+n_reg)-n_reg
indexes=np.add.outer(cs,np.arange(n_reg))
seq[indexes]=regular_sequence
return seq
A for loop solution :
def seq_python():
seq=np.empty(lag_seq.sum()+n_iter*n_reg,dtype=int)
i=0
for lag in lag_seq:
for k in range(lag):
seq[i]=0
i+=1
for k in range(n_reg):
seq[i]=regular_sequence[k]
i+=1
return seq
And a just in time compilation with numba :
from numba import jit
seq_numba=jit(seq_python)
Tests now :
In [96]: %timeit seq_hybrid()
10 loops, best of 3: 38.5 ms per loop
In [97]: %timeit seq_numpy()
10 loops, best of 3: 34.4 ms per loop
In [98]: %timeit seq_python()
1 loops, best of 3: 1.56 s per loop
In [99]: %timeit seq_numba()
100 loops, best of 3: 12.9 ms per loop
Your hybrid solution is quite as speed as a pure numpy one in this case because
the performance depend essentially of the inner loop. And yours (zeros and concatenate) is a numpy one. Predictably , python solution is slower with a traditional about 40x factor. But numpy is not optimal here, because it uses fancy indexing, necessary with misaligned data . In this case numba can help you : minimal operations are done at C level, for a 120x factor gain this time compared to the python solution.
For other values of n_iter,n_reg the factor gains compared to the python solution are:
n_iter= 1000, n_reg= 1000 : seq_numba 124, seq_hybrid 49, seq_numpy 44.
n_iter= 10, n_reg= 100000 : seq_numba 123, seq_hybrid 104, seq_numpy 49.
n_iter= 100000, n_reg= 10 : seq_numba 127, seq_hybrid 1, seq_numpy 42.
I thought an answer posted on this question had a good approach using a binary mask and np.convolve but the answer got deleted and I don't know why. Here it is with 2 concerns addressed.
def insert_sequence(lag_seq, regular_sequence):
offsets = np.cumsum(lag_seq)
start_locs = np.zeros(offsets[-1] + 1, dtype=regular_sequence.dtype)
start_locs[offsets] = 1
return np.convolve(start_locs, regular_sequence)
lag_seq = np.random.normal(15,1,10)
lag_seq = lag_seq.astype(np.uint8)
regular_sequence = np.arange(1, 6)
seq = insert_sequence(lag_seq, regular_sequence)
print(repr(seq))