Speed up minimum search in Numpy/Python - python

I have two floating arrays and want to find data points which match within a certain range.
This is what I got so far:
import numpy as np
for vx in range(len(arr1)):
match = (np.abs(arr2-arr1[vx])).argmin()
if abs(arr1[vx]-arr2[match])<0.375:
point = arr2[match]
The problem is that arr1 contains 150000 elements and arr2 around 110000 elements. This takes an awful amount of time. Do you have suggestions to speed things up?

In addition to not being vectorized, your current search is (n * m) where n is the size of arr2 and m is the size of arr1. In these kinds of searches it helps to sort arr1 or arr2 so you can use a binary search. Sorting ends up being the slowest step but it's still faster if m is large because the n*log(n) sort is faster than (n*m).
Here is how you can do the search in a vectorized way using the sorted array:
def find_closest(A, target):
#A must be sorted
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return A[idx]
arr2.sort()
closest = find_closest(arr2, arr1)
closest = np.where(abs(closest - arr1) < .375, closest, np.nan)

The whole idea of using numpy is to avoid computation with loops.
Specifying criteria to extract a new array that satisfies the criteria can be implemented easily with array computation. Here's an example extracting values from array a which satisfies the criteria that that element has an absolute different of less than 0.75 from the corresponding element in array b:-
a = array([1, 0, 0.5, 1.2])
b = array([1.2, 1.1, 1.3, 1.4])
c = a[abs(a-b)<0.75]
Which gives us
array([ 1. , 1.2])

Related

Tuple-like (lexographical) max in numpy

I find myself running into the following situation in numpy muliple times over the past couple of months, and I cannot imagine there is not a proper solution for it.
I have a 2d array, let's say
x = np.array([
[1, 2, 3],
[2, -5, .333],
[1, 4, 2],
[2, -5, 4]])
Now I would like to (sort/get the maximum/do argsort/argmax/ etc) this array in such a way that it first compares the first column. If the first column is equal, it compares the second column, and then the third. So this means for our example:
# max like python: max(x.tolist())
np.tuple_like_amax(x) = np.array([2, -5, 4])
# argmax does't have python equivalent, but something like: [i for i, e in enumerate(x.tolist()) if e == max(x.tolist())][0]
np.tuple_like_argmax = 3
# sorting like python: sorted(x.tolist())
np.tuple_like_sort(x) = np.array([[1.0, 2.0, 3.0], [1.0, 4.0, 2.0], [2.0, -5.0, 0.333], [2.0, -5.0, 4.0]])
# argsort doesn't have python equivalent, but something like: sorted(range(len(x)), key=lambda i: x[i].tolist())
np.tuple_like_argsort(x) = np.array([0, 2, 1, 3])
This is exactly the way how python compares tuples (so actually just calling max(x.tolist()) does the trick here for max. It does feel however like a time-and-memory waste to first convert the array to a python list, and in addition I would like to use things like argmax, sort and all the other great numpy functions.
So just to be clear, I'm not interested in python code that mimics an argmax, but for something that achieves this without converting the lists to python lists.
Found so far:
np.sort seems to work on structured arrays when order= is given. It does feel to me that creating a structured array and then using this method is overkill. Also, argmax doesn't seem to support this, meaning that one would have to use argsort, which has a much higher complexity.
Here I will focus only on finding the lexicographic argmax (the others: max, argmin, and min can be found trivially from argmax). In addition, unlike np.argmax(), we will return all rows that are at rank 0 (if there are duplicate rows), i.e. all the indices where the row is the lexicographic maximum.
The idea is that, for the "tuple-like order" desired here, the function is really:
find all indices where the first column has the maximum;
break ties with the places where the second column is max, under condition that the first column is max;
etc., as long as there are ties to break (and more columns).
def ixmax(x, k=0, idx=None):
col = x[idx, k] if idx is not None else x[:, k]
z = np.where(col == col.max())[0]
return z if idx is None else idx[z]
def lexargmax(x):
idx = None
for k in range(x.shape[1]):
idx = ixmax(x, k, idx)
if len(idx) < 2:
break
return idx
At first, I was worried that the explicit looping in Python would kill it. But it turns out that it is quite fast. In the case where there is no ties (more likely with independent float values, for instance), that returns immediately after a single np.where(x[:, 0] == x[:, 0].max()). Only in the case of ties do we need to look at the (much smaller) subset of rows that were tied. In unfavorable conditions (many repeated values in all columns), it is still ~100x or more than the partition method, and O(log n) faster than lexsort(), of course.
Test 1: correctness
for i in range(1000):
x = np.random.randint(0, 10, size=(1000, 8))
found = lexargmax(x)
assert lexargmax_by_sort(x) in found and np.unique(x[found], axis=0).shape[0] == 1
(where lexargmark_by_sort is np.lexsort(x[:, ::-1].T)[-1])
Test 2: speed
x = np.random.randint(0, 10, size=(100_000, 100))
a = %timeit -o lexargmax(x)
# 776 µs ± 313 ns per loop
b = %timeit -o lexargmax_by_sort(x)
# 507 ms ± 2.65 ms per loop
# b.average / a.average: 652
c = %timeit -o lexargmax_by_partition(x)
# 141 ms ± 2.38 ms
# c.average / a.average: 182
(where lexargmark_by_partition is based on #MadPhysicist very elegant idea:
def lexargmax_by_partition(x):
view = np.ndarray(x.shape[0], dtype=[('', x.dtype)] * x.shape[1], buffer=x)
return np.argpartition(view, -1)[-1]
)
After some more testing on various sizes, we get the following time measurements and performance ratios:
In the LHS plot, lexargmax is the group shown with 'o-' and lexargmax_by_partition is the upper group of lines.
In the RHS plot, we just show the speed ratio.
Interestingly, lexargmax_by_partition execution time seems fairly independent of m, the number of columns, whereas our lexargmax depends a little bit on it. I believe that is reflecting the fact that, in this setting (purposeful collisions of max in each column), the more columns we have, the "deeper" we need to go when breaking ties.
Previous (wrong) answer
To find the argmax of the row by lexicographic order, I was thinking you could do:
def lexmax(x):
r = (2.0 ** np.arange(x.shape[1]))[::-1]
return np.argmax(((x == x.max(axis=0)) * r).sum(axis=1))
Explanation:
x == x.max(axis=0) (as an int) is 1 for each element that is equal to the column's max. In your example, it is (astype(int)):
[[0 0 0]
[1 0 0]
[0 1 0]
[1 0 1]]
then we multiply by a column weight that is more than the sum of 1's on the right. Powers of two achieve that. We do it in float to address cases with more than 64 columns.
But this is fatally flawed: The positions of max in the second column should be considered only in the subset where the first column had the max value (to break the tie).
Other approaches including affine transformations of all columns so that we can sum them and find the max don't work either: if the max in column 0 is, say, 1.0, and there is a second place at 0.999, then we would have to know ahead of time that difference of 0.001 and make sure no combination of values from the columns to the right to sum up to overtake that difference. So, that's a dead end.
To sort a list by the contents of a row, you can use np.lexsort. The only catch is that it sorts by the last element of the selected axis first:
index = np.lexsort(x.T[::-1])
OR
index = np.lexsort(x[:, ::-1].T)
This is "argsort". You can make it into "sort" by doing
x[index]
"min" and "max" can be done trivially by using the index:
xmin = x[index[0]]
xmax = x[index[-1]]
Alternatively, you can use a technique I suggested in one of my questions: Sorting array of objects by row using custom dtype. The idea is to make each row into a structure that has a field for each element:
view = np.ndarray(x.shape[0], dtype=[('', x.dtype)] * x.shape[1], buffer=x)
You can sort the array in-place by operating on the view:
>>> view.sort()
>>> x
array([[ 1. , 2. , 3. ],
[ 1. , 4. , 2. ],
[ 2. , -5. , 0.333],
[ 2. , -5. , 4. ]])
That's because the ndarray constructor points to x as the original buffer.
You can not get argmin, argmax, min and max to work on the result. However, you can still get the min and max in O(n) time using my favorite function in all of numpy: np.partition:
view.partition([0, -1])
xmin = x[0]
xmax = x[-1]
You can use argpartition on the array as well to get the indices of the desired elements:
index = view.argpartition([0, -1])[[0, -1]]
xmin = x[index[0]]
xmax = x[index[-1]]
Notice that both sort and partition have an order argument that you can use to rearrange the comparison of the columns.

how to do element-wise operator(substract) in numpy fancy index(more than one same index in index list)?

My project has just catch a bug, I want to do following:
import numpy as np
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
a[[0,1,1]] -= b[[0,1,2]]
I hope the result second a[1] = a[1]-b[1]-b[2] = -11, because there is two index=1 in a[xxx] ,thus I want to a[1] subtract twice. But this numpy code only produce:
array([-4, -4, -5])
For the reason I want to numpy boost my algorithm speed, thus I only want to write vectorize numpy code (avoid python for-loop)
Approach #1
You need to use np.subtract.at for those accumulated subtraction at given indices with given values -
np.subtract.at(a,[0,1,1], b[[0,1,2]])
Sample run -
In [8]: a = np.array([1,2,3,4])
In [9]: b = np.array([5,6,7,8])
In [10]: np.subtract.at(a,[0,1,1], b[[0,1,2]])
In [11]: a
Out[11]: array([ -4, -11, 3, 4])
Approach #2
Alternatively, using np.bincount -
ind = np.array([0,1,1])
val = b[[0,1,2]]
unq_ind = np.unique(ind)
a[unq_ind] -= np.bincount(ind, val).astype(a.dtype)
If ind is already sorted, get unq_ind, like so -
unq_ind = ind[np.concatenate(([True],ind[1:] != ind[:-1]))]
Approach #2S (Simpler)
If you don't want to mess around with the unique work, use minlength arg with bincount -
a -= np.bincount([0,1,1], b[[0,1,2]], minlength=a.size).astype(a.dtype)
For accumulated additions
To use the proposed approaches for adding instead of subtracting, simply replace np.subtract.at with np.add.at and for the bincount methods, replace -= with +=.

Numpy find number of occurrences in a 2D array

Is there a numpy function to count the number of occurrences of a certain value in a 2D numpy array. E.g.
np.random.random((3,3))
array([[ 0.68878371, 0.2511641 , 0.05677177],
[ 0.97784099, 0.96051717, 0.83723156],
[ 0.49460617, 0.24623311, 0.86396798]])
How do I find the number of times 0.83723156 occurs in this array?
arr = np.random.random((3,3))
# find the number of elements that get really close to 1.0
condition = arr == 0.83723156
# count the elements
np.count_nonzero(condition)
The value of condition is a list of booleans representing whether each element of the array satisfied the condition. np.count_nonzero counts how many nonzero elements are in the array. In the case of booleans it counts the number of elements with a True value.
To be able to deal with floating point accuracy, you could do something like this instead:
condition = np.fabs(arr - 0.83723156) < 0.001
For floating point arrays np.isclose is much better option than either comparing with the exactly same element or defining a custom range.
>>> a = np.array([[ 0.68878371, 0.2511641 , 0.05677177],
[ 0.97784099, 0.96051717, 0.83723156],
[ 0.49460617, 0.24623311, 0.86396798]])
>>> np.isclose(a, 0.83723156).sum()
1
Note that real numbers are not represented exactly in a computer, that is why np.isclose will work while == doesn't:
>>> (0.1 + 0.2) == 0.3
False
Instead:
>>> np.isclose(0.1 + 0.2, 0.3)
True
To count the number of times x appears in any array, you can simply sum the boolean array that results from a == x:
>>> col = numpy.arange(3)
>>> cols = numpy.tile(col, 3)
>>> (cols == 1).sum()
3
It should go without saying, but I'll say it anyway: this is not very useful with floating point numbers unless you specify a range, like so:
>>> a = numpy.random.random((3, 3))
>>> ((a > 0.5) & (a < 0.75)).sum()
2
This general principle works for all sorts of tests. For example, if you want to count the number of floating point values that are integral:
>>> a = numpy.random.random((3, 3)) * 10
>>> a
array([[ 7.33955747, 0.89195947, 4.70725211],
[ 6.63686955, 5.98693505, 4.47567936],
[ 1.36965745, 5.01869306, 5.89245242]])
>>> a.astype(int)
array([[7, 0, 4],
[6, 5, 4],
[1, 5, 5]])
>>> (a == a.astype(int)).sum()
0
>>> a[1, 1] = 8
>>> (a == a.astype(int)).sum()
1
You can also use np.isclose() as described by Imanol Luengo, depending on what your goal is. But often, it's more useful to know whether values are in a range than to know whether they are arbitrarily close to some arbitrary value.
The problem with isclose is that its default tolerance values (rtol and atol) are arbitrary, and the results it generates are not always obvious or easy to predict. To deal with complex floating point arithmetic, it does even more floating point arithmetic! A simple range is much easier to reason about precisely. (This is an expression of a more general principle: first, do the simplest thing that could possibly work.)
Still, isclose and its cousin allclose have their uses. I usually use them to see if a whole array is very similar to another whole array, which doesn't seem to be your question.
If it may be of use to anyone: for very large 2D arrays, if you want to count how many time all elements appear within the entire array, one could flatten the array into a list and then count how many times each element appeared:
from itertools import chain
import collections
from collections import Counter
#large array is called arr
flatten_arr = list(chain.from_iterable(arr))
dico_nodeid_appearence = Counter(flatten_arr)
#how may times x appeared in the arr
dico_nodeid_appearence[x]

Rounding an array to values given in another array

Say I have an array:
values = np.array([1.1,2.2,3.3,4.4,2.1,8.4])
I want to round these values to members of an arbitrary array, say:
rounds = np.array([1.,3.5,5.1,6.7,9.2])
ideally returning an array of rounded numbers and an array of the residues:
rounded = np.array([1.,1.,3.5,5.1,1.,9.2])
residues = np.array([-0.1,-1.2,0.2,0.7,-1.1,0.6])
Is there a good pythonic way of doing this?
One option is this:
>>> x = np.subtract.outer(values, rounds)
>>> y = np.argmin(abs(x), axis=1)
And then rounded and residues are, respectively:
>>> rounds[y]
array([ 1. , 1. , 3.5, 5.1, 1. , 9.2])
>>> rounds[y] - values
array([-0.1, -1.2, 0.2, 0.7, -1.1, 0.8])
Essentially x is a 2D array of every value in values minus every value in rounds. y is a 1D array of the index of the minimum absolute value of each row of x. This y is then used to index rounds.
I should caveat this answer by noting that if len(values) * len(rounds) is big (e.g. starting to exceed 10e8), memory usage may start to become of concern. In this case, you could consider building up y iteratively instead to avoid having to allocate a large block of memory to x.
As the items in rounds array are sorted(or if not sort them) we can do this is O(n logn) time using numpy.searchsorted:
from functools import partial
def closest(rounds, x):
ind = np.searchsorted(rounds, x, side='right')
length = len(rounds)
if ind in (0, length) :
return rounds[ind]
else:
left, right = rounds[ind-1], rounds[ind]
val = min((left, right), key=lambda y:abs(x-y))
return val
f = partial(closest, rounds)
rounded = np.apply_along_axis(f, 1, values[:,None])[:,0]
residues = rounded - values
print repr(rounded)
print repr(residues)
Output:
array([ 1. , 1. , 3.5, 5.1, 1. , 9.2])
array([-0.1, -1.2, 0.2, 0.7, -1.1, 0.8])
The same time complexity as the answer by Ashwini Chaudhary, but fully vectorized:
def round_to(rounds, values):
# The main speed is in this line
I = np.searchsorted(rounds, values)
# Pad so that we can index easier
rounds_p = np.pad(rounds, 1, mode='edge')
# We have to decide between I and I+1
rounded = np.vstack([rounds_p[I], rounds_p[I+1]])
residues = rounded - values
J = np.argmin(np.abs(residues), axis=0)
K = np.arange(len(values))
return rounded[J,K], residues[J,K]
Find the closest number of x in rounds:
def findClosest(x,rounds):
return rounds[np.argmin(np.absolute(rounds-x))]
Loop over all values:
rounded = [findClosest(x,rounds) for x in values]
residues = values - rounded
This is a straightforward method, but you can be more efficient using that your rounds array is ordered.
def findClosest(x,rounds):
for n in range(len(rounds)):
if x > rounds[n]:
if n == 0:
return rounds[n]
elif rounds[n]-x > x-rounds[n-1]:
return rounds[n-1]
else:
return rounds[n]
return rounds[-1]
This might be, but not necessarily is faster than the argmin approach because you lose time with the python for loop, but you don't have to check along the whole rounds array.
The selected answer is already great. This one may seem convoluted to those that aren't necessarily used to more complex list-comprehensions, but otherwise it's actually quite clear (IMO) if you're familiar with it.
(Interestingly enough, this happens to run faster than the selected answer. Why would the numPy version be slower than this? Hmm... )
values = np.array([1.1,2.2,3.3,4.4,2.1,8.4])
rounds = np.array([1.,3.5,5.1,6.7,9.2])
rounded, residues = zip(*[
[
(rounds[cIndex]),
(dists[cIndex])
]
for v in values
for dists in [[r-v for r in rounds]]
for absDists in [[abs(d) for d in dists]]
for cIndex in [absDists.index(min(absDists))]
])
print np.array(rounded)
print np.array(residues)

Speed up loop to fill an array with closest values from another array

I have a block of code that I need to optimize as much as possible since I have to run it several thousand times.
What it does is it finds the closest float in a sub-list of a given array for a random float and stores the corresponding float (ie: with the same index) stored in another sub-list of that array. It repeats the process until the sum of floats stored reaches a certain limit.
Here's the MWE to make it clearer:
import numpy as np
# Define array with two sub-lists.
a = [np.random.uniform(0., 100., 10000), np.random.random(10000)]
# Initialize empty final list.
b = []
# Run until the condition is met.
while (sum(b) < 10000):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[1].
idx = np.argmin(np.abs(u - a[1]))
# Store value located in sub-list a[0].
b.append(a[0][idx])
The code is reasonably simple but I haven't found a way to speed it up. I tried to adapt the great (and very fast) answer given in a similar question I made some time ago, to no avail.
OK, here's a slightly left-field suggestion. As I understand it, you are just trying to sample uniformally from the elements in a[0] until you have a list whose sum exceeds some limit.
Although it will be more costly memory-wise, I think you'll probably find it's much faster to generate a large random sample from a[0] first, then take the cumsum and find where it first exceeds your limit.
For example:
import numpy as np
# array of reference float values, equivalent to a[0]
refs = np.random.uniform(0, 100, 10000)
def fast_samp_1(refs, lim=10000, blocksize=10000):
# sample uniformally from refs
samp = np.random.choice(refs, size=blocksize, replace=True)
samp_sum = np.cumsum(samp)
# find where the cumsum first exceeds your limit
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
# # if it's ok to be just under lim rather than just over then this might
# # be quicker
# return samp[samp_sum <= lim]
Of course, if the sum of the sample of blocksize elements is < lim then this will fail to give you a sample whose sum is >= lim. You could check whether this is the case, and append to your sample in a loop if necessary.
def fast_samp_2(refs, lim=10000, blocksize=10000):
samp = np.random.choice(refs, size=blocksize, replace=True)
samp_sum = np.cumsum(samp)
# is the sum of our current block of samples >= lim?
while samp_sum[-1] < lim:
# if not, we'll sample another block and try again until it is
newsamp = np.random.choice(refs, size=blocksize, replace=True)
samp = np.hstack((samp, newsamp))
samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) + samp_sum[-1]))
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
Note that concatenating arrays is pretty slow, so it would probably be better to make blocksize large enough to be reasonably sure that the sum of a single block will be >= to your limit, without being excessively large.
Update
I've adapted your original function a little bit so that its syntax more closely resembles mine.
def orig_samp(refs, lim=10000):
# Initialize empty final list.
b = []
a1 = np.random.random(10000)
# Run until the condition is met.
while (sum(b) < lim):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[1].
idx = np.argmin(np.abs(u - a1))
# Store value located in sub-list a[0].
b.append(refs[idx])
return b
Here's some benchmarking data.
%timeit orig_samp(refs, lim=10000)
# 100 loops, best of 3: 11 ms per loop
%timeit fast_samp_2(refs, lim=10000, blocksize=1000)
# 10000 loops, best of 3: 62.9 µs per loop
That's a good 3 orders of magnitude faster. You can do a bit better by reducing the blocksize a fraction - you basically want it to be comfortably larger than the length of the arrays you're getting out. In this case, you know that on average the output will be about 200 elements long, since the mean of all real numbers between 0 and 100 is 50, and 10000 / 50 = 200.
Update 2
It's easy to get a weighted sample rather than a uniform sample - you can just pass the p= parameter to np.random.choice:
def weighted_fast_samp(refs, weights=None, lim=10000, blocksize=10000):
samp = np.random.choice(refs, size=blocksize, replace=True, p=weights)
samp_sum = np.cumsum(samp)
# is the sum of our current block of samples >= lim?
while samp_sum[-1] < lim:
# if not, we'll sample another block and try again until it is
newsamp = np.random.choice(refs, size=blocksize, replace=True,
p=weights)
samp = np.hstack((samp, newsamp))
samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) + samp_sum[-1]))
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
Write it in cython. That's going to get you a lot more for a high iteration operation.
http://cython.org/
One obvious optimization - don't re-calculate sum on each iteration, accumulate it
b_sum = 0
while b_sum<10000:
....
idx = np.argmin(np.abs(u - a[1]))
add_val = a[0][idx]
b.append(add_val)
b_sum += add_val
EDIT:
I think some minor improvement (check it out if you feel like it) may be achieved by pre-referencing sublists before the loop
a_0 = a[0]
a_1 = a[1]
...
while ...:
....
idx = np.argmin(np.abs(u - a_1))
b.append(a_0[idx])
It may save some on run time - though I don't believe it will matter that much.
Sort your reference array.
That allows log(n) lookups instead of needing to browse the whole list. (using bisect for example to find the closest elements)
For starters, I reverse a[0] and a[1] to simplify the sort:
a = np.sort([np.random.random(10000), np.random.uniform(0., 100., 10000)])
Now, a is sorted by order of a[0], meaning if you are looking for the closest value to an arbitrary number, you can start by a bisect:
while (sum(b) < 10000):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[0].
idx = bisect.bisect(a[0], u)
# now, idx can either be idx or idx-1
if idx is not 0 and np.abs(a[0][idx] - u) > np.abs(a[0][idx - 1] - u):
idx = idx - 1
# Store value located in sub-list a[1].
b.append(a[1][idx])

Categories

Resources