Numpy: Vectorize np.argwhere - python

I have the following data structures in numpy:
import numpy as np
a = np.random.rand(267, 173) # dense img matrix
b = np.random.rand(199) # array of probability samples
My goal is to take each entry i in b, find the x,y coordinates/index positions of all values in a that are <= i, then randomly select one of the values in that subset:
from random import randint
for i in b:
l = np.argwhere(a <= i) # list of img coordinates where pixel <= i
sample = l[randint(0, len(l)-1)] # random selection from `l`
This "works", but I'd like to vectorize the sampling operation (i.e. replace the for loop with apply_along_axis or similar). Does anyone know how this can be done? Any suggestions would be greatly appreciated!

You can't exactly vectorize np.argmax because you have a random subset size every time. What you can do though, is speed up the computation pretty dramatically with sorting. Sorting the image once will create a single allocation, while masking the image at every step will create a temporary array for the mask and for the extracted elements. With a sorted image, you can just apply np.searchsorted to get the sizes:
a_sorted = np.sort(a.ravel())
indices = np.searchsorted(a_sorted, b, side='right')
You still need a loop to do the sampling, but you can do something like
samples = np.array([a_sorted[np.random.randint(i)] for i in indices])
Getting x-y coordinates instead of sample values is a bit more complicated with this system. You can use np.unravel_index to get the indices, but first you must convert form the reference frame of a_sorted to a.ravel(). If you sort using np.argsort instead of np.sort, you can get the indices in the original array. Fortunately, np.searchsorted supports this exact scenario with the sorter parameter:
a_ind = np.argsort(a, axis=None)
indices = np.searchsorted(a.ravel(), b, side='right', sorter=a_ind)
r, c = np.unravel_index(a_ind[[np.random.randint(i) for i in indices]], a.shape)
r and c are the same size as b, and correspond to the row and column indices in a of each selection based on b. The index conversion depends on the strides in your array, so we'll assume that you're using C order, as 90% of arrays will do by default.
Complexity
Let's say b has size M and a has size N.
Your current algorithm does a linear search through each element of a for each element of b. At each iteration, it allocates a mask for the matching elements (N/2 on average), and then a buffer of the same size to hold the masked choices. This means that the time complexity is on the order of O(M * N) and the space complexity is the same.
My algorithm sorts a first, which is O(N log N). Then it searches for M insertion points, which is O(M log N). Finally, it selects M samples. The space it allocates is one sorted copy of the image and two arrays of size M. It is therefore of O((M + N) log N) time complexity and O(M + N) in space.

Here is an alternative approach argsorting b instead and then binning a accordingly using np.digitize and this post:
import numpy as np
from scipy import sparse
from timeit import timeit
import math
def h_digitize(a,bs,right=False):
mx,mn = a.max(),a.min()
asz = mx-mn
bsz = bs[-1]-bs[0]
nbins=int(bs.size*math.sqrt(bs.size)*asz/bsz)
bbs = np.concatenate([[0],((nbins-1)*(bs-mn)/asz).astype(int).clip(0,nbins),[nbins]])
bins = np.repeat(np.arange(bs.size+1), np.diff(bbs))
bbs = bbs[:bbs.searchsorted(nbins)]
bins[bbs] = -1
aidx = bins[((nbins-1)*(a-mn)/asz).astype(int)]
ambig = aidx == -1
aa = a[ambig]
if aa.size:
aidx[ambig] = np.digitize(aa,bs,right)
return aidx
def f_pp():
bo = b.argsort()
bs = b[bo]
aidx = h_digitize(a,bs,right=True).ravel()
aux = sparse.csr_matrix((aidx,aidx,np.arange(aidx.size+1)),
(aidx.size,b.size+1)).tocsc()
ridx = np.empty(b.size,int)
ridx[bo] = aux.indices[np.fromiter(map(np.random.randint,aux.indptr[1:-1].tolist()),int,b.size)]
return np.unravel_index(ridx,a.shape)
def f_mp():
a_ind = np.argsort(a, axis=None)
indices = np.searchsorted(a.ravel(), b, sorter=a_ind, side='right')
return np.unravel_index(a_ind[[np.random.randint(i) for i in indices]], a.shape)
a = np.random.rand(267, 173) # dense img matrix
b = np.random.rand(199) # array of probability samples
# round to test wether equality is handled correctly
a = np.round(a,3)
b = np.round(b,3)
print('pp',timeit(f_pp, number=1000),'ms')
print('mp',timeit(f_mp, number=1000),'ms')
# sanity checks
S = np.max([a[f_pp()] for _ in range(1000)],axis=0)
T = np.max([a[f_mp()] for _ in range(1000)],axis=0)
print(f"inequality satisfied: pp {(S<=b).all()} mp {(T<=b).all()}")
print(f"largest smalles distance to boundary: pp {(b-S).max()} mp {(b-T).max()}")
print(f"equality done right: pp {not (b-S).all()} mp {not (b-T).all()}")
Using a tweaked digitize I'm a bit faster but this may vary with problem size. Also, #MadPhysicist's solution is much less convoluted. With standard digitize we are about equal.
pp 2.620121960993856 ms
mp 3.301037881989032 ms
inequality satisfied: pp True mp True
largest smalles distance to boundary: pp 0.0040000000000000036 mp 0.006000000000000005
equality done right: pp True mp True

A slight improvement on #MadPhysicist 's algorithm to make it more vectorized:
%%timeit
a_ind = np.argsort(a, axis=None)
indices = np.searchsorted(a.ravel(), b, sorter=a_ind)
r, c = np.unravel_index(a_ind[[np.random.randint(i) for i in indices]], a.shape)
100 loops, best of 3: 6.32 ms per loop
%%timeit
a_ind = np.argsort(a, axis=None)
indices = np.searchsorted(a.ravel(), b, sorter=a_ind)
r, c = np.unravel_index(a_ind[(np.random.rand(indices.size) * indices).astype(int)], a.shape)
100 loops, best of 3: 4.16 ms per loop
#PaulPanzer 's solution still rules the field, though I'm not sure what it's caching:
%timeit f_pp()
The slowest run took 14.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 1.88 ms per loop

Related

Find Indexes that Maps a Numpy Array to Another

If we have an numpy array a that needs to be sampled with replacement to create a second numpy array b,
import numpy as np
a = np.arange(10, 200*1000)
b = np.random.choice(a, len(a), replace=True)
What is the most efficient way to find an array of indexes named mapping that will transform a to b? It is OK to change np.random.choice to a more suitable function.
The following code is too slow and takes 7-8 seconds on a Macbook Pro to creating the mapping array. With an array size of 1 million, it will take much longer.
mapping = np.array([], dtype=np.int)
for n in b:
m = np.searchsorted(a, n)
mapping = np.append(mapping, m)
Perhaps, run the choice on index of a and slice a using this random index mapping:
mapping = np.random.choice(np.arange(len(a)), len(a), replace=True)
b = a[mapping]

Numpy random choice with probabilities to produce a 2D-array with unique rows

Similar to Numpy random choice to produce a 2D-array with all unique values, I am looking for an efficient way of generating:
n = 1000
k = 10
number_of_combinations = 1000000
p = np.random.rand(n)
p /= np.sum(p)
my_combinations = np.random.choice(n, size=(number_of_combinations, k), replace=False, p=p)
As discussed in the previous question, I want this matrix to have only unique rows. Unfortunately, the provided solutions do not work for the additional extension of using specific probabilities p.
My current solution is as follows:
my_combinations = set()
while len(my_combinations) < number_of_combinations:
new_combination = np.random.choice(n, size=k, replace=False, p=p)
my_combinations.add(frozenset(new_combination))
print(my_combinations)
However, I do think that there should be a more efficient numpy approach to solve this faster.
For these parameter values, the probability of encountering a duplicate row is astronomically small (unless p is very skewed, perhaps to the extent that cannot be accommodated by float precision). I would just use
my_combinations = np.random.choice(n, size=number_of_combinations, k), replace=True, p=p)
You can check for duplicates in O(N log N) where N = number_of_combinations;
Conservatively, you could generate
my_combinations = np.random.choice(n, size=2 * number_of_combinations, k), replace=True, p=p)
then drop duplicates and take the first number_of_combinations rows.

Iterating operation with two arrays using numpy

I'm working with two different arrays (75x4), and I'm applying a shortest distance algorithm between the two arrays.
So I want to:
perform an operation with one row of the first array with every individual row of the second array, iterating to obtain 75 values
find the minimum value, and store that in a new array
repeat this with the second row of the first array, once again iterating the operation for all the rows of the second array, and again storing the minimum difference to the new array
How would I go about doing this with numpy?
Essentially I want to perform an operation between one row of array 1 on every row of array 2, find the minimum value, and store that in a new array. Then do that very same thing for the 2nd row of array 1, and so on for all 75 rows of array 1.
Here is the code for the formula I'm using. What I get here is just the distance between every row of array 1 (training data) and array 2 (testing data). But what I'm looking for is to do it for one row of array 1 iterating down for all rows of array 2, storing the minimum value in a new array, then doing the same for the next row of array 1, and so on.
arr_attributedifference = (arr_trainingdata - arr_testingdata)**2
arr_distance = np.sqrt(arr_attributedifference.sum(axis=1))
Here are two methods one using einsum, the other KDTree:
einsum does essentially what we could also achieve via broadcasting, for example np.einsum('ik,jk', A, B) is roughly equivalent to (A[:, None, :] * B[None, :, :]).sum(axis=2). The advantage of einsum is that it does the summing straight away, so it avoids creating an mxmxn intermediate array.
KDTree is more sophisticated. We have to invest upfront into generating the tree but afterwards querying nearest neighbors is very efficient.
import numpy as np
from scipy.spatial import cKDTree as KDTree
def f_einsum(A, B):
B2AB = np.einsum('ij,ij->i', B, B) / 2 - np.einsum('ik,jk', A, B)
idx = B2AB.argmin(axis=1)
D = A-B[idx]
return np.sqrt(np.einsum('ij,ij->i', D, D)), idx
def f_KDTree(A, B):
T = KDTree(B)
return T.query(A, 1)
m, n = 75, 4
A, B = np.random.randn(2, m, n)
de, ie = f_einsum(A, B)
dt, it = f_KDTree(A, B)
assert np.all(ie == it) and np.allclose(de, dt)
from timeit import timeit
for m, n in [(75, 4), (500, 4)]:
A, B = np.random.randn(2, m, n)
print(m, n)
print('einsum:', timeit("f_einsum(A, B)", globals=globals(), number=1000))
print('KDTree:', timeit("f_KDTree(A, B)", globals=globals(), number=1000))
Sample run:
75 4
einsum: 0.067826496087946
KDTree: 0.12196151306852698
500 4
einsum: 3.1056990439537913
KDTree: 0.85108971898444
We can see that at small problem size the direct method (einsum) is faster while for larger problem size KDTree wins.

How to vectorize 3D Numpy arrays

I have a 3D numpy array like a = np.zeros((100,100, 20)). I want to perform an operation over every x,y position that involves all the elements over the z axis and the result is stored in an array like b = np.zeros((100,100)) on the same corresponding x,y position.
Now i'm doing it using a for loop:
d_n = np.array([...]) # a parameter with the same shape as b
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
### calculate some_value using C
minv = sys.maxint
depth = -1
C = a[x,y,:]
for d in range(len(C)):
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
if e < minv:
minv = e
depth = d
some_value = depth
if depth == -1:
some_value = len(C) - 1
###
b[x,y] = some_value
The problem now is that this operation is much slower than others done the pythonic way, e.g. c = b * b (I actually profiled this function and it's around 2 orders of magnitude slower than others using numpy built in functions and vectorized functions, over a similar number of elements)
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
What is usually done in 3D images is to swap the Z axis to the first index:
>>> a = a.transpose((2,0,1))
>>> a.shape
(20, 100, 100)
And now you can easily iterate over the Z axis:
>>> for slice in a:
do something
The slice here will be each of your 100x100 fractions of your 3D matrix. Additionally, by transpossing allows you to access each of the 2D slices directly by indexing the first axis. For example a[10] will give you the 11th 2D 100x100 slice.
Bonus: If you store the data contiguosly, without transposing (or converting to a contiguous array using a = np.ascontiguousarray(a.transpose((2,0,1))) the access to you 2D slices will be faster since they are mapped contiguosly in memory.
Obviously you want to get rid of the explicit for loop, but I think whether this is possible depends on what calculation you are doing with C. As a simple example,
a = np.zeros((100,100, 20))
a[:,:] = np.linspace(1,20,20) # example data: 1,2,3,.., 20 as "z" for every "x","y"
b = np.sum(a[:,:]**2, axis=2)
will fill the 100 by 100 array b with the sum of the squared "z" values of a, that is 1+4+9+...+400 = 2870.
If your inner calculation is sufficiently complex, and not amenable to vectorization, then your iteration structure is good, and does not contribute significantly to the calculation time
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
...
for d in range(len(C)):
... # complex, not vectorizable calc
...
b[x,y] = some_value
There doesn't appear to be a special structure in the 1st 2 dimensions, so you could just as well think of it as 2D mapping on to 1D, e.g. mapping a (N,20) array onto a (N,) array. That doesn't speed up anything, but may help highlight the essential structure of the problem.
One step is to focus on speeding up that C to some_value calculation. There are functions like cumsum and cumprod that help you do sequential calculations on a vector. cython is also a good tool.
A different approach is to see if you can perform that internal calculation over the N values all at once. In other words, if you must iterate, it is better to do so over the smallest dimension.
In a sense this a non-answer. But without full knowledge of how you get some_value from C and d_n I don't think we can do more.
It looks like e can be calculated for all points at once:
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
E = 2.5 * (d_n[...,None] - np.arange(a.shape[-1]))**2 + a * 0.05 # (100,100,20)
E.min(axis=-1) # smallest value along the last dimension
E.argmin(axis=-1) # index of where that min occurs
On first glance it looks like this E.argmin is the b value that you want (tweaked for some boundary conditions if needed).
I don't have realistic a and d_n arrays, but with simple test ones, this E.argmin(-1) matches your b, with a 66x speedup.
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
Many functions in Numpy are "reduction" functions*, for example sum, any, std, etc. If you supply an axis argument other than None to such a function it will reduce the dimension of the array over that axis. For your code you can use the argmin function, if you first calculate e in a vectorized way:
d = np.arange(a.shape[2])
e = 2.5 * (d_n[...,None] - d)**2 + a*0.05
b = np.argmin(e, axis=2)
The indexing with [...,None] is used to engage broadcasting. The values in e are floating point values, so it's a bit strange to compare to sys.maxint but there you go:
I, J = np.indices(b.shape)
b[e[I,J,b] >= sys.maxint] = a.shape[2] - 1
* Strickly speaking a reduction function is of the form reduce(operator, sequence) so technically not std and argmin

NumPy PolyFit and PolyVal in Multiple Dimensions?

Assume an n-dimensional array of observations that are reshaped to be a 2d-array with each row being one observation set. Using this reshape approach, np.polyfit can compute 2nd order fit coefficients for the entire ndarray (vectorized):
fit = np.polynomial.polynomialpolyfit(X, Y, 2)
where Y is shape (304000, 21) and X is a vector. This results in a (304000,3) array of coefficients, fit.
Using an iterator it is possible to call np.polyval(fit, X) for each row. This is inefficient when a vectorized approach may exist. Could the fit result be applied to the entire observation array without iterating? If so, how?
This is along the lines of this SO question.
np.polynomial.polynomial.polyval takes multidimensional coefficient arrays:
>>> x = np.random.rand(100)
>>> y = np.random.rand(100, 25)
>>> fit = np.polynomial.polynomial.polyfit(x, y, 2)
>>> fit.shape # 25 columns of 3 polynomial coefficients
(3L, 25L)
>>> xx = np.random.rand(50)
>>> interpol = np.polynomial.polynomial.polyval(xx, fit)
>>> interpol.shape # 25 rows, each with 50 evaluations of the polynomial
(25L, 50L)
And of course:
>>> np.all([np.allclose(np.polynomial.polynomial.polyval(xx, fit[:, j]),
... interpol[j]) for j in range(25)])
True
np.polynomial.polynomial.polyval is a perfectly fine (and convenient) approach to efficient evaluation of polynomial fittings.
However, if 'speediest' is what you are looking for, simply constructing the polynomial inputs and using the rudimentary numpy matrix multiplication functions results in slightly faster ( roughly 4x faster) computational speeds.
Setup
Using the same setup as above, we'll create 25 different line fittings.
>>> num_samples = 100000
>>> num_lines = 100
>>> x = np.random.randint(0,100,num_samples)
>>> y = np.random.randint(0,100,(num_samples, num_lines))
>>> fit = np.polyfit(x,y,deg=2)
>>> xx = np.random.randint(0,100,num_samples*10)
Numpy's polyval Function
res1 = np.polynomial.polynomial.polyval(xx, fit)
Basic Matrix Multiplication
inputs = np.array([np.power(xx,d) for d in range(len(fit))])
res2 = fit.T.dot(inputs)
Timing the functions
Using the same parameters above...
%timeit _ = np.polynomial.polynomial.polyval(xx, fit)
1 loop, best of 3: 247 ms per loop
%timeit inputs = np.array([np.power(xx, d) for d in range(len(fit))]);_ = fit.T.dot(inputs)
10 loops, best of 3: 72.8 ms per loop
To beat a dead horse...
mean Efficiency bump of ~3.61x faster. Speed fluctuations probably come from random computer processes in background.

Categories

Resources