Given two matrices X1 (N,3136) and X2 (M,3136) (where every element in every row is an binary number) i am trying to calculate hamming distance so that each element in X1 is compared to all of the rows from X2, such that result matrix is (N,M).
I have written two function for it (first one with help of numpy and the other one without numpy):
def hamming_distance(X, X_train):
array = np.array([np.sum(np.logical_xor(x, X_train), axis=1) for x in X])
return array
def hamming_distance2(X, X_train):
a = len(X[:,0])
b = len(X_train[:,0])
hamming_distance = np.zeros(shape=(a, b))
for i in range(0, a):
for j in range(0, b):
hamming_distance[i,j] = np.count_nonzero(X[i,:] != X_train[j,:])
return hamming_distance
My problem is that upper function is much slower than lower one where I use two for loops. Is it possible to improve on first function so that I use only one loop?
PS. Sorry for my english, it isn't my first language, although I was trying to do my best!
Numpy only makes your code much faster if you use it to vectorize your work. In your case you can make use of array broadcasting to vectorize your problem: compare your two arrays and create an auxiliary array of shape (N,M,K) which you can sum along its third dimension:
hamming_distance = (X[:,None,:] != X_train).sum(axis=-1)
We inject a singleton dimension into the first array to make it of shape (N,1,K), the second array is implicitly compatible with shape (1,M,K), so the operation can be performed.
In the comments #ayhan noted that this will create a huge auxiliary array for large M and N, which is quite true. This is the price of vectorization: you gain CPU time at the cost of memory. If you have enough memory for the above to work, it will be very fast. If you don't, you have to reduce the scope of your vectorization, and loop in either M or N (or both; this would be your current approach). But this doesn't concern numpy itself, this is about striking a balance between available resources and performance.
What you are doing is very similar to dot product. Consider these two binary arrays:
1 0 1 0 1 1 0 0
0 0 1 1 0 1 0 1
We are trying to find the number of different pairs. If you directly take the dot product, it gives you the number of (1, 1) pairs. However, if you negate one of them, it will count the different ones. For example, a1.dot(1-a2) counts (1, 0) pairs. Since we also need the number of (0, 1) pairs, we will add a2.dot(1-a1) to that. The good thing about dot product is that it is pretty fast. However, you will need to convert your arrays to floats first, as Divakar pointed out.
Here's a demo:
prng = np.random.RandomState(0)
arr1 = prng.binomial(1, 0.3, (1000, 3136))
arr2 = prng.binomial(1, 0.3, (2000, 3136))
res1 = hamming_distance2(arr1, arr2)
arr1 = arr1.astype('float32'); arr2 = arr2.astype('float32')
res2 = (1-arr1).dot(arr2.T) + arr1.dot(1-arr2.T)
np.allclose(res1, res2)
Out: True
And timings:
%timeit hamming_distance(arr1, arr2)
1 loop, best of 3: 13.9 s per loop
%timeit hamming_distance2(arr1, arr2)
1 loop, best of 3: 5.01 s per loop
%timeit (1-arr1).dot(arr2.T) + arr1.dot(1-arr2.T)
10 loops, best of 3: 93.1 ms per loop
Related
I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.
My current approach with numpy:
import numpy as np
refArray = np.random.random(16);
myArray = np.random.random(1000);
def find_nearest(array, value):
idx = (np.abs(array-value)).argmin()
return idx;
for value in np.nditer(myArray):
index = find_nearest(refArray, value);
print(index);
Unfortunately, this takes ages for a large amount of values.
Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?
FYI: I don't necessarily need numpy in my script.
Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.
Here's one vectorized approach with np.searchsorted based on this post -
def closest_argmin(A, B):
L = B.size
sidx_B = B.argsort()
sorted_B = B[sidx_B]
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sidx_B[sorted_idx-mask]
Brief explanation :
Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.
Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.
Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.
Benchmarking
Original approach -
def org_app(myArray, refArray):
out1 = np.empty(myArray.size, dtype=int)
for i, value in enumerate(myArray):
# find_nearest from posted question
index = find_nearest(refArray, value)
out1[i] = index
return out1
Timings and verification -
In [188]: refArray = np.random.random(16)
...: myArray = np.random.random(1000)
...:
In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop
In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop
In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True
50x+ speedup for the posted sample and hopefully more for larger datasets!
An answer that is much shorter than that of #Divakar, also using broadcasting and even slightly faster:
abs(myArray[:, None] - refArray[None, :]).argmin(axis=-1)
I have a numpy array embed_vec of length tot_vec in which each entry is a 3d vector:
[[ 0.52483319 0.78015841 0.71117216]
[ 0.53041481 0.79462171 0.67234534]
[ 0.53645428 0.80896727 0.63119403]
...,
[ 0.72283509 0.40070804 0.15220522]
[ 0.71277758 0.38498613 0.16141834]
[ 0.70221445 0.36918032 0.17370776]]
For each of the elements in this array, I want to find out the number of other entries which are "close" to that entry. By close, I mean that the distance between two vectors is less than a specified value R. For this, I must compare all the possible pairs in this array with each other and then find out the number of close vectors for each of the vectors in the array. So I am doing this:
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
However, this is extremely inefficient because I have two nested python loops and for larger array sizes, this takes forever. If this were in C++ or Fortran, it wouldn't have been a great issue. My question is, can one achieve the same thing using numpy efficiently using some vectorization method? As a side note, I don't mind a solution using Pandas also.
Approach #1 : Vectorized approach -
def vectorized_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
r,c = np.triu_indices(tot_vec,1)
subs = embed_vec[r] - embed_vec[c]
dists = np.einsum('ij,ij->i',subs,subs)
return np.bincount(r,dists<R**2,minlength=tot_vec)
Approach #2 : With less loop complexity (for very large arrays) -
def loopy_less_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
Rsq = R**2
out = np.zeros(tot_vec,dtype=int)
for i in range(tot_vec):
subs = embed_vec[i] - embed_vec[i+1:tot_vec]
dists = np.einsum('ij,ij->i',subs,subs)
out[i] = np.count_nonzero(dists < Rsq)
return out
Benchmarking
Original approach -
def loopy_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
return p
Timings -
In [76]: # Sample random array
...: embed_vec = np.random.rand(3000,3)
...: R = 0.5
...:
In [77]: %timeit loopy_app(embed_vec, R)
1 loops, best of 3: 50.5 s per loop
In [78]: %timeit loopy_less_app(embed_vec, R)
10 loops, best of 3: 143 ms per loop
350x+ speedup there!
Going with much bigger array with the proposed loopy_less_app -
In [81]: # Sample random array
...: embed_vec = np.random.rand(20000,3)
...: R = 0.5
...:
In [82]: %timeit loopy_less_app(embed_vec, R)
1 loops, best of 3: 4.47 s per loop
I am intrigued by that question and attempted to solve it efficintly using scipy's cKDTree. However, this approach may run out of memory because internally a list of all pairs with distance <= R is maintained. If your R and tot_vec are small enough it will work:
import numpy as np
from scipy.spatial import cKDTree as KDTree
tot_vec = 60000
embed_vec = np.random.randn(tot_vec, 3)
R = 0.1
tree = KDTree(embed_vec, leafsize=100)
p = np.zeros(tot_vec)
for pair in tree.query_pairs(R):
p[pair[0]] += 1
p[pair[1]] += 1
In case memory is an issue, with some effort it is possible to rewrite query_pairs as a generator function in Python at the cost of C performance.
first broadcast the difference:
disp_vecs=tot_vec[:,None,:]-tot_vec[None,:,:]
Now, depending on how big your dataset is, you may want to do a fist pass without all the math. If the distance is less than r, all the components should be less than r
first_mask=np.max(disp_vec, axis=-1)<r
Then do the actual calculation
disps=np.linlg.norm(disp_vec[first_mask],axis=-1)
second_mask=disps<r
Now reassign
disps=disps[second_mask]
first_mask[first_mask]=second_mask
disps are now the good values, and first_mask is a boolean mask of where they go. You can process from there.
According to http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html, if x and y are given and input arrays are 1-D, where is equivalent to [xv if c else yv for (c,xv, yv) in zip(x!=0, 1/x, x)]. When doing runtime benchmarks, however, they have significantly different speeds:
x = np.array(range(-500, 500))
%timeit np.where(x != 0, 1/x, x)
10000 loops, best of 3: 23.9 µs per loop
%timeit [xv if c else yv for (c,xv, yv) in zip(x!=0, 1/x, x)]
1000 loops, best of 3: 232 µs per loop
Is there a way I can rewrite the second form so that it has a similar runtime to the first? The reason I ask is because I'd like to use a slightly modified version of the second case to avoid division by zero errors:
[1 / xv if c else xv for (c,xv) in zip(x!=0, x)]
Another question: the first case returns a numpy array while the second case returns a list. Is the most efficient way to have the second case return an array is to first make a list and then convert the list to an array?
np.array([xv if c else yv for (c,xv, yv) in zip(x!=0, 1/x, x)])
Thanks!
You just asked about 'delaying' the 'where':
numpy.where : how to delay evaluating parameters?
and someone else just asked about divide by zero:
Replace all elements of a matrix by their inverses
When people say that where is similar to the list comprehension, they attempt to describe the action, not the actual implementation.
np.where called with just one argument is the same as np.nonzero. This quickly (in compiled code) loops through the argument, and collects the indices of all non-zero values.
np.where when called with 3 arguments, returns a new array, collecting values from the 2 and 3rd arguments based on the nonzero values. But it's important to realize that those arguments must be other arrays. They are not functions that it evaluates element by element.
So the where is more like:
m1 = 1/xv
m2 = xv
[v1 if c else v2 for (c, v1, v2) in zip(x!=0, m1, m2)]
It's easy to run this iteration in compiled code because it just involves 3 arrays of matching size (matching via broadcasting).
np.array([...]) is a reasonable way of converting a list (or list comprehension) into an array. It may be a little slower than some alternatives because np.array is a powerful general purpose function. np.fromiter([], dtype) may be faster in some cases, because it isn't as general (you have to specify dtype, and it it only works with 1d).
There are 2 time proven strategies for getting more speed in element-by-element calculations:
use packages like numba and cython to rewrite the problem as c code
rework your calculations to use existing numpy methods. The use of masking to avoid divide by zero is a good example of this.
=====================
np.ma.where, the version for masked arrays is written in Python. Its code might be instructive. Note in particular this piece:
# Construct an empty array and fill it
d = np.empty(fc.shape, dtype=ndtype).view(MaskedArray)
np.copyto(d._data, xv.astype(ndtype), where=fc)
np.copyto(d._data, yv.astype(ndtype), where=notfc)
It makes a target, and then selectively copies values from the 2 inputs arrays, based on the condition array.
You can avoid division by zero while maintaining performance by using advanced indexing:
x = np.arange(-500, 500)
result = np.empty(x.shape, dtype=float) # set the dtype to whatever is appropriate
nonzero = x != 0
result[nonzero] = 1/x[nonzero]
result[~nonzero] = 0
If you for some reason want to bypass an error with numpy it might be worth looking into the errstate context:
x = np.array(range(-500, 500))
with np.errstate(divide='ignore'): #ignore zero-division error
x = 1/x
x[x!=x] = 0 #convert inf and NaN's to 0
Consider changing the array in place by using np.put():
In [56]: x = np.linspace(-1, 1, 5)
In [57]: x
Out[57]: array([-1. , -0.5, 0. , 0.5, 1. ])
In [58]: indices = np.argwhere(x != 0)
In [59]: indices
Out[59]:
array([[0],
[1],
[3],
[4]], dtype=int64)
In [60]: np.put(x, indices, 1/x[indices])
In [61]: x
Out[61]: array([-1., -2., 0., 2., 1.])
The approach above does not create a new array, which could be very convenient if x is a large array.
Question:
How to quickly add 0's to a large array (~ 600 000 entries) at the beginning to bring the length of the array to the next power of two. (2^n) Is there a faster solution besides np.concatinate()?
What I've already tried:
Using the np.concatenate(0, arr) function until the length of the array is equal to the next power of two. The code I have works, it just takes a very very long time.
Here's the pad left function:
def PadLeft(arr):
nextPower = NextPowerOfTwo(len(arr))
deficit = int(math.pow(2, nextPower) - len(arr))
#for x in range(1, int(deficit)):
for x in range(0, deficit):
arr = np.concatenate(([0], arr))
return arr
Here's the next power of two function:
def NextPowerOfTwo(number):
# Returns next power of two following 'number'
return math.ceil(math.log(number,2))
My implementation:
arr = np.ones(())
a = PadLeft(arr)
Thanks!
Rather than extending the old array in a for loop with a single element, why not add the entire set of zeroes at once?
arr = np.concatenate((np.zeros(deficit, dtype=arr.dtype), arr))
So don't use the for-loop. That's where your code is running slowly, as it is making a new array every iteration, which is far less efficient than making the required size array once and then filling it as needed, which can be done in several ways. This is just one, one that's close to your own solution.
The reason dtype=arr.dtype is added, is because np.zeros will return an array that is of the np.float dtype by default. If the datatype of arr was "less" than that (in a casting sense), the result will be cast to the "broader" datatype, being float, which is usually not what you would want (because it happens automatically).
This valid point was made by Divakar in the comments below.
There is numpy.pad which does exactly that.
For a 1D array:
arr = np.pad(arr, (deficit,0), mode='constant')
It reads as (left, right) padding.
For a 2D arrray:
arr = np.pad(arr, ((0,0), (deficit,0)), mode='constant')
The second parameter reads as ((top, bottom), (left, right)). Which pads the array with deficit to the left.
Making use of NumPy entirely, here's an approach with initialization -
def NextPowerOfTwo(number):
# Returns next power of two following 'number'
return np.ceil(np.log2(number))
def PadLeft_with_initialization(arr):
nextPower = NextPowerOfTwo(len(arr))
deficit = int(np.power(2, nextPower) - len(arr))
out = np.zeros(deficit+len(arr),dtype=arr.dtype)
out[deficit:] = arr
return out
Runtime test
Let's time the proposed solution in this post and np.concatenate based one as listed in Oliver W.'s solution :
def PadLeft_with_concatente(arr): # Oliver W.'s solution
nextPower = NextPowerOfTwo(len(arr))
deficit = int(np.power(2, nextPower) - len(arr))
return np.concatenate((np.zeros(deficit,dtype=arr.dtype), arr))
Timings -
In [226]: arr = np.random.randint(0,9,(600000))
In [227]: %timeit PadLeft_with_concatente(arr)
100 loops, best of 3: 5.21 ms per loop
In [228]: %timeit PadLeft_with_initialization(arr)
100 loops, best of 3: 6.75 ms per loop
Being cleaner and faster, I think Oliver W.'s solution with np.concatenate would be the way to go.
I would like to speed up this code :
import numpy as np
import pandas as pd
a = pd.read_csv(path)
closep = a['Clsprc']
delta = np.array(closep.diff())
upgain = np.where(delta >= 0, delta, 0)
downloss = np.where(delta <= 0, -delta, 0)
up = sum(upgain[0:14]) / 14
down = sum(downloss[0:14]) / 14
u = []
d = []
for x in np.nditer(upgain[14:]):
u1 = 13 * up + x
u.append(u1)
up = u1
for y in np.nditer(downloss[14:]):
d1 = 13 * down + y
d.append(d1)
down = d1
The data below:
0 49.00
1 48.76
2 48.52
3 48.28
...
36785758 13.88
36785759 14.65
36785760 13.19
Name: Clsprc, Length: 36785759, dtype: float64
The for loop is too slow, what can I do to speed up this code? Can I vectorize the entire operation?
It looks like you're trying to calculate an exponential moving average (rolling mean), but forgot the division. If that's the case then you may want to see this SO question. Meanwhile, here's a fast a simple moving average using the cumsum() function taken from the referenced link.
def moving_average(a, n=14) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
If this is not the case, and you really want the function described, you can increase the iteration speed by getting using the external_loop flag in your iteration. From the numpy documentation:
The nditer will try to provide chunks that are as large as possible to
the inner loop. By forcing ‘C’ and ‘F’ order, we get different
external loop sizes. This mode is enabled by specifying an iterator
flag.
Observe that with the default of keeping native memory order, the
iterator is able to provide a single one-dimensional chunk, whereas
when forcing Fortran order, it has to provide three chunks of two
elements each.
for x in np.nditer(upgain[14:], flags=['external_loop'], order='F'):
# x now has x[0],x[1], x[2], x[3], x[4], x[5] elements.
In simplified terms, I think this is what the loops are doing:
upgain=np.array([.1,.2,.3,.4])
u=[]
up=1
for x in upgain:
u1=10*up+x
u.append(u1)
up=u1
producing:
[10.1, 101.2, 1012.3, 10123.4]
np.cumprod([10,10,10,10]) is there, plus a modified cumsum for the [.1,.2,.3,.4] terms. But I can't off hand think of a way of combining these with compiled numpy functions. We could write a custom ufunc, and use its accumulate. Or we could write it in cython (or other c interface).
https://stackoverflow.com/a/27912352 suggests that frompyfunc is a way of writing a generalized accumulate. I don't expect big time savings, maybe 2x.
To use frompyfunc, define:
def foo(x,y):return 10*x+y
The loop application (above) would be
def loopfoo(upgain,u,u1):
for x in upgain:
u1=foo(u1,x)
u.append(u1)
return u
The 'vectorized' version would be:
vfoo=np.frompyfunc(foo,2,1) # 2 in arg, 1 out
vfoo.accumulate(upgain,dtype=object).astype(float)
The dtype=object requirement was noted in the prior SO, and https://github.com/numpy/numpy/issues/4155
In [1195]: loopfoo([1,.1,.2,.3,.4],[],0)
Out[1195]: [1, 10.1, 101.2, 1012.3, 10123.4]
In [1196]: vfoo.accumulate([1,.1,.2,.3,.4],dtype=object)
Out[1196]: array([1.0, 10.1, 101.2, 1012.3, 10123.4], dtype=object)
For this small list, loopfoo is faster (3µs v 21µs)
For a 100 element array, e.g. biggain=np.linspace(.1,1,100), the vfoo.accumulate is faster:
In [1199]: timeit loopfoo(biggain,[],0)
1000 loops, best of 3: 281 µs per loop
In [1200]: timeit vfoo.accumulate(biggain,dtype=object)
10000 loops, best of 3: 57.4 µs per loop
For an even larger biggain=np.linspace(.001,.01,1000) (smaller number to avoid overflow), the 5x speed ratio remains.