To be clear, below is what I am trying to do. And the question is, how can I change the function oper_AB() so that instead of the nested for loop, I utilize the vectorization/broadcasting in numpy and get to the ret_list much faster?
def oper(a_1D, b_1D):
return np.dot(a_1D, b_1D) / np.dot(b_1D, b_1D)
def oper_AB(A_2D, B_2D):
ret_list = []
for a_1D in A_2D:
for b_1D in B_2D:
ret_list.append(oper(a_1D, b_1D))
return ret_list
Strictly addressing the question (with the reservation that I suspect the OP wants the norm, not the norm squared, as divisor below):
r = a # b.T / np.linalg.norm(b, axis=1)**2
Example:
np.random.seed(0)
a = np.random.randint(0, 10, size=(2,2))
b = np.random.randint(0, 10, size=(2,2))
Then:
>>> a
array([[5, 0],
[3, 3]])
>>> b
array([[7, 9],
[3, 5]])
>>> oper_AB(a, b)
[0.2692307692307692,
0.4411764705882353,
0.36923076923076925,
0.7058823529411765]
>>> a # b.T / np.linalg.norm(b, axis=1)**2
array([[0.26923077, 0.44117647],
[0.36923077, 0.70588235]])
>>> np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2)
array([0.26923077, 0.44117647, 0.36923077, 0.70588235])
Speed:
n, m = 1000, 100
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
orig = %timeit -o oper_AB(a, b)
# 2.73 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new = %timeit -o np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2)
# 2.22 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
orig.average / new.average
# 1228.78 (speedup)
Our solution is 1200x faster than the original.
Correctness:
>>> np.allclose(np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2), oper_AB(a, b))
True
Speed on large array, comparison to #Ahmed AEK's solution:
n, m = 2000, 2000
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
new = %timeit -o np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2)
# 86.5 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
other = %timeit -o AEK(a, b) # Ahmed AEK's answer
# 102 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Our solution is 15% faster :-)
this should work.
result = (np.matmul(A_2D, B_2D.transpose())/np.sum(B_2D*B_2D,axis=1)).flatten()
but this second implementation will be faster because of cache utilization.
def oper_AB(A_2D, B_2D):
b_squared = np.sum(B_2D*B_2D,axis=1).reshape([-1,1])
b_normalized = B_2D/b_squared
del b_squared
returned_val = np.matmul(A_2D,b_normalized.transpose())
return returned_val.flatten()
the del is there just if the memory allocated by B_2D is too big, (or it's just me being used to working with multiple GB arrays)
Edit: as requested for A_1D - B_1D
def oper2_AB(A_2D, B_2D):
output = np.zeros([A_2D.shape[0]*B_2D.shape[0],A_2D.shape[1]],dtype=A_2D.dtype)
for i in range(len(A_2D)):
output[i*len(B_2D):(i+1)*len(B_2D)] = A_2D[i]-B_2D
return output
I have the following 2 arrays:
X = array([37., 42., 31., 27., 37.])
Y = array([52., 57., 62., 68., 69.])
I could alternatively combine them as follows with this:
XY = np.array((X, Y)).T
which produces
([[37., 52.],
[42., 57.],
[31., 62.],
[27., 68.],
[37., 69.]])
I want to compute the distances between all pairs of points
E.g. I want to do this:
(
np.linalg.norm(np.array([37, 52]) - np.array([42, 57]))
+ np.linalg.norm(np.array([42, 57]) - np.array([31, 62]))
+ np.linalg.norm(np.array([31, 62]) - np.array([27, 68]))
+ np.linalg.norm(np.array([27, 68]) - np.array([37, 69]))
+ np.linalg.norm(np.array([37, 69]) - np.array([37, 52]))
)
which then produces 53.41509195750892
I have written a function that does so:
def distance(X, Y):
N = len(X)
T = 0
oldx, oldy = X[-1], Y[-1]
for x, y in zip(X, Y):
T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
oldx = x
oldy = y
return T
print(distance(X, Y))
produces also 53.41509195750891
I'm interested in knowing if there is a more elegant/efficient way to do this with numpy array operations.
EDIT: I'm sorry, the original example function I gave was wrong, now it should be correct
EDIT: Thanks everyone for the answers! Here is a benchmark with my array of size 50, it seems that Dani's answer is the fastest, even though Akshay's answer was faster for the size 5 array.
def distance_charel(X, Y):
N = len(X)
T = 0
oldx, oldy = X[-1], Y[-1]
for x, y in zip(X, Y):
T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
oldx = x
oldy = y
return T
def distance_dani(X, Y):
XY = np.array((X, Y)).T
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
ss = np.power(diff, 2).sum(axis=1)
res = np.sqrt(ss).sum()
return res
def distance_akshay(X, Y):
XY = np.array((X, Y)).T
pairwise = pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
return total
def distance_gwang(X, Y):
XY = np.array((X, Y)).T
return sum([sum((p1 - p2) ** 2) ** .5 for p1, p2 in zip(XY, XY[1:])])
def distane_andy(X, Y):
arr = np.array((X, Y)).T
return np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
then
print(distance_charel(X, Y))
print(distance_dani(X, Y))
print(distance_akshay(X, Y))
print(distance_gwang(X, Y)) # I think it misses the distance between last and first element
print(distane_andy(X, Y))
%timeit distance_charel(X, Y)
%timeit distance_dani(X, Y)
%timeit distance_akshay(X, Y)
%timeit distance_gwang(X, Y)
%timeit distane_andy(X, Y)
outputs
2586.769647563161
2586.76964756316
2586.7696475631597
2568.8811037431624
2586.7696475631597
2.49 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.9 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
385 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.2 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
EDIT: I accepted an answer Dani's answer now since I find his code the best (uses numpy vector operations, fairly readable, and fastest (by small margin)) for my situation. Thanks to all of you for answering!
EDIT: I updated the benchmark, using 280 coordinates
You can do it completely vectorized one-liner with ANY loops as the following with broadcasting -
First, (5,1,2) broadcasted with (1,5,2) -> (5,5,2)
Subtract with this broadcast to get (5,5,2)
Then square each element in the (5,5,2)
Sum over the last axis to get (5,5)
Finally, square root!
Next, you can just take the shifted diagonal array which holds the distance between (1,2), (2,3) .... Sum that and since you want to add the distance back to first, add it to value of [0,-1]
#This get all pairwise distances calculated with broadcasting
pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
#This takes sum of the first diagonal elements instead of 0th
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
print(total)
53.41509195750892
Another way you can do this is the following, but the above approach will still be faster -
np.sum(np.sqrt(np.sum(np.square(np.diff(np.vstack([XY,XY[0]]), axis=0)), axis=-1)))
#The np.vstack adds the first coordinate into the array so that you can
#calculate the distance from the last to the first again
Benchmarks -
Akshay Sehgal - 19.9 µs ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Gwang - 21.5 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ombk - 60.4 µs ± 5.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Dani Mesejo - 16.4 µs ± 6.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Andy L - 17.6 µs ± 3.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As expected, numpy vectorization always rules the day! Gj Dani!
You could compute the formula by hand in a vectorized fashion by using diff, power and sqrt:
import numpy as np
# setup
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T
# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)
# find the square root and sum
res = np.sqrt(ss).sum()
print(res)
Output
53.41509195750891
The first step:
# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
computes x1 - y1 and x2 - y2, the second step:
# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)
raises those values to the power of two, i.e (x1 - y1)^2, and also sum, finally:
# find the square root and sum
res = np.sqrt(ss).sum()
As it says find the square root.
To understand it better let's look at a smaller example:
# setup
X = np.array([37., 42.])
Y = np.array([52., 57])
XY = np.array((X, Y)).T
diff = np.diff(XY, axis=0)
# [[5. 5.]] (42 - 37) (57 - 52)
ss = np.power(diff, 2).sum(axis=1)
# [50.] 5^2 + 5^2
res = np.sqrt(ss).sum()
# 7.0710678118654755
You may use np.roll together with np.linalg.norm and sum
#arr = np.stack([X,Y], axis=1)
arr = np.array((X, Y)).T #as suggested in the comment
Out[50]:
array([[37., 52.],
[42., 57.],
[31., 62.],
[27., 68.],
[37., 69.]])
In [52]: np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
Out[52]: 53.41509195750892
#preparation:
x = np.array([37., 42., 31., 27., 37.])
y = np.array([52., 57., 62.,68.,69.])
xy = np.array((x, y)).T
def euclidean_distance(p1, p2):
return sum((p1 - p2) ** 2) ** .5
You can do it more elegantly using functional programming.
Here, you want to reduce over the list of pairs of successive elements in xy:
from functools import reduce
from operator import add
reduce(add, [euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
## 36.41509195750892
reduce over a list [1, 2, 3, 4, ..., k]
by applying a diadic function func(a, b) makes this:
func( ... func(func(func(func(1, 2), 3), 4), 5) ..., k).
#DaniMesejo pointed out reduce(add, lst) is just sum(lst).
So it is even much simpler:
sum([euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
The best trick here is actually the zip(xy, xy[1:])
which creates from a list [1, 2, 3, 4, ..., k]
the pairs: [(1, 2), (2, 3), (3, 4), ... (k-1, k)]
from scipy.spatial.distance import euclidean
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T
sum1 = euclidean(XY[0],XY[-1])
for i in range(len(XY)-1):
sum1 += euclidean(XY[i],XY[i+1])
This should do it, start with your sum with the hardest term. Then iterate on the easier ones. Add them all together.
as a check euclidean(XY[0],XY[1]) = 7.0710678118654755 same value that you provided.
In [2]: df = pd.DataFrame([[37., 42., 31., 27., 37.],
...: [52., 57., 62., 68., 69.]]).T.rename(columns={0:"X", 1:"y"})
...: df
Out[2]:
X y
0 37.0 52.0
1 42.0 57.0
2 31.0 62.0
3 27.0 68.0
4 37.0 69.0
In [3]: from scipy.spatial.distance import euclidean
...: np.sum([euclidean(df.iloc[i], df.iloc[i+1]) for i in range(len(df)-1)])
Out[3]: 36.41509195750892
I am creating this array for my shader and this step is very slow as it constitutes a nested for loop. Currently this methos takes approx 1 sec to create this. Can anyone suggest any faster method for creating this array.
import numpy as np
elems = []
b = 23503
a = 24
for i in range(0, a - 1):
for j in range(0, b - 1):
elems += [j + b * i, j + b * i + 1, j + b * (i + 1)]
elems += [j + b * (i + 1), j + b * (i + 1) + 1, j + b * i + 1]
elems = np.array(elems, dtype=np.int32)
First I would recognise that there is a lot of repeated computation. The base term involving the iterator variables here is i*b+j, so let's have NumPy create an array that contains those values in the order they should appear:
ib_j = (np.arange(a-1)[:, None]*b + np.arange(b-1)).flatten()
Next we compute the six different columns from this base, stack them horizontally, and flatten:
def create_shader_array(a, b):
ib_j = (np.arange(a-1)[:, None]*b + np.arange(b-1)).flatten()
return np.column_stack((ib_j, ib_j+1, ib_j+b, ib_j+b, ib_j+b+1, ib_j+1)).flatten()
Validation:
>>> all(create_shader_array(a, b) == AKS(a, b)) # AKS is your original implementation
True
Timing:
>>> %timeit AKS(24, 23503)
1.02 s ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit create_shader_array(24, 23503)
28.8 ms ± 364 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
You can use meshgrid to cover the i and j iterations and then to an outer add to get the inner shading. Using ravel in the end to get a 1D array.
inner = np.array([0, 1, b, b, b+1, 1], dtype="int32")
j, i = np.meshgrid(np.arange(b-1), np.arange(a-1))
elems = np.add.outer((j+b*i), inner).ravel()
or with a one-liner:
elems = ([0, 1, b, b, b+1, 1]+np.arange(b-1)[:, None]+b*np.arange(a-1)[:,None, None]).ravel()
Finishes in <6ms on my computer
In [9]: %timeit ([0, 1, b, b, b+1, 1]+np.arange(b-1)[:,None]+b*np.arange(a-1)[:
...: ,None, None]).ravel()
5.23 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %timeit create_shader_array(a, b)
29.8 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
I have the following np.ndarray:
>>> arr
array([[1, 2],
[3, 4]])
I would like to split it to X and y where each array is coordinates and values, respectively. So far I managed to solve this using np.ndenumerate:
>>> X, y = zip(*np.ndenumerate(arr))
>>> X
((0, 0), (0, 1), (1, 0), (1, 1))
>>> y
(1, 2, 3, 4)
I'm wondering if there's a more idiomatic and faster way to achieve it, since the arrays I'm actually dealing with have millions of values.
I need the X and y array to pass them to a sklearn classifier later. The formats above seemed the most natural for me, but perhaps there's a better way I can pass them to the fit function.
Reshaping arr to y is easy, you can achieve it by y = arr.flatten(). I suggest treating generating X as a separate task.
Let's assume that your dataset is of shape NxM. In our benchmark we set N to 500 and M to 1000.
N = 500
M = 1000
arr = np.random.randn(N, M)
Then by using np.mgrid and transforming indices you can get the result as:
np.mgrid[:N, :M].transpose(1, 2, 0).reshape(-1, 2)
Benchmarks:
%timeit np.mgrid[:N, :M].transpose(1, 2, 0).reshape(-1, 2)
# 3.11 ms ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zip(*np.ndenumerate(arr))
# 235 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In your case you can unpack and get N and M by:
N, M = arr.shape
and then:
X = np.mgrid[:N, :M].transpose(1, 2, 0).reshape(-1, 2)
Use numpy.where with numpy.ravel():
import numpy as np
def ndenumerate(np_array):
return list(zip(*np.where(np_array+1))), np_array.ravel()
arr = np.random.randint(0, 100, (1000,1000))
X_new, y_new = ndenumerate(arr)
X,y = zip(*np.ndenumerate(arr))
Output (validation):
all(i1 == i2 for i1, i2 in zip(X, X_new))
# True
all(y == y_new)
# True
Benchmark (about 3x faster):
%timeit ndenumerate(arr)
# 234 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit zip(*np.ndenumerate(arr))
# 877 ms ± 91.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I am trying to compute the number of occurrences of pairs of values. When running the following code the numpy version (pairs_frequency2) is more than 50% slower than the version relying on collections.Counter (it gets worse when the number of points increases). Could someone please explain the reason why.
Is there a possible numpy rewrite to achieve better performance ?
Thanks in advance.
import numpy as np
from collections import Counter
def pairs_frequency(x, y):
counts = Counter(zip(x, y))
res = np.array([[f, a, b] for ((a, b), f) in counts.items()])
return res[:, 0], res[:, 1], res[:, 2]
def pairs_frequency2(x, y):
unique, counts = np.unique(np.column_stack((x,y)), axis=0, return_counts=True)
return counts, unique[:,0], unique[:,1]
x = np.random.randint(low=1, high=11, size=50000)
y = x + np.random.randint(1, 5, size=x.size)
%timeit pairs_frequency(x, y)
%timeit pairs_frequency2(x, y)
numpy.unique sorts its argument, so its time complexity is O(n*log(n)). It looks like the Counter class could be O(n).
If the values in your arrays are nonnegative integers that are not too big, this version is pretty fast:
def pairs_frequency3(x, y, maxval=15):
z = maxval*x + y
counts = np.bincount(z)
pos = counts.nonzero()[0]
ux, uy = np.divmod(pos, maxval)
return counts[pos], ux, uy
Set maxval to the 1 plus the maximum value in x and y. (You could remove the argument, and add code to find the maximum in the function.)
Timing (x and y were generated as in the question):
In [13]: %timeit pairs_frequency(x, y)
13.8 ms ± 77.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [14]: %timeit pairs_frequency2(x, y)
32.9 ms ± 631 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [15]: %timeit pairs_frequency3(x, y)
129 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note the change in time units of the third result.
pairs_frequency3 returns the arrays in the same order as pairs_frequency2, so it is easy to verify that they return the same values:
In [26]: counts2, x2, y2 = pairs_frequency2(x, y)
In [27]: counts3, x3, y3 = pairs_frequency3(x, y)
In [28]: np.all(counts2 == counts3) and np.all(x2 == x3) and np.all(y2 == y3)
Out[28]: True