I have two points in 3D space:
a = (ax, ay, az)
b = (bx, by, bz)
I want to calculate the distance between them:
dist = sqrt((ax-bx)^2 + (ay-by)^2 + (az-bz)^2)
How do I do this with NumPy? I have:
import numpy
a = numpy.array((ax, ay, az))
b = numpy.array((bx, by, bz))
Use numpy.linalg.norm:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:
Use scipy.spatial.distance.euclidean:
from scipy.spatial import distance
a = (1, 2, 3)
b = (4, 5, 6)
dst = distance.euclidean(a, b)
For anyone interested in computing multiple distances at once, I've done a little comparison using perfplot (a small project of mine).
The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn't matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or
a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))
which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)
The variants where you sum up over the second axis, axis=1, are all substantially slower.
Code to reproduce the plot:
import numpy
import perfplot
from scipy.spatial import distance
def linalg_norm(data):
a, b = data[0]
return numpy.linalg.norm(a - b, axis=1)
def linalg_norm_T(data):
a, b = data[1]
return numpy.linalg.norm(a - b, axis=0)
def sqrt_sum(data):
a, b = data[0]
return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))
def sqrt_sum_T(data):
a, b = data[1]
return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))
def scipy_distance(data):
a, b = data[0]
return list(map(distance.euclidean, a, b))
def sqrt_einsum(data):
a, b = data[0]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))
def sqrt_einsum_T(data):
a, b = data[1]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
out0 = numpy.array([a, b])
out1 = numpy.array([a.T, b.T])
return out0, out1
b = perfplot.bench(
setup=setup,
n_range=[2 ** k for k in range(22)],
kernels=[
linalg_norm,
linalg_norm_T,
scipy_distance,
sqrt_sum,
sqrt_sum_T,
sqrt_einsum,
sqrt_einsum_T,
],
xlabel="len(x), len(y)",
)
b.save("norm.png")
I want to expound on the simple answer with various performance notes. np.linalg.norm will do perhaps more than you need:
dist = numpy.linalg.norm(a-b)
Firstly - this function is designed to work over a list and return all of the values, e.g. to compare the distance from pA to the set of points sP:
sP = set(points)
pA = point
distances = np.linalg.norm(sP - pA, ord=2, axis=1.) # 'distances' is a list
Remember several things:
Python function calls are expensive.
[Regular] Python doesn't cache name lookups.
So
def distance(pointA, pointB):
dist = np.linalg.norm(pointA - pointB)
return dist
isn't as innocent as it looks.
>>> dis.dis(distance)
2 0 LOAD_GLOBAL 0 (np)
2 LOAD_ATTR 1 (linalg)
4 LOAD_ATTR 2 (norm)
6 LOAD_FAST 0 (pointA)
8 LOAD_FAST 1 (pointB)
10 BINARY_SUBTRACT
12 CALL_FUNCTION 1
14 STORE_FAST 2 (dist)
3 16 LOAD_FAST 2 (dist)
18 RETURN_VALUE
Firstly - every time we call it, we have to do a global lookup for "np", a scoped lookup for "linalg" and a scoped lookup for "norm", and the overhead of merely calling the function can equate to dozens of python instructions.
Lastly, we wasted two operations on to store the result and reload it for return...
First pass at improvement: make the lookup faster, skip the store
def distance(pointA, pointB, _norm=np.linalg.norm):
return _norm(pointA - pointB)
We get the far more streamlined:
>>> dis.dis(distance)
2 0 LOAD_FAST 2 (_norm)
2 LOAD_FAST 0 (pointA)
4 LOAD_FAST 1 (pointB)
6 BINARY_SUBTRACT
8 CALL_FUNCTION 1
10 RETURN_VALUE
The function call overhead still amounts to some work, though. And you'll want to do benchmarks to determine whether you might be better doing the math yourself:
def distance(pointA, pointB):
return (
((pointA.x - pointB.x) ** 2) +
((pointA.y - pointB.y) ** 2) +
((pointA.z - pointB.z) ** 2)
) ** 0.5 # fast sqrt
On some platforms, **0.5 is faster than math.sqrt. Your mileage may vary.
**** Advanced performance notes.
Why are you calculating distance? If the sole purpose is to display it,
print("The target is %.2fm away" % (distance(a, b)))
move along. But if you're comparing distances, doing range checks, etc., I'd like to add some useful performance observations.
Let’s take two cases: sorting by distance or culling a list to items that meet a range constraint.
# Ultra naive implementations. Hold onto your hat.
def sort_things_by_distance(origin, things):
return things.sort(key=lambda thing: distance(origin, thing))
def in_range(origin, range, things):
things_in_range = []
for thing in things:
if distance(origin, thing) <= range:
things_in_range.append(thing)
The first thing we need to remember is that we are using Pythagoras to calculate the distance (dist = sqrt(x^2 + y^2 + z^2)) so we're making a lot of sqrt calls. Math 101:
dist = root ( x^2 + y^2 + z^2 )
:.
dist^2 = x^2 + y^2 + z^2
and
sq(N) < sq(M) iff M > N
and
sq(N) > sq(M) iff N > M
and
sq(N) = sq(M) iff N == M
In short: until we actually require the distance in a unit of X rather than X^2, we can eliminate the hardest part of the calculations.
# Still naive, but much faster.
def distance_sq(left, right):
""" Returns the square of the distance between left and right. """
return (
((left.x - right.x) ** 2) +
((left.y - right.y) ** 2) +
((left.z - right.z) ** 2)
)
def sort_things_by_distance(origin, things):
return things.sort(key=lambda thing: distance_sq(origin, thing))
def in_range(origin, range, things):
things_in_range = []
# Remember that sqrt(N)**2 == N, so if we square
# range, we don't need to root the distances.
range_sq = range**2
for thing in things:
if distance_sq(origin, thing) <= range_sq:
things_in_range.append(thing)
Great, both functions no-longer do any expensive square roots. That'll be much faster, but before you go further, check yourself: why did sort_things_by_distance need a "naive" disclaimer both times above? Answer at the very bottom (*a1).
We can improve in_range by converting it to a generator:
def in_range(origin, range, things):
range_sq = range**2
yield from (thing for thing in things
if distance_sq(origin, thing) <= range_sq)
This especially has benefits if you are doing something like:
if any(in_range(origin, max_dist, things)):
...
But if the very next thing you are going to do requires a distance,
for nearby in in_range(origin, walking_distance, hotdog_stands):
print("%s %.2fm" % (nearby.name, distance(origin, nearby)))
consider yielding tuples:
def in_range_with_dist_sq(origin, range, things):
range_sq = range**2
for thing in things:
dist_sq = distance_sq(origin, thing)
if dist_sq <= range_sq: yield (thing, dist_sq)
This can be especially useful if you might chain range checks ('find things that are near X and within Nm of Y', since you don't have to calculate the distance again).
But what about if we're searching a really large list of things and we anticipate a lot of them not being worth consideration?
There is actually a very simple optimization:
def in_range_all_the_things(origin, range, things):
range_sq = range**2
for thing in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
Whether this is useful will depend on the size of 'things'.
def in_range_all_the_things(origin, range, things):
range_sq = range**2
if len(things) >= 4096:
for thing in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
elif len(things) > 32:
for things in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2 + (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
else:
... just calculate distance and range-check it ...
And again, consider yielding the dist_sq. Our hotdog example then becomes:
# Chaining generators
info = in_range_with_dist_sq(origin, walking_distance, hotdog_stands)
info = (stand, dist_sq**0.5 for stand, dist_sq in info)
for stand, dist in info:
print("%s %.2fm" % (stand, dist))
(*a1: sort_things_by_distance's sort key calls distance_sq for every single item, and that innocent looking key is a lambda, which is a second function that has to be invoked...)
Another instance of this problem solving method:
def dist(x,y):
return numpy.sqrt(numpy.sum((x-y)**2))
a = numpy.array((xa,ya,za))
b = numpy.array((xb,yb,zb))
dist_a_b = dist(a,b)
Starting Python 3.8, the math module directly provides the dist function, which returns the euclidean distance between two points (given as tuples or lists of coordinates):
from math import dist
dist((1, 2, 6), (-2, 3, 2)) # 5.0990195135927845
And if you're working with lists:
dist([1, 2, 6], [-2, 3, 2]) # 5.0990195135927845
It can be done like the following. I don't know how fast it is, but it's not using NumPy.
from math import sqrt
a = (1, 2, 3) # Data point 1
b = (4, 5, 6) # Data point 2
print sqrt(sum( (a - b)**2 for a, b in zip(a, b)))
A nice one-liner:
dist = numpy.linalg.norm(a-b)
However, if speed is a concern I would recommend experimenting on your machine. I've found that using math library's sqrt with the ** operator for the square is much faster on my machine than the one-liner NumPy solution.
I ran my tests using this simple program:
#!/usr/bin/python
import math
import numpy
from random import uniform
def fastest_calc_dist(p1,p2):
return math.sqrt((p2[0] - p1[0]) ** 2 +
(p2[1] - p1[1]) ** 2 +
(p2[2] - p1[2]) ** 2)
def math_calc_dist(p1,p2):
return math.sqrt(math.pow((p2[0] - p1[0]), 2) +
math.pow((p2[1] - p1[1]), 2) +
math.pow((p2[2] - p1[2]), 2))
def numpy_calc_dist(p1,p2):
return numpy.linalg.norm(numpy.array(p1)-numpy.array(p2))
TOTAL_LOCATIONS = 1000
p1 = dict()
p2 = dict()
for i in range(0, TOTAL_LOCATIONS):
p1[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
p2[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
total_dist = 0
for i in range(0, TOTAL_LOCATIONS):
for j in range(0, TOTAL_LOCATIONS):
dist = fastest_calc_dist(p1[i], p2[j]) #change this line for testing
total_dist += dist
print total_dist
On my machine, math_calc_dist runs much faster than numpy_calc_dist: 1.5 seconds versus 23.5 seconds.
To get a measurable difference between fastest_calc_dist and math_calc_dist I had to up TOTAL_LOCATIONS to 6000. Then fastest_calc_dist takes ~50 seconds while math_calc_dist takes ~60 seconds.
You can also experiment with numpy.sqrt and numpy.square though both were slower than the math alternatives on my machine.
My tests were run with Python 2.6.6.
I find a 'dist' function in matplotlib.mlab, but I don't think it's handy enough.
I'm posting it here just for reference.
import numpy as np
import matplotlib as plt
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
# Distance between a and b
dis = plt.mlab.dist(a, b)
You can just subtract the vectors and then innerproduct.
Following your example,
a = numpy.array((xa, ya, za))
b = numpy.array((xb, yb, zb))
tmp = a - b
sum_squared = numpy.dot(tmp.T, tmp)
result = numpy.sqrt(sum_squared)
I like np.dot (dot product):
a = numpy.array((xa,ya,za))
b = numpy.array((xb,yb,zb))
distance = (np.dot(a-b,a-b))**.5
With Python 3.8, it's very easy.
https://docs.python.org/3/library/math.html#math.dist
math.dist(p, q)
Return the Euclidean distance between two points p and q, each given
as a sequence (or iterable) of coordinates. The two points must have
the same dimension.
Roughly equivalent to:
sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))
Having a and b as you defined them, you can use also:
distance = np.sqrt(np.sum((a-b)**2))
Since Python 3.8
Since Python 3.8 the math module includes the function math.dist().
See here https://docs.python.org/3.8/library/math.html#math.dist.
math.dist(p1, p2)
Return the Euclidean distance between two points p1 and p2,
each given as a sequence (or iterable) of coordinates.
import math
print( math.dist( (0,0), (1,1) )) # sqrt(2) -> 1.4142
print( math.dist( (0,0,0), (1,1,1) )) # sqrt(3) -> 1.7321
Here's some concise code for Euclidean distance in Python given two points represented as lists in Python.
def distance(v1,v2):
return sum([(x-y)**2 for (x,y) in zip(v1,v2)])**(0.5)
import math
dist = math.hypot(math.hypot(xa-xb, ya-yb), za-zb)
Calculate the Euclidean distance for multidimensional space:
import math
x = [1, 2, 6]
y = [-2, 3, 2]
dist = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(x, y)]))
5.0990195135927845
import numpy as np
from scipy.spatial import distance
input_arr = np.array([[0,3,0],[2,0,0],[0,1,3],[0,1,2],[-1,0,1],[1,1,1]])
test_case = np.array([0,0,0])
dst=[]
for i in range(0,6):
temp = distance.euclidean(test_case,input_arr[i])
dst.append(temp)
print(dst)
You can easily use the formula
distance = np.sqrt(np.sum(np.square(a-b)))
which does actually nothing more than using Pythagoras' theorem to calculate the distance, by adding the squares of Δx, Δy and Δz and rooting the result.
import numpy as np
# any two python array as two points
a = [0, 0]
b = [3, 4]
You first change list to numpy array and do like this: print(np.linalg.norm(np.array(a) - np.array(b))). Second method directly from python list as: print(np.linalg.norm(np.subtract(a,b)))
The other answers work for floating point numbers, but do not correctly compute the distance for integer dtypes which are subject to overflow and underflow. Note that even scipy.distance.euclidean has this issue:
>>> a1 = np.array([1], dtype='uint8')
>>> a2 = np.array([2], dtype='uint8')
>>> a1 - a2
array([255], dtype=uint8)
>>> np.linalg.norm(a1 - a2)
255.0
>>> from scipy.spatial import distance
>>> distance.euclidean(a1, a2)
255.0
This is common, since many image libraries represent an image as an ndarray with dtype="uint8". This means that if you have a greyscale image which consists of very dark grey pixels (say all the pixels have color #000001) and you're diffing it against black image (#000000), you can end up with x-y consisting of 255 in all cells, which registers as the two images being very far apart from each other. For unsigned integer types (e.g. uint8), you can safely compute the distance in numpy as:
np.linalg.norm(np.maximum(x, y) - np.minimum(x, y))
For signed integer types, you can cast to a float first:
np.linalg.norm(x.astype("float") - y.astype("float"))
For image data specifically, you can use opencv's norm method:
import cv2
cv2.norm(x, y, cv2.NORM_L2)
Find difference of two matrices first. Then, apply element wise multiplication with numpy's multiply command. After then, find summation of the element wise multiplied new matrix. Finally, find square root of the summation.
def findEuclideanDistance(a, b):
euclidean_distance = a - b
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
return euclidean_distance
What's the best way to do this with NumPy, or with Python in general? I have:
Well best way would be safest and also the fastest
I would suggest hypot usage for reliable results for chances of underflow and overflow are very little compared to writing own sqroot calculator
Lets see math.hypot, np.hypot vs vanilla np.sqrt(np.sum((np.array([i, j, k])) ** 2, axis=1))
i, j, k = 1e+200, 1e+200, 1e+200
math.hypot(i, j, k)
# 1.7320508075688773e+200
np.sqrt(np.sum((np.array([i, j, k])) ** 2))
# RuntimeWarning: overflow encountered in square
Speed wise math.hypot look better
%%timeit
math.hypot(i, j, k)
# 100 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%%timeit
np.sqrt(np.sum((np.array([i, j, k])) ** 2))
# 6.41 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Underflow
i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0
Overflow
i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf
No Underflow
i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200
No Overflow
i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200
Refer
The fastest solution I could come up with for large number of distances is using numexpr. On my machine it is faster than using numpy einsum:
import numexpr as ne
import numpy as np
np.sqrt(ne.evaluate("sum((a_min_b)**2,axis=1)"))
If you want something more explicit you can easily write the formula like this:
np.sqrt(np.sum((a-b)**2))
Even with arrays of 10_000_000 elements this still runs at 0.1s on my machine.
Related
I have a time series of vectors: Y = [v1, v2, ..., vn]. At each time t, I want to compute the distance between vector t and the average of the vectors before t. So for example, at t=3 I want to compute the cosine distance between v3 and (v1+v2)/2.
I have a script to do it but wondering if there's any way to do this faster via numpy's convolve feature or something like that?
import numpy
from scipy.spatial.distance import cosine
np.random.seed(10)
# Generate `T` vectors of dimension `vector_dim`
# NOTE: In practice, the vector is a very large column vector!
T = 3
vector_dim = 2
y = [np.random.rand(1, vector_dim)[0] for t in range(T)]
def moving_distance(v):
moving_dists = []
for t in range(len(v)):
if t == 0:
pass
else:
# Create moving average of values up until time t
prior_vals = v[:t]
m_avg = np.add.reduce(prior_vals) / len(prior_vals)
# Now compute distance between this moving average and vector t
moving_dists.append(cosine(m_avg, v[t]))
return moving_dists
d = moving_distance(y)
For this dataset, it should return: [0.3337342770170698, 0.0029993196890111262]
ndarray.cumsum or np.add.accumulate can be used to calculate the cumulative sum:
>>> y
array([[0.77132064, 0.02075195],
[0.63364823, 0.74880388],
[0.49850701, 0.22479665]])
>>> y.cumsum(0)
array([[0.77132064, 0.02075195],
[1.40496888, 0.76955583],
[1.90347589, 0.99435248]])
Therefore, the equivalent code of the function you provide is as follows:
>>> means = y.cumsum(0)[:-1] / np.arange(1, len(y))[:, None]
>>> [cosine(avg, vec) for avg, vec in zip(means, y[1:])]
[0.3337342770170698, 0.0029993196890111262]
Referring to the implementation of cosine, the more vectorized code is as follows:
>>> y_ = y[1:]
>>> uv = (means * y_).mean(1)
>>> uu = (means ** 2).mean(1)
>>> vv = (y_ ** 2).mean(1)
>>> np.clip(np.abs(1 - uv / np.sqrt(uu * vv)), 0, 2)
array([0.33373428, 0.00299932])
TL;DR
This is a much faster approach using NumPy (speedups above ~100x for even modest input sizes like 64x16):
import numpy as np
def cos_dist(a, b, axis=None):
ab = np.sum(a * b, axis=axis)
aa = np.sum(a * a, axis=axis)
bb = np.sum(b * b, axis=axis)
return 1 - (ab / np.sqrt(aa * bb))
def moving_dist_cumsum_np(arr, dist=cos_dist):
return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)
which uses a custom definition of cosine distance and is much more efficient than OP's approach as it is fully vectorized.
A slightly faster and more memory efficient (O(1) instead of O(n)) approach involves using Numba-accelerated explicit looping:
import numba as nb
#nb.njit
def cos_dist_nb(a, b):
a = a.ravel()
b = b.ravel()
ab = aa = bb = 0
n = len(a)
for i in range(n):
ab += a[i] * b[i]
aa += a[i] * a[i]
bb += b[i] * b[i]
return 1 - (ab / (aa * bb) ** 0.5)
#nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
n, m = arr.shape
result = np.empty(n - 1)
moving = np.zeros(m)
for i in range(n - 1):
moving += arr[i, :]
result[i] = dist(moving, arr[i + 1, :])
return result
Long Answer
The computation delineated in the OP can be further speed up with various optimizations.
OP's code is significantly more complex than needed.
Let us start with an adaptation that essentially just:
renames the main input
exposes the dist function
returns a NumPy array
replaces len(prior_vals) with t as it is the same value by construction
def moving_dist_OP(arr, dist=sp.spatial.distance.cosine):
moving_dists = []
for t in range(len(arr)):
if t == 0:
pass
else:
# Create moving average of values up until time t
prior_vals = arr[:t]
m_avg = np.add.reduce(prior_vals) / t
# Now compute distance between this moving average and vector t
moving_dists.append(dist(m_avg, arr[t]))
return np.array(moving_dists)
Now, this can be further simplified to this:
def moving_dist_simpler(arr, dist=sp.spatial.distance.cosine):
return np.array([dist(np.add.reduce(arr[:t]), arr[t]) for t in range(1, len(arr))])
On the provision that:
the loop appending can be rewritten as a list comprehension
the range can be made to start from 1 rather than skipping
the division by the length (a non-negative number) can be factored out in the cosine distance
This last observation stems from the definition of the cosine distance for two vectors a and b of identical size, where a . b is the dot product of a and b and |a| = √(a . a) is the norm induced by said dot product:
cos_dist(a, b) = 1 - (a . b) / (|a| |b|)
if a is replaced with k * a with k > 0 (and |k| is the absolute value of k), this becomes:
1 - ((k * a) . b) / (|k * a| |b|)
-> 1 - (k * (a . b)) / (|k| |a| |b|)
-> 1 - sign(k) * (a . b) / (|a| |b|)
-> 1 - (a . b) / (|a| |b|)
The np.add.reduce() computation is not very efficient because its values at the next iteration could be computed in terms of the result from the previous iteration, but instead at each iteration an increasing number of numbers are summed up together to perform the computation.
Instead, re-written with partial sums, this becomes:
def moving_dist_part(arr, dist=sp.spatial.distance.cosine):
n, m = arr.shape
moving_dists = []
moving = np.zeros(m)
for i in range(n - 1):
moving += arr[i, :]
moving_dists.append(dist(moving, arr[i + 1]))
return np.array(moving_dists)
It has been already noted (in #MechanicPig's answer) that the np.add.reduce() computation can also be rewritten with np.cumsum(), which is also more efficient than np.add.reduce() and of similar efficiency as the partial sum, but it uses more temporary memory (O(n) for np.cumsum() versus O(1) for partial sums):
def moving_dist_cumsum(arr, dist=sp.spatial.distance.cosine):
movings = np.cumsum(arr, axis=0)[:-1]
return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])
It is beneficial to rewrite this either fully vectorized or with simpler loops to be accelerated with Numba.
For the fully vectorized version, np.cumsum() is very helpful as it provides some of the partial computation in vector form.
Unfortunately, scipy.spatial.distance.cosine() does not accept higher dimensional input.
However, based on its definition, it is relatively simple to write a vectorized version of the cosine distance:
def cos_dist(a, b, axis=None):
ab = np.sum(a * b, axis=axis)
aa = np.sum(a * a, axis=axis)
bb = np.sum(b * b, axis=axis)
return 1 - (ab / np.sqrt(aa * bb))
With this, one can define a fully vectorized approach:
def moving_dist_cumsum_np(arr, dist=cos_dist):
return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)
Note that the new definition of the cosine distance can be used just about anywhere else scipy.spatial.distance.cosine() was used, e.g.:
def moving_dist_cumsum2(arr, dist=cos_dist):
movings = np.cumsum(arr, axis=0)[:-1]
return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])
However, the vectorized version still has the shortcoming of requiring a potentially large (O(n)) temporary object to store the result of np.cumsum().
Fortunately, with a little more adaptation it is possible to write a Numba-accelerated version of this (similar to moving_dist_part()) that does require only O(1) temporary memory:
import numba as nb
#nb.njit
def cos_dist_nb(a, b):
a = a.ravel()
b = b.ravel()
ab = aa = bb = 0
n = len(a)
for i in range(n):
ab += a[i] * b[i]
aa += a[i] * a[i]
bb += b[i] * b[i]
return 1 - (ab / (aa * bb) ** 0.5)
#nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
n, m = arr.shape
result = np.empty(n - 1)
moving = np.zeros(m)
for i in range(n - 1):
moving += arr[i, :]
result[i] = dist(moving, arr[i + 1, :])
return result
The above approaches can be benchmarked and plotted with the following (where smaller inputs are tested multiple times for more stable results):
import pandas as pd
import matplotlib.pyplot as plt
def benchmark(
funcs,
args=None,
kws=None,
ii=range(4, 15),
m=16,
kk=1024,
is_equal=np.allclose,
seed=0,
unit="ms",
verbose=True
):
labels = [func.__name__ for func in funcs]
units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
args = tuple(args) if args else ()
kws = dict(kws) if kws else {}
assert unit in units
np.random.seed(seed)
timings = {}
for i in ii:
n = 2 ** i
k = 1 + i * kk // n
if verbose:
print(f"i={i}, n={n}, m={m}, k={k}")
arrs = np.random.random((k, n, m))
base = np.array([funcs[0](arr, *args, **kws) for arr in arrs])
timings[n] = []
for func in funcs:
res = np.array([func(arr, *args, **kws) for arr in arrs])
is_good = is_equal(base, res)
timed = %timeit -n 1 -r 1 -q -o [func(arr, *args, **kws) for arr in arrs]
timing = timed.best / k
timings[n].append(timing if is_good else None)
if verbose:
print(
f"{func.__name__:>24}"
f" {is_good!s:5}"
f" {timing * (10 ** units[unit]):10.3f} {unit}"
f" {timings[n][0] / timing:5.1f}x")
return timings, labels
def plot(timings, labels, xlabel="Input Size / #", unit="ms"):
n_rows = 1
n_cols = 3
fig, axs = plt.subplots(n_rows, n_cols, figsize=(8 * n_cols, 6 * n_rows), squeeze=False)
units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
df = pd.DataFrame(data=timings, index=labels).transpose()
base = df[[labels[0]]].to_numpy()
(df * 10 ** units[unit]).plot(marker="o", xlabel=xlabel, ylabel=f"Best timing / {unit}", ax=axs[0, 0])
(df / base * 100).plot(marker='o', xlabel=xlabel, ylabel='Relative speed /labels %', logx=True, ax=axs[0, 1])
(base / df).plot(marker='o', xlabel=xlabel, ylabel='Speed Gain / x', ax=axs[0, 2])
fig.patch.set_facecolor('white')
to be used as:
funcs = moving_dist_OP, moving_dist_simpler, moving_dist_part, moving_dist_cumsum, moving_dist_cumsum2, moving_dist_cumsum_np, moving_dist_nb
timings, labels = benchmark(funcs, unit="ms", verbose=True)
plot(timings, labels, "Benchmarks", unit="ms")
to obtain:
These results indicate that Numba approach is the fastest by far and large, but the vectorized approach is reasonably fast.
When it comes to explicit non-accelerated looping, it is still beneficial to use the custom-defined cos_dist() in place of scipy.spatial.distance.cosine() (see moving_dist_cumsum() vs moving_dist_cumsum2()), while np.cumsum() is reasonably faster than np.add.reduce() but only marginally faster over computing the partial sum. Finally, moving_dist_OP() and moving_dist_simpler() are effectively equivalent (as expected).
I am trying to apply numpy to this code I wrote for trapezium rule integration:
def integral(a,b,n):
delta = (b-a)/float(n)
s = 0.0
s+= np.sin(a)/(a*2)
for i in range(1,n):
s +=np.sin(a + i*delta)/(a + i*delta)
s += np.sin(b)/(b*2.0)
return s * delta
I am trying to get the return value from the new function something like this:
return delta *((2 *np.sin(x[1:-1])) +np.sin(x[0])+np.sin(x[-1]) )/2*x
I am trying for a long time now to make any breakthrough but all my attempts failed.
One of the things I attempted and I do not get is why the following code gives too many indices for array error?
def integral(a,b,n):
d = (b-a)/float(n)
x = np.arange(a,b,d)
J = np.where(x[:,1] < np.sin(x[:,0])/x[:,0])[0]
Every hint/advice is very much appreciated.
You forgot to sum over sin(x):
>>> def integral(a, b, n):
... x, delta = np.linspace(a, b, n+1, retstep=True)
... y = np.sin(x)
... y[0] /= 2
... y[-1] /= 2
... return delta * y.sum()
...
>>> integral(0, np.pi / 2, 10000)
0.9999999979438324
>>> integral(0, 2 * np.pi, 10000)
0.0
>>> from scipy.integrate import quad
>>> quad(np.sin, 0, np.pi / 2)
(0.9999999999999999, 1.1102230246251564e-14)
>>> quad(np.sin, 0, 2 * np.pi)
(2.221501482512777e-16, 4.3998892617845996e-14)
I tried this meanwhile, too.
import numpy as np
def T_n(a, b, n, fun):
delta = (b - a)/float(n) # delta formula
x_i = lambda a,i,delta: a + i * delta # calculate x_i
return 0.5 * delta * \
(2 * sum(fun(x_i(a, np.arange(0, n + 1), delta))) \
- fun(x_i(a, 0, delta)) \
- fun(x_i(a, n, delta)))
Reconstructed the code using formulas at bottom of this page
https://matheguru.com/integralrechnung/trapezregel.html
The summing over the range(0, n+1) - which gives [0, 1, ..., n] -
is implemented using numpy. Usually, you would collect the values using a for loop in normal Python.
But numpy's vectorized behaviour can be used here.
np.arange(0, n+1) gives a np.array([0, 1, ...,n]).
If given as argument to the function (here abstracted as fun) - the function formula for x_0 to x_n
will be then calculated. and collected in a numpy-array. So fun(x_i(...)) returns a numpy-array of the function applied on x_0 to x_n. This array/list is summed up by sum().
The entire sum() is multiplied by 2, and then the function value of x_0 and x_n subtracted afterwards. (Since in the trapezoid formula only the middle summands, but not the first and the last, are multiplied by 2). This was kind of a hack.
The linked German page uses as a function fun(x) = x ^ 2 + 3
which can be nicely defined on the fly by using a lambda expression:
fun = lambda x: x ** 2 + 3
a = -2
b = 3
n = 6
You could instead use a normal function definition, too: defun fun(x): return x ** 2 + 3.
So I tested by typing the command:
T_n(a, b, n, fun)
Which correctly returned:
## Out[172]: 27.24537037037037
For your case, just allocate np.sin tofun and your values for a, b, and n into this function call.
Like:
fun = np.sin # by that eveywhere where `fun` is placed in function,
# it will behave as if `np.sin` will stand there - this is possible,
# because Python treats its functions as first class citizens
a = #your value
b = #your value
n = #your value
Finally, you can call:
T_n(a, b, n, fun)
And it will work!
In the following code I have written 2 methods that theoretically(in my mind) should do the same thing. Unfortunately they don't, I am unable to find out why they don't do the same thing per the numpy documentation.
import numpy as np
dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X = np.ones((100, 20))
# ===================
# Method 1 (works!)
# ===================
for i in range(len(y)):
dW[:, y[i]] -= X[i]
# ===================
# Method 2 (does not work)
# ===================
dW[:, y] -= X.T
As indicated, in principle you cannot operate multiple times over the same element in a single operation, due to how buffering works in NumPy. For that purpose there is the at function, which can be used on about any standard NumPy function (add, subtract, etc.). For your case, you can do:
import numpy as np
dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X = np.ones((100, 20))
# at modifies in place dW, does not return a new array
np.subtract.at(dW, (slice(None), y), X.T)
This is a column-wise version of this question.
The answer there can be adapted to work column-wise as follows:
Approach 1: np.<ufunc>.at
>>> np.subtract.at(dW, (slice(None), y), X.T)
Approach 2: np.bincount
>>> m, n = dW.shape
>>> dW -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)
Please note that the bincount based solution - even though it involves more steps - is faster by a factor of ~6.
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=5000)
>>>
>>> repeat('np.subtract.at(dW, (slice(None), y), X.T); np.add.at(dW, (slice(None), y), X.T)', **kwds)
[1.590626839082688, 1.5769231889862567, 1.5802007300080732]
>>> repeat('_= dW; _ -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n); _ += np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)', **kwds)
[0.2582490430213511, 0.25572817400097847, 0.25478115503210574]
I found a third solution for this problem. Normal Matrix multiplication:
ind = np.zeros((X.shape[0],dW.shape[1]))
ind[range(X.shape[0]),y] = -1
dW = X.T.dot(ind)
I did some experiment using the proposed methods above on some neural network data. In my example X.shape = (500,3073), W.shape = (3073,10) and ind.shape = (500,10).
The subtract version takes about 0.2 second (slowest). The Matrix multiplication method 0.01 s (fastest). Normal loop 0.015 and then bincountmethod 0.04 s. Note that in the question y is vector of ones. This is not my case. The case with only ones can be solved with a simple sum.
Option 1:
for i in range(len(y)):
dW[:, y[i]] -= X[i]
This works because you are looping through and updating value which was updated last time.
Option 2:
dW[:, [1,1,1,1,....1,1,1]] -= [[1,1,1,1...1],
[1,1,1,1...1],
.
.
[1,1,1,1...1]]
It does not work because update happens to 1st index at same time in parallel not in serial way. Initially all are 0 so subtracting results in -1.
I'm implementing Bayesian Changepoint Detection in Python/NumPy (if you are interested have a look at the paper). I need to compute likelihoods for data in ranges [a, b], where a and b can have all values from 1 to n. However I can prune the computation at some points, so that I don't have to compute every likelihood. On the other hand some likelihoods are used more than once, so that I can save time by saving the values in a matrix P[a, b]. Right now I check whether the value is already computed, whenever I use it, but I find that a bit of a hassle. It looks like this:
# ...
P = np.ones((n, n)) * np.inf # a likelihood can't get inf, so I use it
# as pseudo value
for a in range(n):
for b in range(a, n):
# The following two lines get annoying and error prone if you
# use P more than once
if P[a, b] == np.inf:
P[a, b] = likelihood(data, a, b)
Q[a] += P[a, b] * g[a] * Q[a - 1] # some computation using P[a, b]
# ...
I wonder, whether there is a more intuitive and pythonic way to achieve this, without having the if ... statement before every use of a P[a, b]. Something like an automagical function call if some condition is not met. I could of course make the likelihood function aware of the fact that it could save values, but then it needs some kind of state (e.g. becomes an object). I want to avoid that.
The likelihood function
Since it was asked for in a comment, I add the likelihood function. It actually computes the conjugate prior and then the likelihood. And all in log representation... So it is quite complicated.
from scipy.special import gammaln
def gaussian_obs_log_likelihood(data, t, s):
n = s - t
mean = data[t:s].sum() / n
muT = (n * mean) / (1 + n)
nuT = 1 + n
alphaT = 1 + n / 2
betaT = 1 + 0.5 * ((data[t:s] - mean) ** 2).sum() + ((n)/(1 + n)) * (mean**2 / 2)
scale = (betaT*(nuT + 1))/(alphaT * nuT)
# splitting the PDF of the student distribution up is /much/ faster. (~ factor 20)
prob = 1
for yi in data[t:s]:
prob += np.log(1 + (yi - muT)**2/(nuT * scale))
lgA = gammaln((nuT + 1) / 2) - np.log(np.sqrt(np.pi * nuT * scale)) - gammaln(nuT/2)
return n * lgA - (nuT + 1)/2 * prob
Although I work with Python 2.7, both answers for 2.7 and 3.x are appreciated.
I would use a sibling of defaultdict for this (you can't use defaultdict directly since it won't tell you the key that is missing):
class Cache(object):
def __init__(self):
self.cache = {}
def get(self, a, b):
key = (a,b)
result = self.cache.get(key, None)
if result is None:
result = likelihood(data, a, b)
self.cache[key] = result
return result
Another approach would be using a cache decorator on likelihood as described here.
consider my code
a,b,c = np.loadtxt ('test.dat', dtype='double', unpack=True)
a,b, and c are the same array length.
for i in range(len(a)):
q[i] = 3*10**5*c[i]/100
x[i] = q[i]*math.sin(a)*math.cos(b)
y[i] = q[i]*math.sin(a)*math.sin(b)
z[i] = q[i]*math.cos(a)
I am trying to find all the combinations for the difference between 2 points in x,y,z to iterate this equation (xi-xj)+(yi-yj)+(zi-zj) = r
I use this combination code
for combinations in it.combinations(x,2):
xdist = (combinations[0] - combinations[1])
for combinations in it.combinations(y,2):
ydist = (combinations[0] - combinations[1])
for combinations in it.combinations(z,2):
zdist = (combinations[0] - combinations[1])
r = (xdist + ydist +zdist)
This takes a long time for python for a large file I have and I am wondering if there is a faster way to get my array for r preferably using a nested loop?
Such as
if i in range(?):
if j in range(?):
Since you're apparently using numpy, let's actually use numpy; it'll be much faster. It's almost always faster and usually easier to read if you avoid python loops entirely when working with numpy, and use its vectorized array operations instead.
a, b, c = np.loadtxt('test.dat', dtype='double', unpack=True)
q = 3e5 * c / 100 # why not just 3e3 * c?
x = q * np.sin(a) * np.cos(b)
y = q * np.sin(a) * np.sin(b)
z = q * np.cos(a)
Now, your example code after this doesn't do what you probably want it to do - notice how you just say xdist = ... each time? You're overwriting that variable and not doing anything with it. I'm going to assume you want the squared euclidean distance between each pair of points, though, and make a matrix dists with dists[i, j] equal to the distance between the ith and jth points.
The easy way, if you have scipy available:
# stack the points into a num_pts x 3 matrix
pts = np.hstack([thing.reshape((-1, 1)) for thing in (x, y, z)])
# get squared euclidean distances in a matrix
dists = scipy.spatial.squareform(scipy.spatial.pdist(pts, 'sqeuclidean'))
If your list is enormous, it's more memory-efficient to not use squareform, but then it's in a condensed format that's a little harder to find specific pairs of distances with.
Slightly harder, if you can't / don't want to use scipy:
pts = np.hstack([thing.reshape((-1, 1)) for thing in (x, y, z)])
sqnorms = np.sum(pts ** 2, axis=1)
dists = sqnorms.reshape((-1, 1)) - 2 * np.dot(pts, pts.T) + sqnorms
which basically implements the formula (a - b)^2 = a^2 - 2 a b + b^2, but all vector-like.
Apologies for not posting a full solution, but you should avoid nesting calls to range(), as it will create a new tuple every time it gets called. You are better off either calling range() once and storing the result, or using a loop counter instead.
For example, instead of:
max = 50
for number in range (0, 50):
doSomething(number)
...you would do:
max = 50
current = 0
while current < max:
doSomething(number)
current += 1
Well, the complexity of your calculation is pretty high. Also, you need to have huge amounts of memory if you want to store all r values in a single list. Often, you don't need a list and a generator might be enough for what you want to do with the values.
Consider this code:
def calculate(x, y, z):
for xi, xj in combinations(x, 2):
for yi, yj in combinations(y, 2):
for zi, zj in combinations(z, 2):
yield (xi - xj) + (yi - yj) + (zi - zj)
This returns a generator that computes only one value each time you call the generator's next() method.
gen = calculate(xrange(10), xrange(10, 20), xrange(20, 30))
gen.next() # returns -3
gen.next() # returns -4 and so on