Distance formula between two points in a list - python

I need to take a list I have created and find the closest two points and print them out. How can I go about comparing each point in the list?
There isn't any need to plot or anything, just compare the points and find the closest two in the list.
import math # 'math' needed for 'sqrt'
# Distance function
def distance(xi,xii,yi,yii):
sq1 = (xi-xii)*(xi-xii)
sq2 = (yi-yii)*(yi-yii)
return math.sqrt(sq1 + sq2)
# Run through input and reorder in [(x, y), (x,y) ...] format
oInput = ["9.5 7.5", "10.2 19.1", "9.7 10.2"] # Original input list (entered by spacing the two points).
mInput = [] # Manipulated list
fList = [] # Final list
for o in oInput:
mInput = o.split()
x,y = float(mInput[0]), float(mInput[1])
fList += [(x, y)] # outputs [(9.5, 7.5), (10.2, 19.1), (9.7, 10.2)]

It is more convenient to rewrite your distance() function to take two (x, y) tuples as parameters:
def distance(p0, p1):
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
Now you want to iterate over all pairs of points from your list fList. The function iterools.combinations() is handy for this purpose:
min_distance = distance(fList[0], fList[1])
for p0, p1 in itertools.combinations(fList, 2):
min_distance = min(min_distance, distance(p0, p1))
An alternative is to define distance() to accept the pair of points in a single parameter
def distance(points):
p0, p1 = points
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
and use the key parameter to the built-in min() function:
min_pair = min(itertools.combinations(fList, 2), key=distance)
min_distance = distance(min_pair)

I realize that there are library constraints on this question, but for completeness if you have N points in an Nx2 numpy ndarray (2D system):
from scipy.spatial.distance import pdist
x = numpy.array([[9.5,7.5],[10.2,19.1],[9.7,10.2]])
mindist = numpy.min(pdist(x))
I always try to encourage people to use numpy/scipy if they are dealing with data that is best stored in a numerical array and it's good to know that the tools are out there for future reference.

Note that the math.sqrt function is both slow and, in this case, unnecessary. Try comparing the distance squared to speed it up (sorting distances vs. distance squared will always produce the same ordering):
def distSquared(p0, p1):
return (p0[0] - p1[0])**2 + (p0[1] - p1[1])**2

This might work:
oInput = ["9.5 7.5", "10.2 19.1", "9.7 10.2"]
# parse inputs
inp = [(float(j[0]), float(j[1])) for j in [i.split() for i in oInput]]
# initialize results with a really large value
min_distance = float('infinity')
min_pair = None
# loop over inputs
length = len(inp)
for i in xrange(length):
for j in xrange(i+1, length):
point1 = inp[i]
point2 = inp[j]
if math.hypot(point1[0] - point2[0], point1[1] - point2[0]) < min_distance:
min_pair = [point1, point2]
once the loops are done, min_pair should be the pair with the smallest distance.
Using float() to parse the text leaves room for improvement.
math.hypot is about a third faster than calculating the distance in a handwritten python-function

Your fixed code. No efficient algorithm, just the brute force one.
import math # math needed for sqrt
# distance function
def dist(p1, p2):
return math.sqrt((p2[0] - p1[0]) ** 2 + (p2[1] - p1[1]) ** 2)
# run through input and reorder in [(x, y), (x,y) ...] format
input = ["9.5 7.5", "10.2 19.1", "9.7 10.2"] # original input list (entered by spacing the two points)
points = [map(float, point.split()) for point in input] # final list
# http://en.wikipedia.org/wiki/Closest_pair_of_points
mindist = float("inf")
for p1, p2 in itertools.combinations(points, 2):
if dist(p1, p2) < mindist:
mindist = dist(p1, p2)
closestpair = (p1, p2)
print(closestpair)

First, some notes:
a**2 # squares a
(xi - xii)**2 # squares the expression in parentheses.
mInput doesn't need to be declared in advance.
fList.append((x, y)) is more pythonic than using +=.
Now you have fList. Your distance function can be rewritten to take 2 2-tuple (point) arguments, which I won't bother with here.
Then you can just write:
shortest = float('inf')
for pair in itertools.combinations(fList, 2):
shortest = min(shortest, distance(*pair))

Many of the above questions suggest finding square root using math.sqrt which is slow as well as not a good approach to find square root. In spite of using such approach just recall the basic concepts from school: think of taking the square root of any positive number, x. The square root is then written as a power of one-half: x½. Thus, a fractional exponent indicates that some root is to be taken.
so rather than using math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
Use
def distance(a,b):
euclidean_distance = ((b[0]-a[0])**2 + (a[1]-a[1])**2)**0.5
return(euclidean_distance)
Hope it helps

Related

how to calculate the distances between all datapoints among each other

I want to check which data points within X are close to each other and which are far. by calculating the distances between each other without getting to zero, is it possible?
X = np.random.rand(20, 10)
dist = (X - X) ** 2
print(X)
Using just numpy you can either do,
np.linalg.norm((X - X[:,None]),axis=-1)
or,
np.sqrt(np.square(X - X[:,None]).sum(-1))
Another possible solution:
from scipy.spatial.distance import cdist
X = np.random.rand(20, 10)
cdist(X, X)
You can go though each point in sequence
X = np.random.rand(20, 10)
no_points = X.shape[0]
distances = np.zeros((no_points, no_points))
for i in range(no_points):
for j in range(no_points):
distances[i, j] = np.linalg.norm(X[i, :] - X[j, :])
print(distances,np.max(distances))
I would assume you want a way to actually get some way of keeping track of the distances, correct? If so, you can easily build a dictionary that will contain the distances as the keys and a list of tuples that correspond to the points as the value. Then you would just need to iterate through the keys in asc order to get the distances from least to greatest and the points that correspond to that distance. One way to do so would be to just brute force each possible connection between points.
dist = dict()
X = np.random.rand(20, 10)
for indexOfNumber1 in range(len(X) - 1):
for indexOfNumber2 in range(1, len(X)):
distance = sqrt( (X[indexOfNumber1] - X[indexOfNumber2])**2 )
if distance not in dist.keys():
dist[distance] = [tuple(X[indexOfNumber1], X[indexOfNumber2])]
else:
dist[distance] = dist[distance].append(tuple(X[indexOfNumber1], X[indexOfNumber2]))
The code above will then have a dictionary dist that contains all of the possible distances from the points you are looking at and the corresponding points that achieve that distance.

Check if values are inside a specific area around a predefined linear function

My research on solving my issue was unfortunately unsuccessful and I hope you can help me. I have defined the following linear function for a straight line
x = [298358.3258395831, 298401.1779180078]
y = [5625243.628060675, 5625347.074197255]
m, b = np.polyfit(x, y, 1)
and I want to check, if values in an array are inside an area around this function. The area around the function could look like this:
I couldn't find a solution how to create an area around this straight line function and so I couldn't find a way how to check if the points in the array are inside or outside of the area.
Thanks in advance!
For a line given by the equation ax + by + c = 0, the distance from a point A = (x_a,y_a) to this line is given by the following formula :
dist = np.abs(a * x_a + b * y_a + c) / np.sqrt(a**2 + b**2)
Source here.
That way, if you have an array of points and a threshold above which you consider your points to be too far away from your line, you can simply do :
array_points = ... # Format : [[x_1,y_1], [x_2,y_2],...]
a, b, c = ... # Your line's parameters here
thresh = 1e-2 # For example
def is_close_line(array, threshold) :
array_dist = np.abs(a * array[:,0] + b * array[:,1] + c) / np.sqrt(a**2 + b**2)
return (array_dist < threshold)
is_close_line(array_points, thresh) will then output a boolean array, where the i-th item indicates wether or not the i-th element of array_points is close to your line.
A possible solution could be:
Take a distance and project it onto the x axis
Build two new lines by shifting your line according to the distance projection
Compare a new point with the so-built lines
Here a sample code (note that m=0 should be handled differently):
def near_line(point, dist, m, b):
# Data preparation
x, y = point
dist = abs(dist)
if m != 0:
# Case positive ramp
dist_projection = dist/np.sin(np.arctan(abs(m)))
return m*(x-dist_projection)+b < y < m*(x+dist_projection)+b
else:
# Case horizontal line
return b-dist < y < b+dist
print( near_line([298359, 5625244], dist=5, m=m, b=b) )
print( near_line([298400, 5625250], dist=5, m=m, b=b) )
Out:
True
False
My answer is based on this post where the concept of cross-product and norm are considered. This solution applies also to an infinite line like yours, where it's constructed starting from two points.
import numpy as np
def dist_array_line(a, l1, l2, threshold):
"""
a : numpy.array
Array of points of shape (M,2)
M is the total number of points to test
l1 : numpy.array
Array of shape (2,) indicating the first point standing on the line
l2 : numpy.array
Array of shape (2,) indicating the second point standing on the line
threshold : float
Maximum distance allowed between line and points
Returns
numpy.array
Array of shape (M,) with True/False indicating whether the points in `a`
are within/outside the rectangle around the line
"""
distances = np.abs(np.cross(a - l1, a - l2)) / np.linalg.norm(l1 - l2)
return (distances < threshold)
If you want to return the actual distances instead of a True/False array, just make the function return the distances object.
Example
# points on the line
p1 = np.array([298358.3258395831, 5625243.628060675])
p2 = np.array([298401.1779180078, 5625347.074197255])
# array of points to test
my_arr = np.array([
[298359.3258395831, 5625243.628060675],
[298368.3258395831, 5625243.628060675],
[(p1[0] + p2[0]) / 2, (p1[1] + p2[1]) / 2]
])
dist_array_line(my_arr, p1, p2, threshold=5.)
# array([ True, False, True])

Python: Intersection of spheres

I am extremely new to programming but I decided to take on an interesting project as I recently learnt how to represent a sphere in parametric form. When intersecting three spheres, there are two points of intersections that are distinct unless they only overlap at a singular point.
Parametric representation of a sphere:
The code I have is modified from the answer from Python/matplotlib : plotting a 3d cube, a sphere and a vector?, adding the ability to dictate the x, y and z origin and the radius of the sphere. Many similar questions were written in C++, Java, and C#, which I cannot understand at all (I barely know what I am doing so go easy on me).
My Code:
import numpy as np
def make_sphere_x(x, radius):
u, v = np.mgrid[0:2 * np.pi:5000j, 0:np.pi:2500j]
x += radius * np.cos(u) * np.sin(v)
return x
def make_sphere_y(y, radius):
u, v = np.mgrid[0:2 * np.pi:5000j, 0:np.pi:2500j]
y += radius * np.sin(u) * np.sin(v)
return y
def make_sphere_z(z, radius):
u, v = np.mgrid[0:2 * np.pi:5000j, 0:np.pi:2500j]
z += radius * np.cos(v)
return z
#x values
sphere_1_x = make_sphere_x(0, 2)
sphere_2_x = make_sphere_x(1, 3)
sphere_3_x = make_sphere_x(-1, 4)
#y values
sphere_1_y = make_sphere_y(0, 2)
sphere_2_y = make_sphere_y(1, 3)
sphere_3_y = make_sphere_y(0, 4)
#z values
sphere_1_z = make_sphere_z(0, 2)
sphere_2_z = make_sphere_z(1, 3)
sphere_3_z = make_sphere_z(-2, 4)
#intercept of x-values
intercept_x = list(filter(lambda x: x in sphere_1_x, sphere_2_x))
intercept_x = list(filter(lambda x: x in intercept_x, sphere_3_x))
print(intercept_x)
Problems:
Clearly there must be a better way of finding the intercepts. Right now, the code generates points at equal intervals, with the number of intervals I specify under the imaginary number in np.mgrid. If this is increased, the chances of an intersection should increase (I think) but when I try to increase it to 10000j or above, it just spits a memory error.
There are obvious gaps in the array and this method would most likely be erroneous even if I have access to a super computer and can crank up the value to an obscene value. Right now the code results in a null set.
The code is extremely inefficient, not that this is a priority but people like things in threes right?
Feel free to flame me for rookie mistakes in coding or asking questions on Stack Overflow. Your help is greatly valued.
Using scipy.optimize.fsolve you can find the root of a given function, given an initial guess that is somewhere in the range of your solution. I used this approach to solve your problem and it seems to work for me. The only downside is that it only provides you one intersection. To find the second one you would have to tinker with the initial conditions until fsolve finds the second root.
First we define our spheres by defining (arbitrary) radii and centers for each sphere:
a1 = np.array([0,0,0])
r1 = .4
a2 = np.array([.3,0,0])
r2 = .5
a3 = np.array([0,.3,0])
r3 = .5
We then define how to transform back into cartesian coordinates, given angles u,v
def position(a,r,u,v):
return a + r*np.array([np.cos(u)*np.sin(v),np.sin(u)*np.sin(v),np.cos(v)])
Now we think about what equation we need to find the root of. For any intersection point, it holds that for perfect u1,v1,u2,v2,u3,v3 the positions position(a1,r1,u1,v1) = position(a2,r2,u2,v2) = position(a3,r3,u3,v3) are equal. We thus find three equations which must be zeros, namely the differences of two position vectors. In fact, as every vector has 3 components, we have 9 equations which is more than enough to determine our 6 variables.
We find the function to minimize as:
def f(args):
u1,v1,u2,v2,u3,v3,_,_,_ = args
pos1 = position(a1,r1,u1,v1)
pos2 = position(a2,r2,u2,v2)
pos3 = position(a3,r3,u3,v3)
return np.array([pos1 - pos2, pos1 - pos3, pos2 - pos3]).flatten()
fsolve needs the same amount of input and output arguments. As we have 9 equations but only 6 variables I simply used 3 dummy variables so the dimensions match. Flattening the array in the last line is necessary as fsolve only accepts 1D-Arrays.
Now the intersection can be found using fsolve and a (pretty random) guess:
guess = np.array([np.pi/4,np.pi/4,np.pi/4,np.pi/4,np.pi/4,np.pi/4,0,0,0])
x0 = fsolve(f,guess)
u1,v1,u2,v2,u3,v3,_,_,_ = x0
You can check that the result is correct by plugging the angles you received into the position function.
The problem would be better tackled using trigonometry.
Reducing the problem into 2D circles, we could do:
import math
import numpy
class Circle():
def __init__(self, cx, cy, r):
"""initialise Circle and set main properties"""
self.centre = numpy.array([cx, cy])
self.radius = r
def find_intercept(self, c2):
"""find the intercepts between the current Circle and a second c2"""
#Find the distance between the circles
s = c2.centre - self.centre
self.dx, self.dy = s
self.d = math.sqrt(numpy.sum(s**2))
#Test if there is an overlap. Note: this won't detect if one circle completly surrounds the other.
if self.d > (self.radius + c2.radius):
print("no interaction")
else:
#trigonometry
self.theta = math.atan2(self.dy,self.dx)
#cosine rule
self.cosA = (c2.radius**2 - self.radius**2 + self.d**2)/(2*c2.radius*self.d)
self.A = math.acos(self.cosA)
self.Ia = c2.centre - [math.cos(self.A+self.theta)*c2.radius, math.sin(self.A+self.theta)*c2.radius]
self.Ib = c2.centre - [math.cos(self.A-self.theta)*c2.radius,-math.sin(self.A-self.theta)*c2.radius]
print("Interaction points are : ", self.Ia, " and: ", self.Ib)
#define two arbitrary circles
c1 = Circle(2,5,5)
c2 = Circle(1,6,4)
#find the intercepts
c1.find_intercept(c2)
#test results by reversing the operation
c2.find_intercept(c1)

Optimizing by translation to map one x,y set of points onto another

I have a list of x,y ideal points, and a second list of x,y measured points. The latter has some offset and some noise.
I am trying to "fit" the latter to the former. So, extract the x,y offset of the latter relative to the former.
I'm following some examples of scipy.optimize.leastsq, but having trouble getting it working. Here is my code:
import random
import numpy as np
from scipy import optimize
# Generate fake data. Goal: Get back dx=0.1, dy=0.2 at the end of this exercise
dx = 0.1
dy = 0.2
# "Actual" (ideal) data.
xa = np.array([0,0,0,1,1,1])
ya = np.array([0,1,2,0,1,2])
# "Measured" (non-ideal) data. Add the offset and some randomness.
xm = map(lambda x: x + dx + random.uniform(0,0.01), xa)
ym = map(lambda y: y + dy + random.uniform(0,0.01), ya)
# Plot each
plt.figure()
plt.plot(xa, ya, 'b.', xm, ym, 'r.')
# The error function.
#
# Args:
# translations: A list of xy tuples, each xy tuple holding the xy offset
# between 'coords' and the ideal positions.
# coords: A list of xy tuples, each xy tuple holding the measured (non-ideal)
# coordinates.
def errfunc(translations, coords):
sum = 0
for t, xy in zip(translations, coords):
dx = t[0] + xy[0]
dy = t[1] + xy[1]
sum += np.sqrt(dx**2 + dy**2)
return sum
translations, coords = [], []
for xxa, yya, xxm, yym in zip(xa, ya, xm, ym):
t = (xxm-xxa, yym-yya)
c = (xxm, yym)
translations.append(t)
coords.append(c)
translation_guess = [0.05, 0.1]
out = optimize.leastsq(errfunc, translation_guess, args=(translations, coords), full_output=1)
print out
I get the error:
errfunc() takes exactly 2 arguments (3 given)"
I'm not sure why it says 3 arguments as I only gave it two. Can anyone help?
====
ANSWER:
I was thinking about this wrong. All I have to do is to take the average of the dx and dy's -- that gives the correct result.
n = xa.shape[0]
dx = -np.sum(xa - xm) / n
dy = -np.sum(ya - ym) / n
print dx, dy
The scipy.optimize.leastsq assumes that the function you are using already has one input, x0, the initial guess. Any other additional inputs are then listed in args.
So you are sending three arguments: translation_guess, transactions, and coords.
Note that here it specifies that args are "extra arguments."
Okay, I think I understand now. You have the actual locations and the measured locations and you want to figure out the constant offset, but there is noise on each pair. Correct me if I'm wrong:
xy = tuple with coordinates of measured point
t = tuple with measured offset (constant + noise)
The actual coordinates of a point are (xy - t) then?
If so, then we think it should be measured at (xy - t + guess).
If so, then our error is (xy - t + guess - xy) = (guess - t)
Where it is measured doesn't even matter! We just want to find the guess that is closest to all of the measured translations:
def errfunc(guess, translations):
errx = 0
erry = 0
for t in translations:
errx += guess[0] - t[0]
erry += guess[1] - t[1]
return errx,erry
What do you think? Does that make sense or did I miss something?

Using combinations or another trick to iterate though 3 different arrays?

consider my code
a,b,c = np.loadtxt ('test.dat', dtype='double', unpack=True)
a,b, and c are the same array length.
for i in range(len(a)):
q[i] = 3*10**5*c[i]/100
x[i] = q[i]*math.sin(a)*math.cos(b)
y[i] = q[i]*math.sin(a)*math.sin(b)
z[i] = q[i]*math.cos(a)
I am trying to find all the combinations for the difference between 2 points in x,y,z to iterate this equation (xi-xj)+(yi-yj)+(zi-zj) = r
I use this combination code
for combinations in it.combinations(x,2):
xdist = (combinations[0] - combinations[1])
for combinations in it.combinations(y,2):
ydist = (combinations[0] - combinations[1])
for combinations in it.combinations(z,2):
zdist = (combinations[0] - combinations[1])
r = (xdist + ydist +zdist)
This takes a long time for python for a large file I have and I am wondering if there is a faster way to get my array for r preferably using a nested loop?
Such as
if i in range(?):
if j in range(?):
Since you're apparently using numpy, let's actually use numpy; it'll be much faster. It's almost always faster and usually easier to read if you avoid python loops entirely when working with numpy, and use its vectorized array operations instead.
a, b, c = np.loadtxt('test.dat', dtype='double', unpack=True)
q = 3e5 * c / 100 # why not just 3e3 * c?
x = q * np.sin(a) * np.cos(b)
y = q * np.sin(a) * np.sin(b)
z = q * np.cos(a)
Now, your example code after this doesn't do what you probably want it to do - notice how you just say xdist = ... each time? You're overwriting that variable and not doing anything with it. I'm going to assume you want the squared euclidean distance between each pair of points, though, and make a matrix dists with dists[i, j] equal to the distance between the ith and jth points.
The easy way, if you have scipy available:
# stack the points into a num_pts x 3 matrix
pts = np.hstack([thing.reshape((-1, 1)) for thing in (x, y, z)])
# get squared euclidean distances in a matrix
dists = scipy.spatial.squareform(scipy.spatial.pdist(pts, 'sqeuclidean'))
If your list is enormous, it's more memory-efficient to not use squareform, but then it's in a condensed format that's a little harder to find specific pairs of distances with.
Slightly harder, if you can't / don't want to use scipy:
pts = np.hstack([thing.reshape((-1, 1)) for thing in (x, y, z)])
sqnorms = np.sum(pts ** 2, axis=1)
dists = sqnorms.reshape((-1, 1)) - 2 * np.dot(pts, pts.T) + sqnorms
which basically implements the formula (a - b)^2 = a^2 - 2 a b + b^2, but all vector-like.
Apologies for not posting a full solution, but you should avoid nesting calls to range(), as it will create a new tuple every time it gets called. You are better off either calling range() once and storing the result, or using a loop counter instead.
For example, instead of:
max = 50
for number in range (0, 50):
doSomething(number)
...you would do:
max = 50
current = 0
while current < max:
doSomething(number)
current += 1
Well, the complexity of your calculation is pretty high. Also, you need to have huge amounts of memory if you want to store all r values in a single list. Often, you don't need a list and a generator might be enough for what you want to do with the values.
Consider this code:
def calculate(x, y, z):
for xi, xj in combinations(x, 2):
for yi, yj in combinations(y, 2):
for zi, zj in combinations(z, 2):
yield (xi - xj) + (yi - yj) + (zi - zj)
This returns a generator that computes only one value each time you call the generator's next() method.
gen = calculate(xrange(10), xrange(10, 20), xrange(20, 30))
gen.next() # returns -3
gen.next() # returns -4 and so on

Categories

Resources