Geometric median - python

I have written some code to find the
Geometric median for a set of weighted points
it is based on this Google-kickstart challenge . I don't want a better solution but want to know what is wrong in my code
. The code iterates against given precision value of 10^-6 to arrive to a value close to the geometric median . the problem I face is it returns correct value for digits until 10^-3 and after that it goes wrong . I cannot figure out what is going wrong . I also noted changing the initializing value alters the result but don't know why.The code also holds good if weight of points is not considered
here is formula i use to find distance to each point: max(abs(i.x-k.x), abs(i.y-k.y))x(weight_of_i) (its is Chebyshev distance)
here is the iteration function i used :
#c = previous centre , stp =previous step ,listy_r = list of points(x,y,wt) ,k = previous sum of distances
def move_ct( c,stp,listy_r,k): #calculates the minimum centre itrateviely , returns c--> center,stp-->step,k-->sum of distances
while True:
tmp=list()
moves = [(c[0], c[1]+stp), (c[0], c[1]-stp),
(c[0]+stp, c[1]), (c[0]-stp, c[1])]
for each in moves:tmp.append(sdist(listy_r, each))
tmp_min = min(tmp)
if tmp_min < k:
k = tmp_min
index = tmp.index(tmp_min)
c = moves[index]
break
else:
stp *= 0.5
return (c,stp,k)
here are the values i initialized:
initial geometric centre = centroid of the weighted points
precision = 10**-6
step = half of distance between highest and lowest coordinates on x,y
I have attached an input text file here that contains 10000 points(it is test case 1 for large input for the challenge) in the format
one point for each line and each point has 3 parameters (x,y,weight)
eg: 980.69 595.86 619.03 where
980.69 = x coordinate
595.86 = y coordinate
619.03 = weight
the result of the 10000 points should give :3288079343.471880 but it gives
3288079343.4719906 as result . Notice it is off only after 10^-3 .

Related

Underestimation of f(x) by using a piecewise linear function

I am trying to check if there is any Matlab/Python procedure to underestimate f(x) by using a piecewise linear function g(x). That is g(x) needs to be less or equal to, f(x). See the picture and code below. Could you please help to modify this code to find how to underestimate this function?
x = 0.000000001:0.001:1;
y = abs(f(x));
%# Find section sizes, by using an inverse of the approximation of the derivative
numOfSections = 5;
totalRange = max(x(:))-min(x(:));
%# The relevant nodes
xNodes = x(1) + [ 0 cumsum(sectionSize)];
yNodes = abs(f(xNodes));
figure;plot(x,y);
hold on;
plot (xNodes,yNodes,'r');
scatter (xNodes,yNodes,'r');
legend('abs(f(x))','adaptive linear interpolation');
This approach is based on Luis Mendo's comment. The idea is the following:
Select a number of points from the original curve, your final piecewise linear curve will pass through these points
For each point calculate the equation of the tangent to the original curve. Because your graph is convex, the tangents of consecutive points in your sample will intersect below the curve
Calculate, for each set of consecutive tangents, the x-coordinate of the point of intersection. Use the equation of the tangent to calculate the corresponding y-coordinate
Now, after reordering the points, this gives you a piecewise linear approximation with the constraints you want.
h = 0.001;
x = 0.000000001:h:1;
y = abs(log2(x));
% Derivative of function on all the points
der = diff(y)/h;
NPts = 10; % Number of sample points
% Draw the index of the points by which the output will pass at random
% Still make sure you got first and last point
idx = randperm(length(x)-3,NPts-2);
idx = [1 idx+1 length(x)-1];
idx = sort(idx);
x_pckd = x(idx);
y_pckd = y(idx);
der_pckd = der(idx);
% Use obscure math to calculate the points of intersection
xder = der_pckd.*x_pckd;
x_2add = -(diff(y_pckd)-(diff(xder)))./diff(der_pckd);
y_2add = der_pckd(1:(end-1)).*(x_2add-(x_pckd(1:(end-1))))+y_pckd(1:(end-1));
% Calculate the error as the sum of the errors made at the middle points
Err_add = sum(abs(y_2add-interp1(x,y,x_2add)));
% Get final x and y coordinates of interpolant
x_final = [reshape([x_pckd(1:end-1);x_2add],1,[]) x_pckd(end)];
y_final = [reshape([y_pckd(1:end-1);y_2add],1,[]) y_pckd(end)];
figure;
plot(x,y,'-k');
hold on
plot(x_final,y_final,'-or')
You can see in my code that the points are drawn at random. If you want to do some sort of optimization (e.g. what is the set of points that minimizes the error), you can just run this a high amount of time and keep track of the best contender. For example, 10000 random draws see the rise of this guy:

How to select numeric samples based on their distance relative to samples already selected (Python)

I have some random test data in a 2D array of shape (500,2) as such:
xy = np.random.randint(low=0.1, high=1000, size=[500, 2])
From this array, I first select 10 random samples, to select the 11th sample, I would like to pick the sample that is the furthest away from the original 10 selected samples collectively, I am using the euclidean distance to do this. I need to keep doing this until a certain amount have been picked. Here is my attempt at doing this.
# Function to get the distance between samples
def get_dist(a, b):
return np.sqrt(np.sum(np.square(a - b)))
# Set up variables and empty lists for the selected sample and starting samples
n_xy_to_select = 120
selected_xy = []
starting = []
# This selects 10 random samples and appends them to selected_xy
for i in range(10):
idx = np.random.randint(len(xy))
starting_10 = xy[idx, :]
selected_xy.append(starting_10)
starting.append(starting_10)
xy = np.delete(xy, idx, axis = 0)
starting = np.asarray(starting)
# This performs the selection based on the distances
for i in range(n_xy_to_select - 1):
# Set up an empty array dists
dists = np.zeros(len(xy))
for selected_xy_ in selected_xy:
# Get the distance between each already selected sample, and every other unselected sample
dists_ = np.array([get_dist(selected_xy_, xy_) for xy_ in xy])
# Apply some kind of penalty function - this is the key
dists_[dists_ < 90] -= 25000
# Sum dists_ onto dists
dists += dists_
# Select the largest one
dist_max_idx = np.argmax(dists)
selected_xy.append(xy[dist_max_idx])
xy = np.delete(xy, dist_max_idx, axis = 0)
The key to this is this line - the penalty function
dists_[dists_ < 90] -= 25000
This penalty function exists to prevent the code from just picking a ring of samples at the edge of the space, by artificially shortening values that are close together.
However, this eventually breaks down, and the selection starts clustering, as shown in the image. You can clearly see that there are much better selections that the code can make before any kind of clustering is necessary. I feel that a kind of decaying exponential function would be best for this, but I do not know how to implement it.
So my question is; how would I change the current penalty function to get what I'm looking for?
From your question, I understand that what you are looking for are Periodic Boundary Conditions (PBC). Meaning that a point which at the left edge of your space is just next to the on the right end side. Thus, the maximal distance you can get along one axis is given by the half of the box (i.e. between the edge and the center).
To take into account the PBC you need to compute the distance on each axis and subtract the half of the box to that:
For example, if you have a point with x1 = 100 and a second one with x2 = 900, using the PBC they are 200 units apart : |x1 - x2| - 500. In the general case, given 2 coordinates and the half size box, you end up by having:
In your case this simplifies to:
delta_x[delta_x > 500] = delta_x[delta_x > 500] - 500
To wrap it up, I rewrote your code using a new distance function (note that I removed some unnecessary for loops):
import numpy as np
def distance(p, arr, 500):
delta_x = np.abs(p[0] - arr[:,0])
delta_y = np.abs(p[1] - arr[:,1])
delta_x[delta_x > 500] = delta_x[delta_x > 500] - 500
delta_y[delta_y > 500] = delta_y[delta_y > 500] - 500
return np.sqrt(delta_x**2 + delta_y**2)
xy = np.random.randint(low=0.1, high=1000, size=[500, 2])
idx = np.random.randint(500, size=10)
selected_xy = list(xy[idx])
_initial_selected = xy[idx]
xy = np.delete(xy, idx, axis = 0)
n_xy_to_select = 120
for i in range(n_xy_to_select - 1):
# Set up an empty array dists
dists = np.zeros(len(xy))
for selected_xy_ in selected_xy:
# Compute the distance taking into account the PBC
dists_ = distance(selected_xy_, xy)
dists += dists_
# Select the largest one
dist_max_idx = np.argmax(dists)
selected_xy.append(xy[dist_max_idx])
xy = np.delete(xy, dist_max_idx, axis = 0)
And indeed it creates clusters, and this is normal as you will tend to create points clusters that are at the maximal distance from each others. More than that, due to the boundary conditions, we set that the maximal distance between 2 points along one axis is given by 500. The maximal distance between two clusters is thus also 500 ! And as you can see on the image, it is the case.
More over, picking more numbers will start to draws line to connect the different clusters, starting from the central one as you can see here :
What I was looking for is called 'Furthest Point Sampling'. I have some some more research into the solution, and the Python code used to perform this is found here: https://minibatchai.com/ai/2021/08/07/FPS.html

How to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus

Given a 10x10 grid (2d-array) filled randomly with numbers, either 0, 1 or 2. How can I find the Euclidean distance (the l2-norm of the distance vector) between two given points considering periodic boundaries?
Let us consider an arbitrary grid point called centre. Now, I want to find the nearest grid point containing the same value as centre. I need to take periodic boundaries into account, such that the matrix/grid can be seen rather as a torus instead of a flat plane. In that case, say the centre = matrix[0,2], and we find that there is the same number in matrix[9,2], which would be at the southern boundary of the matrix. The Euclidean distance computed with my code would be for this example np.sqrt(0**2 + 9**2) = 9.0. However, because of periodic boundaries, the distance should actually be 1, because matrix[9,2] is the northern neighbour of matrix[0,2]. Hence, if periodic boundary values are implemented correctly, distances of magnitude above 8 should not exist.
So, I would be interested on how to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus by applying a wrap-around for the boundaries.
import numpy as np
matrix = np.random.randint(0,3,(10,10))
centre = matrix[0,2]
#rewrite the centre to be the number 5 (to exclude itself as shortest distance)
matrix[0,2] = 5
#find the points where entries are same as centre
same = np.where((matrix == centre) == True)
idx_row, idx_col = same
#find distances from centre to all values which are of same value
dist = np.zeros(len(same[0]))
for i in range(0,len(same[0])):
delta_row = same[0][i] - 0 #row coord of centre
delta_col = same[1][i] - 2 #col coord of centre
dist[i] = np.sqrt(delta_row**2 + delta_col**2)
#retrieve the index of the smallest distance
idx = dist.argmin()
print('Centre value: %i. The nearest cell with same value is at (%i,%i)'
% (centre, same[0][idx],same[1][idx]))
For each axis, you can check whether the distance is shorter when you wrap around or when you don't. Consider the row axis, with rows i and j.
When not wrapping around, the difference is abs(i - j).
When wrapping around, the difference is "flipped", as in 10 - abs(i - j). In your example with i == 0 and j == 9 you can check that this correctly produces a distance of 1.
Then simply take whichever is smaller:
delta_row = same[0][i] - 0 #row coord of centre
delta_row = min(delta_row, 10 - delta_row)
And similarly for delta_column.
The final dist[i] calculation needs no changes.
I have a working 'sketch' of how this could work. In short, I calculate the distance 9 times, 1 for the normal distance, and 8 shifts to possibly correct for a closer 'torus' distance.
As n is getting larger, the calculation costs can go sky high as the numbers go up. But, the torus effect, is probably not needed as there is always a point nearby without 'wrap around'.
You can easily test this, because for a grid of size 1, if a point is found of distance 1/2 or closer, you know there is not a closer torus point (right?)
import numpy as np
n=10000
np.random.seed(1)
A = np.random.randint(low=0, high=10, size=(n,n))
I create 10000x10000 points, and store the location of the 1's in ONES.
ONES = np.argwhere(A == 0)
Now I define my torus distance, which is trying which of the 9 mirrors is the closest.
def distance_on_torus( point=[500,500] ):
index_diff = [[1],[1],[0],[0],[0,1],[0,1],[0,1],[0,1]]
coord_diff = [[-1],[1],[-1],[1],[-1,-1],[-1,1],[1,-1],[1,1]]
tree = BallTree( ONES, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances = [dist[0]]
for indici_to_shift, coord_direction in zip(index_diff, coord_diff):
MIRROR = ONES.copy()
for i,shift in zip(indici_to_shift,coord_direction):
MIRROR[:,i] = MIRROR[:,i] + (shift * n)
tree = BallTree( MIRROR, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances.append(dist[0])
return np.min(distances)
%%time
distance_on_torus([2,3])
It is slow, the above takes 15 minutes.... For n = 1000 less than a second.
A optimisation would be to first consider the none-torus distance, and if the minimum distance is possibly not the smallest, calculate with only the minimum set of extra 'blocks' around. This will greatly increase speed.

How can I determine which curve is closest to a given set of points?

I have several dataframes which each contain two columns of x and y values, so each row represents a point on a curve. The different dataframes then represent contours on a map. I have another series of data points (fewer in number), and I'd like to see which contour they are closest to on average.
I would like to establish the distance from each datapoint to each point on the curve, with sqrt(x^2+y^2) - sqrt(x_1^2 + y_1^2), add them up for each point on the curve. The trouble is that there are several thousand points on the curve, and there are only a few dozen datapoints to assess, so I can't simply put these in columns next to each other.
I think I need to cycle through the datapoints, checking the sqdistance between them and each point in the curve.
I don't know whether there is an easy function or module that can do this.
Thanks in advance!
Edit: Thanks for the comments. #Alexander: I've tried the vectorize function, as follows, with a sample dataset. I'm actually using contours which comprise several thousand datapoints, and the dataset to compare against are 100+, so I'd like to be able to automate as much as possible. I'm currently able to create a distance measurement from the first datapoint against my contour, but I would ideally like to cycle through j as well. When I try it, it comes up with an error:
import numpy as np
from numpy import vectorize
import pandas as pd
from pandas import DataFrame
df1 = {'X1':['1', '2', '2', '3'], 'Y1':['2', '5', '7', '9']}
df1 = DataFrame(df1, columns=['X1', 'Y1'])
df2 = {'X2':['3', '5', '6'], 'Y2':['10', '15', '16']}
df2 = DataFrame(df2, columns=['X2', 'Y2'])
df1=df1.astype(float)
df2=df2.astype(float)
Distance=pd.DataFrame()
i = range(0, len(df1))
j = range(0, len(df2))
def myfunc(x1, y1, x2, y2):
return np.sqrt((x2-x1)**2+np.sqrt(y2-y1)**2)
vfunc=np.vectorize(myfunc)
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[0]['X2'], df2.iloc[0]['Y2'])
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[1]['X2'], df2.iloc[1]['Y2'])
Distance
General idea
The "curve" is actually a polygon with a lot's of points. There definetly some libraries to calculate the distance between the polygon and the point. But generally it will be something like:
Calculate "approximate distance" to whole polygon, e.g. to the bounding box of a polygon (from point to 4 line segments), or to the center of bounding box
calculate distances to the lines of a polygon. If you have too many points then as an extra step "resolution" of a polygon might be reduced.
Smallest found distance is the distance from point to the polygon.
repeat for each point and each polygon
Existing solutions
Some libraries already can do that:
shapely question, shapely Geo-Python docs
Using shapely in geopandas to calculate distance
scipy.spatial.distance: scipy can be used to calculate distance between arbitrary number of points
numpy.linalg.norm(point1-point2): some answers propose different ways to calculate distance using numpy. Some even show performance benchmarks
sklearn.neighbors: not really about curves and distances to them, but can be used if you want to check "to which area point is most likely related"
And you can always calculate distances yourself using D(x1, y1, x2, y2) = sqrt((x₂-x₁)² + (y₂-y₁)²) and search for best combination of points that gives minimal distance
Example:
# get distance from points of 1 dataset to all the points of another dataset
from scipy.spatial import distance
d = distance.cdist(df1.to_numpy(), df2.to_numpy(), 'euclidean')
print(d)
# Results will be a matrix of all possible distances:
# [[ D(Point_df1_0, Point_df2_0), D(Point_df1_0, Point_df2_1), D(Point_df1_0, Point_df2_2)]
# [ D(Point_df1_1, Point_df2_0), D(Point_df1_1, Point_df2_1), D(Point_df1_1, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_2, Point_df2_1), D(Point_df1_2, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_3, Point_df2_1), D(Point_df1_3, Point_df2_2)]]
[[ 8.24621125 13.60147051 14.86606875]
[ 5.09901951 10.44030651 11.70469991]
[ 3.16227766 8.54400375 9.8488578 ]
[ 1. 6.32455532 7.61577311]]
What to do next is up to you. For example as a metric of "general distance between curves" you can:
Pick smallest values in each row and each column (if you skip some columns/rows, then you might end up with candidate that "matches only a part of contour), and calculate their median: np.median(np.hstack([np.amin(d, axis) for axis in range(len(d.shape))])).
Or you can calculate mean value of:
all the distances: np.median(d)
of "smallest 2/3 of distances": np.median(d[d<np.percentile(d, 66, interpolation='higher')])
of "smallest distances that cover at least each rows and each columns":
for min_value in np.sort(d, None):
chosen_indices = d<=min_value
if np.all(np.hstack([np.amax(chosen_indices, axis) for axis in range(len(chosen_indices.shape))])):
break
similarity = np.median(d[chosen_indices])
Or maybe you can use different type of distance from the begining (e.g. "correlation distance" looks promising to your task)
Maybe use "Procrustes analysis, a similarity test for two data sets" together with distances.
Maybe you can use minkowski distance as a similarity metric.
Alternative approach
Alternative approach would be to use some "geometry" library to compare areas of concave hulls:
Build concave hulls for contours and for "candidate datapoints" (not easy, but possible: using shapely , using concaveman). But if you are sure that your contours are already ordered and without overlapping segments, then you can directly build polygons from those points without need for concave hull.
Use "intersection area" minus "non-common area" as a metric of similarity (shapely can be used for that):
Non-common area is: union - intersection or simply "symmetric difference"
Final metric: intersection.area - symmetric_difference.area (intersection, area)
This approach might be better than processing distances in some situations, for example:
You want to prefer "fewer points covering whole area" over "huge amount of very close points that cover only half of the area"
It's more obvious way to compare candidates with different number of points
But it has it's disadvantages too (just draw some examples on paper and experiment to find them)
Other ideas:
instead of using polygons or concave hull you can:
build a linear ring from your points and then use contour.buffer(some_distance). This way you ignore "internal area" of the contour and only compare contour itself (with tolerance of some_distance). Distance between centroids (or double of that) may be used as value for some_distance
You can build polygons/lines from segments using ops.polygonize
instead of using intersection.area - symmetric_difference.area you can:
Snap one object to another, and then compare snapped object to original
Before comparing real objects you can compare "simpler" versions of the objects to filter out obvious mismatches:
For example you can check if boundaries of objects intersect
Or you can simplify geometries before comparing them
For the distance, you need to change your formula to
def getDistance(x, y, x_i, y_i):
return sqrt((x_i -x)^2 + (y_i - y)^2)
with (x,y) being your datapoint and (x_i, y_i) being a point from the curve.
Consider using NumPy for vectorization. Explicitly looping through your data points will most likely be less efficient, depending on your use case, it might however be quick enough. (If you need to run it on a regular basis, I think vectorization will easily outspeed the explicit way) This could look something like this:
import numpy as np # Universal abbreviation for the module
datapoints = np.random.rand(3,2) # Returns a vector with randomized entries of size 3x2 (Imagine it as 3 sets of x- and y-values
contour1 = np.random.rand(1000, 2) # Other than the size (which is 1000x2) no different than datapoints
contour2 = np.random.rand(1000, 2)
contour3 = np.random.rand(1000, 2)
def squareDistanceUnvectorized(datapoint, contour):
retVal = 0.
print("Using datapoint with values x:{}, y:{}".format(datapoint[0], datapoint[1]))
lengthOfContour = np.size(contour, 0) # This gets you the number of lines in the vector
for pointID in range(lengthOfContour):
squaredXDiff = np.square(contour[pointID,0] - datapoint[0])
squaredYDiff = np.square(contour[pointID,1] - datapoint[1])
retVal += np.sqrt(squaredXDiff + squaredYDiff)
retVal = retVal / lengthOfContour # As we want the average, we are dividing the sum by the element count
return retVal
if __name__ == "__main__":
noOfDatapoints = np.size(datapoints,0)
contID = 0
for currentDPID in range(noOfDatapoints):
dist1 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour1)
dist2 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour2)
dist3 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour3)
if dist1 > dist2 and dist1 > dist3:
contID = 1
elif dist2 > dist1 and dist2 > dist3:
contID = 2
elif dist3 > dist1 and dist3 > dist2:
contID = 3
else:
contID = 0
if contID == 0:
print("Datapoint {} is inbetween two contours".format(currentDPID))
else:
print("Datapoint {} is closest to contour {}".format(currentDPID, contID))
Okay, now moving on to vector-land.
I have taken the liberty to adjust this part to what I think is your dataset. Try it and let me know if it works.
import numpy as np
import pandas as pd
# Generate 1000 points (2-dim Vector) with random values between 0 and 1. Make them strings afterwards.
# This is the first contour
random2Ddata1 = np.random.rand(1000,2)
listOfX1 = [str(x) for x in random2Ddata1[:,0]]
listOfY1 = [str(y) for y in random2Ddata1[:,1]]
# Do the same for a second contour, except that we de-center this 255 units into the first dimension
random2Ddata2 = np.random.rand(1000,2)+[255,0]
listOfX2 = [str(x) for x in random2Ddata2[:,0]]
listOfY2 = [str(y) for y in random2Ddata2[:,1]]
# After this step, our 'contours' are basically two blobs of datapoints whose centers are approx. 255 units apart.
# Generate a set of 4 datapoints and make them a Pandas-DataFrame
datapoints = {'X': ['0.5', '0', '255.5', '0'], 'Y': ['0.5', '0', '0.5', '-254.5']}
datapoints = pd.DataFrame(datapoints, columns=['X', 'Y'])
# Do the same for the two contours
contour1 = {'Xf': listOfX1, 'Yf': listOfY1}
contour1 = pd.DataFrame(contour1, columns=['Xf', 'Yf'])
contour2 = {'Xf': listOfX2, 'Yf': listOfY2}
contour2 = pd.DataFrame(contour2, columns=['Xf', 'Yf'])
# We do now have 4 datapoints.
# - The first datapoint is basically where we expect the mean of the first contour to be.
# Contour 1 consists of 1000 points with x, y- values between 0 and 1
# - The second datapoint is at the origin. Its distances should be similar to the once of the first datapoint
# - The third datapoint would be the result of shifting the first datapoint 255 units into the positive first dimension
# - The fourth datapoint would be the result of shifting the first datapoint 255 units into the negative second dimension
# Transformation into numpy array
# First the x and y values of the data points
dpArray = ((datapoints.values).T).astype(np.float)
c1Array = ((contour1.values).T).astype(np.float)
c2Array = ((contour2.values).T).astype(np.float)
# This did the following:
# - Transform the datapoints and contours into numpy arrays
# - Transpose them afterwards so that if we want all x values, we can write var[0,:] instead of var[:,0].
# A personal preference, maybe
# - Convert all the values into floats.
# Now, we iterate through the contours. If you have a lot of them, putting them into a list beforehand would do the job
for contourid, contour in enumerate([c1Array, c2Array]):
# Now for the datapoints
for _index, _value in enumerate(dpArray[0,:]):
# The next two lines do vectorization magic.
# First, we square the difference between one dpArray entry and the contour x values.
# You might notice that contour[0,:] returns an 1x1000 vector while dpArray[0,_index] is an 1x1 float value.
# This works because dpArray[0,_index] is broadcasted to fit the size of contour[0,:].
dx = np.square(dpArray[0,_index] - contour[0,:])
# The same happens for dpArray[1,_index] and contour[1,:]
dy = np.square(dpArray[1,_index] - contour[1,:])
# Now, we take (for one datapoint and one contour) the mean value and print it.
# You could write it into an array or do basically anything with it that you can imagine
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
# But you want to be able to call this... so here we go, generating a function out of it!
def getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, listOfContourDataFrames):
""" Takes a DataFrame with points and a list of different contours to return the average distance for each combination"""
dpArray = ((datapoints.values).T).astype(np.float)
listOfContours = []
for item in listOfContourDataFrames:
listOfContours.append(((item.values).T).astype(np.float))
retVal = np.zeros((np.size(dpArray,1), len(listOfContours)))
for contourid, contour in enumerate(listOfContours):
for _index, _value in enumerate(dpArray[0,:]):
dx = np.square(dpArray[0,_index] - contour[0,:])
dy = np.square(dpArray[1,_index] - contour[1,:])
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
retVal[_index, contourid] = distance
return retVal
# And just to see that it is, indeed, returning the same results, run it once
getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, [contour1, contour2])

Choosing correct distance from a list

This question is somewhat similar to this. I've gone a bit farther than the OP, though, and I'm in Python 2 (not sure what he was using).
I have a Python function that can determine the distance from a point inside a convex polygon to regularly-defined intervals along the polygon's perimeter. The problem is that it returns "extra" distances that I need to eliminate. (Please note--I suspect this will not work for rectangles yet. I'm not finished with it.)
First, the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# t1.py
#
# Copyright 2015 FRED <fred#matthew24-25>
#
# THIS IS TESTING CODE ONLY. IT WILL BE MOVED INTO THE CORRECT MODULE
# UPON COMPLETION.
#
from __future__ import division
import math
import matplotlib.pyplot as plt
def Dist(center_point, Pairs, deg_Increment):
# I want to set empty lists to store the values of m_lnsgmnt and b_lnsgmnts
# for every iteration of the for loop.
m_linesegments = []
b_linesegments = []
# Scream and die if Pairs[0] is the same as the last element of Pairs--i.e.
# it has already been run once.
#if Pairs[0] == Pairs[len(Pairs)-1]:
##print "The vertices contain duplicate points!"
## Creates a new list containing the original list plus the first element. I did this because, due
## to the way the for loop is set up, the last iteration of the loop subtracts the value of the
## last value of Pairs from the first value. I therefore duplicated the first value.
#elif:
new_Pairs = Pairs + [Pairs[0]]
# This will calculate the slopes and y-intercepts of the linesegments of the polygon.
for a in range(len(Pairs)):
# This calculates the slope of each line segment and appends it to m_linesegments.
m_lnsgmnt = (new_Pairs[a+1][2] - new_Pairs[a][3]) / (new_Pairs[a+1][0] - new_Pairs[a][0])
m_linesegments.append(m_lnsgmnt)
# This calculates the y-intercept of each line segment and appends it to b_linesegments.
b_lnsgmnt = (Pairs[a][4]) - (m_lnsgmnt * Pairs[a][0])
b_linesegments.append(b_lnsgmnt)
# These are temporary testing codes.
print "m_linesegments =", m_linesegments
print "b_linesegments =", b_linesegments
# I want to set empty lists to store the value of m_rys and b_rys for every
# iteration of the for loop.
m_rays = []
b_rays = []
# I need to set a range of degrees the intercepts will be calculated for.
theta = range(0, 360, deg_Increment)
# Temporary testing line.
print "theta =", theta
# Calculate the slope and y-intercepts of the rays radiating from the center_point.
for b in range(len(theta)):
m_rys = math.tan(math.radians(theta[b]))
m_rays.append(m_rys)
b_rys = center_point[1] - (m_rys * center_point[0])
b_rays.append(b_rys)
# Temporary testing lines.
print "m_rays =", m_rays
print "b_rays =", b_rays
# Set empty matrix for Intercepts.
Intercepts = []
angle = []
# Calculate the intersections of the rays with the line segments.
for c in range((360//deg_Increment)):
for d in range(len(Pairs)):
# Calculate the x-coordinates and the y-coordinates of each
# intersection
x_Int = (b_rays[c] - b_linesegments[d]) / (m_linesegments[d] - m_rays[c])
y_Int = ((m_linesegments[d] * x_Int) + b_linesegments[d])
Intercepts.append((x_Int, y_Int))
# Calculates the angle of the ray. Rounding is necessary to
# compensate for binary-decimal errors.
a_ngle = round(math.degrees(math.atan2((y_Int - center_point[1]), (x_Int - center_point[0]))))
# Substitutes positive equivalent for every negative angle,
# i.e. -270 degrees equals 90 degrees.
if a_ngle < 0:
a_ngle = a_ngle + 360
# Selects the angles that correspond to theta
if a_ngle == theta[c]:
angle.append(a_ngle)
print "INT1=", Intercepts
print "angle=", angle
dist = []
# Calculates distance.
for e in range(len(Intercepts) - 1):
distA = math.sqrt(((Intercepts[e][0] - center_point[0])**2) + ((Intercepts[e][5]- center_point[1])**2))
dist.append(distA)
print "dist=", dist
if __name__ == "__main__":
main()
Now, as to how it works:
The code takes 3 inputs: center_point (a point contained in the polygon, given in (x,y) coordinates), Pairs (the vertices of the polygon, also given in (x,y) coordinats), and deg_Increment ( which defines how often to calculate distance).
Let's assume that center_point = (4,5), Pairs = [(1, 4), (3, 8), (7, 2)], and deg_Increment = 20. This means that a polygon is created (sort of) whose vertices are Pairs, and center_point is a point contained inside the polygon.
Now rays are set to radiate from center_point every 20 degrees (which isdeg_Increment). The intersection points of the rays with the perimeter of the polygon are determined, and the distance is calculated using the distance formula.
The only problem is that I'm getting too many distances. :( In my example above, the correct distances are
1.00000 0.85638 0.83712 0.92820 1.20455 2.07086 2.67949 2.29898 2.25083 2.50000 3.05227 2.22683 1.93669 1.91811 2.15767 2.85976 2.96279 1.40513
But my code is returning
dist= [2.5, 1.0, 6.000000000000001, 3.2523178818773006, 0.8563799085248148, 3.0522653889161626, 5.622391569468206, 0.8371216462519347, 2.226834844885431, 37.320508075688686, 0.9282032302755089, 1.9366857335569072, 7.8429970322236064, 1.2045483557883576, 1.9181147622136665, 3.753460385470896, 2.070863609380179, 2.157671808913309, 2.6794919243112276, 12.92820323027545, 2.85976265663383, 2.298981118867903, 2.962792920643178, 5.162096782237789, 2.250827351906659, 1.4051274947736863, 69.47032761621092, 2.4999999999999996, 1.0, 6.000000000000004, 3.2523178818773006, 0.8563799085248148, 3.0522653889161626, 5.622391569468206, 0.8371216462519347, 2.226834844885431, 37.32050807568848, 0.9282032302755087, 1.9366857335569074, 7.842997032223602, 1.2045483557883576, 1.9181147622136665, 3.7534603854708997, 2.0708636093801767, 2.1576718089133085, 2.679491924311227, 12.928203230275532, 2.85976265663383, 2.298981118867903, 2.9627929206431776, 5.162096782237789, 2.250827351906659, 1.4051274947736847]
If anyone can help me get only the correct distances, I'd greatly appreciate it.
Thanks!
And just for reference, here's what my example looks like with the correct distances only:
You're getting too many values in Intercepts because it's being appended to inside the second for-loop [for d in range(len(Pairs))].
You only want one value in Intercept per step through the outer for-loop [for c in range((360//deg_Increment))], so the append to Intercept needs to be in this loop.
I'm not sure what you're doing with the inner loop, but you seem to be calculating a separate intercept for each of the lines that make up the polygon sides. But you only want the one that you're going to hit "first" when going in that direction.
You'll have to add some code to figure out which of the 3 (in this case) sides of the polygon you're actually going to encounter first.

Categories

Resources