I am working in a discrete 2D grid of points in which there are "shapes" that I would like to create points outside of. I have been able to identify the vertices of these points and take convex hulls. So far, this leads to this and all is good and well. The purple here is the shape in question and the red line is the convex contour I have computed.
What I would like to do now is create two neighborhoods of points outside this shape. The first one is a set of points directly outside (as close as the grid size will allow), the second is another set of points but offset some distance away (the distance is not fixed, but rather an input).
I have attempted to write this in Python and get okay results. Here is an example of my current output. The problem is I notice the offsets are not perfect, for example look at the bottom most point in the image I attached. It kinks downwards whereas the original shape does not. It's not too bad in this example, but in other cases where the shape is smaller or if I take a smaller offset it gets worse. I also have an issue where the offsets sometimes overlap, even if they are supposed to be some distance away. I would also like there to be one line in each section of the contour, not two lines (for example in the top left).
My current attempt uses the Shapely package to handle most of the computational geometry. An outline of what I do once I have found the vertices of the convex contour is to offset these vertices by some amount, and interpolate along each pair of vertices to obtain many points alone these lines. Afterwards I use a coordinate transform to identify all points to the nearest grid point. This is how I obtain my final set of points. Below is the actual code I have written.
How can I improve this so I don't run into the issues I described?
Function #1 - Computes the offset points
def OutsidePoints(vertices, dist):
poly_line = LinearRing(vertices)
poly_line_offset = poly_line.buffer(dist, resolution=1, join_style=2, mitre_limit=1).exterior
new_vertices = list(poly_line_offset.coords)
new_vertices = np.asarray(new_vertices)
shape = sg.Polygon(new_vertices)
points = []
for t in np.arange(0, shape.length, step_size):
temp_points = np.transpose(shape.exterior.interpolate(t).xy)
points.append(temp_points[0])
points = np.array(points)
points = np.unique(points, axis=0)
return points
Function #2 - Transforming these points into points that are on my grid
def IndexFinder(points):
index_points = invCoordinateTransform(points)
for i in range(len(index_points)):
for j in range(2):
index_points[i][j] = math.floor(index_points[i][j])
index_points = np.unique(index_points, axis=0)
return index_points
Many thanks!
I have several dataframes which each contain two columns of x and y values, so each row represents a point on a curve. The different dataframes then represent contours on a map. I have another series of data points (fewer in number), and I'd like to see which contour they are closest to on average.
I would like to establish the distance from each datapoint to each point on the curve, with sqrt(x^2+y^2) - sqrt(x_1^2 + y_1^2), add them up for each point on the curve. The trouble is that there are several thousand points on the curve, and there are only a few dozen datapoints to assess, so I can't simply put these in columns next to each other.
I think I need to cycle through the datapoints, checking the sqdistance between them and each point in the curve.
I don't know whether there is an easy function or module that can do this.
Thanks in advance!
Edit: Thanks for the comments. #Alexander: I've tried the vectorize function, as follows, with a sample dataset. I'm actually using contours which comprise several thousand datapoints, and the dataset to compare against are 100+, so I'd like to be able to automate as much as possible. I'm currently able to create a distance measurement from the first datapoint against my contour, but I would ideally like to cycle through j as well. When I try it, it comes up with an error:
import numpy as np
from numpy import vectorize
import pandas as pd
from pandas import DataFrame
df1 = {'X1':['1', '2', '2', '3'], 'Y1':['2', '5', '7', '9']}
df1 = DataFrame(df1, columns=['X1', 'Y1'])
df2 = {'X2':['3', '5', '6'], 'Y2':['10', '15', '16']}
df2 = DataFrame(df2, columns=['X2', 'Y2'])
df1=df1.astype(float)
df2=df2.astype(float)
Distance=pd.DataFrame()
i = range(0, len(df1))
j = range(0, len(df2))
def myfunc(x1, y1, x2, y2):
return np.sqrt((x2-x1)**2+np.sqrt(y2-y1)**2)
vfunc=np.vectorize(myfunc)
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[0]['X2'], df2.iloc[0]['Y2'])
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[1]['X2'], df2.iloc[1]['Y2'])
Distance
General idea
The "curve" is actually a polygon with a lot's of points. There definetly some libraries to calculate the distance between the polygon and the point. But generally it will be something like:
Calculate "approximate distance" to whole polygon, e.g. to the bounding box of a polygon (from point to 4 line segments), or to the center of bounding box
calculate distances to the lines of a polygon. If you have too many points then as an extra step "resolution" of a polygon might be reduced.
Smallest found distance is the distance from point to the polygon.
repeat for each point and each polygon
Existing solutions
Some libraries already can do that:
shapely question, shapely Geo-Python docs
Using shapely in geopandas to calculate distance
scipy.spatial.distance: scipy can be used to calculate distance between arbitrary number of points
numpy.linalg.norm(point1-point2): some answers propose different ways to calculate distance using numpy. Some even show performance benchmarks
sklearn.neighbors: not really about curves and distances to them, but can be used if you want to check "to which area point is most likely related"
And you can always calculate distances yourself using D(x1, y1, x2, y2) = sqrt((x₂-x₁)² + (y₂-y₁)²) and search for best combination of points that gives minimal distance
Example:
# get distance from points of 1 dataset to all the points of another dataset
from scipy.spatial import distance
d = distance.cdist(df1.to_numpy(), df2.to_numpy(), 'euclidean')
print(d)
# Results will be a matrix of all possible distances:
# [[ D(Point_df1_0, Point_df2_0), D(Point_df1_0, Point_df2_1), D(Point_df1_0, Point_df2_2)]
# [ D(Point_df1_1, Point_df2_0), D(Point_df1_1, Point_df2_1), D(Point_df1_1, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_2, Point_df2_1), D(Point_df1_2, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_3, Point_df2_1), D(Point_df1_3, Point_df2_2)]]
[[ 8.24621125 13.60147051 14.86606875]
[ 5.09901951 10.44030651 11.70469991]
[ 3.16227766 8.54400375 9.8488578 ]
[ 1. 6.32455532 7.61577311]]
What to do next is up to you. For example as a metric of "general distance between curves" you can:
Pick smallest values in each row and each column (if you skip some columns/rows, then you might end up with candidate that "matches only a part of contour), and calculate their median: np.median(np.hstack([np.amin(d, axis) for axis in range(len(d.shape))])).
Or you can calculate mean value of:
all the distances: np.median(d)
of "smallest 2/3 of distances": np.median(d[d<np.percentile(d, 66, interpolation='higher')])
of "smallest distances that cover at least each rows and each columns":
for min_value in np.sort(d, None):
chosen_indices = d<=min_value
if np.all(np.hstack([np.amax(chosen_indices, axis) for axis in range(len(chosen_indices.shape))])):
break
similarity = np.median(d[chosen_indices])
Or maybe you can use different type of distance from the begining (e.g. "correlation distance" looks promising to your task)
Maybe use "Procrustes analysis, a similarity test for two data sets" together with distances.
Maybe you can use minkowski distance as a similarity metric.
Alternative approach
Alternative approach would be to use some "geometry" library to compare areas of concave hulls:
Build concave hulls for contours and for "candidate datapoints" (not easy, but possible: using shapely , using concaveman). But if you are sure that your contours are already ordered and without overlapping segments, then you can directly build polygons from those points without need for concave hull.
Use "intersection area" minus "non-common area" as a metric of similarity (shapely can be used for that):
Non-common area is: union - intersection or simply "symmetric difference"
Final metric: intersection.area - symmetric_difference.area (intersection, area)
This approach might be better than processing distances in some situations, for example:
You want to prefer "fewer points covering whole area" over "huge amount of very close points that cover only half of the area"
It's more obvious way to compare candidates with different number of points
But it has it's disadvantages too (just draw some examples on paper and experiment to find them)
Other ideas:
instead of using polygons or concave hull you can:
build a linear ring from your points and then use contour.buffer(some_distance). This way you ignore "internal area" of the contour and only compare contour itself (with tolerance of some_distance). Distance between centroids (or double of that) may be used as value for some_distance
You can build polygons/lines from segments using ops.polygonize
instead of using intersection.area - symmetric_difference.area you can:
Snap one object to another, and then compare snapped object to original
Before comparing real objects you can compare "simpler" versions of the objects to filter out obvious mismatches:
For example you can check if boundaries of objects intersect
Or you can simplify geometries before comparing them
For the distance, you need to change your formula to
def getDistance(x, y, x_i, y_i):
return sqrt((x_i -x)^2 + (y_i - y)^2)
with (x,y) being your datapoint and (x_i, y_i) being a point from the curve.
Consider using NumPy for vectorization. Explicitly looping through your data points will most likely be less efficient, depending on your use case, it might however be quick enough. (If you need to run it on a regular basis, I think vectorization will easily outspeed the explicit way) This could look something like this:
import numpy as np # Universal abbreviation for the module
datapoints = np.random.rand(3,2) # Returns a vector with randomized entries of size 3x2 (Imagine it as 3 sets of x- and y-values
contour1 = np.random.rand(1000, 2) # Other than the size (which is 1000x2) no different than datapoints
contour2 = np.random.rand(1000, 2)
contour3 = np.random.rand(1000, 2)
def squareDistanceUnvectorized(datapoint, contour):
retVal = 0.
print("Using datapoint with values x:{}, y:{}".format(datapoint[0], datapoint[1]))
lengthOfContour = np.size(contour, 0) # This gets you the number of lines in the vector
for pointID in range(lengthOfContour):
squaredXDiff = np.square(contour[pointID,0] - datapoint[0])
squaredYDiff = np.square(contour[pointID,1] - datapoint[1])
retVal += np.sqrt(squaredXDiff + squaredYDiff)
retVal = retVal / lengthOfContour # As we want the average, we are dividing the sum by the element count
return retVal
if __name__ == "__main__":
noOfDatapoints = np.size(datapoints,0)
contID = 0
for currentDPID in range(noOfDatapoints):
dist1 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour1)
dist2 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour2)
dist3 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour3)
if dist1 > dist2 and dist1 > dist3:
contID = 1
elif dist2 > dist1 and dist2 > dist3:
contID = 2
elif dist3 > dist1 and dist3 > dist2:
contID = 3
else:
contID = 0
if contID == 0:
print("Datapoint {} is inbetween two contours".format(currentDPID))
else:
print("Datapoint {} is closest to contour {}".format(currentDPID, contID))
Okay, now moving on to vector-land.
I have taken the liberty to adjust this part to what I think is your dataset. Try it and let me know if it works.
import numpy as np
import pandas as pd
# Generate 1000 points (2-dim Vector) with random values between 0 and 1. Make them strings afterwards.
# This is the first contour
random2Ddata1 = np.random.rand(1000,2)
listOfX1 = [str(x) for x in random2Ddata1[:,0]]
listOfY1 = [str(y) for y in random2Ddata1[:,1]]
# Do the same for a second contour, except that we de-center this 255 units into the first dimension
random2Ddata2 = np.random.rand(1000,2)+[255,0]
listOfX2 = [str(x) for x in random2Ddata2[:,0]]
listOfY2 = [str(y) for y in random2Ddata2[:,1]]
# After this step, our 'contours' are basically two blobs of datapoints whose centers are approx. 255 units apart.
# Generate a set of 4 datapoints and make them a Pandas-DataFrame
datapoints = {'X': ['0.5', '0', '255.5', '0'], 'Y': ['0.5', '0', '0.5', '-254.5']}
datapoints = pd.DataFrame(datapoints, columns=['X', 'Y'])
# Do the same for the two contours
contour1 = {'Xf': listOfX1, 'Yf': listOfY1}
contour1 = pd.DataFrame(contour1, columns=['Xf', 'Yf'])
contour2 = {'Xf': listOfX2, 'Yf': listOfY2}
contour2 = pd.DataFrame(contour2, columns=['Xf', 'Yf'])
# We do now have 4 datapoints.
# - The first datapoint is basically where we expect the mean of the first contour to be.
# Contour 1 consists of 1000 points with x, y- values between 0 and 1
# - The second datapoint is at the origin. Its distances should be similar to the once of the first datapoint
# - The third datapoint would be the result of shifting the first datapoint 255 units into the positive first dimension
# - The fourth datapoint would be the result of shifting the first datapoint 255 units into the negative second dimension
# Transformation into numpy array
# First the x and y values of the data points
dpArray = ((datapoints.values).T).astype(np.float)
c1Array = ((contour1.values).T).astype(np.float)
c2Array = ((contour2.values).T).astype(np.float)
# This did the following:
# - Transform the datapoints and contours into numpy arrays
# - Transpose them afterwards so that if we want all x values, we can write var[0,:] instead of var[:,0].
# A personal preference, maybe
# - Convert all the values into floats.
# Now, we iterate through the contours. If you have a lot of them, putting them into a list beforehand would do the job
for contourid, contour in enumerate([c1Array, c2Array]):
# Now for the datapoints
for _index, _value in enumerate(dpArray[0,:]):
# The next two lines do vectorization magic.
# First, we square the difference between one dpArray entry and the contour x values.
# You might notice that contour[0,:] returns an 1x1000 vector while dpArray[0,_index] is an 1x1 float value.
# This works because dpArray[0,_index] is broadcasted to fit the size of contour[0,:].
dx = np.square(dpArray[0,_index] - contour[0,:])
# The same happens for dpArray[1,_index] and contour[1,:]
dy = np.square(dpArray[1,_index] - contour[1,:])
# Now, we take (for one datapoint and one contour) the mean value and print it.
# You could write it into an array or do basically anything with it that you can imagine
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
# But you want to be able to call this... so here we go, generating a function out of it!
def getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, listOfContourDataFrames):
""" Takes a DataFrame with points and a list of different contours to return the average distance for each combination"""
dpArray = ((datapoints.values).T).astype(np.float)
listOfContours = []
for item in listOfContourDataFrames:
listOfContours.append(((item.values).T).astype(np.float))
retVal = np.zeros((np.size(dpArray,1), len(listOfContours)))
for contourid, contour in enumerate(listOfContours):
for _index, _value in enumerate(dpArray[0,:]):
dx = np.square(dpArray[0,_index] - contour[0,:])
dy = np.square(dpArray[1,_index] - contour[1,:])
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
retVal[_index, contourid] = distance
return retVal
# And just to see that it is, indeed, returning the same results, run it once
getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, [contour1, contour2])
This sort of question is a tad bit different the normal 'how to find the intersection of two lines' via numpy. Here is the situation, I am creating a program that looks at slope stability and I need to find where a circle intersects a line.
I have two numpy arrays:
One array gives me a normal (x, y) values of an elevation profile in 2D
The other array is calculated values of coordinates (x, y) that spans the circumference of a circle from a defined centre.
I need to somehow compare the two at what approximate point does the coordinates of the circle intersect the profile line?
Here some data to work with:
circ_coords = np.array([
[.71,.71],
[0.,1.]
])
linear_profile = np.array([
[0.,0.],
[1.,1.]
])
I need a function that would spit out say a single or multiple coordinate values saying that based on these circular coordinates and your linear profile.. the two would intersect here.
def intersect(array1, array2):
# stuff
return computed_array
You can solve it algebraically. The parametric representation of points (x,y) on the line segment between (x1,y1) and (x2,y2) is:
x=tx1+(1−t)x2 and y=ty1+(1−t)y2,
where 0≤t≤1.
If you substitute it in the equation of the circle and solve the resulting quadratic equation for t, you can test if 0≤t01≤1, i.e line segment intersets with circle. The t01 values could be than used to calculate intersection points.
Shapely has some cool functions. According to this post, this code should work:
from shapely.geometry import LineString
from shapely.geometry import Point
p = Point(0,0)//center
c = p.buffer(0.71).boundary//radius
l = LineString([(0.,0.), (1., 1.)])//line point
i = c.intersection(l)
Apparently here i is the array you are looking for, also, check this post too. Hope this helps.
Below given is an example image where 'center-point' is (x0,y0) (the center of the wheel). Other points are the other ends of the spoke. The distance between 'center-point" and the other end of spoke may be different (spokes of different length). These all points are in cartesian coordinate system.
I need to find here the largest angle made by any two consecutive spoke. In this fig all the angles are same but assume that any one of the spoke is missing, then we will have that angle as the the largest angle at origin.
My take:
I am calculating the angle created by each edge with respect to x axis one at a time subtracting with the previous one (that gives angle between two spoke). I am keeping track of the largest angle, everytime updating it if I encounter an angle larger than the previous. My method works but just wondering if any efficient method is available to find the same.
Assuming you want the angle between two spokes, I suggest you convert the data points to polar/complex co-ordinates, this is made easy in the cmath module, and allows you to do something like this (phase takes out just the angle about centre):
import cmath
def largest_spoke_angle(centre, peripheral):
per_from_centre = [complex(z[0]-centre[0], z[1]-centre[1]) for z in peripheral]
per_angles = [cmath.phase(z) for z in per_from_centre]
per_angles.sort()
differences = [ per_angles[n+1]-per_angles[n] for n in range(len(per_angles)-1)] \
+ [per_angles[0] +2*cmath.pi - per_angles[-1]]
return max(differences)#in radians
centre = (0.,0.)
peripheral = [(1.,2.),(3.,4.),(3.,5.)]
print largest_spoke_angle(centre, peripheral)
I think I would do something like this:
angles = [get_angle_from_xaxis(origin,point) for point in points]
#make sure the angles are in order
angles.sort()
#need to compare last one with first one
angles.insert(0,angles[-1]-360.0) #360 if degrees, otherwise 2*math.pi.
#Now calculate the difference between adjacent angles and take the maximum
maxangle = max( angles[i] - angle for i,angle in enumerate(angles[:-1],1) )
This is basically the solution you describe. The only thing I've added is a check between the last and first and a sort to make sure we have the angles in the right order.
The answer of #user1597034 is correct. But it's not possible to get which spokes resulted in the largest angle.
The code below finds the indices of the two vectors of largest angle:
import cmath
import numpy as np
center = (0.,0.)
peripheral = np.array([(-1.,-1.),(0.,1.),(1.,-0.55), (0,-1), (-1,1)])
per_from_centre = [complex(z[0]-center[0], z[1]-center[1]) for z in peripheral]
per_angles = [cmath.phase(z) for z in per_from_centre]
id_ord = np.argsort(per_angles,axis=-1) # order index
per_angles.sort()
differences = [ per_angles[n+1]-per_angles[n] for n in range(len(per_angles)-1)] \
+ [per_angles[0] +2*cmath.pi - per_angles[-1]]
# ----- so far, same code in relation to #user1597034 -----
# find index of adjacent angles of greater angle
max_value = max(differences) # maximum value
for i in range(len(differences)):
if max_value == differences[i]:
if i == (len(differences)-1):
pairs = [id_ord[0], id_ord[-1]]
else:
pairs = [id_ord[i]] + [id_ord[i+1]]
print('pair index of largest angle:',pairs)
pair index of largest angle: [2, 1]