Calculate minimum distance between a point and a polygon in geopandas - python

I am trying to compute the minimum distance between a set of points and a set of polygons.
My code looks like this:
polys = sf.geometry.tolist()
cities = sf.CITY_LABEL.tolist()
min_dist = np.empty(len(points))
min_city = ['NA'] * len(points)
min_coord = ['NA'] * len(points)
inside = ['NA'] * len(points)
for i, point in enumerate(points):
aux = sf.boundary.distance(point).tolist()
idx = aux.index(min(aux))
min_dist[i] = aux[idx]
min_city[i] = cities[idx]
min_coord[i] = polys[idx].boundary.interpolate(polys[idx].boundary.project(point)).wkt
inside[i] = polys[idx].contains(point)
where the variable points contain the points and the variable sf is my shapefile with polygons.
I then save to a file the min distance in degrees, km (deg*111), closest polygon name (a city), the closest polygon point, and whether the point is inside the polygon. So, I have something like this:
But the minimum distance I computed (column D) is larger than the distance between the point (column A) and the polygon closest point (column F).
Any idea what am I doing wrong? Why is the distance function returning a different distance than the distance between column A and F?
(I also replicated the code in R, and indeed the minimum distance I get is the one computed between column A and F of the file above, not the one in column D)

Related

Using two sets of coordinate points, find points that are closest to each other in python

I have two DataFrames (df1, df2) with differing sizes, but the same overall columns. Both have time stamps and latitude and longitude points. The time stamps and coordinates are the same for many points because of the frequency at which the data was collected. Here is an example of the DataFrame:
time_local
Lat
Long
2021-09-08 12:56:32-04:00
37.1455
-85.0555
2021-09-08 12:56:32-04:00
37.1455
-85.0555
2021-09-08 12:56:32-04:00
37.1455
-85.0555
.........................
.......
........
The second DataFrame is the same; however, there are differences in some of the coordinate points throughout. I want to select the points in the first dataframe (df1) closest to the points in the second dataframe (df2); for example, if I had the following coordinate base points of (37.1455, -85.0555) and then (37.1454, -85.0555), (37.1454, -85.0556), (37.1453, -85.0556) then the closest point selected would be (37.1455, -85.0555).
Is there a function within Python that can do this easily enough?
Yes, what we require here is some math. The distance formula for coordinates would help us.
Formula:1
Here, x2 represents the second value and x1 the first. Same goes with y.
Putting it in a code (Cartesian Plane):
points = [(37.1454, -85.0555), (37.1454, -85.0556), (37.1453, -85.0556)]
origin = (37.1455, -85.0555)
def distance(cord1,cord2):
x1, y1 = cord1
x2, y2 = cord2
res = ((x2 - x1)**2 + (y2 - y1)**2)**0.5 # Raising to 0.5 is nothing but square root
return res
def closest_point(origin,points):
distances = [distance(origin, point) for point in points]
return points[distances.index(min(distances))] # Fetches the index from points based on smallest value
print(closest_point(origin,points))
For flat surfaces (where only y-coordinate matters):
points = [(37.1454, -85.0555), (37.1454, -85.0556), (37.1453, -85.0556)]
origin = (37.1455, -85.0555)
def closest_point(origin,points):
distances = [origin[1]-point[1] for point in points]
return points[distances.index(min(distances))]
print(closest_point(origin,points))

How to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus

Given a 10x10 grid (2d-array) filled randomly with numbers, either 0, 1 or 2. How can I find the Euclidean distance (the l2-norm of the distance vector) between two given points considering periodic boundaries?
Let us consider an arbitrary grid point called centre. Now, I want to find the nearest grid point containing the same value as centre. I need to take periodic boundaries into account, such that the matrix/grid can be seen rather as a torus instead of a flat plane. In that case, say the centre = matrix[0,2], and we find that there is the same number in matrix[9,2], which would be at the southern boundary of the matrix. The Euclidean distance computed with my code would be for this example np.sqrt(0**2 + 9**2) = 9.0. However, because of periodic boundaries, the distance should actually be 1, because matrix[9,2] is the northern neighbour of matrix[0,2]. Hence, if periodic boundary values are implemented correctly, distances of magnitude above 8 should not exist.
So, I would be interested on how to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus by applying a wrap-around for the boundaries.
import numpy as np
matrix = np.random.randint(0,3,(10,10))
centre = matrix[0,2]
#rewrite the centre to be the number 5 (to exclude itself as shortest distance)
matrix[0,2] = 5
#find the points where entries are same as centre
same = np.where((matrix == centre) == True)
idx_row, idx_col = same
#find distances from centre to all values which are of same value
dist = np.zeros(len(same[0]))
for i in range(0,len(same[0])):
delta_row = same[0][i] - 0 #row coord of centre
delta_col = same[1][i] - 2 #col coord of centre
dist[i] = np.sqrt(delta_row**2 + delta_col**2)
#retrieve the index of the smallest distance
idx = dist.argmin()
print('Centre value: %i. The nearest cell with same value is at (%i,%i)'
% (centre, same[0][idx],same[1][idx]))
For each axis, you can check whether the distance is shorter when you wrap around or when you don't. Consider the row axis, with rows i and j.
When not wrapping around, the difference is abs(i - j).
When wrapping around, the difference is "flipped", as in 10 - abs(i - j). In your example with i == 0 and j == 9 you can check that this correctly produces a distance of 1.
Then simply take whichever is smaller:
delta_row = same[0][i] - 0 #row coord of centre
delta_row = min(delta_row, 10 - delta_row)
And similarly for delta_column.
The final dist[i] calculation needs no changes.
I have a working 'sketch' of how this could work. In short, I calculate the distance 9 times, 1 for the normal distance, and 8 shifts to possibly correct for a closer 'torus' distance.
As n is getting larger, the calculation costs can go sky high as the numbers go up. But, the torus effect, is probably not needed as there is always a point nearby without 'wrap around'.
You can easily test this, because for a grid of size 1, if a point is found of distance 1/2 or closer, you know there is not a closer torus point (right?)
import numpy as np
n=10000
np.random.seed(1)
A = np.random.randint(low=0, high=10, size=(n,n))
I create 10000x10000 points, and store the location of the 1's in ONES.
ONES = np.argwhere(A == 0)
Now I define my torus distance, which is trying which of the 9 mirrors is the closest.
def distance_on_torus( point=[500,500] ):
index_diff = [[1],[1],[0],[0],[0,1],[0,1],[0,1],[0,1]]
coord_diff = [[-1],[1],[-1],[1],[-1,-1],[-1,1],[1,-1],[1,1]]
tree = BallTree( ONES, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances = [dist[0]]
for indici_to_shift, coord_direction in zip(index_diff, coord_diff):
MIRROR = ONES.copy()
for i,shift in zip(indici_to_shift,coord_direction):
MIRROR[:,i] = MIRROR[:,i] + (shift * n)
tree = BallTree( MIRROR, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances.append(dist[0])
return np.min(distances)
%%time
distance_on_torus([2,3])
It is slow, the above takes 15 minutes.... For n = 1000 less than a second.
A optimisation would be to first consider the none-torus distance, and if the minimum distance is possibly not the smallest, calculate with only the minimum set of extra 'blocks' around. This will greatly increase speed.

K-nearest points from two dataframes with GeoPandas

GeoPandas uses shapely under the hood. To get the nearest neighbor I saw the use of nearest_points from shapely. However, this approach does not include k-nearest points.
I needed to compute distances to nearest points from to GeoDataFrames and insert the distance into the GeoDataFrame containing the "from this point" data.
This is my approach using GeoSeries.distance() without using another package or library. Note that when k == 1 the returned value essentially shows the distance to the nearest point. There is also a GeoPandas-only solution for nearest point by #cd98 which inspired my approach.
This works well for my data, but I wonder if there is a better or faster approach or another benefit to use shapely or sklearn.neighbors?
import pandas as pd
import geopandas as gp
gdf1 > GeoDataFrame with point type geometry column - distance from this point
gdf2 > GeoDataFrame with point type geometry column - distance to this point
def knearest(from_points, to_points, k):
distlist = to_points.distance(from_points)
distlist.sort_values(ascending=True, inplace=True) # To have the closest ones first
return distlist[:k].mean()
# looping through a list of nearest points
for Ks in [1, 2, 3, 4, 5, 10]:
name = 'dist_to_closest_' + str(Ks) # to set column name
gdf1[name] = gdf1.geometry.apply(knearest, args=(gdf2, closest_x))
yes there is, but first, I must credit the University of Helsinki from automating GIS process, here's the source code. Here's how
first, read the data, for example, finding nearest bus stops for each building.
# Filepaths
stops = gpd.read_file('data/pt_stops_helsinki.gpkg')
buildings = read_gdf_from_zip('data/building_points_helsinki.zip')
define the function, here, you can adjust the k_neighbors
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
Do the nearest neighbours analysis
# Find closest public transport stop for each building and get also the distance based on haversine distance
# Note: haversine distance which is implemented here is a bit slower than using e.g. 'euclidean' metric
# but useful as we get the distance between points in meters
closest_stops = nearest_neighbor(buildings, stops, return_dist=True)
now join the from and to data frame
# Rename the geometry of closest stops gdf so that we can easily identify it
closest_stops = closest_stops.rename(columns={'geometry': 'closest_stop_geom'})
# Merge the datasets by index (for this, it is good to use '.join()' -function)
buildings = buildings.join(closest_stops)
The answer above using Automating GIS-processes is really nice but there is an error when converting points as a numpy array as RADIANS. The latitude and longitude are reversed.
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.y * np.pi / 180, geom.x * np.pi / 180)).to_list())
Indeed Points are given with (lat, lon) but the longitude correspond the x-axis of a plan or a sphere and the latitude to the y-axis.
If your data are in grid coordinates, then the approach is a bit leaner, but with one key gotcha.
Building on sutan's answer and streamlining the block from the Uni Helsinki...
To get multiple neighbors, you edit the k_neighbors argument....and must ALSO hard code vars within the body of the function (see my additions below 'closest' and 'closest_dist') AND add them to the return statement.
Thus, if you want the 2 closest points, it looks like:
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=2):
"""
Find nearest neighbors for all source points from a set of candidate points
modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='euclidean')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
closest_second = indices[1] # *manually add per comment above*
closest_second_dist = distances[1] # *manually add per comment above*
# Return indices and distances
return (closest, closest_dist, closest_sec, closest_sec_dist)
The inputs are lists of (x,y) tuples. Thus, since (by question title) your data is in a GeoDataframe:
# easier to read
in_pts = [(row.geometry.x, row.geometry.y) for idx, row in gdf1.iterrows()]
qry_pts = [(row.geometry.x, row.geometry.y) for idx, row in gdf2.iterrows()]
# faster (by about 7X)
in_pts = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
qry_pts = [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]
I'm not interested in distances, so instead of commenting out of the function, I run:
idx_nearest, _, idx_2ndnearest, _ = get_nearest(in_pts, qry_pts)
and get two arrays of the same length of in_pts that, respectively, contain index values of the closest and second closest points from the original geodataframe for qry_pts.
Great solution! If you are using Automating GIS-processes solution, make sure to reset the index of buildings geoDataFrame before join (only if you are using a subset of left_gdf).
buildings.insert(0, 'Number', range(0,len(buildings)))
buildings.set_index('Number' , inplace = True)
Based on the answers before I have a all-in-one solution for you which takes two geopandas.DataFrames as input and searches for the nearest k-neighbors.
def get_nearest_neighbors(gdf1, gdf2, k_neighbors=2):
'''
Find k nearest neighbors for all source points from a set of candidate points
modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
Parameters
----------
gdf1 : geopandas.DataFrame
Geometries to search from.
gdf2 : geopandas.DataFrame
Geoemtries to be searched.
k_neighbors : int, optional
Number of nearest neighbors. The default is 2.
Returns
-------
gdf_final : geopandas.DataFrame
gdf1 with distance, index and all other columns from gdf2.
'''
src_points = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
candidates = [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='euclidean')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
closest_gdfs = []
for k in np.arange(k_neighbors):
gdf_new = gdf2.iloc[indices[k]].reset_index()
gdf_new['distance'] = distances[k]
gdf_new = gdf_new.add_suffix(f'_{k+1}')
closest_gdfs.append(gdf_new)
closest_gdfs.insert(0,gdf1)
gdf_final = pd.concat(closest_gdfs,axis=1)
return gdf_final

python: elegant way of finding the GPS coordinates of a circle around a certain GPS location

I have a set of GPS coordinates in decimal notation, and I'm looking for a way to find the coordinates in a circle with variable radius around each location.
Here is an example of what I need. It is a circle with 1km radius around the coordinate 47,11.
What I need is the algorithm for finding the coordinates of the circle, so I can use it in my kml file using a polygon. Ideally for python.
see also Adding distance to a GPS coordinate for simple relations between lat/lon and short-range distances.
this works:
import math
# inputs
radius = 1000.0 # m - the following code is an approximation that stays reasonably accurate for distances < 100km
centerLat = 30.0 # latitude of circle center, decimal degrees
centerLon = -100.0 # Longitude of circle center, decimal degrees
# parameters
N = 10 # number of discrete sample points to be generated along the circle
# generate points
circlePoints = []
for k in xrange(N):
# compute
angle = math.pi*2*k/N
dx = radius*math.cos(angle)
dy = radius*math.sin(angle)
point = {}
point['lat']=centerLat + (180/math.pi)*(dy/6378137)
point['lon']=centerLon + (180/math.pi)*(dx/6378137)/math.cos(centerLat*math.pi/180)
# add to list
circlePoints.append(point)
print circlePoints
Use the formula for "Destination point given distance and bearing from start point" here:
http://www.movable-type.co.uk/scripts/latlong.html
with your centre point as start point, your radius as distance, and loop over a number of bearings from 0 degrees to 360 degrees. That will give you the points on a circle, and will work at the poles because it uses great circles everywhere.
It is a simple trigonometry problem.
Set your coordinate system XOY at your circle centre. Start from y = 0 and find your x value with x = r. Then just rotate your radius around origin by angle a (in radians). You can find the coordinates of your next point on the circle with Xi = r * cos(a), Yi = r * sin(a). Repeat the last 2 * Pi / a times.
That's all.
UPDATE
Taking the comment of #poolie into account, the problem can be solved in the following way (assuming the Earth being the right sphere). Consider a cross section of the Earth with its largest diameter D through our point (call it L). The diameter of 1 km length of our circle then becomes a chord (call it AB) of the Earth cross section circle. So, the length of the arc AB becomes (AB) = D * Theta, where Theta = 2 * sin(|AB| / 2). Further, it is easy to find all other dimensions.

Python how calculate a polygon perimeter using an osgeo.ogr.Geometry object

First of all, I apologize to post this easy question. I need to compute a certain number of gemotrical attributes (area, perimeters, Roundess, major and minor axis, etc). I am using GDAL/OGR to read a shapefile format of my polygon. What i wish to ask is:
is there a method to compute the perimeter using osgeo.ogr.Geometry?
is there a module build to compute metrics on polygon?
thanks in advance
import osgeo.gdal, ogr
poly="C:\\\myshape.shp"
shp = osgeo.ogr.Open(poly)
layer = shp.GetLayer()
# For every polygon
for index in xrange(len(allFID)):
feature = layer.GetFeature(index)
# get "FID" (Feature ID)
FID = str(feature.GetFID())
geometry = feature.GetGeometryRef()
# get the area
Area = geometry.GetArea()
ref_geometry = ref_feature.GetGeometryRef()
pts = ref_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
def edges_index(points):
"""
compute edges index for a given 2D point set
1- The number of edges which form the polygon
2- Perimeter
3- The length of the longest edge in a polygon
4- The length of the shortest edge in a polygon
5- The average length of all of edges in a polygon
6- The lengths of edges deviate from their mean value
"""
Nedges = len(points)-1
length = []
for i in xrange(Nedges):
ax, ay = points[i]
bx, by = points[i+1]
length.append(math.hypot(bx-ax, by-ay))
edges_perimeter = numpy.sum(length)
edges_max = numpy.amax(length)
edges_min = numpy.amin(length)
edges_average = numpy.average(length)
edges_std = numpy.std(length)
return (Nedges,edges_perimeter,edges_max,edges_min,edges_average,edges_std)
I might be late on this one but i was looking for a solution to the same question and i happen to have chanced on this one. I solved the issue by simply finding the boundary of the geometry and then finding the length of the boundary. Sample Python code below:
perimeter = feat.GetGeometryRef().Boundary().Length()
poly = [(0,10),(10,10),(10,0),(0,0)]
def segments(poly):
"""A sequence of (x,y) numeric coordinates pairs """
return zip(poly, poly[1:] + [poly[0]])
def area(poly):
"""A sequence of (x,y) numeric coordinates pairs """
return 0.5 * abs(sum(x0*y1 - x1*y0
for ((x0, y0), (x1, y1)) in segments(poly)))
def perimeter(poly):
"""A sequence of (x,y) numeric coordinates pairs """
return abs(sum(math.hypot(x0-x1,y0-y1) for ((x0, y0), (x1, y1)) in segments(poly)))

Categories

Resources