distance from a point to the nearest edge of a polygon - python

in the below code i want to calculate the distance from a point to the nearest edge of a polygon.as shown in the results section below, the coordinates are provided.the code posted below shows how i find the distance from a point to the neatrest edge of a polygon.
at run time, and as shown below in restults section, for the give point and geometry, the distance from postgis is equal to 4.32797817574802 while the one calculated from geopandas gives 3.8954865274727614e-05
please let me know how to find the distance from a point to nearest edge of a polygon.
code
poly = wkt.loads(fieldCoordinatesAsTextInWKTInEPSG25832)
pt = wkt.loads(centerPointointAsTextInWKTInEPSG25832)
print(poly.distance(pt)))
results:
queryPostgreSQLForDistancesFromPointsToPolygon:4.32797817574802#result from postgis using st_distance operator
centerPointointAsTextInWKTInEPSG4326:POINT(6.7419520458647835 51.08427961641239)
centerPointointAsTextInWKTInEPSG25832:POINT(341849.5 5661622.5)
centerPointointAsTextInWKTInEPSG4326:POINT(6.7419520458647835 51.08427961641239)
fieldCoordinatesAsTextInWKTInEPSG25832:POLYGON ((5622486.93624152 1003060.89945681,5622079.52632924 1003170.95198635,5622126.00418918 1003781.73122161,5622444.73987453 1003694.55868486,5622486.93624152 1003060.89945681))
fieldCoordinatesAsTextInWKTInEPSG4326:POLYGON((6.741879696309871 51.08423775429969,6.742907378503366 51.08158745820981,6.746964018740842 51.08233499299334,6.746152690693346 51.08440763989611,6.741879696309871 51.08423775429969))
poly.distance(pt):3.8954865274727614e-05#result from geopandas

your code works. It's approx 7000km from Belgium to Ethiopia
are you sure your data is correct? Have built a plotly graph to show where buffered polygon, polygon centroid and point are located in EPSG:4326 CRS
from shapely import wkt
import geopandas as gpd
import plotly.express as px
import json
# queryPostgreSQLForDistancesFromPointsToPolygon:4.32797817574802#result from postgis using st_distance operator
centerPointointAsTextInWKTInEPSG4326 = "POINT(6.7419520458647835 51.08427961641239)"
centerPointointAsTextInWKTInEPSG25832 = "POINT(341849.5 5661622.5)"
centerPointointAsTextInWKTInEPSG4326 = "POINT(6.7419520458647835 51.08427961641239)"
fieldCoordinatesAsTextInWKTInEPSG25832 = "POLYGON ((5622486.93624152 1003060.89945681,5622079.52632924 1003170.95198635,5622126.00418918 1003781.73122161,5622444.73987453 1003694.55868486,5622486.93624152 1003060.89945681))"
fieldCoordinatesAsTextInWKTInEPSG4326 = "POLYGON((6.741879696309871 51.08423775429969,6.742907378503366 51.08158745820981,6.746964018740842 51.08233499299334,6.746152690693346 51.08440763989611,6.741879696309871 51.08423775429969))"
# poly.distance(pt):3.8954865274727614e-05#result from geopandas
poly = wkt.loads(fieldCoordinatesAsTextInWKTInEPSG25832)
pt = wkt.loads(centerPointointAsTextInWKTInEPSG25832)
print(poly.distance(pt)/10**3)
# let's visualize it....
gpoly = (
gpd.GeoDataFrame(geometry=[poly], crs="EPSG:25832")
.buffer(10 ** 6)
.to_crs("EPSG:4326")
)
gpoly.plot()
gpt = gpd.GeoDataFrame(geometry=[pt, poly.centroid], crs="EPSG:25832").to_crs(
"EPSG:4326"
)
px.scatter_mapbox(
gpt.assign(dist=poly.distance(pt)/10**3),
lat=gpt.geometry.y,
lon=gpt.geometry.x,
hover_data={"dist":":.0f"},
).update_layout(
mapbox={
"style": "carto-positron",
"zoom": 4,
"layers": [
{
"source": json.loads(gpoly.to_json()),
"below": "traces",
"type": "fill",
"color": "red",
}
],
}
)

Related

Wrong points buffer using geopandas

Good evening,
I'm working on a product to detect local events (strike) within subscription areas.
The yellow polygons should be 40KM (left) and 50KM (right) circles around central red points. Green points are my strikes that should be detected in my process.
It appears that my current use of buffer() does not produce 40/50 Km buffer radius as expected and then my process in missing my two events .
My code:
# Create my two events to detect
df_strike = pd.DataFrame(
{ 'Latitude': [27.0779, 31.9974],
'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})
# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
{ 'perimeter_id': [1370, 13858],
'distance' : [40.0, 50.0],
'custom_lat': [31.6661, 26.6500],
'custom_lon': [38.6635, 51.5700]})
gdf_locations = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})
# Now reproject to a crs using meters
gdf_locations = gdf_locations.to_crs({'init':'epsg:3857'})
gdf_events = gdf_events.to_crs({'init':'epsg:3857'})
# Create buffer using distance (in meters) from locations
gdf_locations['geometry'] = gdf_locations['geometry'].buffer(gdf_locations['distance']*1000)
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
But my result is an empty dataframe and should not be. If I compute distance between events and locations (distance between red and green points):
pnt1 = Point(27.0779, 51.5144)
pnt2 = Point(26.65, 51.57)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)
Returns: 48662.078723 meters
and
pnt1 = Point(31.9974, 38.7078)
pnt2 = Point(31.6661, 38.6635)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)
Returns: 37417.343796 meters
Then I was expecting to have this result :
>>> pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
subscriber_id perimeter_id distance custom_lat custom_lon geometry index_right Latitude Longitude
0 19664 1370 40.0 31.6661 38.6635 POLYGON ((2230301.324 3642618.584, 2230089.452... 1 31.9974 38.7078
1 91201 13858 50.0 26.6500 51.5700 POLYGON ((3684499.890 3347425.378, 3684235.050... 0 27.0779 51.5144
I think my buffer is at ~47KM and ~38KM instead of 50KM and 40KM as expected. Am I missing something here which could explain that empty result ?
Certain computations with geodataframe's methods that involves distances, namely, .distance(), .buffer() in this particular case, are based on Euclidean geometry and map projection coordinate systems. Their results are not reliable, to always get the correct results one should avoid using them and use direct computation with geographic coordinates instead. Doing so with proper module/library, you will get great-circle arc distances instead of errorneous euclidean distances. Thus avoid mysterious errors.
Here I present the runnable code that show how to proceed along the line that I proposed:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
import cartopy.crs as ccrs
import cartopy
import matplotlib.pyplot as plt
import numpy as np
from pyproj import Geod
# Create my two events to detect
df_strike = pd.DataFrame(
{ 'Latitude': [27.0779, 31.9974],
'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})
# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
{ 'perimeter_id': [1370, 13858],
'distance' : [40.0, 50.0],
'custom_lat': [31.6661, 26.6500],
'custom_lon': [38.6635, 51.5700]})
gdf_locations = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})
# Begin: My code----------------
def point_buffer(lon, lat, radius_m):
# Use this instead of `.buffer()` provided by geodataframe
# Adapted from:
# https://stackoverflow.com/questions/31492220/how-to-plot-a-tissot-with-cartopy-and-matplotlib
geod = Geod(ellps='WGS84')
num_vtxs = 64
lons, lats, _ = geod.fwd(np.repeat(lon, num_vtxs),
np.repeat(lat, num_vtxs),
np.linspace(360, 0, num_vtxs),
np.repeat(radius_m, num_vtxs),
radians=False
)
return Polygon(zip(lons, lats))
# Get location to create buffer
# Create buffer geometries from points' coordinates and distances using ...
# special function `point_buffer()` defined above
gdf_locations['geometry'] = gdf_locations.apply(lambda row : point_buffer(row.custom_lon, row.custom_lat, 1000*row.distance), axis=1)
# Convert CRS to Mercator (epsg:3395), it will match `ccrs.Mercator()`
# Do not use Web_Mercator (epsg:3857), it is crude approx of 3395
gdf_locations = gdf_locations.to_crs({'init':'epsg:3395'})
gdf_events = gdf_events.to_crs({'init':'epsg:3395'})
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
# Visualization
# Use cartopy for best result
fig = plt.figure(figsize=(9,8))
ax = fig.add_subplot(projection=ccrs.Mercator())
gdf_locations.plot(color="green", ax=ax, alpha=0.4)
gdf_events.plot(color="red", ax=ax, alpha=0.9, zorder=23)
ax.coastlines(lw=0.3, color="gray")
ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True)
# Other helpers
# Horiz/vert lines are plotted to mark the circles' centers
ax.hlines([31.6661,26.6500], 30, 60, transform=ccrs.PlateCarree(), lw=0.1)
ax.vlines([38.6635, 51.5700], 20, 35, transform=ccrs.PlateCarree(), lw=0.1)
ax.set_extent([35, 55, 25, 33], crs=ccrs.PlateCarree())
Spatial joining:
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
matching_entln[["perimeter_id", "distance", "index_right", "Latitude", "Longitude"]] #custom_lat custom_lon
Compute distances between points for checking
This checks the result of the spatial join if computed distances are less than the buffered distances.
# Use greatcircle arc length
geod = Geod(ellps='WGS84')
# centers of buffered-circles
from_lon1, from_lon2 = [38.6635, 51.5700]
from_lat1, from_lat2 = [31.6661, 26.6500]
# event locations
to_lon1, to_lon2= [51.5144, 38.7078]
to_lat1, to_lat2 = [27.0779, 31.9974]
_,_, dist_m = geod.inv(from_lon1, from_lat1, to_lon2, to_lat2, radians=False)
print(dist_m) #smaller than 40 km == inside
# Get: 36974.419811328786 m.
_,_, dist_m = geod.inv(from_lon2, from_lat2, to_lon1, to_lat1, radians=False)
print(dist_m) #smaller than 50 km == inside
# Get: 47732.76744655724 m.
My notes
Serious geographic computation should be done directly with geodetic computation without the use of map projection of any kind.
Map projection is used when you need graphic visualization. But correct geographic values that are computed/transformed to map projection CRS correctly are expected.
Computation with map projection (grid) coordinate beyond its allowable limits (and get bad results) is often happen with inexperienced users.
Computation involving map/grid position/values using euclidean geometry should be performed within small extent of projection areas that all kinds of map distortions is very low.

Plotly scatter large volume geographic data

I tried to write a code that creates a visualization of all forest fires that happened during the year 2021. The CSV file containing the data is around 1.5Gb, the program looks correct for me, but when I try to run it, it gets stuck without displaying any visualization or error message. The last time I tried, it run for almost half a day until python crashed.
I don't know if I am having an infinite loop, if that's because the file is too big or if there is something else I am missing.
Can anyone provide feedback, please?
Here is my code:
import csv
from datetime import datetime
from plotly.graph_objs import Scattergeo , Layout
from plotly import offline
filename='fire_nrt_J1V-C2_252284.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
lats, lons, brights, dates=[],[],[],[]
for row in reader:
date=datetime.strptime(row[5], '%Y-%m-%d')
lat=row[0]
lon=row[1]
bright=row[2]
lats.append(lat)
lons.append(lon)
brights.append(bright)
dates.append(date)
data=[{
'type':'scattergeo',
'lon':lons,
'lat':lats,
'text':dates,
'marker':{
'size':[5*bright for bright in brights],
'color': brights,
'colorscale':'Reds',
'colorbar': {'title':'Fire brightness'},
}
}]
my_layout=Layout(title="Forestfires during the year 2021")
fig={'data':data,'layout':my_layout}
offline.plot(fig, filename='global_fires_2021.html')
have found data you describe here https://wifire-data.sdsc.edu/dataset/viirs-i-band-375-m-active-fire-data/resource/3ce73b20-f584-44f7-996b-2f319c480294
plotly uses resources for every point plotted on a scatter. So there is a limit before you run out of resources
there are other approaches to plotting larger number of points
https://plotly.com/python/mapbox-density-heatmaps/ fewer limits, but still limited on very large data sets
https://plotly.com/python/datashader/ can work with very large data sets as it generates an image. It is more challenging to work with (install and navigate API)
data sourcing
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
df = pd.read_csv("https://firms.modaps.eosdis.nasa.gov/data/active_fire/noaa-20-viirs-c2/csv/J1_VIIRS_C2_Global_7d.csv")
df
scatter_geo
limited to random sample of 1000 rows
px.scatter_geo(
df.sample(1000),
lat="latitude",
lon="longitude",
color="bright_ti4",
# size="size",
hover_data=["acq_date"],
color_continuous_scale="reds",
)
density mapbox
px.density_mapbox(
df.sample(5000),
lat="latitude",
lon="longitude",
z="bright_ti4",
radius=3,
color_continuous_scale="reds",
zoom=1,
mapbox_style="carto-positron",
)
datashader Mapbox
all data
some libraries are more difficult to install and use
need to deal with this issue https://community.plotly.com/t/datashader-image-distorted-when-passed-to-mapbox/39375/2
import datashader as ds, colorcet
from pyproj import Transformer
t3857_to_4326 = Transformer.from_crs(3857, 4326, always_xy=True)
# project CRS to ensure image overlays appropriately back over mapbox
# https://community.plotly.com/t/datashader-image-distorted-when-passed-to-mapbox/39375/2
df.loc[:, "longitude_3857"], df.loc[:, "latitude_3857"] = ds.utils.lnglat_to_meters(
df.longitude, df.latitude
)
RESOLUTION=1000
cvs = ds.Canvas(plot_width=RESOLUTION, plot_height=RESOLUTION)
agg = cvs.points(df, x="longitude_3857", y="latitude_3857")
img = ds.tf.shade(agg, cmap=colorcet.fire).to_pil()
fig = go.Figure(go.Scattermapbox())
fig.update_layout(
mapbox={
"style": "carto-positron",
"layers": [
{
"sourcetype": "image",
"source": img,
# Sets the coordinates array contains [longitude, latitude] pairs for the image corners listed in
# clockwise order: top left, top right, bottom right, bottom left.
"coordinates": [
t3857_to_4326.transform(
agg.coords["longitude_3857"].values[a],
agg.coords["latitude_3857"].values[b],
)
for a, b in [(0, -1), (-1, -1), (-1, 0), (0, 0)]
],
}
],
},
margin={"l": 0, "r": 0, "t": 0, "r": 0},
)

Calculating spatial averages for each country after spatial join

Hello I am using the following code at the bottom to extract countries from coordinates. Please see the following url which provides a more detailed explanation of the code: Extracting countries from NetCDF data using geopandas.
My main variable/value is the monthly mean pdsi value from: https://psl.noaa.gov/data/gridded/data.pdsi.html. The image below represents a portion of the visualization created by the code below. The shaded squares represent the spatial regions of pdsi values, which is overlapping a shapefile of the world.
From the image of Belgium, you can see that the 4 squares that touch the land area of Belgium are also touching other countries. If I attribute the base values to the Belgium, I believe this overestimates the mean pdsi values. Especially when considering the bottom two squares barely touch Belgium, the weight of these values when calculating the mean should be significantly lower. Thus, is there a way to incorporate some sort of weighted average where the area of each square within a country can be used as the weight to adjust each pdsi value? Additionally, I would like to standardize this process not only for Belgium, but for all countries as well.
Any help would be greatly appreciated!
import geopandas as gpd
import numpy as np
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import shapely.geometry
import xarray as xr
# download NetCDF data...
# fmt: off
url = "https://psl.noaa.gov/repository/entry/get/pdsi.mon.mean.selfcalibrated.nc?entryid=synth%3Ae570c8f9-ec09-4e89-93b4-babd5651e7a9%3AL2RhaV9wZHNpL3Bkc2kubW9uLm1lYW4uc2VsZmNhbGlicmF0ZWQubmM%3D"
f = Path.cwd().joinpath(Path(urllib.parse.urlparse(url).path).name)
# fmt: on
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
ds = xr.open_dataset(f)
pdsi = ds.to_dataframe()
pdsi = pdsi.reset_index().dropna() # don't care about places in oceans...
# use subset for testing... last 5 times...
pdsim = pdsi.loc[pdsi["time"].isin(pdsi.groupby("time").size().index[-5:])]
# create geopandas dataframe
gdf = gpd.GeoDataFrame(
pdsim, geometry=pdsim.loc[:, ["lon", "lat"]].apply(shapely.geometry.Point, axis=1)
)
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
# Import shapefile from geopandas
path_to_data = gpd.datasets.get_path("naturalearth_lowres")
world_shp = gpd.read_file(path_to_data)
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
gdf["time_a"] = gdf["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
px.choropleth_mapbox(
gdf,
geojson=gdf.geometry,
locations=gdf.index,
color="pdsi",
hover_data=["name"],
animation_frame="time_a",
opacity=.3
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
using https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.intersection.html you can get part of grid that intersects with country polygon
using area, you can calculate proportion of overlap
from this I have generated two visualisations
show countries a grid overlaps and how much it overlaps
aggregate to countries using a weighted average plus calculate other measures that can be used for transparency
I do not know if this is mathematically / scientifically sound to aggregate PDSI in this way (either means or weighted averages). This does demonstrate how to get results your question requests.
# the solution... spatial join buffered polygons to countries
# plus work out overlap between PDSI grid and country. Area of each grid is constant...
gdf_c = (
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.merge(
gdf.loc[:, "geometry"],
left_on="index_right",
right_index=True,
suffixes=("", "_pdsi"),
)
.assign(
overlap=lambda d: (
d["geometry"]
.intersection(gpd.GeoSeries(d["geometry_pdsi"], crs="EPSG:4326"))
.area
/ (buffer * 2) ** 2
).round(3)
)
)
# comma separate associated countries and a list of overlaps
gdf_pdsi = gdf.loc[:, ["geometry", "time", "pdsi"]].join(
gdf_c.groupby("index_right").agg({"name": ",".join, "overlap": list})
)
gdf_pdsi["time_a"] = gdf_pdsi["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
fig_buf = px.choropleth_mapbox(
gdf_pdsi,
geojson=gdf_pdsi.geometry,
locations=gdf_pdsi.index,
color="pdsi",
hover_data=["name", "overlap"],
animation_frame="time_a",
opacity=0.3,
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig_buf
import pandas as pd
# prepare data to plot by country
df_pdsi = (
gdf_c.groupby(["name", "time"])
.apply(
lambda d: pd.Series(
{
"weighted_pdsi": (d["pdsi"] * d["overlap"]).sum() / d["overlap"].sum(),
"unweighted_pdsi": d["pdsi"].mean(),
"min_pdsi": d["pdsi"].min(),
"max_pdsi": d["pdsi"].max(),
"min_overlap": d["overlap"].min(),
"max_overlap": d["overlap"].max(),
"size_pdsi": len(d["pdsi"]),
# "pdsi_list":[round(v,2) for v in d["pdsi"]]
}
)
)
.reset_index()
)
df_pdsi["time_a"] = df_pdsi["time"].dt.strftime("%Y-%b-%d")
fig = px.choropleth_mapbox(
df_pdsi,
geojson=world_shp.set_index("name").loc[:, "geometry"],
locations="name",
color="weighted_pdsi",
hover_data=df_pdsi.columns,
animation_frame="time_a",
opacity=0.3,
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig

How to join point with polygon in geopandas

I have the polygon combination of lat-long1,lat2-long2 ..... and point like Lat - Long .
I have used GeoPandas library to get the result if there is any point is exist within polygon.
Sample Data of Polygon saved in csv file:
POLYGON((28.56056 77.36535,28.564635293716776
77.3675137204626,28.56871055311656 77.36967760850214,28.572785778190855 77.3718416641586,28.576860968931193 77.37400588747194,28.580936125329096 77.3761702784821,28.585011247376094 77.37833483722912,28.58908633506372 77.38049956375293,28.593161388383457 77.38266445809356,28.59723640732686 77.38482952029099,28.60131139188541 77.38699475038526,28.605386342050664 77.38916014841635,28.60946125781409 77.39132571442434,28.613536139167238 77.39349144844923,28.61761098610158 77.39565735053108,28.62168579860863 77.39782342070995,28.62576057667991 77.39998965902589,28.62983532030691 77.402156065519,28.633910029481108 77.40432264022931,28.637984704194054 77.40648938319696,28.642059344437207 77.408656294462,28.64068221074683 77.41187044231611,28.63920739580329 77.41502778244606,28.63763670052024 77.41812446187686,28.635972042808007 77.42115670220443,28.634215455216115 77.42412080422613,28.63236908243526 77.42701315247152,28.630435178662026 77.42983021962735,28.628416104829583 77.43256857085188,28.626314325707924 77.43522486797251,28.624132406877322 77.437795873562,28.621873011578572 77.44027845488824,28.619538897444272 77.4426695877325,28.617132913115164 77.44496636007166,28.614657994745563 77.44716597562005,28.612117162402576 77.44926575722634,28.609513516363293 77.45126315012166,28.606850233314923 77.45315572501488,28.604130562462267 77.45494118103147,28.60135782154758 77.45661734849246,28.598535392787774 77.45818219153013,28.595666718733966 77.45963381053753,28.592755298058414 77.46097044444889,28.589804681274302 77.46219047284835,28.586818466393503 77.46329241790465,28.583800294527727 77.46427494612952,28.58075384543836 77.46513686995802,28.57768283304089 77.46587714914885,28.574591000868892 77.4664948920035,28.571482117503592 77.46698935640259,28.568359971974488 77.46735995065883,28.565228369136484 77.46760623418534,28.56209112502966 77.4677279179792,28.558952062226695 77.4677248649196,28.55581500517431 77.46759708988064,28.552683775533943 77.46734475965891,28.552683775533943 77.46734475965891,28.553079397193876 77.4622453846313,28.553474828308865 77.45714597129259,28.55387006887434 77.4520465196603,28.554265118885752 77.44694702975198,28.554659978338513 77.4418475015852,28.555054647228083 77.43674793517746,28.555449125549913 77.43164833054634,28.555843413299442 77.42654868770937,28.55623751047213 77.42144900668411,28.556631417063407 77.41634928748812,28.55702513306874 77.41124953013893,28.55741865848359 77.40614973465412,28.557811993303396 77.40104990105122,28.55820513752363 77.39595002934782,28.558598091139757 77.39085011956145,28.558990854147225 77.38575017170969,28.559383426541523 77.3806501858101,28.559775808318093 77.37555016188024,28.560167999472434 77.37045009993768,28.56056 77.36535))
and second dataset is for LAT and LONG as 28.56282, 77.36824 respectively saved in csv file .
I have used below Python code to join both data set based on condition if point exist within polygon. like below
import pandas as pd
import shapely.geometry
from shapely.geometry import Point
import geopandas as gpd
site_df = pd.read_csv (r'lat_long_file.csv') # load lat and long file
site_df['geometry'] = pd.DataFrame(site_df).apply(lambda x: Point(x.LAT,x.LONG), axis='columns') # convert lat and long to point
gdf = gpd.GeoDataFrame(site_df, geometry = site_df.geometry,crs='EPSG:4326') #creating geo pandas data frame for point
from shapely import wkt
polygon_df = pd.read_csv (r'polygon_csv_file') #reading polygon sample raw string file
polygon_df['geometry'] = pd.DataFrame(polygon_df).apply(lambda row: shapely.wkt.loads(row.polygon), axis='columns') #converting string polygon to geometory
gd_polygon = gpd.GeoDataFrame(polygon_df, geometry = polygon_df.geometry,crs='EPSG:4326') #create geopandas dataframe
import shapely.speedups
shapely.speedups.enable() # this makes some spatial queries run faster
join_data = gpd.sjoin(gdf, gd_polygon, how="inner", op="within") //actual join condition
But that query does not retun anything . But point is exist within polygon. as we can see in below diagram
Green Location marker is point Lat and long which is exist within polygon.
I would check the axis order - WKT usually interpreted as longitude first, latitude second order, while the point you construct uses latitude:longitude order.
You can try removing the CRS identifier to see if it changes the result.
Also see
https://gis.stackexchange.com/questions/376751/shapely-flips-lat-long-coordinate
and
https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
your sample data is unusable as it's an image
have sourced a polygon - a county boundary in UK
constructed a geopandas data frame of a point that is within this county
have used plotly to demonstrate visually the data
have used your code fragment gpd.sjoin(gdf, gd_polygon, how="inner", op="within") to do spatial join and it correctly joins point to polygon
import requests, json
import geopandas as gpd
import plotly.express as px
import shapely.geometry
# fmt: off
# get a polygon and construct a point
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
gd_polygon = gpd.GeoDataFrame.from_features(res.json()).loc[lambda d: d["LAD20NM"].str.contains("Hereford")]
gdf = gpd.GeoDataFrame(geometry=gd_polygon.loc[:,["LONG","LAT"]].apply(shapely.geometry.Point, axis=1)).reset_index(drop=True)
# fmt: on
# plot to show point is within polygon
px.scatter_mapbox(gd_polygon, lon="LONG", lat="LAT").update_traces(
name="gd_polygon"
).add_traces(
px.scatter_mapbox(gdf, lat=gdf2.geometry.y, lon=gdf2.geometry.x)
.update_traces(name="gdf", marker_color="red")
.data
).update_traces(
showlegend=True
).update_layout(
mapbox={
"style": "carto-positron",
"layers": [
{"source": json.loads(gd_polygon.geometry.to_json()), "type": "line"}
],
}
).show()
# spatial join, all good :-)
gpd.sjoin(gdf, gd_polygon, how="inner", op="within")
output
spatial join has worked, point is within polygon
geometry
index_right
OBJECTID
LAD20CD
LAD20NM
LAD20NMW
BNG_E
BNG_N
LONG
LAT
Shape__Area
Shape__Length
0
POINT (-2.73931 52.081539)
18
19
E06000019
Herefordshire, County of
349434
242834
-2.73931
52.0815
2.18054e+09
285427

How do I get the area of a GeoJSON polygon with Python

I have the GeoJSON
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[[13.65374516425911, 52.38533382814119], [13.65239769133293, 52.38675829106993], [13.64970274383571, 52.38675829106993], [13.64835527090953, 52.38533382814119], [13.64970274383571, 52.38390931824483], [13.65239769133293, 52.38390931824483], [13.65374516425911, 52.38533382814119]]
]
}
}
]
}
which http://geojson.io displays as
I would like to calculate its area (87106.33m^2) with Python. How do I do that?
What I tried
# core modules
from functools import partial
# 3rd pary modules
from shapely.geometry import Polygon
from shapely.ops import transform
import pyproj
l = [[13.65374516425911, 52.38533382814119, 0.0], [13.65239769133293, 52.38675829106993, 0.0], [13.64970274383571, 52.38675829106993, 0.0], [13.64835527090953, 52.38533382814119, 0.0], [13.64970274383571, 52.38390931824483, 0.0], [13.65239769133293, 52.38390931824483, 0.0], [13.65374516425911, 52.38533382814119, 0.0]]
polygon = Polygon(l)
print(polygon.area)
proj = partial(pyproj.transform, pyproj.Proj(init='epsg:4326'),
pyproj.Proj(init='epsg:3857'))
print(transform(proj, polygon).area)
It gives 1.1516745933889345e-05 and 233827.03300877335 - that the first one doesn't make any sense was expected, but how do I fix the second one? (I have no idea how to set the pyproj.Proj init parameter)
I guess epsg:4326 makes sense at it is WGS84 (source), but for epsg:3857 I'm uncertain.
Better results
The following is a lot closer:
# core modules
from functools import partial
# 3rd pary modules
import pyproj
from shapely.geometry import Polygon
import shapely.ops as ops
l = [[13.65374516425911, 52.38533382814119, 0],
[13.65239769133293, 52.38675829106993, 0],
[13.64970274383571, 52.38675829106993, 0],
[13.64835527090953, 52.38533382814119, 0],
[13.64970274383571, 52.38390931824483, 0],
[13.65239769133293, 52.38390931824483, 0],
[13.65374516425911, 52.38533382814119, 0]]
polygon = Polygon(l)
print(polygon.area)
geom_area = ops.transform(
partial(
pyproj.transform,
pyproj.Proj(init='EPSG:4326'),
pyproj.Proj(
proj='aea',
lat1=polygon.bounds[1],
lat2=polygon.bounds[3])),
polygon)
print(geom_area.area)
it gives 87254.7m^2 - that is still 148m^2 different from what geojson.io says. Why is that the case?
It looks like geojson.io is not calculating the area after projecting the spherical coordinates onto a plane like you are, but rather using a specific algorithm for calculating the area of a polygon on the surface of a sphere, directly from the WGS84 coordinates. If you want to recreate it you can find the source code here.
If you are happy to carry on projecting the coordinates to a flat system to calculate the area, since it's good enough accuracy for your use case, then you might trying using this projection for Germany instead. E.g:
from osgeo import ogr
from osgeo import osr
source = osr.SpatialReference()
source.ImportFromEPSG(4326)
target = osr.SpatialReference()
target.ImportFromEPSG(5243)
transform = osr.CoordinateTransformation(source, target)
poly = ogr.CreateGeometryFromJson(str(geoJSON['features'][0]['geometry']))
poly.Transform(transform)
poly.GetArea()
which returns 87127.2534625642

Categories

Resources