I want to count how many points there are per Polygon
# Credits of this code go to: https://stackoverflow.com/questions/69642668/the-indices-of-the-two-geoseries-are-different-understanding-indices/69644010#69644010
import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests
# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on
Now I sjoin the two. I use df_poly first, in order to add the points dfp to the GeoDataframe df_poly.
df_poly.sjoin(dfp)
Now I want to count how many points there are per polygon.
I thought
df_poly.sjoin(dfp).groupby('OBJECTID').count()
But that does not add a column to the GeoDataframe df_poly with the count of each group.
You need to add one of the columns from the output of count() back into the original DataFrame using merge. I have used the geometry column and renamed it to n_points:
df_poly.merge(
df_poly.sjoin(
dfp
).groupby(
'OBJECTID'
).count().geometry.rename(
'n_points'
).reset_index())
This is a follow on to this question The indices of the two GeoSeries are different - Understanding Indices
right_index of spatial join gives index of polygon as polygon was on right of spatial join
hence the series gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points") can then be simply joined to the polygon GeoDataFrame to give how many points were found
note how="left" to ensure it's a left join, not an inner join. Any polygons with no points with have NaN you may want to fillna(0) in this case.
import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests
# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = pd.concat([dfp,dfp]).reset_index(drop=True)
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on
df_poly.join(
gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points"),
how="left",
)
Building on the answere Fergus McClean provided, this can even be done in less code:
df_poly.merge(df_poly.sjoin(dfp).groupby('OBJECTID').size().rename('n_points').reset_index())
However, the method (.join()) proposed by Rob Raymond to combine the two dataframes keeps the entries that have no count.
Related
I have a Pandas DataFrame containing Lat, Long coordinates. How do I draw non-overlapping polygons around a cluster of points and aggregate the geometries in a geopandas DataFrame. Below is sample code to work with:
import pandas as pd
import numpy as np
import geopandas as gpd
df = pd.DataFrame({
'yr': [2018, 2017, 2018, 2016],
'id': [0, 1, 2, 3],
'v': [10, 12, 8, 10],
'lat': [32.7418248, 32.8340583, 32.8340583, 32.7471895],
'lon':[-97.524066, -97.0805484, -97.0805484, -96.9400779]
})
df = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Long'], df['Lat']))
# set crs for buffer calculations
df.set_crs("ESRI:102003", inplace=True)
The Polygons can be of any shape, however, must include a minimum of 5 points. I tried creating a buffer around the points but circle is not the ideal solution. I am looking for a way to draw a more flexible polygon.
This polygon representation will be added as a new column to the pandas dataframe containing the points.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.buffer.html
your question and sample data make no sense! You say you want clusters of 5 points or more and only provide 4 points. Leaving person who answers this question mandated to find some data. Better practice is to generate a MWE of what you've tried which can possibly become solution you want. Have used UK hospitals to get some data with lat / lon
from your other scatter gun questions, it's clear you have tried using geohash as a solution. Let's explore this
get geohash for each point geolib.geohash.encode()
aggregate points in same geohash by using dissolve() This will give a MULTIPOINT geometry. Convert this to POLYGON using convex_hull
now have polygons that do not overlap and have clusters of points. It doesn't ensure that a cluster has a minimum of 5 points
import requests, io
import pandas as pd
import numpy as np
import geopandas as gpd
import geolib.geohash
import folium
# get some data that meets sample with enough data
df = (
pd.read_csv(
io.StringIO(requests.get("https://assets.nhs.uk/data/foi/Hospital.csv").text),
sep="Č",
engine="python",
)
.rename(columns={"Latitude": "lat", "Longitude": "lon"})
.loc[:, ["lat", "lon"]]
).dropna()
df["id"] = df.index
df["yr"] = np.random.choice(range(2016, 2019), len(df))
df["v"] = np.random.randint(0, 11, len(df))
# get geohash so points in same area can be clustered
df["geohash"] = df.apply(lambda r: geolib.geohash.encode(r["lon"], r["lat"], 3), axis=1)
# construct geodataframe
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df["lon"], df["lat"]), crs="epsg:4386"
)
# cluster points to polygons
gdf2 = gdf.dissolve(by="geohash", aggfunc={"v": "sum", "id":"count", "yr":"mean"})
gdf2["geometry"] = gdf2["geometry"].convex_hull
# let's visualise everything
m = gdf2.explore(color="green", name="cluster", height=300, width=600)
m = gdf.explore(column="geohash", m=m, name="popints")
folium.LayerControl().add_to(m)
m
Use Geopandas convex hull.
The convex hull of a geometry is the smallest convex Polygon containing all the points in each geometry.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.convex_hull.html
I have the polygon combination of lat-long1,lat2-long2 ..... and point like Lat - Long .
I have used GeoPandas library to get the result if there is any point is exist within polygon.
Sample Data of Polygon saved in csv file:
POLYGON((28.56056 77.36535,28.564635293716776
77.3675137204626,28.56871055311656 77.36967760850214,28.572785778190855 77.3718416641586,28.576860968931193 77.37400588747194,28.580936125329096 77.3761702784821,28.585011247376094 77.37833483722912,28.58908633506372 77.38049956375293,28.593161388383457 77.38266445809356,28.59723640732686 77.38482952029099,28.60131139188541 77.38699475038526,28.605386342050664 77.38916014841635,28.60946125781409 77.39132571442434,28.613536139167238 77.39349144844923,28.61761098610158 77.39565735053108,28.62168579860863 77.39782342070995,28.62576057667991 77.39998965902589,28.62983532030691 77.402156065519,28.633910029481108 77.40432264022931,28.637984704194054 77.40648938319696,28.642059344437207 77.408656294462,28.64068221074683 77.41187044231611,28.63920739580329 77.41502778244606,28.63763670052024 77.41812446187686,28.635972042808007 77.42115670220443,28.634215455216115 77.42412080422613,28.63236908243526 77.42701315247152,28.630435178662026 77.42983021962735,28.628416104829583 77.43256857085188,28.626314325707924 77.43522486797251,28.624132406877322 77.437795873562,28.621873011578572 77.44027845488824,28.619538897444272 77.4426695877325,28.617132913115164 77.44496636007166,28.614657994745563 77.44716597562005,28.612117162402576 77.44926575722634,28.609513516363293 77.45126315012166,28.606850233314923 77.45315572501488,28.604130562462267 77.45494118103147,28.60135782154758 77.45661734849246,28.598535392787774 77.45818219153013,28.595666718733966 77.45963381053753,28.592755298058414 77.46097044444889,28.589804681274302 77.46219047284835,28.586818466393503 77.46329241790465,28.583800294527727 77.46427494612952,28.58075384543836 77.46513686995802,28.57768283304089 77.46587714914885,28.574591000868892 77.4664948920035,28.571482117503592 77.46698935640259,28.568359971974488 77.46735995065883,28.565228369136484 77.46760623418534,28.56209112502966 77.4677279179792,28.558952062226695 77.4677248649196,28.55581500517431 77.46759708988064,28.552683775533943 77.46734475965891,28.552683775533943 77.46734475965891,28.553079397193876 77.4622453846313,28.553474828308865 77.45714597129259,28.55387006887434 77.4520465196603,28.554265118885752 77.44694702975198,28.554659978338513 77.4418475015852,28.555054647228083 77.43674793517746,28.555449125549913 77.43164833054634,28.555843413299442 77.42654868770937,28.55623751047213 77.42144900668411,28.556631417063407 77.41634928748812,28.55702513306874 77.41124953013893,28.55741865848359 77.40614973465412,28.557811993303396 77.40104990105122,28.55820513752363 77.39595002934782,28.558598091139757 77.39085011956145,28.558990854147225 77.38575017170969,28.559383426541523 77.3806501858101,28.559775808318093 77.37555016188024,28.560167999472434 77.37045009993768,28.56056 77.36535))
and second dataset is for LAT and LONG as 28.56282, 77.36824 respectively saved in csv file .
I have used below Python code to join both data set based on condition if point exist within polygon. like below
import pandas as pd
import shapely.geometry
from shapely.geometry import Point
import geopandas as gpd
site_df = pd.read_csv (r'lat_long_file.csv') # load lat and long file
site_df['geometry'] = pd.DataFrame(site_df).apply(lambda x: Point(x.LAT,x.LONG), axis='columns') # convert lat and long to point
gdf = gpd.GeoDataFrame(site_df, geometry = site_df.geometry,crs='EPSG:4326') #creating geo pandas data frame for point
from shapely import wkt
polygon_df = pd.read_csv (r'polygon_csv_file') #reading polygon sample raw string file
polygon_df['geometry'] = pd.DataFrame(polygon_df).apply(lambda row: shapely.wkt.loads(row.polygon), axis='columns') #converting string polygon to geometory
gd_polygon = gpd.GeoDataFrame(polygon_df, geometry = polygon_df.geometry,crs='EPSG:4326') #create geopandas dataframe
import shapely.speedups
shapely.speedups.enable() # this makes some spatial queries run faster
join_data = gpd.sjoin(gdf, gd_polygon, how="inner", op="within") //actual join condition
But that query does not retun anything . But point is exist within polygon. as we can see in below diagram
Green Location marker is point Lat and long which is exist within polygon.
I would check the axis order - WKT usually interpreted as longitude first, latitude second order, while the point you construct uses latitude:longitude order.
You can try removing the CRS identifier to see if it changes the result.
Also see
https://gis.stackexchange.com/questions/376751/shapely-flips-lat-long-coordinate
and
https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
your sample data is unusable as it's an image
have sourced a polygon - a county boundary in UK
constructed a geopandas data frame of a point that is within this county
have used plotly to demonstrate visually the data
have used your code fragment gpd.sjoin(gdf, gd_polygon, how="inner", op="within") to do spatial join and it correctly joins point to polygon
import requests, json
import geopandas as gpd
import plotly.express as px
import shapely.geometry
# fmt: off
# get a polygon and construct a point
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
gd_polygon = gpd.GeoDataFrame.from_features(res.json()).loc[lambda d: d["LAD20NM"].str.contains("Hereford")]
gdf = gpd.GeoDataFrame(geometry=gd_polygon.loc[:,["LONG","LAT"]].apply(shapely.geometry.Point, axis=1)).reset_index(drop=True)
# fmt: on
# plot to show point is within polygon
px.scatter_mapbox(gd_polygon, lon="LONG", lat="LAT").update_traces(
name="gd_polygon"
).add_traces(
px.scatter_mapbox(gdf, lat=gdf2.geometry.y, lon=gdf2.geometry.x)
.update_traces(name="gdf", marker_color="red")
.data
).update_traces(
showlegend=True
).update_layout(
mapbox={
"style": "carto-positron",
"layers": [
{"source": json.loads(gd_polygon.geometry.to_json()), "type": "line"}
],
}
).show()
# spatial join, all good :-)
gpd.sjoin(gdf, gd_polygon, how="inner", op="within")
output
spatial join has worked, point is within polygon
geometry
index_right
OBJECTID
LAD20CD
LAD20NM
LAD20NMW
BNG_E
BNG_N
LONG
LAT
Shape__Area
Shape__Length
0
POINT (-2.73931 52.081539)
18
19
E06000019
Herefordshire, County of
349434
242834
-2.73931
52.0815
2.18054e+09
285427
I have one pandas dataframe and one geopandas dataframe. In the Pandas dataframe, I have a column Points that contains shapely.geometry Point objects. The geometry column in the geopandas frame has Polygon objects. What I would like to do is take a Point in the Pandas frame and test to see if it is within any of the Polygon objects in the geopandas frame.
In a new column in the pandas frame, I would like the following. If the Point is within a given Polygon (i.e. within call returns True), I would like the new column's value at the Point's row to be the value of a different column in the Polygon's row in the geopandas frame.
I have a working solution to this problem, but it is not vectorized. Is it possible to vectorize it?
Example:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point, Polygon
# Create random frame, geometries are supposed to be mutually exclusive
gdf = gpd.GeoDataFrame({'A': [1, 2], 'geometry': [Polygon([(10, 5), (5, 6)]), Polygon([(1,2), (2, 5))]})
# Create random pandas
df = pd.DataFrame({'Foo': ['bar', 'Bar'], 'Points': [Point(4, 5), Point(1, 2)]})
# My non-vectorized solution
df['new'] = ''
for i in df.index:
for j in gdf.index:
if df.at[i, 'Points'].within(gdf.at[j, 'geometry']):
df.at[i, 'new'] = gdf.at[j, 'A']
This works fine, so that df['new'] will contain whatever is in column gdf['A'] when the point is within the polygon. I am hoping that there may be a way for me to vectorize this operation.
You can calculate the Euclidean distance between all the points of the Points and Polygon. And, wherever the distance is equal to 0, this would give you an intersection point. My approach is below. Note that, I leave the part of getting all the points and the polygon points from your data frames to you. Probably, a function like pandas.Series.toList should provide that.
import numpy as np
from scipy.spatial.distance import cdist
polygon = [[10,5],[5,6],[1,2],[2,5]]
points = [[4,5],[1,2]]
# return distances between all the items of the two arrays
distances = cdist(polygon,points)
print(distances)
[[6. 9.48683298]
[1.41421356 5.65685425]
[4.24264069 0. ]
[2. 3.16227766]]
All we have to do now, is to get the index of 0s in the array. As you can see, our intersection point is at the 3rd row and the 2nd column, which is the 3rd item of the polygon or the 2nd item of the points.
for i,dist in enumerate(distances.flatten()):
if dist==0:
intersect_index = np.unravel_index(i,shape=distances.shape)
intersect_point = polygon[intersect_index[0]]
print(intersect_point)
[1,2]
This should give you the vectorized form you are looking for.
I found a solution that works for my purposes. Not the most elegant, but still much faster than looping.
def within_vectorized(array, point):
# Create array of False and True values
_array = np.array([point.within(p) for p in array])
# When the first element of np.where tuple is not empty
if np.where(_array)[0].size != 0:
return np.where(_array)[0][0]
else:
return -1
# Create dummy value row geopandas frame
# This will have an empty Polygon object in the geometry column and NaN's everywhere else
dummy_values = np.empty((1, gdf.shape[1]))
dummy_values[:] = np.nan
dummy_values = dummy_values.tolist()[0]
dummy_values[-1] = Polygon()
gdf.loc[-1] = dummy_values
# Use loc where index is retrieved by calling vectorized function
df['A'] = gdf.loc[df['Point'].apply(lambda x: within_vectorized(gdf['geometry'], x)), 'A'].to_list()
I am trying to replicate the output from ArcGIS Dissolve on a set of stream flow lines using geopandas. Essentially the df/stream_0 layer is a stream network extracted from a DEM using pysheds. That output has some randomly overlapping reaches which I am trying to remove. Running Dissolve through ArcGIS Pro does this well, but I would prefer not to have to deal with ArcGIS/ArcPy to resolve this.
Stream Network
ArcGIS Dissolve Setting
#streams_0.geojson = df.shp = streams_0.shp from Dissolve Setting image
#~~~~~~~~~~~~~~~~~~~~
import geopandas as gpd
df = gpd.read_file('streams_0.geojson')
df.head()
Out[3]:
geometry
0 LINESTRING (400017.781 3000019.250, 400017.781...
1 LINESTRING (400027.781 3000039.250, 400027.781...
2 LINESTRING (400027.781 3000039.250, 400037.781...
3 LINESTRING (400027.781 3000029.250, 400037.781...
4 LINESTRING (400047.781 3000079.250, 400047.781...
I have tried using gpd.dissolve() using a filler column with no luck.
df['dissolvefield'] = 1;
df2 = df.dissolve(by='dissolvefield')
df3 = gpd.geoseries.GeoSeries([geom for geom in df2.geometry.iloc[0].geoms])
Similarly tried to use unary_union in shapely with no luck.
import fiona
shape1 = fiona.open("df.shp")
first = shape1.next()
from shapely.geometry import shape
shp_geom = shape(first['geometry'])
from shapely.ops import unary_union
shape2 = unary_union(shp_geom)
Seems like an easy solution, wondering why I am running into so many issues. My GeoDataFrame only consists of the line geometry, so there is not necessarily another attribute I can aggregate based on. I am essentially just trying keep the geometry of the lines unchanged, but remove any overlapping features that may be there. I don't want to split the lines, and I don't want to aggregate them into multipart features.
i use the unary_union, but no need to read it as shapely feature.
after reading the file and put it in GPD (you can do it straight from the *.shp file):
df = gpd.read_file('streams_0.geojson')
try to plot it to see the if the output is correct
df.plot()
than use the unary_union like this, and plot again:
shape2 = df.unary_union
shape2
and the last step (if necessary), is to set as geopandas again:
# transform Geometry Collection to shapely multilinestirng
segments = [feature for feature in shape2]
# set back as geopandas
gdf = gpd.GeoDataFrame(list(range(len(segments))), geometry=segments,
crs=crs)
gdf .columns = ['index', 'geometry']
I have three polygon shapefiles which overlap each other. Let's call them:
file_one.shp (polygon Name is 1)
file_two.shp (polygon Name is 2)
file_three.shp (polygon Name is 3)
I want to combine them and keep the values like this.
How can I achieve the result (As shown in the figure) in Python, please?
Thanks!
If you want to simply create one shapefile from files you've mentioned you can try following code (I assume that shapefiles has same columns).
import pandas as pd
import geopandas as gpd
gdf1 = gpd.read_file('file_one.shp')
gdf2 = gpd.read_file('file_two.shp')
gdf3 = gpd.read_file('file_three.shp')
gdf = gpd.GeoDataFrame(pd.concat([gdf1, gdf2, gdf3]))
First, let's generate some data for demonstration:
import geopandas as gpd
from shapely.geometry import Point
shp1 = gpd.GeoDataFrame({'geometry': [Point(1, 1).buffer(3)], 'name': ['Shape 1']})
shp2 = gpd.GeoDataFrame({'geometry': [Point(1, 1).buffer(2)], 'name': ['Shape 2']})
shp3 = gpd.GeoDataFrame({'geometry': [Point(1, 1).buffer(1)], 'name': ['Shape 3']})
Now take the symmetric difference for all, but the smallest shape, that can be left as is:
diffs = []
gdfs = [shp1, shp2, shp3]
for idx, gdf in enumerate(gdfs):
if idx < 2:
diffs.append(gdf.symmetric_difference(gdfs[idx+1]).iloc[0])
diffs.append(shp3.iloc[0].geometry)
There you go, now you have the desired shapes as a list in diffs. If you would like to combine them to one GeoDataFrame, just do as follows:
all_shapes = gpd.GeoDataFrame(geometry=diffs)