I have a dataframe with earthquake data called eq that has columns listing latitude and longitude. using geopandas I created a point column with the following:
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
s = GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
eq['geometry'] = s
eq.crs = {'init': 'epsg:4326', 'no_defs': True}
eq
Now I have a geometry column with lat lon coordinates but I want to change the projection to UTM. Can anyone help with the transformation?
Latitude/longitude aren't really a projection, but sort of a default "unprojection". See this page for more details, but it probably means your data uses WGS84 or epsg:4326.
Let's build a dataset and, before we do any reprojection, we'll define the crs as epsg:4326
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
df = pd.DataFrame({'id': [1, 2, 3], 'population' : [2, 3, 10], 'longitude': [-80.2, -80.11, -81.0], 'latitude': [11.1, 11.1345, 11.2]})
s = gpd.GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
geo_df = gpd.GeoDataFrame(df[['id', 'population']], geometry=s)
# Define crs for our geodataframe:
geo_df.crs = {'init': 'epsg:4326'}
I'm not sure what you mean by "UTM projection". From the wikipedia page I see there are 60 different UTM projections depending on the area of the world. You can find the appropriate epsg code online, but I'll just give you an example with a random epsgcode. This is the one for zone 33N for example
How do you do the reprojection? You can easily get this info from the geopandas docs on projection. It's just one line:
geo_df = geo_df.to_crs({'init': 'epsg:3395'})
and the geometry isn't coded as latitude/longitude anymore:
id population geometry
0 1 2 POINT (-8927823.161620541 1235228.11420853)
1 2 3 POINT (-8917804.407449147 1239116.84994171)
2 3 10 POINT (-9016878.754255159 1246501.097746004)
Related
I have a Pandas DataFrame containing Lat, Long coordinates. How do I draw non-overlapping polygons around a cluster of points and aggregate the geometries in a geopandas DataFrame. Below is sample code to work with:
import pandas as pd
import numpy as np
import geopandas as gpd
df = pd.DataFrame({
'yr': [2018, 2017, 2018, 2016],
'id': [0, 1, 2, 3],
'v': [10, 12, 8, 10],
'lat': [32.7418248, 32.8340583, 32.8340583, 32.7471895],
'lon':[-97.524066, -97.0805484, -97.0805484, -96.9400779]
})
df = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Long'], df['Lat']))
# set crs for buffer calculations
df.set_crs("ESRI:102003", inplace=True)
The Polygons can be of any shape, however, must include a minimum of 5 points. I tried creating a buffer around the points but circle is not the ideal solution. I am looking for a way to draw a more flexible polygon.
This polygon representation will be added as a new column to the pandas dataframe containing the points.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.buffer.html
your question and sample data make no sense! You say you want clusters of 5 points or more and only provide 4 points. Leaving person who answers this question mandated to find some data. Better practice is to generate a MWE of what you've tried which can possibly become solution you want. Have used UK hospitals to get some data with lat / lon
from your other scatter gun questions, it's clear you have tried using geohash as a solution. Let's explore this
get geohash for each point geolib.geohash.encode()
aggregate points in same geohash by using dissolve() This will give a MULTIPOINT geometry. Convert this to POLYGON using convex_hull
now have polygons that do not overlap and have clusters of points. It doesn't ensure that a cluster has a minimum of 5 points
import requests, io
import pandas as pd
import numpy as np
import geopandas as gpd
import geolib.geohash
import folium
# get some data that meets sample with enough data
df = (
pd.read_csv(
io.StringIO(requests.get("https://assets.nhs.uk/data/foi/Hospital.csv").text),
sep="Č",
engine="python",
)
.rename(columns={"Latitude": "lat", "Longitude": "lon"})
.loc[:, ["lat", "lon"]]
).dropna()
df["id"] = df.index
df["yr"] = np.random.choice(range(2016, 2019), len(df))
df["v"] = np.random.randint(0, 11, len(df))
# get geohash so points in same area can be clustered
df["geohash"] = df.apply(lambda r: geolib.geohash.encode(r["lon"], r["lat"], 3), axis=1)
# construct geodataframe
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df["lon"], df["lat"]), crs="epsg:4386"
)
# cluster points to polygons
gdf2 = gdf.dissolve(by="geohash", aggfunc={"v": "sum", "id":"count", "yr":"mean"})
gdf2["geometry"] = gdf2["geometry"].convex_hull
# let's visualise everything
m = gdf2.explore(color="green", name="cluster", height=300, width=600)
m = gdf.explore(column="geohash", m=m, name="popints")
folium.LayerControl().add_to(m)
m
Use Geopandas convex hull.
The convex hull of a geometry is the smallest convex Polygon containing all the points in each geometry.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.convex_hull.html
I want to calculate the distance from each point of dataframe geosearch_crs to the polygons in the gelb_crs dataframe, returning only the minimum distance.
I have tried this code:
for i in range(len(geosearch_crs)):
point = geosearch_crs['geometry'].iloc[i]
for j in range(len(gelb_crs)):
poly = gelb_crs['geometry'].iloc[j]
print(point.distance(poly).min())
it returns this error:
AttributeError: 'float' object has no attribute 'min'
I somehow don't get how to return what i want, the points.distance(poly).min() function should work though.
This is part of the data frames (around 180000 entries):
geosearch_crs:
count
geometry
12
POINT (6.92334 50.91695)
524
POINT (6.91970 50.93167)
5
POINT (6.96946 50.91469)
gelb_crs (35 entries):
name
geometry
Polygon 1
POLYGON Z ((6.95712 50.92851 0.00000, 6.95772 ...
Polygon 2
POLYGON Z ((6.91896 50.92094 0.00000, 6.92211 ...
I'm not sure about the 'distance' method, but maybe you could try adding the distances to a list:
distances = list()
for i in geosearch_crs:
for j in gelb_crs:
distances.append(i.distance(j))
print(min(distances))
your sample polygon data is unusable as it's truncated with ellipses. Have used two other polygons to deomonstrate as a MWE
need to ensure that CRS in both data frames is compatible. Your sample data is clearly in two different CRS, points look like epsg:4326, polygons are either a UTM CRS or EPSG:3857 from the range of values
geopandas sjoin_nearest() is simple way to find nearest polygon and get distance. Have use UTM CRS so that distance is in meters rather than degrees
import geopandas as gpd
import pandas as pd
import shapely
import io
df = pd.read_csv(
io.StringIO(
"""count,geometry
12,POINT (6.92334 50.91695)
524,POINT (6.91970 50.93167)
5,POINT (6.96946 50.91469)"""
)
)
geosearch_crs = gpd.GeoDataFrame(
df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326"
)
# generated as sample in question unusable
df = pd.read_csv(
io.StringIO(
'''name,geometry
Polygon 1,"POLYGON ((6.9176561 50.8949742, 6.9171649 50.8951417, 6.9156967 50.8957149, 6.9111788 50.897751, 6.9100077 50.8989409, 6.9101989 50.8991319, 6.9120049 50.9009167, 6.9190374 50.9078591, 6.9258157 50.9143227, 6.9258714 50.9143691, 6.9259546 50.9144355, 6.9273598 50.915413, 6.9325715 50.9136438, 6.9331018 50.9134553, 6.9331452 50.9134397, 6.9255391 50.9018725, 6.922309 50.8988869, 6.9176561 50.8949742))"
Polygon 2,"POLYGON ((6.9044955 50.9340428, 6.8894236 50.9344297, 6.8829359 50.9375553, 6.8862995 50.9409307, 6.889446 50.9423764, 6.9038401 50.9436598, 6.909518 50.9383374, 6.908634 50.9369064, 6.9046363 50.9340648, 6.9045721 50.9340431, 6.9044955 50.9340428))"'''
)
)
gelb_crs = gpd.GeoDataFrame(
df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326"
)
geosearch_crs.to_crs(geosearch_crs.estimate_utm_crs()).sjoin_nearest(
gelb_crs.to_crs(geosearch_crs.estimate_utm_crs()), distance_col="distance"
)
count
geometry
index_right
name
distance
0
12
POINT (354028.1446652143 5642643.287732874)
0
Polygon 1
324.158
2
5
POINT (357262.7994182631 5642301.777981625)
0
Polygon 1
2557.33
1
524
POINT (353818.4585403281 5644287.172541857)
1
Polygon 2
971.712
I have the polygon combination of lat-long1,lat2-long2 ..... and point like Lat - Long .
I have used GeoPandas library to get the result if there is any point is exist within polygon.
Sample Data of Polygon saved in csv file:
POLYGON((28.56056 77.36535,28.564635293716776
77.3675137204626,28.56871055311656 77.36967760850214,28.572785778190855 77.3718416641586,28.576860968931193 77.37400588747194,28.580936125329096 77.3761702784821,28.585011247376094 77.37833483722912,28.58908633506372 77.38049956375293,28.593161388383457 77.38266445809356,28.59723640732686 77.38482952029099,28.60131139188541 77.38699475038526,28.605386342050664 77.38916014841635,28.60946125781409 77.39132571442434,28.613536139167238 77.39349144844923,28.61761098610158 77.39565735053108,28.62168579860863 77.39782342070995,28.62576057667991 77.39998965902589,28.62983532030691 77.402156065519,28.633910029481108 77.40432264022931,28.637984704194054 77.40648938319696,28.642059344437207 77.408656294462,28.64068221074683 77.41187044231611,28.63920739580329 77.41502778244606,28.63763670052024 77.41812446187686,28.635972042808007 77.42115670220443,28.634215455216115 77.42412080422613,28.63236908243526 77.42701315247152,28.630435178662026 77.42983021962735,28.628416104829583 77.43256857085188,28.626314325707924 77.43522486797251,28.624132406877322 77.437795873562,28.621873011578572 77.44027845488824,28.619538897444272 77.4426695877325,28.617132913115164 77.44496636007166,28.614657994745563 77.44716597562005,28.612117162402576 77.44926575722634,28.609513516363293 77.45126315012166,28.606850233314923 77.45315572501488,28.604130562462267 77.45494118103147,28.60135782154758 77.45661734849246,28.598535392787774 77.45818219153013,28.595666718733966 77.45963381053753,28.592755298058414 77.46097044444889,28.589804681274302 77.46219047284835,28.586818466393503 77.46329241790465,28.583800294527727 77.46427494612952,28.58075384543836 77.46513686995802,28.57768283304089 77.46587714914885,28.574591000868892 77.4664948920035,28.571482117503592 77.46698935640259,28.568359971974488 77.46735995065883,28.565228369136484 77.46760623418534,28.56209112502966 77.4677279179792,28.558952062226695 77.4677248649196,28.55581500517431 77.46759708988064,28.552683775533943 77.46734475965891,28.552683775533943 77.46734475965891,28.553079397193876 77.4622453846313,28.553474828308865 77.45714597129259,28.55387006887434 77.4520465196603,28.554265118885752 77.44694702975198,28.554659978338513 77.4418475015852,28.555054647228083 77.43674793517746,28.555449125549913 77.43164833054634,28.555843413299442 77.42654868770937,28.55623751047213 77.42144900668411,28.556631417063407 77.41634928748812,28.55702513306874 77.41124953013893,28.55741865848359 77.40614973465412,28.557811993303396 77.40104990105122,28.55820513752363 77.39595002934782,28.558598091139757 77.39085011956145,28.558990854147225 77.38575017170969,28.559383426541523 77.3806501858101,28.559775808318093 77.37555016188024,28.560167999472434 77.37045009993768,28.56056 77.36535))
and second dataset is for LAT and LONG as 28.56282, 77.36824 respectively saved in csv file .
I have used below Python code to join both data set based on condition if point exist within polygon. like below
import pandas as pd
import shapely.geometry
from shapely.geometry import Point
import geopandas as gpd
site_df = pd.read_csv (r'lat_long_file.csv') # load lat and long file
site_df['geometry'] = pd.DataFrame(site_df).apply(lambda x: Point(x.LAT,x.LONG), axis='columns') # convert lat and long to point
gdf = gpd.GeoDataFrame(site_df, geometry = site_df.geometry,crs='EPSG:4326') #creating geo pandas data frame for point
from shapely import wkt
polygon_df = pd.read_csv (r'polygon_csv_file') #reading polygon sample raw string file
polygon_df['geometry'] = pd.DataFrame(polygon_df).apply(lambda row: shapely.wkt.loads(row.polygon), axis='columns') #converting string polygon to geometory
gd_polygon = gpd.GeoDataFrame(polygon_df, geometry = polygon_df.geometry,crs='EPSG:4326') #create geopandas dataframe
import shapely.speedups
shapely.speedups.enable() # this makes some spatial queries run faster
join_data = gpd.sjoin(gdf, gd_polygon, how="inner", op="within") //actual join condition
But that query does not retun anything . But point is exist within polygon. as we can see in below diagram
Green Location marker is point Lat and long which is exist within polygon.
I would check the axis order - WKT usually interpreted as longitude first, latitude second order, while the point you construct uses latitude:longitude order.
You can try removing the CRS identifier to see if it changes the result.
Also see
https://gis.stackexchange.com/questions/376751/shapely-flips-lat-long-coordinate
and
https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
your sample data is unusable as it's an image
have sourced a polygon - a county boundary in UK
constructed a geopandas data frame of a point that is within this county
have used plotly to demonstrate visually the data
have used your code fragment gpd.sjoin(gdf, gd_polygon, how="inner", op="within") to do spatial join and it correctly joins point to polygon
import requests, json
import geopandas as gpd
import plotly.express as px
import shapely.geometry
# fmt: off
# get a polygon and construct a point
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
gd_polygon = gpd.GeoDataFrame.from_features(res.json()).loc[lambda d: d["LAD20NM"].str.contains("Hereford")]
gdf = gpd.GeoDataFrame(geometry=gd_polygon.loc[:,["LONG","LAT"]].apply(shapely.geometry.Point, axis=1)).reset_index(drop=True)
# fmt: on
# plot to show point is within polygon
px.scatter_mapbox(gd_polygon, lon="LONG", lat="LAT").update_traces(
name="gd_polygon"
).add_traces(
px.scatter_mapbox(gdf, lat=gdf2.geometry.y, lon=gdf2.geometry.x)
.update_traces(name="gdf", marker_color="red")
.data
).update_traces(
showlegend=True
).update_layout(
mapbox={
"style": "carto-positron",
"layers": [
{"source": json.loads(gd_polygon.geometry.to_json()), "type": "line"}
],
}
).show()
# spatial join, all good :-)
gpd.sjoin(gdf, gd_polygon, how="inner", op="within")
output
spatial join has worked, point is within polygon
geometry
index_right
OBJECTID
LAD20CD
LAD20NM
LAD20NMW
BNG_E
BNG_N
LONG
LAT
Shape__Area
Shape__Length
0
POINT (-2.73931 52.081539)
18
19
E06000019
Herefordshire, County of
349434
242834
-2.73931
52.0815
2.18054e+09
285427
I am doing some work on the Geopanda library, I have a shapefile with polygons and data on a excel sheet that I transform into points. I want to intersect the two DataFrames and export it to a file. I use also on both projections (WGS84) so that I can compare them.
There should be at least some points that intersects the polygons.
My intersect GeoSeries does not give me any points that fit into the polygon, but I don't see why...
I checked if the unit of the shapefile was really Kilometer and not somthing else. I am not proficient into GeoPlot so I can't really make sure what the GeoDataFrame look like.
f = pd.read_excel(io = 'C:\\Users\\peilj\\meteo_sites.xlsx')
#Converting panda dataframe into a GeoDataFrame with CRS projection
geometry = [Point(xy) for xy in zip(df.geoBreite, df.geoLaenge)]
df = df.drop(['geoBreite', 'geoLaenge'], axis=1)
crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
#Reading shapefile and creating buffer
gdfBuffer = geopandas.read_file(filename = 'C:\\Users\\peilj\\lkr_vallanUTM.shp')
gdfBuffer = gdfBuffer.buffer(100) #When the unit is kilometer
#Converting positions long/lat into shapely object
gdfBuffer = gdfBuffer.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
#Intersection coordonates from polygon Buffer and points of stations
gdf['intersection'] = gdf.geometry.intersects(gdfBuffer)
#Problem: DOES NOT FIND ANY POINTS INSIDE STATIONS !!!!!!!
#Giving CRS projection to the intersect GeoDataframe
gdf_final = gdf.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
gdf_final['intersection'] = gdf_final['intersection'].astype(int) #Shapefile does not accept bool
#Exporting to a file
gdf_final.to_file(driver='ESRI Shapefile', filename=r'C:\\GIS\\dwd_stationen.shp
The files needed:
https://drive.google.com/open?id=11x55aNxPOdJVKDzRWLqrI3S_ExwbqCE9
two things:
You need to swap geoBreite and geoLaenge when creating the points to:
geometry = [Point(xy) for xy in zip(df.geoLaenge, df.geoBreite)]
This is because shapely follows the x, y logic, not lat, lon.
As for checking the intersection, you could do as follows:
gdf['inside'] = gdf['geometry'].apply(lambda shp: shp.intersects(gdfBuffer.dissolve('LAND').iloc[0]['geometry']))
which detects six stations inside the shape file:
gdf['inside'].sum()
ouputs:
6
So along with some other minor fixes we get:
import geopandas as gpd
from shapely.geometry import Point
df = pd.read_excel(r'C:\Users\peilj\meteo_sites.xlsx')
geometry = [Point(xy) for xy in zip(df.geoLaenge, df.geoBreite)]
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
gdfBuffer = gpd.read_file(filename = r'C:\Users\peilj\lkr_vallanUTM.shp')
gdfBuffer['goemetry'] = gdfBuffer['geometry'].buffer(100)
gdfBuffer = gdfBuffer.to_crs(crs)
gdf['inside'] = gdf['geometry'].apply(lambda shp: shp.intersects(gdfBuffer.dissolve('LAND').iloc[0]['geometry']))
I'm working with datasets where latitudes and longitudes are sometimes mislabeled and I need to flip the longitudes and the latitudes. The best solution I could come up with is to extract the x an y coordinates using df.geometry.x and df.geometry.y, create a new geometry column, and reconstruct the GeoDataFrame using the new geometry column. Or in code form:
import geopandas
from shapely.geometry import Point
gdf['coordinates'] = list(zip(gdf.geometry.y, gdf.geometry.x))
gdf['coordinates'] = gdf['coordinates'].apply(Point)
gdf= gpd.GeoDataFrame(point_data, geometry='coordinates', crs = 4326)
This is pretty ugly, requires creating a new column and isn't efficient for large datasets. Is there an easier way to flip the longitude and latitude coordinates of a GeoSeries/ GeoDataFrame?
You can create the geometry column directly:
df['geometry'] = df.apply(lambda row: Point(row['y'], row['x']), axis=1)
df = gpd.GeoDataFrame(df, crs=4326)
It works for Point and Polygon either:
gpd.GeoSeries(gdf['coordinates']).map(lambda polygon: shapely.ops.transform(lambda x, y: (y, x), polygon))