I have a dataframe that contains lon/lat information. The aim is to find all the points within a rad distance to a specific point st_p.
In fact I kind of have already have the code in R, but I need to do the same on python.
Here what I do is that, I convert the dataframes to sf objects, I define a buffer, the I make the intersection with the buffer.
Here is the R code.
I just dont know what libraries to use in Python in order to do the same.
within_radius <- function(df, st_p, rad) {
# Transform to an sf object and cahnge from lon/lat to utm
sf_df <- st_transform(st_as_sf(
df,
coords = c("lon", "lat"),
crs = 4326,
agr = "constant"
), 6622)
# Create an utm st point based on the coordinates of the stop point
cntr <- st_transform(st_sfc(st_p, crs = 4326), 6622)
# Craete a circular buffer with the given radius
buff <- st_buffer(cntr, rad)
# Filter the points that are within the buffer
intr <- st_intersects(sf_df, buff, sparse = F)
sf_df <- st_transform(sf_df, 4326)
sf_df <- sf_df[which(unlist(intr)), ]
# Compute the distance of esch point to the begining of the road segment
xy = st_coordinates(st_centroid(sf_df))
nc.sort = sf_df[order(xy[, "X"], xy[, "Y"]), ]
sf_df <- nc.sort %>%
mutate(dist = st_distance(
x = nc.sort,
y = nc.sort[1, ],
by_element = TRUE
))
}
You can use geopandas and shapely to do pretty much anything
Create a geopandas geodataframe from a pandas dataframe with lat, lng:
In [19]: import pandas as pd
In [20]: import geopandas as gpd
In [21]: from shapely.geometry import Point
In [22]: df = pd.DataFrame({"lat": [19.435175, 19.432909], "lng":[-99.141197, -99.146036]})
In [23]: gf = gpd.GeoDataFrame(df, geometry = [Point(x,y) for (x,y) in zip(df.lng, df.lat)], crs = "epsg:4326")
In [24]: gf
Out[24]:
lat lng geometry
0 19.435175 -99.141197 POINT (-99.14120 19.43518)
1 19.432909 -99.
146036 POINT (-99.14604 19.43291)
buffer, projections and other operations are available for a geodataframe, this is how you convert to a metric projection and create a 10m buffer:
In [27]: gf.to_crs(6622).buffer(10)
Out[27]:
0 POLYGON ((-3597495.980 -2115793.588, -3597496....
1 POLYGON ((-3598149.053 -2115813.383, -3598149....
dtype: geometry
You can call intersects to get the intersection between a buffer and a point:
In [29]: gf.to_crs(6622).buffer(10).intersects(Point(-3597505.980,-2115793.588))
Out[29]:
0 True
1 False
dtype: bool
compute centroids:
In [30]: gf.to_crs(6622).buffer(10).centroid
Out[30]:
0 POINT (-3597505.980 -2115793.588)
1 POINT (-3598159.053 -2115813.383)
dtype: geometry
filter using the buffer:
In [31]: gf.loc[gf.to_crs(6622).buffer(10).intersects(Point(-3597505.980,-2115793.588))]
Out[31]:
lat lng geometry
0 19.435175 -99.141197 POINT (-99.14120 19.43518)
distance gives you the distance to the closest point in a geometry:
In [33]: gf.to_crs(6622).buffer(10).distance(Point(-3597505.980,-2115793.588))
Out[33]:
0 0.000000
1 643.377576
dtype: float64
And you can do a lot more, just look at the documentation
https://geopandas.org/index.html
Also look at shapely's documentation to see how to project a single point https://shapely.readthedocs.io/en/latest/manual.html#shapely.ops.transform
Related
I downloaded a geotiff from here: https://www.nass.usda.gov/Research_and_Science/Crop_Progress_Gridded_Layers/index.php
(file also available: https://drive.google.com/file/d/1XcfEw-CZgVFE2NJytu4B1yBvjWydF-Tm/view?usp=sharing)
Looking at one of the weeks in 2021, I'd like to convert the geotiff to a data frame so I have an associated value with each lat/lon pair in the geotiff.
I tried:
import rioxarray
fl = 'data/cpc2021/corn/cpccorn2021/condition/cornCond21w24.tif'
da = rioxarray.open_rasterio(fl, masked=True)
df = da[0].to_pandas()
df['y'] = df.index
pd.melt(df, id_vars='y')
However, this returns a dataframe with x and y that don't seem to correspond to the lat/lon. How can I add (or retain) this information while converting?
Expect lat/lon points to be in contiguous US
edit:
I found a meta file that has the projections: NAD_1983_Contiguous_USA_Albers
which I believe corresponds to EPSG:5070 (also seen later in the same xml file)
I also found the bounding box for lat/lon coordinates:
<GeoBndBox esriExtentType="search">
<exTypeCode Sync="TRUE">1</exTypeCode>
<westBL Sync="TRUE">-127.360895</westBL>
<eastBL Sync="TRUE">-68.589171</eastBL>
<northBL Sync="TRUE">51.723828</northBL>
<southBL Sync="TRUE">23.297865</southBL>
However, still uncertain how to include this information in my quest to convert to dataframe.
Result of print(da) is:
<xarray.DataArray (band: 1, y: 320, x: 479)>
[153280 values with dtype=float32]
Coordinates:
* band (band) int64 1
* x (x) float64 -2.305e+06 -2.296e+06 ... 1.987e+06 1.996e+06
* y (y) float64 3.181e+06 3.172e+06 ... 3.192e+05 3.102e+05
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
RepresentationType: ATHEMATIC
STATISTICS_COVARIANCES: 0.1263692188822515
STATISTICS_MAXIMUM: 4.8569073677063
STATISTICS_MEAN: 3.7031858480518
STATISTICS_MINIMUM: 2.1672348976135
STATISTICS_SKIPFACTORX: 1
STATISTICS_SKIPFACTORY: 1
STATISTICS_STDDEV: 0.35548448472789
scale_factor: 1.0
add_offset: 0.0
Credit to Jose from the GIS community:
import rioxarray
import pandas as pd
da = rioxarray.open_rasterio(fl, masked=True)
da = da.rio.reproject("EPSG:4326")
df = da[0].to_pandas()
df['y'] = df.index
df = pd.melt(df, id_vars='y')
https://gis.stackexchange.com/questions/443801/add-lat-and-lon-to-dataarray-read-in-by-rioxarray/443810#443810
I want to calculate the distance from each point of dataframe geosearch_crs to the polygons in the gelb_crs dataframe, returning only the minimum distance.
I have tried this code:
for i in range(len(geosearch_crs)):
point = geosearch_crs['geometry'].iloc[i]
for j in range(len(gelb_crs)):
poly = gelb_crs['geometry'].iloc[j]
print(point.distance(poly).min())
it returns this error:
AttributeError: 'float' object has no attribute 'min'
I somehow don't get how to return what i want, the points.distance(poly).min() function should work though.
This is part of the data frames (around 180000 entries):
geosearch_crs:
count
geometry
12
POINT (6.92334 50.91695)
524
POINT (6.91970 50.93167)
5
POINT (6.96946 50.91469)
gelb_crs (35 entries):
name
geometry
Polygon 1
POLYGON Z ((6.95712 50.92851 0.00000, 6.95772 ...
Polygon 2
POLYGON Z ((6.91896 50.92094 0.00000, 6.92211 ...
I'm not sure about the 'distance' method, but maybe you could try adding the distances to a list:
distances = list()
for i in geosearch_crs:
for j in gelb_crs:
distances.append(i.distance(j))
print(min(distances))
your sample polygon data is unusable as it's truncated with ellipses. Have used two other polygons to deomonstrate as a MWE
need to ensure that CRS in both data frames is compatible. Your sample data is clearly in two different CRS, points look like epsg:4326, polygons are either a UTM CRS or EPSG:3857 from the range of values
geopandas sjoin_nearest() is simple way to find nearest polygon and get distance. Have use UTM CRS so that distance is in meters rather than degrees
import geopandas as gpd
import pandas as pd
import shapely
import io
df = pd.read_csv(
io.StringIO(
"""count,geometry
12,POINT (6.92334 50.91695)
524,POINT (6.91970 50.93167)
5,POINT (6.96946 50.91469)"""
)
)
geosearch_crs = gpd.GeoDataFrame(
df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326"
)
# generated as sample in question unusable
df = pd.read_csv(
io.StringIO(
'''name,geometry
Polygon 1,"POLYGON ((6.9176561 50.8949742, 6.9171649 50.8951417, 6.9156967 50.8957149, 6.9111788 50.897751, 6.9100077 50.8989409, 6.9101989 50.8991319, 6.9120049 50.9009167, 6.9190374 50.9078591, 6.9258157 50.9143227, 6.9258714 50.9143691, 6.9259546 50.9144355, 6.9273598 50.915413, 6.9325715 50.9136438, 6.9331018 50.9134553, 6.9331452 50.9134397, 6.9255391 50.9018725, 6.922309 50.8988869, 6.9176561 50.8949742))"
Polygon 2,"POLYGON ((6.9044955 50.9340428, 6.8894236 50.9344297, 6.8829359 50.9375553, 6.8862995 50.9409307, 6.889446 50.9423764, 6.9038401 50.9436598, 6.909518 50.9383374, 6.908634 50.9369064, 6.9046363 50.9340648, 6.9045721 50.9340431, 6.9044955 50.9340428))"'''
)
)
gelb_crs = gpd.GeoDataFrame(
df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326"
)
geosearch_crs.to_crs(geosearch_crs.estimate_utm_crs()).sjoin_nearest(
gelb_crs.to_crs(geosearch_crs.estimate_utm_crs()), distance_col="distance"
)
count
geometry
index_right
name
distance
0
12
POINT (354028.1446652143 5642643.287732874)
0
Polygon 1
324.158
2
5
POINT (357262.7994182631 5642301.777981625)
0
Polygon 1
2557.33
1
524
POINT (353818.4585403281 5644287.172541857)
1
Polygon 2
971.712
I have this data frame, that contains lat/lon coordiantes:
Lat Lon
29.39291 -98.50925
29.39923 -98.51256
29.40147 -98.51123
29.38752 -98.52372
29.39291 -98.50925
29.39537 -98.50402
29.39343 -98.49707
29.39291 -98.50925
29.39556 -98.53148
These are the coordinates that construct the polygon:
Lat Lon
29.392945 -98.507696
29.406167 -98.509074
29.407234 -98.517039
29.391325 -98.517166
I want to check for each coordinate (from the first data frame) if it's within the polygon, using Python, and taking into account the great circle.
Expected result:
Lat Lon Within
29.39291 -98.50925 1
29.39923 -98.51256 1
29.40147 -98.51123 1
29.38752 -98.52372 0
29.39291 -98.50925 1
29.39537 -98.50402 0
29.39343 -98.49707 0
29.39291 -98.50925 1
29.39556 -98.53148 0
From here What's the fastest way of checking if a point is inside a polygon in python, assuming your dataframe of the polygon is df_poly and the points are df_points:
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
polygon = Polygon([tuple(x) for x in df_poly[['Lat', 'Lon']].to_numpy()])
df_points['Within'] = df_points.apply(lambda x: polygon.contains(Point(x['Lat'], x['Lon'])), axis=1)
I am doing some work on the Geopanda library, I have a shapefile with polygons and data on a excel sheet that I transform into points. I want to intersect the two DataFrames and export it to a file. I use also on both projections (WGS84) so that I can compare them.
There should be at least some points that intersects the polygons.
My intersect GeoSeries does not give me any points that fit into the polygon, but I don't see why...
I checked if the unit of the shapefile was really Kilometer and not somthing else. I am not proficient into GeoPlot so I can't really make sure what the GeoDataFrame look like.
f = pd.read_excel(io = 'C:\\Users\\peilj\\meteo_sites.xlsx')
#Converting panda dataframe into a GeoDataFrame with CRS projection
geometry = [Point(xy) for xy in zip(df.geoBreite, df.geoLaenge)]
df = df.drop(['geoBreite', 'geoLaenge'], axis=1)
crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
#Reading shapefile and creating buffer
gdfBuffer = geopandas.read_file(filename = 'C:\\Users\\peilj\\lkr_vallanUTM.shp')
gdfBuffer = gdfBuffer.buffer(100) #When the unit is kilometer
#Converting positions long/lat into shapely object
gdfBuffer = gdfBuffer.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
#Intersection coordonates from polygon Buffer and points of stations
gdf['intersection'] = gdf.geometry.intersects(gdfBuffer)
#Problem: DOES NOT FIND ANY POINTS INSIDE STATIONS !!!!!!!
#Giving CRS projection to the intersect GeoDataframe
gdf_final = gdf.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
gdf_final['intersection'] = gdf_final['intersection'].astype(int) #Shapefile does not accept bool
#Exporting to a file
gdf_final.to_file(driver='ESRI Shapefile', filename=r'C:\\GIS\\dwd_stationen.shp
The files needed:
https://drive.google.com/open?id=11x55aNxPOdJVKDzRWLqrI3S_ExwbqCE9
two things:
You need to swap geoBreite and geoLaenge when creating the points to:
geometry = [Point(xy) for xy in zip(df.geoLaenge, df.geoBreite)]
This is because shapely follows the x, y logic, not lat, lon.
As for checking the intersection, you could do as follows:
gdf['inside'] = gdf['geometry'].apply(lambda shp: shp.intersects(gdfBuffer.dissolve('LAND').iloc[0]['geometry']))
which detects six stations inside the shape file:
gdf['inside'].sum()
ouputs:
6
So along with some other minor fixes we get:
import geopandas as gpd
from shapely.geometry import Point
df = pd.read_excel(r'C:\Users\peilj\meteo_sites.xlsx')
geometry = [Point(xy) for xy in zip(df.geoLaenge, df.geoBreite)]
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
gdfBuffer = gpd.read_file(filename = r'C:\Users\peilj\lkr_vallanUTM.shp')
gdfBuffer['goemetry'] = gdfBuffer['geometry'].buffer(100)
gdfBuffer = gdfBuffer.to_crs(crs)
gdf['inside'] = gdf['geometry'].apply(lambda shp: shp.intersects(gdfBuffer.dissolve('LAND').iloc[0]['geometry']))
I have a dataframe with earthquake data called eq that has columns listing latitude and longitude. using geopandas I created a point column with the following:
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
s = GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
eq['geometry'] = s
eq.crs = {'init': 'epsg:4326', 'no_defs': True}
eq
Now I have a geometry column with lat lon coordinates but I want to change the projection to UTM. Can anyone help with the transformation?
Latitude/longitude aren't really a projection, but sort of a default "unprojection". See this page for more details, but it probably means your data uses WGS84 or epsg:4326.
Let's build a dataset and, before we do any reprojection, we'll define the crs as epsg:4326
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
df = pd.DataFrame({'id': [1, 2, 3], 'population' : [2, 3, 10], 'longitude': [-80.2, -80.11, -81.0], 'latitude': [11.1, 11.1345, 11.2]})
s = gpd.GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
geo_df = gpd.GeoDataFrame(df[['id', 'population']], geometry=s)
# Define crs for our geodataframe:
geo_df.crs = {'init': 'epsg:4326'}
I'm not sure what you mean by "UTM projection". From the wikipedia page I see there are 60 different UTM projections depending on the area of the world. You can find the appropriate epsg code online, but I'll just give you an example with a random epsgcode. This is the one for zone 33N for example
How do you do the reprojection? You can easily get this info from the geopandas docs on projection. It's just one line:
geo_df = geo_df.to_crs({'init': 'epsg:3395'})
and the geometry isn't coded as latitude/longitude anymore:
id population geometry
0 1 2 POINT (-8927823.161620541 1235228.11420853)
1 2 3 POINT (-8917804.407449147 1239116.84994171)
2 3 10 POINT (-9016878.754255159 1246501.097746004)