Check if the point within a polygon (speed up) - python

I have 2 data frames. 1) data - long and lat points 2) border = shapefile of a city
I need to check which points are within the shapefile and save them. Here is my code to do that:
Data
city = pd.read_csv("D:...path.../data.csv")
crs = {'init':'epsg:4326'}
geometry = [Point(xy) for xy in zip(city.longitude,city.latitude)]
city_point = gpd.GeoDataFrame(city,crs=crs,geometry=geometry)
Border
border = gpd.read_file("C:...path.../border.shp")
border_gdf = gpd.GeoDataFrame(border, geometry='geometry')
Final check
city_point['inside'] = city_point['geometry'].apply(border_gdf.contains)
city_point = city_point[city_point.inside != True]
Libraries
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon

city_point[city_point.geometry.within(border_gdf.iloc[0].geometry)]

Related

Extract raster value using multipolygon type of shape using python

I am trying to export raster values using multipolygon shapefile in python. I have found the answer here, but the calculation there is not valid for multipolygon. Could please someone guide me, how i should correct the code in order to have not polygon but multipolygon datatype in calculation.
My code is below:
import rasterio
from rasterio.mask import mask
import geopandas as gpd
import numpy as np
from rasterio import Affine
from shapely.geometry import mapping
shapefile = gpd.read_file(r'/Users..../polygon_sector.shp')
geoms = shapefile.geometry.values
geometry = geoms[0] # shapely geometry
# transform to GeJSON format
geoms = [mapping(geoms[0])]
# extract the raster values within the polygon
with rasterio.open("/Users/.../map_reclass.tif") as src:
out_image, out_transform = mask(src, geoms, crop=True)
# no data values of the original raster
no_data=src.nodata
print(no_data)
# extract the values of the masked array
data = out_image[0,:,:]
# extract the row, columns of the valid values
row, col = np.where(data != no_data)
rou = np.extract(data != no_data, data)
# affine import Affine
T1 = out_transform * Affine.translation(0.5, 0.5) # reference the pixel centre
rc2xy = lambda r, c: (c, r) * T1
d = gpd.GeoDataFrame({'col':col,'row':row,'ROU':rou})
# coordinate transformation
d['x'] = d.apply(lambda row: rc2xy(row.row,row.col)[0], axis=1)
d['y'] = d.apply(lambda row: rc2xy(row.row,row.col)[1], axis=1)
# geometry
from shapely.geometry import Point
d['geometry'] =d.apply(lambda row: Point(row['x'], row['y']), axis=1)
# save to a shapefile
d.to_file(r'/Users/y.../result_full.shp', driver='ESRI Shapefile')
I have tried to assign the other geometry (multipolygon) but i did it wrong, since when i print the geometry it was still POLYGON, not MULTIPOLYGON. So far as i understood it should come from shapely.

Define a circe that circumscribes a set of points (shapefile) in python

I have a shapefile of points, defined by X and Y coordinates, ad the ID feature.
I have at least 3 different points with the same ID number.
I would like to define, for each ID, the shapefile of a circle that circumscribes the points.
How can this be done in python environment?
there is a library that does it: https://pypi.org/project/miniball/
it's pretty forward to integrate in standard pandas pattern https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
solution really reduces to this:
def circle(points):
p, r = miniball.get_bounding_ball(np.array([points.x, points.y]).T)
return shapely.geometry.Point(p).buffer(math.sqrt(r))
col = "group"
# generate circles around groups of points
gdf_c = cities.groupby(col, as_index=False).agg(geometry=("geometry", circle))
with sample example and visualisation, circles do become distorted due to epsg:4326 projection limitations
full working example
import geopandas as gpd
import numpy as np
import shapely
import miniball
import math
import pandas as pd
cities = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# a semi-synthetic grouping of cities
world["size"] = world.groupby("continent")["pop_est"].apply(
lambda d: pd.cut(d, 2, labels=list("ab"), duplicates="drop").astype(str)
)
cities = cities.sjoin(world.loc[:, ["continent", "iso_a3", "size", "geometry"]])
cities["group"] = cities["continent"] + cities["size"]
def circle(points):
p, r = miniball.get_bounding_ball(np.array([points.x, points.y]).T)
return shapely.geometry.Point(p).buffer(math.sqrt(r))
col = "group"
# generate circles around groups of points
gdf_c = cities.groupby(col, as_index=False).agg(geometry=("geometry", circle))
# visualize it
m = cities.explore(column=col, height=300, width=600, legend=False)
gdf_c.loc[~gdf_c["geometry"].is_empty].explore(
m=m, column=col, marker_kwds={"radius": 20}, legend=False
)
output

How to join point with polygon in geopandas

I have the polygon combination of lat-long1,lat2-long2 ..... and point like Lat - Long .
I have used GeoPandas library to get the result if there is any point is exist within polygon.
Sample Data of Polygon saved in csv file:
POLYGON((28.56056 77.36535,28.564635293716776
77.3675137204626,28.56871055311656 77.36967760850214,28.572785778190855 77.3718416641586,28.576860968931193 77.37400588747194,28.580936125329096 77.3761702784821,28.585011247376094 77.37833483722912,28.58908633506372 77.38049956375293,28.593161388383457 77.38266445809356,28.59723640732686 77.38482952029099,28.60131139188541 77.38699475038526,28.605386342050664 77.38916014841635,28.60946125781409 77.39132571442434,28.613536139167238 77.39349144844923,28.61761098610158 77.39565735053108,28.62168579860863 77.39782342070995,28.62576057667991 77.39998965902589,28.62983532030691 77.402156065519,28.633910029481108 77.40432264022931,28.637984704194054 77.40648938319696,28.642059344437207 77.408656294462,28.64068221074683 77.41187044231611,28.63920739580329 77.41502778244606,28.63763670052024 77.41812446187686,28.635972042808007 77.42115670220443,28.634215455216115 77.42412080422613,28.63236908243526 77.42701315247152,28.630435178662026 77.42983021962735,28.628416104829583 77.43256857085188,28.626314325707924 77.43522486797251,28.624132406877322 77.437795873562,28.621873011578572 77.44027845488824,28.619538897444272 77.4426695877325,28.617132913115164 77.44496636007166,28.614657994745563 77.44716597562005,28.612117162402576 77.44926575722634,28.609513516363293 77.45126315012166,28.606850233314923 77.45315572501488,28.604130562462267 77.45494118103147,28.60135782154758 77.45661734849246,28.598535392787774 77.45818219153013,28.595666718733966 77.45963381053753,28.592755298058414 77.46097044444889,28.589804681274302 77.46219047284835,28.586818466393503 77.46329241790465,28.583800294527727 77.46427494612952,28.58075384543836 77.46513686995802,28.57768283304089 77.46587714914885,28.574591000868892 77.4664948920035,28.571482117503592 77.46698935640259,28.568359971974488 77.46735995065883,28.565228369136484 77.46760623418534,28.56209112502966 77.4677279179792,28.558952062226695 77.4677248649196,28.55581500517431 77.46759708988064,28.552683775533943 77.46734475965891,28.552683775533943 77.46734475965891,28.553079397193876 77.4622453846313,28.553474828308865 77.45714597129259,28.55387006887434 77.4520465196603,28.554265118885752 77.44694702975198,28.554659978338513 77.4418475015852,28.555054647228083 77.43674793517746,28.555449125549913 77.43164833054634,28.555843413299442 77.42654868770937,28.55623751047213 77.42144900668411,28.556631417063407 77.41634928748812,28.55702513306874 77.41124953013893,28.55741865848359 77.40614973465412,28.557811993303396 77.40104990105122,28.55820513752363 77.39595002934782,28.558598091139757 77.39085011956145,28.558990854147225 77.38575017170969,28.559383426541523 77.3806501858101,28.559775808318093 77.37555016188024,28.560167999472434 77.37045009993768,28.56056 77.36535))
and second dataset is for LAT and LONG as 28.56282, 77.36824 respectively saved in csv file .
I have used below Python code to join both data set based on condition if point exist within polygon. like below
import pandas as pd
import shapely.geometry
from shapely.geometry import Point
import geopandas as gpd
site_df = pd.read_csv (r'lat_long_file.csv') # load lat and long file
site_df['geometry'] = pd.DataFrame(site_df).apply(lambda x: Point(x.LAT,x.LONG), axis='columns') # convert lat and long to point
gdf = gpd.GeoDataFrame(site_df, geometry = site_df.geometry,crs='EPSG:4326') #creating geo pandas data frame for point
from shapely import wkt
polygon_df = pd.read_csv (r'polygon_csv_file') #reading polygon sample raw string file
polygon_df['geometry'] = pd.DataFrame(polygon_df).apply(lambda row: shapely.wkt.loads(row.polygon), axis='columns') #converting string polygon to geometory
gd_polygon = gpd.GeoDataFrame(polygon_df, geometry = polygon_df.geometry,crs='EPSG:4326') #create geopandas dataframe
import shapely.speedups
shapely.speedups.enable() # this makes some spatial queries run faster
join_data = gpd.sjoin(gdf, gd_polygon, how="inner", op="within") //actual join condition
But that query does not retun anything . But point is exist within polygon. as we can see in below diagram
Green Location marker is point Lat and long which is exist within polygon.
I would check the axis order - WKT usually interpreted as longitude first, latitude second order, while the point you construct uses latitude:longitude order.
You can try removing the CRS identifier to see if it changes the result.
Also see
https://gis.stackexchange.com/questions/376751/shapely-flips-lat-long-coordinate
and
https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
your sample data is unusable as it's an image
have sourced a polygon - a county boundary in UK
constructed a geopandas data frame of a point that is within this county
have used plotly to demonstrate visually the data
have used your code fragment gpd.sjoin(gdf, gd_polygon, how="inner", op="within") to do spatial join and it correctly joins point to polygon
import requests, json
import geopandas as gpd
import plotly.express as px
import shapely.geometry
# fmt: off
# get a polygon and construct a point
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
gd_polygon = gpd.GeoDataFrame.from_features(res.json()).loc[lambda d: d["LAD20NM"].str.contains("Hereford")]
gdf = gpd.GeoDataFrame(geometry=gd_polygon.loc[:,["LONG","LAT"]].apply(shapely.geometry.Point, axis=1)).reset_index(drop=True)
# fmt: on
# plot to show point is within polygon
px.scatter_mapbox(gd_polygon, lon="LONG", lat="LAT").update_traces(
name="gd_polygon"
).add_traces(
px.scatter_mapbox(gdf, lat=gdf2.geometry.y, lon=gdf2.geometry.x)
.update_traces(name="gdf", marker_color="red")
.data
).update_traces(
showlegend=True
).update_layout(
mapbox={
"style": "carto-positron",
"layers": [
{"source": json.loads(gd_polygon.geometry.to_json()), "type": "line"}
],
}
).show()
# spatial join, all good :-)
gpd.sjoin(gdf, gd_polygon, how="inner", op="within")
output
spatial join has worked, point is within polygon
geometry
index_right
OBJECTID
LAD20CD
LAD20NM
LAD20NMW
BNG_E
BNG_N
LONG
LAT
Shape__Area
Shape__Length
0
POINT (-2.73931 52.081539)
18
19
E06000019
Herefordshire, County of
349434
242834
-2.73931
52.0815
2.18054e+09
285427

Combining shapefiles in Python / GeoPandas

I have three polygon shapefiles which overlap each other. Let's call them:
file_one.shp (polygon Name is 1)
file_two.shp (polygon Name is 2)
file_three.shp (polygon Name is 3)
I want to combine them and keep the values like this.
How can I achieve the result (As shown in the figure) in Python, please?
Thanks!
If you want to simply create one shapefile from files you've mentioned you can try following code (I assume that shapefiles has same columns).
import pandas as pd
import geopandas as gpd
gdf1 = gpd.read_file('file_one.shp')
gdf2 = gpd.read_file('file_two.shp')
gdf3 = gpd.read_file('file_three.shp')
gdf = gpd.GeoDataFrame(pd.concat([gdf1, gdf2, gdf3]))
First, let's generate some data for demonstration:
import geopandas as gpd
from shapely.geometry import Point
shp1 = gpd.GeoDataFrame({'geometry': [Point(1, 1).buffer(3)], 'name': ['Shape 1']})
shp2 = gpd.GeoDataFrame({'geometry': [Point(1, 1).buffer(2)], 'name': ['Shape 2']})
shp3 = gpd.GeoDataFrame({'geometry': [Point(1, 1).buffer(1)], 'name': ['Shape 3']})
Now take the symmetric difference for all, but the smallest shape, that can be left as is:
diffs = []
gdfs = [shp1, shp2, shp3]
for idx, gdf in enumerate(gdfs):
if idx < 2:
diffs.append(gdf.symmetric_difference(gdfs[idx+1]).iloc[0])
diffs.append(shp3.iloc[0].geometry)
There you go, now you have the desired shapes as a list in diffs. If you would like to combine them to one GeoDataFrame, just do as follows:
all_shapes = gpd.GeoDataFrame(geometry=diffs)

Creating a shape file from a bounding box coordinates list

There is already few existing questions about this topic, but I unfortunately did not find something that could fix my problem.
I have a point Lat, Long coordinate i.e. Lat= 10 and Long = 10. I want to create a shape file of a 0.5 degree bounding box around this point, so the bounding box should be as follow:
minimum Long= 9.75
minimum Lat = 9.75
maximum Long = 10.25
maximum Lat = 10.25
Does anyone knows how to do that in Python?
Here's one way to do it using shapely, geopandas and pandas:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
def bbox(lat,lng, margin):
return Polygon([[lng-margin, lat-margin],[lng-margin, lat+margin],
[lng+margin,lat+margin],[lng+margin,lat-margin]])
gpd.GeoDataFrame(pd.DataFrame(['p1'], columns = ['geom']),
crs = {'init':'epsg:4326'},
geometry = [bbox(10,10, 0.25)]).to_file('poly.shp')
I want to enchance Bruno Carballo's code. I hope it will easier for you
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
# function to return polygon
def bbox(long0, lat0, lat1, long1):
return Polygon([[long0, lat0],
[long1,lat0],
[long1,lat1],
[long0, lat1]])
test = bbox(9.75, 9.75, 10.25, 10.25)
gpd.GeoDataFrame(pd.DataFrame(['p1'], columns = ['geom']),
crs = {'init':'epsg:4326'},
geometry = [test]).to_file('poly.shp')
And here is an implementation of Bruno Carballo's answer that applies it to en entire DataFrame:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
# function to return polygon
def bbox(vec):
long0, lat0, lat1, long1 = vec[0], vec[1], vec[2], vec[3]
return Polygon([[long0, lat0],
[long0,lat1],
[long1,lat1],
[long1, lat0]])
def extentPolygon(df):
return(
pd.DataFrame({'geometry' : df[['ext_min_x','ext_min_y','ext_max_y','ext_max_x']].apply(bbox, axis = 1)})
)
df = pd.DataFrame({'ext_min_x' : [9.75, 9.78], 'ext_max_x' : [10.25, 10.28],
'ext_min_y' : [9.75, 9.78], 'ext_max_y' : [10.25, 10.28]})
df = extentPolygon(df)
After which you can easily turn the resulting DataFrame into a GeoDataFrame:
df_gp = gdp.GeoDataFrame(df)

Categories

Resources