Python: how to check if shape areas contain points?

Python: how to check if shape areas contain points? - python

I have a shapefile of Singapore that I visualize in this way:
import shapefile
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
### Plot shapefile
sf = shapefile.Reader("myShape.shp")
recs = sf.records()
shapes = sf.shapes()
Nshp = len(shapes)
cns = []
for nshp in xrange(Nshp):
cns.append(recs[nshp][1])
cns = array(cns)
cm = get_cmap('Dark2')
cccol = cm(1.*arange(Nshp)/Nshp)
# -- plot --
fig,ax = plt.subplots(figsize=(20,10))
for nshp in xrange(Nshp):
ptchs = []
pts = array(shapes[nshp].points)
prt = shapes[nshp].parts
par = list(prt) + [pts.shape[0]]
for pij in xrange(len(prt)):
ptchs.append(Polygon(pts[par[pij]:par[pij+1]]))
ax.add_collection(PatchCollection(ptchs,facecolor=cccol[nshp,:],edgecolor='k', linewidths=.1))
ax.set_xlim(103.6,104.1)
ax.set_ylim(1.15,1.48)
Now I have a list of points with coordinates in a dataframe and I want to check in which region of the shapefile the points are.
mypoints
lon lat
0 103.619740 1.280485
1 103.622632 1.268944
2 103.622632 1.274714
3 103.622632 1.277600
4 103.622632 1.280485
In particular from the shapefile I can extract information of the specific area such as the name. In fact, in recs I have this information. For instance:
recs[0]:
[42,
1,
'STRAITS VIEW',
'SVSZ01',
'Y',
'STRAITS VIEW',
'SV',
'CENTRAL REGION',
'CR',
'21451799CA1AB6EF',
datetime.date(2016, 5, 11),
30832.9017,
28194.0843,
5277.76082729,
1127297.23737]
In this case STRAITS VIEW is the name of the area. At the the end I would like to have a dataframe like:
mypoints
lon lat name
0 103.619740 1.280485 STRAITS VIEW
1 103.622632 1.268944 STRAITS VIEW
2 103.622632 1.274714 DOWNTOWN
3 103.622632 1.277600 BEDOK
4 103.622632 1.280485 CHANGI

To check if a coordinate lies inside some Polygon you could use the Intersects method from GDAL. This method compares two OGRGeometry objects and return a boolean value indicating whether they intercept or not.
With this you can iterate over your points, and for each one test if it Intersects() with all your Polygon areas. It should look something like:
for point_geom in your_points:
#iterate over your area polygons and check
for area_geom in your_areas:
if area_geom.Intersects(point_geom):
#then you have a match, save or do as you want
Note: In case your points are not an OGRGeometry (your Polygons most probably are as you are reading them from a Shapefile), you can create such Geometry as suggested here:
from osgeo import ogr
point = ogr.Geometry(ogr.wkbPoint)
point.AddPoint(1198054.34, 648493.09)
and in the case of Polygons:
ring = ogr.Geometry(ogr.wkbLinearRing)
ring.AddPoint(1179091.1646903288, 712782.8838459781)
ring.AddPoint(1161053.0218226474, 667456.2684348812)
#etc...add all your points, in a loop would be better
# Create polygon
poly = ogr.Geometry(ogr.wkbPolygon)
poly.AddGeometry(ring)

Related

Wrong points buffer using geopandas

Good evening,
I'm working on a product to detect local events (strike) within subscription areas.
The yellow polygons should be 40KM (left) and 50KM (right) circles around central red points. Green points are my strikes that should be detected in my process.
It appears that my current use of buffer() does not produce 40/50 Km buffer radius as expected and then my process in missing my two events .
My code:
# Create my two events to detect
df_strike = pd.DataFrame(
{ 'Latitude': [27.0779, 31.9974],
'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})
# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
{ 'perimeter_id': [1370, 13858],
'distance' : [40.0, 50.0],
'custom_lat': [31.6661, 26.6500],
'custom_lon': [38.6635, 51.5700]})
gdf_locations = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})
# Now reproject to a crs using meters
gdf_locations = gdf_locations.to_crs({'init':'epsg:3857'})
gdf_events = gdf_events.to_crs({'init':'epsg:3857'})
# Create buffer using distance (in meters) from locations
gdf_locations['geometry'] = gdf_locations['geometry'].buffer(gdf_locations['distance']*1000)
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
But my result is an empty dataframe and should not be. If I compute distance between events and locations (distance between red and green points):
pnt1 = Point(27.0779, 51.5144)
pnt2 = Point(26.65, 51.57)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)
Returns: 48662.078723 meters
and
pnt1 = Point(31.9974, 38.7078)
pnt2 = Point(31.6661, 38.6635)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)
Returns: 37417.343796 meters
Then I was expecting to have this result :
>>> pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
subscriber_id perimeter_id distance custom_lat custom_lon geometry index_right Latitude Longitude
0 19664 1370 40.0 31.6661 38.6635 POLYGON ((2230301.324 3642618.584, 2230089.452... 1 31.9974 38.7078
1 91201 13858 50.0 26.6500 51.5700 POLYGON ((3684499.890 3347425.378, 3684235.050... 0 27.0779 51.5144
I think my buffer is at ~47KM and ~38KM instead of 50KM and 40KM as expected. Am I missing something here which could explain that empty result ?

Certain computations with geodataframe's methods that involves distances, namely, .distance(), .buffer() in this particular case, are based on Euclidean geometry and map projection coordinate systems. Their results are not reliable, to always get the correct results one should avoid using them and use direct computation with geographic coordinates instead. Doing so with proper module/library, you will get great-circle arc distances instead of errorneous euclidean distances. Thus avoid mysterious errors.
Here I present the runnable code that show how to proceed along the line that I proposed:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
import cartopy.crs as ccrs
import cartopy
import matplotlib.pyplot as plt
import numpy as np
from pyproj import Geod
# Create my two events to detect
df_strike = pd.DataFrame(
{ 'Latitude': [27.0779, 31.9974],
'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})
# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
{ 'perimeter_id': [1370, 13858],
'distance' : [40.0, 50.0],
'custom_lat': [31.6661, 26.6500],
'custom_lon': [38.6635, 51.5700]})
gdf_locations = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})
# Begin: My code----------------
def point_buffer(lon, lat, radius_m):
# Use this instead of `.buffer()` provided by geodataframe
# Adapted from:
# https://stackoverflow.com/questions/31492220/how-to-plot-a-tissot-with-cartopy-and-matplotlib
geod = Geod(ellps='WGS84')
num_vtxs = 64
lons, lats, _ = geod.fwd(np.repeat(lon, num_vtxs),
np.repeat(lat, num_vtxs),
np.linspace(360, 0, num_vtxs),
np.repeat(radius_m, num_vtxs),
radians=False
)
return Polygon(zip(lons, lats))
# Get location to create buffer
# Create buffer geometries from points' coordinates and distances using ...
# special function `point_buffer()` defined above
gdf_locations['geometry'] = gdf_locations.apply(lambda row : point_buffer(row.custom_lon, row.custom_lat, 1000*row.distance), axis=1)
# Convert CRS to Mercator (epsg:3395), it will match `ccrs.Mercator()`
# Do not use Web_Mercator (epsg:3857), it is crude approx of 3395
gdf_locations = gdf_locations.to_crs({'init':'epsg:3395'})
gdf_events = gdf_events.to_crs({'init':'epsg:3395'})
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
# Visualization
# Use cartopy for best result
fig = plt.figure(figsize=(9,8))
ax = fig.add_subplot(projection=ccrs.Mercator())
gdf_locations.plot(color="green", ax=ax, alpha=0.4)
gdf_events.plot(color="red", ax=ax, alpha=0.9, zorder=23)
ax.coastlines(lw=0.3, color="gray")
ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True)
# Other helpers
# Horiz/vert lines are plotted to mark the circles' centers
ax.hlines([31.6661,26.6500], 30, 60, transform=ccrs.PlateCarree(), lw=0.1)
ax.vlines([38.6635, 51.5700], 20, 35, transform=ccrs.PlateCarree(), lw=0.1)
ax.set_extent([35, 55, 25, 33], crs=ccrs.PlateCarree())
Spatial joining:
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
matching_entln[["perimeter_id", "distance", "index_right", "Latitude", "Longitude"]] #custom_lat custom_lon
Compute distances between points for checking
This checks the result of the spatial join if computed distances are less than the buffered distances.
# Use greatcircle arc length
geod = Geod(ellps='WGS84')
# centers of buffered-circles
from_lon1, from_lon2 = [38.6635, 51.5700]
from_lat1, from_lat2 = [31.6661, 26.6500]
# event locations
to_lon1, to_lon2= [51.5144, 38.7078]
to_lat1, to_lat2 = [27.0779, 31.9974]
_,_, dist_m = geod.inv(from_lon1, from_lat1, to_lon2, to_lat2, radians=False)
print(dist_m) #smaller than 40 km == inside
# Get: 36974.419811328786 m.
_,_, dist_m = geod.inv(from_lon2, from_lat2, to_lon1, to_lat1, radians=False)
print(dist_m) #smaller than 50 km == inside
# Get: 47732.76744655724 m.
My notes
Serious geographic computation should be done directly with geodetic computation without the use of map projection of any kind.
Map projection is used when you need graphic visualization. But correct geographic values that are computed/transformed to map projection CRS correctly are expected.
Computation with map projection (grid) coordinate beyond its allowable limits (and get bad results) is often happen with inexperienced users.
Computation involving map/grid position/values using euclidean geometry should be performed within small extent of projection areas that all kinds of map distortions is very low.

Intersection between 2 Geodataframe

I am doing some work on the Geopanda library, I have a shapefile with polygons and data on a excel sheet that I transform into points. I want to intersect the two DataFrames and export it to a file. I use also on both projections (WGS84) so that I can compare them.
There should be at least some points that intersects the polygons.
My intersect GeoSeries does not give me any points that fit into the polygon, but I don't see why...
I checked if the unit of the shapefile was really Kilometer and not somthing else. I am not proficient into GeoPlot so I can't really make sure what the GeoDataFrame look like.
f = pd.read_excel(io = 'C:\\Users\\peilj\\meteo_sites.xlsx')
#Converting panda dataframe into a GeoDataFrame with CRS projection
geometry = [Point(xy) for xy in zip(df.geoBreite, df.geoLaenge)]
df = df.drop(['geoBreite', 'geoLaenge'], axis=1)
crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
#Reading shapefile and creating buffer
gdfBuffer = geopandas.read_file(filename = 'C:\\Users\\peilj\\lkr_vallanUTM.shp')
gdfBuffer = gdfBuffer.buffer(100) #When the unit is kilometer
#Converting positions long/lat into shapely object
gdfBuffer = gdfBuffer.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
#Intersection coordonates from polygon Buffer and points of stations
gdf['intersection'] = gdf.geometry.intersects(gdfBuffer)
#Problem: DOES NOT FIND ANY POINTS INSIDE STATIONS !!!!!!!
#Giving CRS projection to the intersect GeoDataframe
gdf_final = gdf.to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
gdf_final['intersection'] = gdf_final['intersection'].astype(int) #Shapefile does not accept bool
#Exporting to a file
gdf_final.to_file(driver='ESRI Shapefile', filename=r'C:\\GIS\\dwd_stationen.shp
The files needed:
https://drive.google.com/open?id=11x55aNxPOdJVKDzRWLqrI3S_ExwbqCE9

two things:
You need to swap geoBreite and geoLaenge when creating the points to:
geometry = [Point(xy) for xy in zip(df.geoLaenge, df.geoBreite)]
This is because shapely follows the x, y logic, not lat, lon.
As for checking the intersection, you could do as follows:
gdf['inside'] = gdf['geometry'].apply(lambda shp: shp.intersects(gdfBuffer.dissolve('LAND').iloc[0]['geometry']))
which detects six stations inside the shape file:
gdf['inside'].sum()
ouputs:
6
So along with some other minor fixes we get:
import geopandas as gpd
from shapely.geometry import Point
df = pd.read_excel(r'C:\Users\peilj\meteo_sites.xlsx')
geometry = [Point(xy) for xy in zip(df.geoLaenge, df.geoBreite)]
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
gdfBuffer = gpd.read_file(filename = r'C:\Users\peilj\lkr_vallanUTM.shp')
gdfBuffer['goemetry'] = gdfBuffer['geometry'].buffer(100)
gdfBuffer = gdfBuffer.to_crs(crs)
gdf['inside'] = gdf['geometry'].apply(lambda shp: shp.intersects(gdfBuffer.dissolve('LAND').iloc[0]['geometry']))

using python to project lat lon geometry to utm

I have a dataframe with earthquake data called eq that has columns listing latitude and longitude. using geopandas I created a point column with the following:
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
s = GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
eq['geometry'] = s
eq.crs = {'init': 'epsg:4326', 'no_defs': True}
eq
Now I have a geometry column with lat lon coordinates but I want to change the projection to UTM. Can anyone help with the transformation?

Latitude/longitude aren't really a projection, but sort of a default "unprojection". See this page for more details, but it probably means your data uses WGS84 or epsg:4326.
Let's build a dataset and, before we do any reprojection, we'll define the crs as epsg:4326
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
df = pd.DataFrame({'id': [1, 2, 3], 'population' : [2, 3, 10], 'longitude': [-80.2, -80.11, -81.0], 'latitude': [11.1, 11.1345, 11.2]})
s = gpd.GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
geo_df = gpd.GeoDataFrame(df[['id', 'population']], geometry=s)
# Define crs for our geodataframe:
geo_df.crs = {'init': 'epsg:4326'}
I'm not sure what you mean by "UTM projection". From the wikipedia page I see there are 60 different UTM projections depending on the area of the world. You can find the appropriate epsg code online, but I'll just give you an example with a random epsgcode. This is the one for zone 33N for example
How do you do the reprojection? You can easily get this info from the geopandas docs on projection. It's just one line:
geo_df = geo_df.to_crs({'init': 'epsg:3395'})
and the geometry isn't coded as latitude/longitude anymore:
id population geometry
0 1 2 POINT (-8927823.161620541 1235228.11420853)
1 2 3 POINT (-8917804.407449147 1239116.84994171)
2 3 10 POINT (-9016878.754255159 1246501.097746004)

Given latitude/longitude say if the coordinate is within continental US or not

I want to check if a particular latitude/longitude is within continental US or not. I don't want to use Online APIs and I'm using Python.
I downloaded this shapefile
from shapely.geometry import MultiPoint, Point, Polygon
import shapefile
sf = shapefile.Reader("cb_2015_us_nation_20m")
shapes = sf.shapes()
fields = sf.fields
records = sf.records()
points = shapes[0].points
poly = Polygon(points)
lon = -112
lat = 48
point = Point(-112, 48)
poly.contains(point)
#should return True because it is in continental US but returns False
The sample lon, lat is within US boundary but poly.contains returns False.
I'm not sure what the problem is and how to solve the issue so that I can test if a point is within continental US.

I ended up checking if lat/lon was in every state instead of check in continental U.S., if a point is in one of the states, then it is in continental U.S..
from shapely.geometry import MultiPoint, Point, Polygon
import shapefile
#return a polygon for each state in a dictionary
def get_us_border_polygon():
sf = shapefile.Reader("./data/states/cb_2015_us_state_20m")
shapes = sf.shapes()
#shapes[i].points
fields = sf.fields
records = sf.records()
state_polygons = {}
for i, record in enumerate(records):
state = record[5]
points = shapes[i].points
poly = Polygon(points)
state_polygons[state] = poly
return state_polygons
#us border
state_polygons = get_us_border_polygon()
#check if in one of the states then True, else False
def in_us(lat, lon):
p = Point(lon, lat)
for state, poly in state_polygons.iteritems():
if poly.contains(p):
return state
return None

I ran your code and plotted the polygon. It looked like this:
If you had run it with this code:
import geopandas as gpd
import matplotlib.pyplot as plt
shapefile = gpd.read_file("path/to/shapes.shp")
shapefile.plot()
plt.show()
# credit to https://stackoverflow.com/a/59688817/1917407
you would have seen this:
So, 1, you're not looking at the CONUS, and 2, your plot is borked. Your code works though and will return True with the geopandas plot.

Plotting netCDF data with Python: How to change grid?

I'm an new one in python and plotting data with Matplotlib. I really need help and thank you in advance for the answers.
So, I have a netCDF file with v-component of wind data. Grid coordinates: points=9600 (240x40)
lon : 0 to 358.5 by 1.5 degrees_east circular
lat : 88.5 to 30 by -1.5 degrees_north
My code is:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from netCDF4 import Dataset
from matplotlib.mlab import griddata
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
#read data from NETcdf file ".nc"
my_file = '/home/Era-Interim/NH-EraInt-1979.nc'
fh = Dataset(my_file, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
V = fh.variables['V'][:]
V_units = fh.variables['V'].units
fh.close()
# create figure
fig = plt.figure(figsize=(20,20))
# create a map
m = Basemap(projection='nplaea',boundinglat=30,lon_0=10,resolution='l',round=True)
#draw parallels, meridians, coastlines, countries, mapboundary
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
#m.drawmapboundary(linewidth=2)
m.drawparallels(np.arange(30,90,20), labels=[1,1,0,0]) #paral in 10 degree, right, left
m.drawmeridians(np.arange(0,360,30), labels=[1,1,1,1]) #merid in 10 degree, bottom
#Plot the data on top of the map
lon,lat = np.meshgrid(lons,lats)
x,y = m(lon,lat)
cs = m.pcolor(x,y,np.squeeze(V),cmap=plt.cm.RdBu_r)
plt.title("", fontsize=25, verticalalignment='baseline')
plt.savefig("/home/Era-Interim/1.png")
As a result, I received a map (you can find in my dropbox folder) https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
On the map, there are white pixels between 358.5 and 0 (360) lon, because I have no data between 358.5 and 0 (360) lon.
The question is: how can I change the size of the grid, regrid it, interpolate data, or something else in order to not have this white sector?

I have found a solution. At the beginning of the script, you must add
from mpl_toolkits.basemap import Basemap, addcyclic
and further
datain, lonsin = addcyclic(np.squeeze(Q), lons)
lons, Q = m.shiftdata(lonsin, datain = np.squeeze(Q), lon_0=180.)
print lons
lon, lat = np.meshgrid(lons, lats)
x,y = m(lon, lat)
cs = m.pcolor(x,y,datain,cmap=plt.cm.RdBu_r)
The difference can be seen in the figures (I still can not post images).
https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0

I think in this case some kind of interpolation techniques can be applied.
Check this out. There was similar problem.
Hope it is useful.

The simple answer is 360 degrees is 0 degrees, so you can copy the 0 degrees data and it should look right. I may be interpreting this wrong though, as I believe that the data is representing the pressure levels at each of the points, not between the two points (i.e. at zero degrees, not between zero degrees and 1.5 degrees).
My interpretation means that, yes, you don't have data between 358.5 and 0, but you also don't have data between 357 and 358.5. This seems more likely than just skipping an area. This would mean that the data from 358.5 should be touching the data from 0 as it is just as far away as 0 is from 1.5 which is touching.
Copying the last bit would grant you the ability to change your m.pcolor call to an imshow call (as in Roman Dryndik's link) and use interpolation to smooth out the graph.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to check if shape areas contain points? - python

Related

Wrong points buffer using geopandas

Intersection between 2 Geodataframe

using python to project lat lon geometry to utm

Given latitude/longitude say if the coordinate is within continental US or not

Plotting netCDF data with Python: How to change grid?

Categories

Resources