Group Polygons after splitting MultiPolygon with LineString - python

I wish to split a MultiPolygon (representing a country with islands) via a LineString, thereby splitting the county in two.
from shapely.ops import split
collection_of_polyogns = split(country,line)
This results is a set of Polygons in a GeometryCollection object. How would you group the result into two MultiPolygon objects, with each containing the Polygons for their respective half?
UPDATE
The question:
Determine the "left" and "right" side of a split shapely geometry offers a good solution where a point is taken from each Polygon in the result to see if it forms a clockwise or anti-clockwise Linestring when combined with the splitting LineString. But am thinking of using the centroid for each polygon instead since it is guaranteed to not be on the splitting line.

So first you have your countries MultiPolygon, and then your split polygons:
from shapely.ops import split
from shapely.geometry import Point, LineString, Polygon, MultiPolygon
#Somewhere you get your country and your line vars
collection_of_polygons = split(country,line)
Then you make a polygon of the original boundary of your countries MutliPolygon, and perform the same split with the line:
bounds = country.bounds
p1 = Point(bounds[0],bounds[-1])
p2 = Point(bounds[0],bounds[1])
p3 = Point(bounds[-2],bounds[1])
p4 = Point(bounds[-2],bounds[-1])
point_list = [p1, p2, p3, p4, p1]
bounds_poly = Polygon(point_list)
boundaries = shapely.ops.split(bounds_poly,line)
Make two lists, and iterate over the split country polygons to check if they are within the first polygon from the boundaries split polygon:
group1, group2 = [],[]
for x in collection_of_polygons:
if x.within(boundaries[0]):
group1.append(x)
else:
group2.append(x)
Finally assign the two lists two two different MultiPolygons:
group_multi1 = MultiPolygon(group1)
group_multi2 = MultiPolygon(group2)

Related

How to fill holes in Multi-polygons created when dissolving geodataframe with geopandas?

I'm aiming to plot the boundaries of clusters of MSOAs (contiguous geographical units in UK) to do so I've downloaded a shapefile of MSOA boundaries from here. I then add a column of cluster labels and dissolve using geopandas.
df.dissolve(by='label', aggfunc='sum')
When I use Folium to plot there are multiple inner holes as seen in the attached image. How do I remove these?
#creates map
m = folium.Map([54.5,-3],zoom_start=6.8,tiles='cartodbpositron')
#makes boundaries plot
Boundaries = folium.GeoJson(
df,
name='Boundaries',
style_function = lambda x: {
'color': 'black',
'weight': 3,
'fillOpacity': 0
}).add_to(m)
m
In case anyone encounters the same problem I found a website which you can upload, simplify and export shape files from called mapshaper this managed to simplify my boundaries to the required form.
This will hopefully help you to use just geopandas to organize your polygons. You can just overwrite the geometry using the functions below. Extra handling is used to preserve or reduce MultiPolygons. I would imagine that a very similar thing is happening with MapShaper, but this way you don't need to do the extra handling.
from shapely.geometry import MultiPolygon, Polygon
def remove_interiors(poly):
"""
Close polygon holes by limitation to the exterior ring.
Arguments
---------
poly: shapely.geometry.Polygon
Input shapely Polygon
Returns
---------
Polygon without any interior holes
"""
if poly.interiors:
return Polygon(list(poly.exterior.coords))
else:
return poly
def pop_largest(gs):
"""
Pop the largest polygon off of a GeoSeries
Arguments
---------
gs: geopandas.GeoSeries
Geoseries of Polygon or MultiPolygon objects
Returns
---------
Largest Polygon in a Geoseries
"""
geoms = [g.area for g in gs]
return geoms.pop(geoms.index(max(geoms)))
def close_holes(geom):
"""
Remove holes in a polygon geometry
Arguments
---------
gseries: geopandas.GeoSeries
Geoseries of Polygon or MultiPolygon objects
Returns
---------
Largest Polygon in a Geoseries
"""
if isinstance(geom, MultiPolygon):
ser = gpd.GeoSeries([remove_interiors(g) for g in geom])
big = pop_largest(ser)
outers = ser.loc[~ser.within(big)].tolist()
if outers:
return MultiPolygon([big] + outers)
return Polygon(big)
if isinstance(geom, Polygon):
return remove_interiors(geom)
df.geometry = df.geometry.apply(lambda p: close_holes(p))

Find shortest linestring between two points while avoiding n-sided polygon

I am trying to find the shortest linestring between two points. There is a constraint that there is an n-sided polygon possibly directly between the 2 points. I am not allowed to cross through the polygon but only pass through its edges.
eg.
start = (2,0)
end = (0,1)
poly = [(1,0),(1,1),(1,2),(2,1)]
passing it through the function would output 2.41
so far I have
from shapely.geometry import LineString, Polygon, Point
def shortest_linestring(start, end, poly):
poly = Polygon(poly)
p1 = Point(start)
p2 = Point(end)
but I am completely stumped as to what to do next. Any hint would be appreciated.

Python: using polygons to create a mask on a given 2d grid

I have some polygons (Canadian provinces), read in with GeoPandas, and want to use these to create a mask to apply to gridded data on a 2-d latitude-longitude grid (read from a netcdf file using iris). An end goal would be to only have data for a given province remaining, with the rest of the data masked out. So the mask would be 1's for grid boxes within the province, and 0's or NaN's for grid boxes outside the province.
The polygons can be obtained from the shapefile here:
https://www.dropbox.com/s/o5elu01fetwnobx/CAN_adm1.shp?dl=0
The netcdf file I am using can be downloaded here:
https://www.dropbox.com/s/kxb2v2rq17m7lp7/t2m.20090815.nc?dl=0
I imagine there are two approaches here but I am struggling with both:
1) Use the polygon to create a mask on the latitude-longitude grid so that this can be applied to lots of datafiles outside of python (preferred)
2) Use the polygon to mask the data that have been read in and extract only the data inside the province of interest, to work with interactively.
My code so far:
import iris
import geopandas as gpd
#read the shapefile and extract the polygon for a single province
#(province names stored as variable 'NAME_1')
Canada=gpd.read_file('CAN_adm1.shp')
BritishColumbia=Canada[Canada['NAME_1'] == 'British Columbia']
#get the latitude-longitude grid from netcdf file
cubelist=iris.load('t2m.20090815.nc')
cube=cubelist[0]
lats=cube.coord('latitude').points
lons=cube.coord('longitude').points
#create 2d grid from lats and lons (may not be necessary?)
[lon2d,lat2d]=np.meshgrid(lons,lats)
#HELP!
Thanks very much for any help or advice.
UPDATE: Following the great solution from #DPeterK below, my original data can be masked, giving the following:
It looks like you have started well! Geometries loaded from shapefiles expose various geospatial comparison methods, and in this case you need the contains method. You can use this to test each point in your cube's horizontal grid for being contained within your British Columbia geometry. (Note that this is not a fast operation!) You can use this comparison to build up a 2D mask array, which could be applied to your cube's data or used in other ways.
I've written a Python function to do the above – it takes a cube and a geometry and produces a mask for the (specified) horizontal coordinates of the cube, and applies the mask to the cube's data. The function is below:
def geom_to_masked_cube(cube, geometry, x_coord, y_coord,
mask_excludes=False):
"""
Convert a shapefile geometry into a mask for a cube's data.
Args:
* cube:
The cube to mask.
* geometry:
A geometry from a shapefile to define a mask.
* x_coord: (str or coord)
A reference to a coord describing the cube's x-axis.
* y_coord: (str or coord)
A reference to a coord describing the cube's y-axis.
Kwargs:
* mask_excludes: (bool, default False)
If False, the mask will exclude the area of the geometry from the
cube's data. If True, the mask will include *only* the area of the
geometry in the cube's data.
.. note::
This function does *not* preserve lazy cube data.
"""
# Get horizontal coords for masking purposes.
lats = cube.coord(y_coord).points
lons = cube.coord(x_coord).points
lon2d, lat2d = np.meshgrid(lons,lats)
# Reshape to 1D for easier iteration.
lon2 = lon2d.reshape(-1)
lat2 = lat2d.reshape(-1)
mask = []
# Iterate through all horizontal points in cube, and
# check for containment within the specified geometry.
for lat, lon in zip(lat2, lon2):
this_point = gpd.geoseries.Point(lon, lat)
res = geometry.contains(this_point)
mask.append(res.values[0])
mask = np.array(mask).reshape(lon2d.shape)
if mask_excludes:
# Invert the mask if we want to include the geometry's area.
mask = ~mask
# Make sure the mask is the same shape as the cube.
dim_map = (cube.coord_dims(y_coord)[0],
cube.coord_dims(x_coord)[0])
cube_mask = iris.util.broadcast_to_shape(mask, cube.shape, dim_map)
# Apply the mask to the cube's data.
data = cube.data
masked_data = np.ma.masked_array(data, cube_mask)
cube.data = masked_data
return cube
If you just need the 2D mask you could return that before the above function applies it to the cube.
To use this function in your original code, add the following at the end of your code:
geometry = BritishColumbia.geometry
masked_cube = geom_to_masked_cube(cube, geometry,
'longitude', 'latitude',
mask_excludes=True)
If this doesn't mask anything it might well mean that your cube and geometry are defined on different extents. That is, your cube's longitude coordinate runs from 0°–360°, and if the geometry's longitude values run from -180°–180°, then the containment test will never return True. You can fix this by changing the extents of your cube with the following:
cube = cube.intersection(longitude=(-180, 180))
I found an alternative solution to the excellent one posted by #DPeterK above, which yields the same result. It uses matplotlib.path to test if points are contained within the exterior coordinates described by the geometries loaded from a shape file. I am posting this because this method is ~10 times faster than that given by #DPeterK (2:23 minutes vs 25:56 minutes). I'm not sure what is preferable: an elegant solution, or a speedy, brute force solution. Perhaps one can have both?!
One complication with this method is that some geometries are MultiPolygons - i.e. the shape consists of several smaller polygons (in this case, the province of British Columbia includes islands off of the west coast, which can't be described by the coordinates of the mainland British Columbia Polygon). The MultiPolygon has no exterior coordinates but the individual polygons do, so these each need to be treated individually. I found that the neatest solution to this was to use a function copied from GitHub (https://gist.github.com/mhweber/cf36bb4e09df9deee5eb54dc6be74d26), which 'explodes' MultiPolygons into a list of individual polygons that can then be treated separately.
The working code is outlined below, with my documentation. Apologies that it is not the most elegant code - I am relatively new to Python and I'm sure there are lots of unnecessary loops/neater ways to do things!
import numpy as np
import iris
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.path as mpltPath
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon
#-----
#FIRST, read in the target data and latitude-longitude grid from netcdf file
cubelist=iris.load('t2m.20090815.minus180_180.nc')
cube=cubelist[0]
lats=cube.coord('latitude').points
lons=cube.coord('longitude').points
#create 2d grid from lats and lons
[lon2d,lat2d]=np.meshgrid(lons,lats)
#create a list of coordinates of all points within grid
points=[]
for latit in range(0,241):
for lonit in range(0,480):
point=(lon2d[latit,lonit],lat2d[latit,lonit])
points.append(point)
#turn into np array for later
points=np.array(points)
#get the cube data - useful for later
fld=np.squeeze(cube.data)
#create a mask array of zeros, same shape as fld, to be modified by
#the code below
mask=np.zeros_like(fld)
#NOW, read the shapefile and extract the polygon for a single province
#(province names stored as variable 'NAME_1')
Canada=gpd.read_file('/Users/ianashpole/Computing/getting_province_outlines/CAN_adm_shp/CAN_adm1.shp')
BritishColumbia=Canada[Canada['NAME_1'] == 'British Columbia']
#BritishColumbia.geometry.type reveals this to be a 'MultiPolygon'
#i.e. several (in this case, thousands...) if individual polygons.
#I ultimately want to get the exterior coordinates of the BritishColumbia
#polygon, but a MultiPolygon is a list of polygons and therefore has no
#exterior coordinates. There are probably many ways to progress from here,
#but the method I have stumbled upon is to 'explode' the multipolygon into
#it's individual polygons and treat each individually. The function below
#to 'explode' the MultiPolygon was found here:
#https://gist.github.com/mhweber/cf36bb4e09df9deee5eb54dc6be74d26
#---define function to explode MultiPolygons
def explode_polygon(indata):
indf = indata
outdf = gpd.GeoDataFrame(columns=indf.columns)
for idx, row in indf.iterrows():
if type(row.geometry) == Polygon:
#note: now redundant, but function originally worked on
#a shapefile which could have combinations of individual polygons
#and MultiPolygons
outdf = outdf.append(row,ignore_index=True)
if type(row.geometry) == MultiPolygon:
multdf = gpd.GeoDataFrame(columns=indf.columns)
recs = len(row.geometry)
multdf = multdf.append([row]*recs,ignore_index=True)
for geom in range(recs):
multdf.loc[geom,'geometry'] = row.geometry[geom]
outdf = outdf.append(multdf,ignore_index=True)
return outdf
#-------
#Explode the BritishColumbia MultiPolygon into its constituents
EBritishColumbia=explode_polygon(BritishColumbia)
#Loop over each individual polygon and get external coordinates
for index,row in EBritishColumbia.iterrows():
print 'working on polygon', index
mypolygon=[]
for pt in list(row['geometry'].exterior.coords):
print index,', ',pt
mypolygon.append(pt)
#See if any of the original grid points read from the netcdf file earlier
#lie within the exterior coordinates of this polygon
#pth.contains_points returns a boolean array (true/false), in the
#shape of 'points'
path=mpltPath.Path(mypolygon)
inside=path.contains_points(points)
#find the results in the array that were inside the polygon ('True')
#and set them to missing. First, must reshape the result of the search
#('points') so that it matches the mask & original data
#reshape the result to the main grid array
inside=np.array(inside).reshape(lon2d.shape)
i=np.where(inside == True)
mask[i]=1
print 'fininshed checking for points inside all polygons'
#mask now contains 0's for points that are not within British Columbia, and
#1's for points that are. FINALLY, use this to mask the original data
#(stored as 'fld')
i=np.where(mask == 0)
fld[i]=np.nan
#Done.

Finding Intersections Region Based Trajectories vs. Line Trajectories

I have two trajectories (i.e. two lists of points) and I am trying to find the intersection points for both these trajectories. However, if I represent these trajectories as lines, I might miss real world intersections (just misses).
What I would like to do is to represent the line as a polygon with certain width around the points and then find where the two polygons intersect with each other.
I am using the python spatial library but I was wondering if anyone has done this before. Here is a picture of the line segments which don't intersect because they just miss each other. Below is the sample data code that represents the trajectory of two objects.
object_trajectory=np.array([[-3370.00427248, 3701.46800775],
[-3363.69164715, 3702.21408203],
[-3356.31277271, 3703.06477984],
[-3347.25951787, 3704.10740164],
[-3336.739511 , 3705.3958357 ],
[-3326.29355823, 3706.78035903],
[-3313.4987339 , 3708.2076586 ],
[-3299.53433345, 3709.72507366],
[-3283.15486406, 3711.47077376],
[-3269.23487255, 3713.05635557]])
target_trajectory=np.array([[-3384.99966703, 3696.41922372],
[-3382.43687562, 3696.6739521 ],
[-3378.22995178, 3697.08802862],
[-3371.98983789, 3697.71490469],
[-3363.5900481 , 3698.62666805],
[-3354.28520354, 3699.67613798],
[-3342.18581931, 3701.04853915],
[-3328.51519511, 3702.57528111],
[-3312.09691577, 3704.41961271],
[-3297.85543763, 3706.00878621]])
plt.plot(object_trajectory[:,0],object_trajectory[:,1],'b',color='b')
plt.plot(vehicle_trajectory[:,0],vehicle_trajectory[:,1],'b',color='r')
Let's say you have two lines defined by numpy arrays x1, y1, x2, and y2.
import numpy as np
You can create an array distances[i, j] containing the distances between the ith point in the first line and the jth point in the second line.
distances = ((x1[:, None] - x2[None, :])**2 + (y1[:, None] - y2[None, :])**2)**0.5
Then you can find indices where distances is less than some threshold you want to define for intersection. If you're thinking of the lines as having some thickness, the threshold would be half of that thickness.
threshold = 0.1
intersections = np.argwhere(distances < threshold)
intersections is now a N by 2 array containing all point pairs that are considered to be "intersecting" (the [i, 0] is the index from the first line, and [i, 1] is the index from the second line). If you want to get the set of all the indices from each line that are intersecting, you can use something like
first_intersection_indices = np.asarray(sorted(set(intersections[:, 0])))
second_intersection_indices = np.asarray(sorted(set(intersections[:, 1])))
From here, you can also determine how many intersections there are by taking only the center value for any consecutive values in each list.
L1 = []
current_intersection = []
for i in range(first_intersection_indices.shape[0]):
if len(current_intersection) == 0:
current_intersection.append(first_intersection_indices[i])
elif first_intersection_indices[i] == current_intersection[-1]:
current_intersection.append(first_intersection_indices[i])
else:
L1.append(int(np.median(current_intersection)))
current_intersection = [first_intersection_indices[i]]
print(len(L1))
You can use these to print the coordinates of each intersection.
for i in L1:
print(x1[i], y1[i])
Turns out that the shapely package already has a ton of convinience functions that get me very far with this.
from shapely.geometry import Point, LineString, MultiPoint
# I assume that self.line is of type LineString (i.e. a line trajectory)
region_polygon = self.line.buffer(self.lane_width)
# line.buffer essentially generates a nice interpolated bounding polygon around the trajectory.
# Now we can identify all the other points in the other trajectory that intersects with the region_polygon that we just generated. You can also use .intersection if you want to simply generate two polygon trajectories and find the intersecting polygon as well.
is_in_region = [region_polygon.intersects(point) for point in points]

Create new shapely polygon by subtracting the intersection with another polygon

I have two shapely MultiPolygon instances (made of lon,lat points) that intersect at various parts. I'm trying to loop through, determine if there's an intersection between two polygons, and then create a new polygon that excludes that intersection. From the attached image, I basically don't want the red circle to overlap with the yellow contour, I want the edge to be exactly where the yellow contour starts.
I've tried following the instructions here but it doesn't change my output at all, plus I don't want to merge them into one cascading union. I'm not getting any error messages, but when I add these MultiPolygons to a KML file (just raw text manipulation in python, no fancy program) they're still showing up as circles without any modifications.
# multipol1 and multipol2 are my shapely MultiPolygons
from shapely.ops import cascaded_union
from itertools import combinations
from shapely.geometry import Polygon,MultiPolygon
outmulti = []
for pol in multipoly1:
for pol2 in multipoly2:
if pol.intersects(pol2)==True:
# If they intersect, create a new polygon that is
# essentially pol minus the intersection
intersection = pol.intersection(pol2)
nonoverlap = pol.difference(intersection)
outmulti.append(nonoverlap)
else:
# Otherwise, just keep the initial polygon as it is.
outmulti.append(pol)
finalpol = MultiPolygon(outmulti)
I guess you can use the symmetric_difference between theses two polygons, combined by the difference with the second polygon to achieve what you want to do (the symmetric difference will brings you the non-overlapping parts from the two polygons, on which are removed parts of the polygon 2 by the difference). I haven't tested but it might look like :
# multipol1 and multipol2 are my shapely MultiPolygons
from shapely.ops import cascaded_union
from itertools import combinations
from shapely.geometry import Polygon,MultiPolygon
outmulti = []
for pol in multipoly1:
for pol2 in multipoly2:
if pol.intersects(pol2)==True:
# If they intersect, create a new polygon that is
# essentially pol minus the intersection
nonoverlap = (pol.symmetric_difference(pol2)).difference(pol2)
outmulti.append(nonoverlap)
else:
# Otherwise, just keep the initial polygon as it is.
outmulti.append(pol)
finalpol = MultiPolygon(outmulti)

Categories

Resources