raise ValueError when producing a shape file with geopandas - python

I have just recently started to work with shapefiles. I have a shapefile in which each object is a polygon. I want to produce a new shapefile in which the geometry of each polygon is replaced by its centroid. There is my code.
import geopandas as gp
from shapely.wkt import loads as load_wkt
fname = '../data_raw/bg501c_starazagora.shp'
outfile = 'try.shp'
shp = gp.GeoDataFrame.from_file(fname)
centroids = list()
index = list()
df = gp.GeoDataFrame()
for i,r in shp.iterrows():
index.append(i)
centroid = load_wkt(str(r['geometry'])).centroid.wkt
centroids.append(centroid)
df['geometry'] = centroids
df['INDEX'] = index
gp.GeoDataFrame.to_file(df,outfile)
When I run the script I end up with raise ValueError("Geometry column cannot contain mutiple " ValueError: Geometry column cannot contain mutiple geometry types when writing to file.
I cannot understand what is wrong. Any help?

The issue is that you're populating the geometry field with a string representation of the geometry rather than a shapely geometry object.
No need to convert to wkt. Your loop could instead be:
for i,r in shp.iterrows():
index.append(i)
centroid = r['geometry'].centroid
centroids.append(centroid)
However, there's no need to loop through the geodataframe at all. You could create a new one of shapefile centroids as follows:
df=gp.GeoDataFrame(data=shp, geometry=shp['geometry'].centroid)
df.to_file(outfile)

Related

Error when trying to make a GeoDataFrame of network nodes

I need to make a GeoDataFrame of some nodes on a road network (which was extracted from OpenStreetMap using OSMnx). In the code below, graph_proj is the graph whose nodes I'm working with, the points are start_point and end_point:
import osmnx as ox
import geopandas as gpd
nodes_proj, edges_proj = ox.graph_to_gdfs(graph_proj, nodes=True, edges=True)
# Finding the nodes on the graph nearest to the points
start_node = ox.nearest_nodes(graph_proj, start_point.geometry.x, start_point.geometry.y, return_dist=False)
end_node = ox.nearest_nodes(graph_proj, end_point.geometry.x, end_point.geometry.y, return_dist=False)
start_closest = nodes_proj.loc[start_node]
end_closest = nodes_proj.loc[end_node]
# Create a GeoDataBase from the start and end nodes
od_nodes = gpd.GeoDataFrame([start_closest, end_closest], geometry='geometry', crs=nodes_proj.crs)
During the last step ("# Create a GeoDataBase...", etc.), an error is thrown. Apparently, it has something to do with a 3-dimensional array being passed to the GeoDataFrame function. Am I right that the way I pass in the locations([start_closest, end_closest]) results in a 3D array? (The error message reads, 'Must pass 2-d input. shape=(2, 1, 7)') I tried transposing this array, but then GeoPandas could not locate the 'geometry' column. How do I go about passing in this argument in a way that it will be accepted?
OK, so I was able to get around this by writing each node to its own GeoDataFrame and then merging the two GeoDataFrames, like this:
od_nodes1 = gpd.GeoDataFrame(start_closest, geometry='geometry', crs=nodes_proj.crs)
od_nodes2 = gpd.GeoDataFrame(end_closest, geometry='geometry', crs=nodes_proj.crs)
od_nodes = od_nodes1.append(od_nodes2)
Surely, though, there must be a more elegant way of writing more than one feature into a GeoDataFrame?

Matching Geopandas Dissolve with ArcGIS Dissolve on set of Polylines

I am trying to replicate the output from ArcGIS Dissolve on a set of stream flow lines using geopandas. Essentially the df/stream_0 layer is a stream network extracted from a DEM using pysheds. That output has some randomly overlapping reaches which I am trying to remove. Running Dissolve through ArcGIS Pro does this well, but I would prefer not to have to deal with ArcGIS/ArcPy to resolve this.
Stream Network
ArcGIS Dissolve Setting
#streams_0.geojson = df.shp = streams_0.shp from Dissolve Setting image
#~~~~~~~~~~~~~~~~~~~~
import geopandas as gpd
df = gpd.read_file('streams_0.geojson')
df.head()
Out[3]:
geometry
0 LINESTRING (400017.781 3000019.250, 400017.781...
1 LINESTRING (400027.781 3000039.250, 400027.781...
2 LINESTRING (400027.781 3000039.250, 400037.781...
3 LINESTRING (400027.781 3000029.250, 400037.781...
4 LINESTRING (400047.781 3000079.250, 400047.781...
I have tried using gpd.dissolve() using a filler column with no luck.
df['dissolvefield'] = 1;
df2 = df.dissolve(by='dissolvefield')
df3 = gpd.geoseries.GeoSeries([geom for geom in df2.geometry.iloc[0].geoms])
Similarly tried to use unary_union in shapely with no luck.
import fiona
shape1 = fiona.open("df.shp")
first = shape1.next()
from shapely.geometry import shape
shp_geom = shape(first['geometry'])
from shapely.ops import unary_union
shape2 = unary_union(shp_geom)
Seems like an easy solution, wondering why I am running into so many issues. My GeoDataFrame only consists of the line geometry, so there is not necessarily another attribute I can aggregate based on. I am essentially just trying keep the geometry of the lines unchanged, but remove any overlapping features that may be there. I don't want to split the lines, and I don't want to aggregate them into multipart features.
i use the unary_union, but no need to read it as shapely feature.
after reading the file and put it in GPD (you can do it straight from the *.shp file):
df = gpd.read_file('streams_0.geojson')
try to plot it to see the if the output is correct
df.plot()
than use the unary_union like this, and plot again:
shape2 = df.unary_union
shape2
and the last step (if necessary), is to set as geopandas again:
# transform Geometry Collection to shapely multilinestirng
segments = [feature for feature in shape2]
# set back as geopandas
gdf = gpd.GeoDataFrame(list(range(len(segments))), geometry=segments,
crs=crs)
gdf .columns = ['index', 'geometry']

How to resolve tick problems in Geopandas overlay?

I am trying to overlay a polygon and lines in Geopandas, but I am getting tick plot problems.
ValueError: cannot convert float NaN to integer
import geopandas as gpd
from geopandas.tools import overlay
zip1 = "zip://data/mmcovidshp.zip"
mmcovid = gpd.read_file(zip1)
zip2 = "zip://data/roads_MM.zip"
mmroads = gpd.read_file(zip2)
overlay_intersection = overlay(mmcovid, mmroads,
how='intersection')
overlay_intersection.plot(figsize=(6, 8))
Data: https://drive.google.com/drive/folders/1Xxo1Ep6Dgau5ThmNetuqzehpSh9sgpfP?usp=sharing
It is not clear what are you trying to do.
overlay_intersection is empty because it tries to preserve the geometry type of the left GeoDataFrame. Because the left gdf are polygons and intersection of polygon and linestring is linestring, the result is empty. You can control that using keep_geom_type keyword. keep_geom_type=False returns everything.
The simple solution here is to change order.
overlay_intersection = overlay(mmroads, mmcovid
how='intersection')
That produces non-empty gdf. See more https://geopandas.readthedocs.io/en/latest/docs/user_guide/set_operations.html?highlight=overlay.
If you are trying to simply clip mmroads to mmcovid's shape, use geopandas.clip. https://geopandas.readthedocs.io/en/latest/gallery/plot_clip.html

Convert Column to Polygon in Python to perform Point in Polygon

I have written Code to establish Point in Polygon in Python, the program uses a shapefile that I read in as the Polygons.
I now have a dataframe I read in with a column containing the Polygon e.g [[28.050815,-26.242253],[28.050085,-26.25938],[28.011934,-26.25888],[28.020216,-26.230127],[28.049828,-26.230704],[28.050815,-26.242253]].
I want to transform this column into a polygon in order to perform Point in Polygon, but all the examples use geometry = [Point(xy) for xy in zip(dataPoints['Long'], dataPoints['Lat'])] but mine is already zip?
How would I go about achieving this?
Thanks
taking your example above you could do the following:
list_coords = [[28.050815,-26.242253],[28.050085,-26.25938],[28.011934,-26.25888],[28.020216,-26.230127],[28.049828,-26.230704],[28.050815,-26.242253]]
from shapely.geometry import Point, Polygon
# Create a list of point objects using list comprehension
point_list = [Point(x,y) for [x,y] in list_coords]
# Create a polygon object from the list of Point objects
polygon_feature = Polygon([[poly.x, poly.y] for poly in point_list])
And if you would like to apply it to a dataframe you could do the following:
import pandas as pd
import geopandas as gpd
df = pd.DataFrame({'coords': [list_coords]})
def get_polygon(list_coords):
point_list = [Point(x,y) for [x,y] in list_coords]
polygon_feature = Polygon([[poly.x, poly.y] for poly in point_list])
return polygon_feature
df['geom'] = df['coords'].apply(get_polygon)
However, there might be geopandas built-in functions in order to avoid "reinventing the wheel", so let's see if anyone else has a suggestion :)

Add Points to Geopandas Object

My objective is to create some kind of geojson object and add several Point's objects to it, with a For Loop.
What am I missing here?
from geojson import Feature
import pandas as pd
import geopandas as gpd
# Point((-115.81, 37.24))
# Create a Dataframe with **Schools Centroids**
myManipulationObj = pd.DataFrame
for schoolNumber in listOfResults:
myManipulationObj.append(centroids[schoolNumber])
# GDF should be a Beautiful collection (geoDataFrame) of Points
gdf = gpd.GeoDataFrame(myManipulationObj, geometry='Coordinates')
After that, I want to use geopandas write() to create a .geojson file.
Any Help?
(solved)
I solved that problem by:
creating a python list (listOfPoints),
Using the POINT object as geometry parameter to the FEATURE object,
Using the List of Features (with Points) to create a FeatureCollection
Leave here for future reference if someone needs :D
# Used to get the Index of Schools from the M Model Optimized
listOfResults = []
for e in range(numSchools):
tempObj = m.getVarByName(str(e))
# If This School is on the Results Optimized
if(tempObj.x != 0):
listOfResults.append(int(tempObj.varName))
# Select, from the List Of Results, A set of Centroid Points
listOfPoints = []
for schoolNumber in listOfResults:
# Attention to the Feature(geometry) from geopandas
listOfPoints.append(Feature(geometry=centroids[schoolNumber]))
# Creating a FeatureCollection with the Features (Points) manipulated above
resultCentroids = FeatureCollection(listOfPoints)

Categories

Resources