I am trying to replicate the output from ArcGIS Dissolve on a set of stream flow lines using geopandas. Essentially the df/stream_0 layer is a stream network extracted from a DEM using pysheds. That output has some randomly overlapping reaches which I am trying to remove. Running Dissolve through ArcGIS Pro does this well, but I would prefer not to have to deal with ArcGIS/ArcPy to resolve this.
Stream Network
ArcGIS Dissolve Setting
#streams_0.geojson = df.shp = streams_0.shp from Dissolve Setting image
#~~~~~~~~~~~~~~~~~~~~
import geopandas as gpd
df = gpd.read_file('streams_0.geojson')
df.head()
Out[3]:
geometry
0 LINESTRING (400017.781 3000019.250, 400017.781...
1 LINESTRING (400027.781 3000039.250, 400027.781...
2 LINESTRING (400027.781 3000039.250, 400037.781...
3 LINESTRING (400027.781 3000029.250, 400037.781...
4 LINESTRING (400047.781 3000079.250, 400047.781...
I have tried using gpd.dissolve() using a filler column with no luck.
df['dissolvefield'] = 1;
df2 = df.dissolve(by='dissolvefield')
df3 = gpd.geoseries.GeoSeries([geom for geom in df2.geometry.iloc[0].geoms])
Similarly tried to use unary_union in shapely with no luck.
import fiona
shape1 = fiona.open("df.shp")
first = shape1.next()
from shapely.geometry import shape
shp_geom = shape(first['geometry'])
from shapely.ops import unary_union
shape2 = unary_union(shp_geom)
Seems like an easy solution, wondering why I am running into so many issues. My GeoDataFrame only consists of the line geometry, so there is not necessarily another attribute I can aggregate based on. I am essentially just trying keep the geometry of the lines unchanged, but remove any overlapping features that may be there. I don't want to split the lines, and I don't want to aggregate them into multipart features.
i use the unary_union, but no need to read it as shapely feature.
after reading the file and put it in GPD (you can do it straight from the *.shp file):
df = gpd.read_file('streams_0.geojson')
try to plot it to see the if the output is correct
df.plot()
than use the unary_union like this, and plot again:
shape2 = df.unary_union
shape2
and the last step (if necessary), is to set as geopandas again:
# transform Geometry Collection to shapely multilinestirng
segments = [feature for feature in shape2]
# set back as geopandas
gdf = gpd.GeoDataFrame(list(range(len(segments))), geometry=segments,
crs=crs)
gdf .columns = ['index', 'geometry']
I am trying to overlay a polygon and lines in Geopandas, but I am getting tick plot problems.
ValueError: cannot convert float NaN to integer
import geopandas as gpd
from geopandas.tools import overlay
zip1 = "zip://data/mmcovidshp.zip"
mmcovid = gpd.read_file(zip1)
zip2 = "zip://data/roads_MM.zip"
mmroads = gpd.read_file(zip2)
overlay_intersection = overlay(mmcovid, mmroads,
how='intersection')
overlay_intersection.plot(figsize=(6, 8))
Data: https://drive.google.com/drive/folders/1Xxo1Ep6Dgau5ThmNetuqzehpSh9sgpfP?usp=sharing
It is not clear what are you trying to do.
overlay_intersection is empty because it tries to preserve the geometry type of the left GeoDataFrame. Because the left gdf are polygons and intersection of polygon and linestring is linestring, the result is empty. You can control that using keep_geom_type keyword. keep_geom_type=False returns everything.
The simple solution here is to change order.
overlay_intersection = overlay(mmroads, mmcovid
how='intersection')
That produces non-empty gdf. See more https://geopandas.readthedocs.io/en/latest/docs/user_guide/set_operations.html?highlight=overlay.
If you are trying to simply clip mmroads to mmcovid's shape, use geopandas.clip. https://geopandas.readthedocs.io/en/latest/gallery/plot_clip.html
I have the Polygon data from the States from the USA from the website
arcgis
and I also have an excel file with coordinates of citys. I have converted the coordinates to geometry data (Points).
Now I want to test if the Points are in the USA.
Both are dtype: geometry. I thought with this I can easily compare, but when I use my code I get for every Point the answer false. Even if there are Points that are in the USA.
The code is:
import geopandas as gp
import pandas as pd
import xlsxwriter
import xlrd
from shapely.geometry import Point, Polygon
df1 = pd.read_excel('PATH')
gdf = gp.GeoDataFrame(df1, geometry= gp.points_from_xy(df1.longitude, df1.latitude))
US = gp.read_file('PATH')
print(gdf['geometry'].contains(US['geometry']))
Does anybody know what I do wrong?
contains in GeoPandas currently work on a pairwise basis 1-to-1, not 1-to-many. For this purpose, use sjoin.
points_within = gp.sjoin(gdf, US, op='within')
That will return only those points within the US. Alternatively, you can filter polygons which contain points.
polygons_contains = gp.sjoin(US, gdf, op='contains')
I have written Code to establish Point in Polygon in Python, the program uses a shapefile that I read in as the Polygons.
I now have a dataframe I read in with a column containing the Polygon e.g [[28.050815,-26.242253],[28.050085,-26.25938],[28.011934,-26.25888],[28.020216,-26.230127],[28.049828,-26.230704],[28.050815,-26.242253]].
I want to transform this column into a polygon in order to perform Point in Polygon, but all the examples use geometry = [Point(xy) for xy in zip(dataPoints['Long'], dataPoints['Lat'])] but mine is already zip?
How would I go about achieving this?
Thanks
taking your example above you could do the following:
list_coords = [[28.050815,-26.242253],[28.050085,-26.25938],[28.011934,-26.25888],[28.020216,-26.230127],[28.049828,-26.230704],[28.050815,-26.242253]]
from shapely.geometry import Point, Polygon
# Create a list of point objects using list comprehension
point_list = [Point(x,y) for [x,y] in list_coords]
# Create a polygon object from the list of Point objects
polygon_feature = Polygon([[poly.x, poly.y] for poly in point_list])
And if you would like to apply it to a dataframe you could do the following:
import pandas as pd
import geopandas as gpd
df = pd.DataFrame({'coords': [list_coords]})
def get_polygon(list_coords):
point_list = [Point(x,y) for [x,y] in list_coords]
polygon_feature = Polygon([[poly.x, poly.y] for poly in point_list])
return polygon_feature
df['geom'] = df['coords'].apply(get_polygon)
However, there might be geopandas built-in functions in order to avoid "reinventing the wheel", so let's see if anyone else has a suggestion :)
I have a list of customers lat and long and I want to define which ones are within a given polygon.
But the results I got are none of them in that polygon and it is not correct.
Could you please help? Thanks!
from shapely.geometry import Polygon
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
df=pd.read_csv("C:\\Users\\n.nguyen.2\\Documents\\order from May 1.csv")
geometry=[Point(xy) for xy in zip(df['customer_lat'],df['customer_lng'])]
crs={'init':'epsg:4326'}
gdf=gpd.GeoDataFrame(df,crs=crs,geometry=geometry)
gdf.head()
polygon= Polygon ([(103.85362669999994, 1.4090082), (103.8477709, 1.4051988), (103.84821190000002, 1.4029509), (103.84933950000004, 1.4012179), (103.85182859999998, 1.4001453), (103.85393150000004, 1.3986867), (103.85745050000001, 1.3962412), (103.85809410000002, 1.3925516), (103.85843750000004, 1.3901491), (103.8583946, 1.3870601), (103.8585663, 1.3838853), (103.8582659, 1.3812682), (103.85822299999997, 1.3792946), (103.85843750000004, 1.3777931), (103.85882370000002, 1.3748757), (103.86015410000005, 1.3719582), (103.8607978, 1.3700276), (103.86092659999998, 1.368097), (103.86036880000006, 1.3657372), (103.8593174, 1.3633562), (103.85852339999995, 1.3607605), (103.85745050000001, 1.3581005), (103.8571071, 1.355655), (103.85736459999998, 1.3520941), (103.85873790000007, 1.3483615), (103.86187100000006, 1.3456583), (103.86488409999993, 1.340689), (103.87096889999998, 1.3378933), (103.87519599999996, 1.3373354), (103.88178349999998, 1.3408963), (103.88508790000004, 1.3433418), (103.89186870000005, 1.3436426), (103.89742610000008, 1.342355), (103.91813279999997, 1.3805388), (103.91824964404806, 1.3813377489306), (103.91433759243228, 1.38607494841128), (103.91607279999994, 1.3895484), (103.91942029999996, 1.3940104), (103.92903330000001, 1.4009604), (103.9342689, 1.402076), (103.93289559999994, 1.4075675), (103.92534249999994, 1.4146035), (103.92517090000003, 1.4211246), (103.90972139999997, 1.4238704), (103.89942169999993, 1.4202666), (103.89744760000008, 1.4224117), (103.89315599999998, 1.425758), (103.88740540000003, 1.4285896), (103.88148309999995, 1.4328798), (103.87478829999998, 1.4331372), (103.85918850000007, 1.4249644), (103.85401679999995, 1.4114284), (103.85362669999994, 1.4090082)])
gdf['answer']=gdf['geometry'].within(polygon)
writer = pd.ExcelWriter("C:\\Users\\n.nguyen.2\\Documents\\order may define1.xlsx")
gdf.to_excel(writer, 'Sheet1', index=False)
writer.save()
The results are all false.
Raw data:
Result:
Adding my comments as an answer for future reference.
You have switched longitude and latitude in the order of coordinates. Look at coordinates of your polygon and those of points. Coordinates of points you have generated are (Lat, Lon), while your polygon (Lon, Lat). So these points are not within this polygon. Do
geometry=[Point(xy) for xy in zip(df['customer_lng'],df['customer_lat'])]
instead and it will work.
To make your life easier, geopandas has helper function for creating points from polygons points_from_xy() (http://geopandas.org/gallery/create_geopandas_from_pandas.html?highlight=points_from_xy)