How to see city map when ploting with Geopandas lib - python

I have just started learinig Geopandas lib in Python.
I have a dataset with Lat(E) and Lon(N) of car accidents in Belgrade.
I want to plot those dots on the map of Belgrade.
This is my code:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 150)
pd.set_option('display.max_columns', 200)
pd.set_option('display.width', 5000)
# reading csv into geopandas
geo_df = gpd.read_file('SaobracajBeograd.csv')
geo_df.columns = ["ID", "Date,Time", "E", "N", "Outcome", "Type", "Description", "geometry"]
geo_df.geometry = gpd.points_from_xy(geo_df.E, geo_df.N)
#print(geo_df)
# reading built in dataset for each city
world_cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
# I want to plot geometry column only for Belgrade
ax = world_cities[world_cities.name == 'Belgrade'].plot(figsize=(7, 7), alpha=0.5, edgecolor='black')
geo_df.plot(ax=ax, color='red')
plt.show()
This is the result that I get:
How can I prettify this plot, so that I can see the map of the city ( with streets if possible, in color) and with smaller red dots?

as per comments, folium provides base map of overall geometry
have added two layers
Belgrade, I have obtained this geometry from osmnx this is beyond the scope of this question so have just included the polygon as a WKT string
the points that you provided via link in comments
from pathlib import Path
import pandas as pd
import geopandas as gpd
import shapely
import folium
# downloaded data
df = pd.read_csv(
Path.home().joinpath("Downloads/SaobracajBeograd.csv"),
names=["ID", "Date,Time", "E", "N", "Outcome", "Type", "Description"],
)
# create geodataframe, NB CRS
geo_df = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df["E"], df["N"]), crs="epsg:4386"
)
# couldn't find belgrade geometry, used osmnx and simplified geometry as a WKT string
belgrade_poly = shapely.wkt.loads(
"POLYGON ((20.2213764 44.9154621, 20.2252450 44.9070062, 20.2399466 44.9067193, 20.2525385 44.8939145, 20.2419942 44.8842235, 20.2610016 44.8826597, 20.2794675 44.8754192, 20.2858284 44.8447802, 20.2856918 44.8332410, 20.3257447 44.8342507, 20.3328068 44.8098272, 20.3367239 44.8080890, 20.3339619 44.8058144, 20.3353253 44.8011005, 20.3336310 44.8003791, 20.3360230 44.7898245, 20.3384687 44.7907875, 20.3405086 44.7859144, 20.3417344 44.7872272, 20.3474466 44.7713203, 20.3509860 44.7687822, 20.3398029 44.7558716, 20.3220093 44.7448572, 20.3160895 44.7387338, 20.3235092 44.7345531, 20.3359605 44.7308053, 20.3437350 44.7301552, 20.3450306 44.7243651, 20.3497410 44.7209764, 20.3521450 44.7143627, 20.3633795 44.7046060, 20.3830709 44.7030441, 20.3845248 44.7011631, 20.3847991 44.7032182, 20.3924066 44.7036702, 20.4038881 44.6984458, 20.4097684 44.6992834, 20.4129839 44.7024603, 20.4192098 44.7021308, 20.4217436 44.7034920, 20.4251744 44.6976337, 20.4279418 44.6980838, 20.4313251 44.6940680, 20.4358368 44.6933579, 20.4402665 44.6905161, 20.4452138 44.6910160, 20.4495428 44.6880459, 20.4539572 44.6888231, 20.4529809 44.6911331, 20.4550753 44.6919188, 20.4534174 44.6929137, 20.4571253 44.6957696, 20.4570013 44.7008391, 20.4614601 44.7027894, 20.4646634 44.7018970, 20.4674388 44.7050131, 20.4753542 44.7039532, 20.4760757 44.7050260, 20.4802055 44.7033479, 20.4867635 44.7061539, 20.4983359 44.7022445, 20.5049892 44.7021663, 20.5071809 44.7071295, 20.5027682 44.7154832, 20.5028502 44.7217294, 20.5001912 44.7225288, 20.5007294 44.7251513, 20.5093727 44.7271542, 20.5316662 44.7248060, 20.5385861 44.7270519, 20.5390058 44.7329843, 20.5483761 44.7280993, 20.5513810 44.7308508, 20.5510751 44.7340860, 20.5483958 44.7345580, 20.5503614 44.7352316, 20.5509440 44.7434333, 20.5416617 44.7521169, 20.5358563 44.7553171, 20.5348919 44.7609694, 20.5393015 44.7624855, 20.5449353 44.7698750, 20.5490005 44.7708792, 20.5488362 44.7733456, 20.5647717 44.7649237, 20.5711431 44.7707818, 20.5772388 44.7711074, 20.5798915 44.7727751, 20.5852472 44.7808647, 20.5817268 44.7826053, 20.5823183 44.7845765, 20.5792147 44.7843299, 20.5777701 44.7872565, 20.5744279 44.7854098, 20.5740215 44.7886805, 20.5693220 44.7911579, 20.5655386 44.7906451, 20.5635444 44.7921747, 20.5598333 44.7901679, 20.5536143 44.7898282, 20.5502434 44.7909478, 20.5435002 44.8022967, 20.5424780 44.8073064, 20.5474459 44.8103678, 20.5530335 44.8102412, 20.5652728 44.8188428, 20.5738545 44.8279189, 20.5724006 44.8315147, 20.5776931 44.8371416, 20.5765153 44.8378971, 20.5863097 44.8427122, 20.5826128 44.8462544, 20.5762290 44.8486489, 20.5825139 44.8520894, 20.5953933 44.8552493, 20.6206689 44.8543410, 20.6212821 44.8560293, 20.6173687 44.8574761, 20.5961883 44.8615803, 20.5928447 44.8609861, 20.5911876 44.8626994, 20.6019440 44.8670619, 20.6196285 44.8673213, 20.6232109 44.8693710, 20.6164092 44.8815202, 20.6152606 44.8895682, 20.5777643 44.8860527, 20.5311826 44.8712209, 20.5230234 44.8646244, 20.5226088 44.8685278, 20.5187616 44.8654899, 20.5197414 44.8694015, 20.5132944 44.8687179, 20.5076686 44.8735038, 20.5065584 44.8670548, 20.4991594 44.8719635, 20.4938631 44.8734651, 20.4821047 44.8723679, 20.4737899 44.8677144, 20.4661802 44.8592493, 20.4594505 44.8560945, 20.4600397 44.8546034, 20.4650988 44.8535738, 20.4600110 44.8491680, 20.4623204 44.8477906, 20.4603705 44.8445375, 20.4711373 44.8342913, 20.4706338 44.8317839, 20.4498025 44.8343946, 20.4244846 44.8431449, 20.4138827 44.8526577, 20.3912248 44.8598333, 20.3749815 44.8683583, 20.3617778 44.8791076, 20.3436922 44.9103973, 20.3390650 44.9117584, 20.3011288 44.9426876, 20.2946156 44.9402419, 20.2960052 44.9381397, 20.2746476 44.9304194, 20.2703905 44.9345682, 20.2213764 44.9154621))"
)
# plot belgrade city limits
m = gpd.GeoDataFrame(geometry=[belgrade_poly], crs="epsg:4326").explore(name="Belgrade", height=300, width=500)
# plot the points, just for demo purposes plot outcomes as different colors
m = geo_df.explore(m=m, column="Outcome", cmap=["red","green","blue"], name="points")
# add layer control so layers can be switched on / off
folium.LayerControl().add_to(m)
m
supplementary update
Obtain Belgrade geometry
import osmnx as ox
gdf = ox.geocode_to_gdf({'city': 'Belgrade'})

It seems that you are not able to get a map of city using the dataset from world_cities.
For example, if you check
belgrade = world_cities[world_cities.name == 'Belgrade']
it returns following geopandas dataframe
name geometry
102 Belgrade POINT (20.46604 44.82059)
The geometry is in the form of Point which is basically longitude and latitude. The geometry must include polygon of some sort to get the shape of the city you want.
For example, if you extract the country from the world dataset as follows:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
serbia = world[world.name == "Serbia"]
it returns the following geopandas dataframe for serbia
pop_est continent name iso_a3 gdp_md_est geometry
172 7111024 Europe Serbia SRB 101800.0 POLYGON ((18.82982 45.90887, 18.82984 45.90888...
As you see, the geometry is in the shape of polygon. Now you can plot the map using serbia.plot() to get the map of serbia, which looks as follows:
To get the map of the city, you need to first download the shape file of the city in the form of *.shp file along with other supporting files, and read the *.shp file as gpd.read_file("file.shp"). Only then you are able to plot the map of the required city.

Related

Draw polygons around a set of points and create clusters in python

I have a Pandas DataFrame containing Lat, Long coordinates. How do I draw non-overlapping polygons around a cluster of points and aggregate the geometries in a geopandas DataFrame. Below is sample code to work with:
import pandas as pd
import numpy as np
import geopandas as gpd
df = pd.DataFrame({
'yr': [2018, 2017, 2018, 2016],
'id': [0, 1, 2, 3],
'v': [10, 12, 8, 10],
'lat': [32.7418248, 32.8340583, 32.8340583, 32.7471895],
'lon':[-97.524066, -97.0805484, -97.0805484, -96.9400779]
})
df = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Long'], df['Lat']))
# set crs for buffer calculations
df.set_crs("ESRI:102003", inplace=True)
The Polygons can be of any shape, however, must include a minimum of 5 points. I tried creating a buffer around the points but circle is not the ideal solution. I am looking for a way to draw a more flexible polygon.
This polygon representation will be added as a new column to the pandas dataframe containing the points.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.buffer.html
your question and sample data make no sense! You say you want clusters of 5 points or more and only provide 4 points. Leaving person who answers this question mandated to find some data. Better practice is to generate a MWE of what you've tried which can possibly become solution you want. Have used UK hospitals to get some data with lat / lon
from your other scatter gun questions, it's clear you have tried using geohash as a solution. Let's explore this
get geohash for each point geolib.geohash.encode()
aggregate points in same geohash by using dissolve() This will give a MULTIPOINT geometry. Convert this to POLYGON using convex_hull
now have polygons that do not overlap and have clusters of points. It doesn't ensure that a cluster has a minimum of 5 points
import requests, io
import pandas as pd
import numpy as np
import geopandas as gpd
import geolib.geohash
import folium
# get some data that meets sample with enough data
df = (
pd.read_csv(
io.StringIO(requests.get("https://assets.nhs.uk/data/foi/Hospital.csv").text),
sep="Č",
engine="python",
)
.rename(columns={"Latitude": "lat", "Longitude": "lon"})
.loc[:, ["lat", "lon"]]
).dropna()
df["id"] = df.index
df["yr"] = np.random.choice(range(2016, 2019), len(df))
df["v"] = np.random.randint(0, 11, len(df))
# get geohash so points in same area can be clustered
df["geohash"] = df.apply(lambda r: geolib.geohash.encode(r["lon"], r["lat"], 3), axis=1)
# construct geodataframe
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df["lon"], df["lat"]), crs="epsg:4386"
)
# cluster points to polygons
gdf2 = gdf.dissolve(by="geohash", aggfunc={"v": "sum", "id":"count", "yr":"mean"})
gdf2["geometry"] = gdf2["geometry"].convex_hull
# let's visualise everything
m = gdf2.explore(color="green", name="cluster", height=300, width=600)
m = gdf.explore(column="geohash", m=m, name="popints")
folium.LayerControl().add_to(m)
m
Use Geopandas convex hull.
The convex hull of a geometry is the smallest convex Polygon containing all the points in each geometry.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.convex_hull.html

How to join point with polygon in geopandas

I have the polygon combination of lat-long1,lat2-long2 ..... and point like Lat - Long .
I have used GeoPandas library to get the result if there is any point is exist within polygon.
Sample Data of Polygon saved in csv file:
POLYGON((28.56056 77.36535,28.564635293716776
77.3675137204626,28.56871055311656 77.36967760850214,28.572785778190855 77.3718416641586,28.576860968931193 77.37400588747194,28.580936125329096 77.3761702784821,28.585011247376094 77.37833483722912,28.58908633506372 77.38049956375293,28.593161388383457 77.38266445809356,28.59723640732686 77.38482952029099,28.60131139188541 77.38699475038526,28.605386342050664 77.38916014841635,28.60946125781409 77.39132571442434,28.613536139167238 77.39349144844923,28.61761098610158 77.39565735053108,28.62168579860863 77.39782342070995,28.62576057667991 77.39998965902589,28.62983532030691 77.402156065519,28.633910029481108 77.40432264022931,28.637984704194054 77.40648938319696,28.642059344437207 77.408656294462,28.64068221074683 77.41187044231611,28.63920739580329 77.41502778244606,28.63763670052024 77.41812446187686,28.635972042808007 77.42115670220443,28.634215455216115 77.42412080422613,28.63236908243526 77.42701315247152,28.630435178662026 77.42983021962735,28.628416104829583 77.43256857085188,28.626314325707924 77.43522486797251,28.624132406877322 77.437795873562,28.621873011578572 77.44027845488824,28.619538897444272 77.4426695877325,28.617132913115164 77.44496636007166,28.614657994745563 77.44716597562005,28.612117162402576 77.44926575722634,28.609513516363293 77.45126315012166,28.606850233314923 77.45315572501488,28.604130562462267 77.45494118103147,28.60135782154758 77.45661734849246,28.598535392787774 77.45818219153013,28.595666718733966 77.45963381053753,28.592755298058414 77.46097044444889,28.589804681274302 77.46219047284835,28.586818466393503 77.46329241790465,28.583800294527727 77.46427494612952,28.58075384543836 77.46513686995802,28.57768283304089 77.46587714914885,28.574591000868892 77.4664948920035,28.571482117503592 77.46698935640259,28.568359971974488 77.46735995065883,28.565228369136484 77.46760623418534,28.56209112502966 77.4677279179792,28.558952062226695 77.4677248649196,28.55581500517431 77.46759708988064,28.552683775533943 77.46734475965891,28.552683775533943 77.46734475965891,28.553079397193876 77.4622453846313,28.553474828308865 77.45714597129259,28.55387006887434 77.4520465196603,28.554265118885752 77.44694702975198,28.554659978338513 77.4418475015852,28.555054647228083 77.43674793517746,28.555449125549913 77.43164833054634,28.555843413299442 77.42654868770937,28.55623751047213 77.42144900668411,28.556631417063407 77.41634928748812,28.55702513306874 77.41124953013893,28.55741865848359 77.40614973465412,28.557811993303396 77.40104990105122,28.55820513752363 77.39595002934782,28.558598091139757 77.39085011956145,28.558990854147225 77.38575017170969,28.559383426541523 77.3806501858101,28.559775808318093 77.37555016188024,28.560167999472434 77.37045009993768,28.56056 77.36535))
and second dataset is for LAT and LONG as 28.56282, 77.36824 respectively saved in csv file .
I have used below Python code to join both data set based on condition if point exist within polygon. like below
import pandas as pd
import shapely.geometry
from shapely.geometry import Point
import geopandas as gpd
site_df = pd.read_csv (r'lat_long_file.csv') # load lat and long file
site_df['geometry'] = pd.DataFrame(site_df).apply(lambda x: Point(x.LAT,x.LONG), axis='columns') # convert lat and long to point
gdf = gpd.GeoDataFrame(site_df, geometry = site_df.geometry,crs='EPSG:4326') #creating geo pandas data frame for point
from shapely import wkt
polygon_df = pd.read_csv (r'polygon_csv_file') #reading polygon sample raw string file
polygon_df['geometry'] = pd.DataFrame(polygon_df).apply(lambda row: shapely.wkt.loads(row.polygon), axis='columns') #converting string polygon to geometory
gd_polygon = gpd.GeoDataFrame(polygon_df, geometry = polygon_df.geometry,crs='EPSG:4326') #create geopandas dataframe
import shapely.speedups
shapely.speedups.enable() # this makes some spatial queries run faster
join_data = gpd.sjoin(gdf, gd_polygon, how="inner", op="within") //actual join condition
But that query does not retun anything . But point is exist within polygon. as we can see in below diagram
Green Location marker is point Lat and long which is exist within polygon.
I would check the axis order - WKT usually interpreted as longitude first, latitude second order, while the point you construct uses latitude:longitude order.
You can try removing the CRS identifier to see if it changes the result.
Also see
https://gis.stackexchange.com/questions/376751/shapely-flips-lat-long-coordinate
and
https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
your sample data is unusable as it's an image
have sourced a polygon - a county boundary in UK
constructed a geopandas data frame of a point that is within this county
have used plotly to demonstrate visually the data
have used your code fragment gpd.sjoin(gdf, gd_polygon, how="inner", op="within") to do spatial join and it correctly joins point to polygon
import requests, json
import geopandas as gpd
import plotly.express as px
import shapely.geometry
# fmt: off
# get a polygon and construct a point
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
gd_polygon = gpd.GeoDataFrame.from_features(res.json()).loc[lambda d: d["LAD20NM"].str.contains("Hereford")]
gdf = gpd.GeoDataFrame(geometry=gd_polygon.loc[:,["LONG","LAT"]].apply(shapely.geometry.Point, axis=1)).reset_index(drop=True)
# fmt: on
# plot to show point is within polygon
px.scatter_mapbox(gd_polygon, lon="LONG", lat="LAT").update_traces(
name="gd_polygon"
).add_traces(
px.scatter_mapbox(gdf, lat=gdf2.geometry.y, lon=gdf2.geometry.x)
.update_traces(name="gdf", marker_color="red")
.data
).update_traces(
showlegend=True
).update_layout(
mapbox={
"style": "carto-positron",
"layers": [
{"source": json.loads(gd_polygon.geometry.to_json()), "type": "line"}
],
}
).show()
# spatial join, all good :-)
gpd.sjoin(gdf, gd_polygon, how="inner", op="within")
output
spatial join has worked, point is within polygon
geometry
index_right
OBJECTID
LAD20CD
LAD20NM
LAD20NMW
BNG_E
BNG_N
LONG
LAT
Shape__Area
Shape__Length
0
POINT (-2.73931 52.081539)
18
19
E06000019
Herefordshire, County of
349434
242834
-2.73931
52.0815
2.18054e+09
285427

I am trying visualise railway track from shape shapes in python

I am trying to visualise railway tracks using Plotly lines on a map. I am a beginner who is working on gis data and visualise railway networks. i am not sure about the approach i am going about and i don't want to use arcgis or qgis.
Problem: I am not sure what should be added in the names attribute.
it would great i could get an overview of the code where i am going wrong
zipfile contains the .shp,.dbf.shx,.prj,.cpj
The error i am getting is 'GeoDataFrame' object has no attribute 'name'
this the code which i am working on.
import plotly.express as px
import geopandas as gpd
import shapely.geometry
import numpy as np
import wget
geo_df = gpd.read_file(r'railwaytrack.zip' )
lattiudes = [47.04691]
longitudes = [8.37467]
names=[ ]
for feature, line_geo in zip(geo_df.geometry, geo_df.name):
if isinstance(feature, shapely.geometry.linestring.LineString):
linestrings = [feature]
elif isinstance(feature, shapely.geometry.multilinestring.MultiLineString):
linestrings = feature.geoms
else:
continue
for linestring in linestrings:
x, y = linestring.xy
lattiudes = np.append(lattiudes, y)
longitudes = np.append(longitudes, x)
names = np.append(names, [name]*len(y))
lattiudes = np.append(lattiudes, None)
longitudes = np.append(longitudes, None)
names = np.append(names, None)
fig = px.line_geo(lat=lattiudes, lon=longitudes)
fig.show()

Check if the point within a polygon (speed up)

I have 2 data frames. 1) data - long and lat points 2) border = shapefile of a city
I need to check which points are within the shapefile and save them. Here is my code to do that:
Data
city = pd.read_csv("D:...path.../data.csv")
crs = {'init':'epsg:4326'}
geometry = [Point(xy) for xy in zip(city.longitude,city.latitude)]
city_point = gpd.GeoDataFrame(city,crs=crs,geometry=geometry)
Border
border = gpd.read_file("C:...path.../border.shp")
border_gdf = gpd.GeoDataFrame(border, geometry='geometry')
Final check
city_point['inside'] = city_point['geometry'].apply(border_gdf.contains)
city_point = city_point[city_point.inside != True]
Libraries
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
city_point[city_point.geometry.within(border_gdf.iloc[0].geometry)]

using python to project lat lon geometry to utm

I have a dataframe with earthquake data called eq that has columns listing latitude and longitude. using geopandas I created a point column with the following:
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
s = GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
eq['geometry'] = s
eq.crs = {'init': 'epsg:4326', 'no_defs': True}
eq
Now I have a geometry column with lat lon coordinates but I want to change the projection to UTM. Can anyone help with the transformation?
Latitude/longitude aren't really a projection, but sort of a default "unprojection". See this page for more details, but it probably means your data uses WGS84 or epsg:4326.
Let's build a dataset and, before we do any reprojection, we'll define the crs as epsg:4326
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
df = pd.DataFrame({'id': [1, 2, 3], 'population' : [2, 3, 10], 'longitude': [-80.2, -80.11, -81.0], 'latitude': [11.1, 11.1345, 11.2]})
s = gpd.GeoSeries([Point(x,y) for x, y in zip(df['longitude'], df['latitude'])])
geo_df = gpd.GeoDataFrame(df[['id', 'population']], geometry=s)
# Define crs for our geodataframe:
geo_df.crs = {'init': 'epsg:4326'}
I'm not sure what you mean by "UTM projection". From the wikipedia page I see there are 60 different UTM projections depending on the area of the world. You can find the appropriate epsg code online, but I'll just give you an example with a random epsgcode. This is the one for zone 33N for example
How do you do the reprojection? You can easily get this info from the geopandas docs on projection. It's just one line:
geo_df = geo_df.to_crs({'init': 'epsg:3395'})
and the geometry isn't coded as latitude/longitude anymore:
id population geometry
0 1 2 POINT (-8927823.161620541 1235228.11420853)
1 2 3 POINT (-8917804.407449147 1239116.84994171)
2 3 10 POINT (-9016878.754255159 1246501.097746004)

Categories

Resources