I am working on Kaggle Global Terrorism Database (https://www.kaggle.com/START-UMD/gtd/download) and I am trying to use geopandas for visualization.
I am also using countries dataset (http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/)
import seaborn as sns
import geopandas as gpd
import matplotlib.pyplot as plt
sns.set(style = "ticks", context = "poster")
from shapely.geometry import Point
countries = gpd.read_file("C:/Users/petr7/Desktop/ne_110m_admin_0_countries/")
countries = countries[(countries['NAME'] != "Antarctica")]
countries.plot(figsize = (15, 15))
using code above I can easily plot entire Europe,
after that I import kaggle terrorist dataset and define it as geopandas dataframe
DF = pd.read_csv("C:/Users/petr7/Desktop/gtd/globalterrorismdb_0718dist.csv", encoding='latin1')
crs = {"init": "epsg:4326"}
geometry = [Point(xy) for xy in zip ( DF["longitude"], DF["latitude"])]
geo_DF = gpd.GeoDataFrame(DF, geometry = geometry)
geo_DF.head()
Until this point everything is working and dataset can be inspect
NOW when I try to plot it it return nonsense plot:
geo_DF.plot()
I am prety new to geopandas so I wanted to ask what I am missing and also how would you plot entire europe map (countries.plot) and above that terrorist attacks?
PICTURE HERE
There is an error in the data. DF["longitude"].min() gives -86185896.0.
DF.loc[DF["longitude"] == DF["longitude"].min()]
As you can see if you run the snippet above, row with the error is 17658.
It seems to be missing comma. If you do
DF.at[17658, 'longitude'] = -86.185896
before generating geometry, it will work. Or you can drop the row if you are not sure what is exactly wrong with the data.
Related
there seems to be an issue with my code. My goal is to plot a map that represents an outcome (population) accross the regions of Benin.
import pandas as pd
import matplotlib as mpl
database_path = "datafinalproject.csv"
database = pd.read_csv(database_path)
#Creating a geodataframe
points = gpd.points_from_xy(database["longitude"], database["latitude"], crs="EPSG:4326")
map = gpd.GeoDataFrame (database, geometry=points)
I get this message when I type map.plot and I when I type map.plot(column='population'), I get an empty map.
Can you help me solve this problem?
database.head() gives :
map.plot() will work in a Jupyter notebook but not in a normal Python environment.
You should import matplotlib.pyplot and add plt.show() at the end of your code:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
database_path = "datafinalproject.csv"
database = pd.read_csv(database_path)
#Creating a geodataframe
points = gpd.points_from_xy(database["longitude"], database["latitude"], crs="EPSG:4326")
map = gpd.GeoDataFrame (database, geometry=points)
map.plot()
plt.show()
I'm trying to create a map of the following GeoJSON: https://github.com/nychealth/coronavirus-data/blob/master/Geography-resources/UHF_resources/UHF42.geo.json
I load it with GeoPandas and can plot it fine with matplotlib:
But when I try to plot it with Altair I get a blue square:
I don't know why it's not working. I've tried plotting other GeoJSONs with Altair and they work fine. I have also checked the geodataframe's crs and it's WGS 84, which is the recommended one for Altair.
Here's my code:
import pandas as pd
import geopandas as gpd
gdf = gpd.read_file('https://raw.githubusercontent.com/nychealth/coronavirus-data/master/Geography-resources/UHF_resources/UHF42.geo.json')
print(gdf.crs)
# Matplotlib plot
gdf.plot()
# Altair plot
alt.Chart(gdf).mark_geoshape()
I'm new to working with maps in Altair, but here's a great answer: from a URL, you need to use alt.Data(url,format) to convert it to data.
Edit:
Since you want to use geopandas to make use of it, I used data from the same github to visualize the 7 days data, since the current geopandas doesn't have data to graph. and associated it with 'id'.
import pandas as pd
import geopandas as gpd
import altair as alt
gdf = gpd.read_file('https://raw.githubusercontent.com/nychealth/coronavirus-data/master/Geography-resources/UHF_resources/UHF42.geo.json')
#print(gdf.crs)
data_url = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/latest/now-transmission-by-uhf42.csv'
df =pd.read_csv(data_url)
df.columns = ['id', 'neighborhood_name', 'case_rate_7day']
url_geojson = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/Geography-resources/UHF_resources/UHF42.geo.json'
data_geojson_remote = alt.Data(url=url_geojson, format=alt.DataFormat(property='features',type='json'))
alt.Chart(data_geojson_remote).mark_geoshape().encode(
color="case_rate_7day:Q"
).transform_lookup(
lookup='id',
from_=alt.LookupData(df, 'id', ['case_rate_7day'])
).project(
type='identity', reflectY=True
)
I've been trying to map the route of US domestic flights over a map of the United States through Geopandas. I can map my shapefile of flights without any problem, but when I try to add another layer showing the United States underneath, the resulting plot is the size of Geopandas' world map, but only shows the US and flights geometries.
import matplotlib as mp
import geopandas as gp
import numpy as np
import pandas as pd
import pyproj as pj
import matplotlib.pyplot as plt
import descartes
flights = gp.read_file(r"C:\Users\corne\git\flight\flights.shp")
usmap = gp.read_file(r"C:\Users\corne\git\flight\Igismap\UnitedStates_Boundary.shp")
usmap = usmap.to_crs({'init': 'epsg:4326'})
fig, ax = plt.subplots(figsize = (12,6))
flights.plot(ax=ax)
usmap.plot(ax = ax)
Here is the resulting map
US-only data being plotted on World map
I cant see any base map in your piece of code.
In javascript I use .shp .dbf .prj and Compiled shape all 4 files together. You would find all 4 files in the compressed form when you downloaded .shp from igismap.
I'm trying to plot a large number of latitude longitude values from a CSV file on a map, having this format (first column and second column):
I'm using python 3.6 (apparently some libraries like Basemap doesn't operate on this version).
How can I do that?
If you are just looking at plotting the point data as a scatterplot, is as simple as
import matplotlib.pyplot as plt
plt.scatter(x=df['Longitude'], y=df['Latitude'])
plt.show()
If you want to plot the points on the map, it's getting interesting because it depends more on how you plot your map.
A simple way is to use shapely and geopandas. The code below is not tested given my limited access on the laptop I am currently using, but it should give you a conceptual roadmap.
import pandas as pd
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame
df = pd.read_csv("Long_Lats.csv", delimiter=',', skiprows=0, low_memory=False)
geometry = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = GeoDataFrame(df, geometry=geometry)
#this is a simple map that goes with geopandas
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);
Find below an example of the rendered image:
You can also use plotly express to plot the interactive worldmap for latitude and longitude
import plotly.express as px
import pandas as pd
df = pd.read_csv("location_coordinate.csv")
fig = px.scatter_geo(df,lat='lat',lon='long', hover_name="id")
fig.update_layout(title = 'World map', title_x=0.5)
fig.show()
Here's an example of adding Lat & Long to a real OpenStreet map:
import plotly.express as px
import pandas as pd
df = pd.read_csv("dataset/dataset.csv")
df.dropna(
axis=0,
how='any',
thresh=None,
subset=None,
inplace=True
)
color_scale = [(0, 'orange'), (1,'red')]
fig = px.scatter_mapbox(df,
lat="Lat",
lon="Long",
hover_name="Address",
hover_data=["Address", "Listed"],
color="Listed",
color_continuous_scale=color_scale,
size="Listed",
zoom=8,
height=800,
width=800)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
Example CSV:
Address, Lat, Long, Listed
Address #1, -33.941, 18.467, 1250000
Address #2, -33.942, 18.468, 1900000
Address #3, -33.941, 18.467, 1200000
Address #4, -33.936, 18.467, 1195000
Address #5, -33.944, 18.470, 2400000
Example output (interactive map):
I'm stuck already for a couple of hours on a task.
I have an excel-file with all cities with 300000+ habitants + coordinates. I have to plot them on a global map. For this I have the following code:
from IPython import get_ipython
get_ipython().magic('reset -sf')
get_ipython().magic('matplotlib')
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Series, DataFrame
plt.close('all')
#%%
dirname=('C:\\Users\\Guido\\Documents\\Geologie\\Programmeren\\Scripts van mij\\Deftig\\')
filename='WUP2014-F12-Cities_Over_300K.xls'
xlsfile = pd.ExcelFile(dirname + filename)
drframe = xlsfile.parse("DATA", skiprows = 16)
urbpop = DataFrame(drframe)
lat = urbpop["Latitude"]
lon = urbpop["Longitude"]
m = Basemap(projection='robin',lon_0=0,resolution='c')
m.drawcoastlines()
m.drawcountries()
lons,lats = m(list(lon), list(lat))
m.scatter(lons, lats, s = 1.3, color ='blue')
The excel-file looks like this
The output figure looks like this
Now I have to give the points on each continent another color (so for instance South-America orange, Europe blue...).
Also I have to label each point with its amount of inhabitants.
Any ideas?
You should somehow provide continent information to excel file as a different column or you can create your own file, say continent.dat in json format or something.
{ 'south-america': ['Argentina', ,'Brazil', ...],
'north-america': ['Canada', 'United States', ...],
...
}
With this information, you can regroup your population dataframe data. Once you have done that then, you can follow your process for each continent by assigning different color.
Pandas groupby can be a good method if you want to add continent information to the excel file.
http://pandas.pydata.org/pandas-docs/stable/groupby.html