I am trying to create a choropleth map using basemap and pandas, to plot the level of prescription rates across CCGs (NHS Clinical Commissioning Groups). I am downloading the shapefile from http://geoportal.statistics.gov.uk/datasets/1bc1e6a77cdd4b3a9a0458b64af1ade4_1 which provides the CCG area boundaries.. However the initial problem I am encountering is to do with the reading of the shapefile.
The following error is arising:
raise IOError('cannot locate %s.shp'%shapefile)
This is my code so far...
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
from matplotlib.colors import Normalize
fig, ax = plt.subplots(figsize=(10,20))
m = Basemap(resolution='c', # c, l, i, h, f or None
projection='merc',
lat_0=54.5, lon_0=-4.36,
llcrnrlon=-6., llcrnrlat= 49.5, urcrnrlon=2., urcrnrlat=55.2)
m.drawmapboundary(fill_color='#46bcec')
m.fillcontinents(color='#f2f2f2',lake_color='#46bcec')
m.drawcoastlines()
m.readshapefile('/Volumes/Clinical_Commissioning_Groups_April_2016_Full_Extent_Boundaries_in_England', 'areas', drawbounds =True)
m.areas
df_poly = pd.DataFrame({'shapes': [Polygon(np.array(shape), True) for shape in m.areas],'area': [area['ccg16cd'] for area in m.areas_info]})
rates=pd.read_csv('Volumes/TOSHIBA EXT/Basemap rates.csv', delimiter=",", usecols=[0,6])
rates.columns = ['ccg16cd','MEAN YEARLY PRESCRIPTION RATE']
frame = df_poly.merge(rates, on='ccg16cd', how='left')
cmap = plt.get_cmap('Oranges')
pc = PatchCollection(df_poly.shapes, zorder=2)
norm = Normalize()
pc.set_facecolor(cmap(norm(df_poly['count'].fillna(0).values)))
ax.add_collection(pc)
mapper = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)
mapper.set_array(df_poly['count'])
plt.colorbar(mapper, shrink=0.4)
m
Would appreciate any pointers as to how I can achieve this choropleth map - starting with what is going wrong in reading the shapefile.
Try using geopandas to read in the shapefile:
import geopandas as gp
shape_file = gp.read_file('FileName.shp')
Also, check that the path to the shapefile is correct.
Related
there seems to be an issue with my code. My goal is to plot a map that represents an outcome (population) accross the regions of Benin.
import pandas as pd
import matplotlib as mpl
database_path = "datafinalproject.csv"
database = pd.read_csv(database_path)
#Creating a geodataframe
points = gpd.points_from_xy(database["longitude"], database["latitude"], crs="EPSG:4326")
map = gpd.GeoDataFrame (database, geometry=points)
I get this message when I type map.plot and I when I type map.plot(column='population'), I get an empty map.
Can you help me solve this problem?
database.head() gives :
map.plot() will work in a Jupyter notebook but not in a normal Python environment.
You should import matplotlib.pyplot and add plt.show() at the end of your code:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
database_path = "datafinalproject.csv"
database = pd.read_csv(database_path)
#Creating a geodataframe
points = gpd.points_from_xy(database["longitude"], database["latitude"], crs="EPSG:4326")
map = gpd.GeoDataFrame (database, geometry=points)
map.plot()
plt.show()
I am trying to connect lines based on a specific relationship associated with the points. In this example the lines would connect the players by which court they played in. I can create the basic structure but haven't figured out a reasonably simple way to create this added feature.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_dict={'court':[1,1,2,2,3,3,4,4],
'player':['Bob','Ian','Bob','Ian','Bob','Ian','Ian','Bob'],
'score':[6,8,12,15,8,16,11,13],
'win':['no','yes','no','yes','no','yes','no','yes']}
df=pd.DataFrame.from_dict(df_dict)
ax = sns.boxplot(x='score',y='player',data=df)
ax = sns.swarmplot(x='score',y='player',hue='win',data=df,s=10,palette=['red','green'])
plt.show()
This code generates the following plot minus the gray lines that I am after.
You can use lineplot here:
sns.lineplot(
data=df, x="score", y="player", units="court",
color=".7", estimator=None
)
The player name is converted to an integer as a flag, which is used as the value of the y-axis, and a loop process is applied to each position on the court to draw a line.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_dict={'court':[1,1,2,2,3,3,4,4],
'player':['Bob','Ian','Bob','Ian','Bob','Ian','Ian','Bob'],
'score':[6,8,12,15,8,16,11,13],
'win':['no','yes','no','yes','no','yes','no','yes']}
df=pd.DataFrame.from_dict(df_dict)
ax = sns.boxplot(x='score',y='player',data=df)
ax = sns.swarmplot(x='score',y='player',hue='win',data=df,s=10,palette=['red','green'])
df['flg'] = df['player'].apply(lambda x: 0 if x == 'Bob' else 1)
for i in df.court.unique():
dfq = df.query('court == #i').reset_index()
ax.plot(dfq['score'], dfq['flg'], 'g-')
plt.show()
i want to plot x and y from a csv file in a geopandas graph but only the graph axis that shows up
import fiona
import matplotlib.pyplot as plt
from mpl_toolkits.axisartist.axislines import Subplot
import pandas as pd
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
gpd.io.file.fiona.drvsupport.supported_drivers["KML"] = "rw"
dfN = pd.read_csv ("nodes.txt",delimiter ="\\s+")
dfN.to_csv ("nodes.csv", index=None)
df = gpd.read_file("data.kml", driver="KML")
df=df.to_crs(epsg=32733)
gdf = gpd.GeoDataFrame(dfN ,geometry=gpd.points_from_xy(dfN.X, dfN.Y))
dg=df.translate(433050,299)
fig,ax = plt.subplots()
ax.set_aspect('equal')
ax.scatter(gdf.X, gdf.Y , zorder=1, alpha= 1, c='r', s=10)
dg.plot(ax=ax,zorder=0,color='white', edgecolor='black',aspect= 'equal')
plt.show()
this is not a MWE so have sourced data from publicly available and have applied same transformations...
plotting code can simplified, then it works. using plot() on geopandas which includes POINT objects will produce a scatter
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import requests, io
# data sourcing generated two geopandas data frames, let's replace to make MWE
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
df=df.to_crs(epsg=32733)
dg = df.loc[df["geometry"].is_valid *df["iso_a3"].eq("GBR")].translate(433050,299)
dfN = pd.read_csv(io.StringIO(requests.get("https://assets.nhs.uk/data/foi/Hospital.csv").text),
sep="Č",engine="python",).loc[:,["OrganisationName","Latitude","Longitude"]].rename(columns={"Latitude":"Y","Longitude":"X"})
gdf = gpd.GeoDataFrame(dfN ,geometry=gpd.points_from_xy(dfN.X, dfN.Y))
gdf = gdf.set_crs("EPSG:4326").to_crs(epsg=32733)
# plotting code is simplified as:
ax = dg.plot(zorder=0,color='white', edgecolor='black',aspect= 'equal')
gdf.plot(ax=ax, zorder=1, alpha= 1, c='r', markersize=10)
output
clearly within the defined CRS, plus one set of geometry has been transformed
Im using seaborn to plot an heatmap over an image, the data is a matrix 41x41 on a excel file and the image is 890px by 890px, each value in the matrix contains a value for pollutant concentration, and the image is a map from google earth, but im getting this result. The image is too big for the graph and I dont know how to fit the two together because the plot is always 41px by 41px, how can i do this?
here is the code:
import scipy.misc as sci
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
excel_file = "2002marçoSO2vFINAL1.xls"
xldata = pd.read_excel(excel_file,
sheet_name = "Python")
heatmap_data = xldata
sns.heatmap(heatmap_data, cmap="gist_stern", alpha = 0.2)
img = sci.imread("50x50.png")
plt.imshow(img)
plt.show()```
I'm using Pandas and am very new to programming. I'm plotting Energy Deposited (eDep) as a function of its x,y and z positions. So far, was successful in getting it to plot, but it won't let me plot the colormap beside my scatter plot! Any help is much appreciated
%matplotlib inline
import pandas as pd
import numpy as np
IncubatorBelow = "./Analysis.Test.csv"
df = pd.read_csv(IncubatorBelow, sep = ',', names['Name','TrackID','ParentID','xPos','yPos','zPos','eDep','DeltaE','Einit','EventID'],low_memory=False,error_bad_lines=False)
df["xPos"] = df["xPos"].str.replace("(","")
df["zPos"] = df["zPos"].str.replace(")","")
df.sort_values(by='Name', ascending=[False])
df.dropna(how='any',axis=0,subset=['Name','TrackID','ParentID','xPos','yPos','zPos','eDep','DeltaE','Einit','EventID'], inplace=True)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
df['xPos'] = df['xPos'].astype(float)
df['yPos'] = df['yPos'].astype(float)
df['zPos'] = df['zPos'].astype(float)
#df10[df10['Name'].str.contains("e-")]
threedee = plt.figure().gca(projection='3d')
threedee.scatter(df["xPos"], df["yPos"], df["zPos"], c=df["eDep"], cmap=plt.cm.coolwarm)
threedee.set_xlabel("x(mm)")
threedee.set_ylabel("y(mm)")
threedee.set_zlabel("z(mm)")
plt.show()
Heres what the plot looks like!
Its from a particle physics simulation using GEANT4. The actual files are extremely large (3.7GB's that I've chunked into 40ish MB's) and this plot only represents a small fraction of the data.