matplotlib - dataframe - How to have real map on the background in matplotlib - python

The codes below put dots on the specific points on the earth map.
num_samples = 1250000
indices = np.random.choice(df.index, num_samples)
df_x = df.df_longitude[indices].values
df_y = df.df_latitude[indices].values
sns.set_style('white')
fig, ax = plt.subplots(figsize=(11, 12))
ax.scatter(df_x, df_y, s=5, color='red', alpha=0.5)
ax.set_xlim([-74.10, -73.60])
ax.set_ylim([40.85, 40.90])
ax.set_title('coordinates')
Is there any way to put these dots on a map instead of this white background?
Please have a look at the picture below:

geopandas provides an API that makes this quite easy. Here is an example, where the map is zoom into the continent of Africa:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas
df = pd.DataFrame(
{'Latitude': np.random.uniform(-20,10, 100),
'Longitude': np.random.uniform(40,20, 100)})
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
ax = world[world.continent == 'Africa'].plot(color='white', edgecolor='black')
# We can now plot our ``GeoDataFrame``.
gdf.plot(ax=ax, color='red')
plt.show()
The result:

Related

Matplot legends are printing twice

I am writing a simple code with matplotlib/seaborn to plot the data of a sample csv file. However, when call the sns.histplot() function through a for loop, the legends of each column are displaying twice. Any help would be greatly appreciated:)
Here's the code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib
sns.set_style('darkgrid')
df = pd.read_csv('dm_office_sales.csv')
df['salary'] = df['salary'] * 3
df['sample salary'] = df['salary'] * 2
x = df['salary']
y = df['sales']
z = df['sample salary']
fig,ax = plt.subplots()
for i in [x,y,z]:
sns.histplot(data = i, bins=50, ax=ax, palette = 'bright',alpha=0.3, label='{}'.format(i.name))
plt.legend(numpoints=1)
plt.suptitle('Sales/Salary Histogram')
plt.show()
Pass just the columns in question in one step, instead of looping.
sns.histplot(data=df[['salary', 'sales', 'sample salary']], ...)
Here's a demo with the tips dataset:
tips = sns.load_dataset('tips')
fig, ax = plt.subplots()
sns.histplot(tips[['total_bill', 'tip']], bins=50,
ax=ax, alpha=0.3, palette='bright')
plt.show()

How to display the line color in the legend with kdeplot

I am wanting to overlay different 2D density plots over each other using the kdeplot() function from seaborn, however the color of the contours aren't appearing in the legend. How would I be able to update the legend with the color?
Code example:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
sns.kdeplot(x = np.random.random(30), y = np.random.random(30), label = "dist1", ax=ax)
sns.kdeplot(x = np.random.random(30) + 1, y = np.random.random(30) + 1, label = "dist2", ax=ax)
ax.legend()
plt.show()
I'm using seaborn v0.12.0
Found a way to work around the issue. By extracting the colour in the colourcycle, you can manually set the colour of kdeplot() as well as construct the handles for the legend.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
fig, ax = plt.subplots()
handles = []
# Extracting next colour in cycle
color = next(ax._get_lines.prop_cycler)["color"]
sns.kdeplot(x = np.random.random(30), y = np.random.random(30), color = color, label = "dist1", ax=ax)
handles.append(mlines.Line2D([], [], color=color, label="dist1"))
color = next(ax._get_lines.prop_cycler)["color"]
sns.kdeplot(x = np.random.random(30) + 1, y = np.random.random(30) + 1, color = color, label = "dist2", ax=ax)
handles.append(mlines.Line2D([], [], color=color, label="dist1"))
ax.legend(handles = handles)
Output plot
It's easier create a pandas.DataFrame with a label column, and let the plot API manage the legend handles and labels. In addition to making plotting easier, it's easier to perform additional analysis on the data in a dataframe.
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.2, seaborn 0.12.0
Create a DataFrame
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(2022) # for the same sample data each time
# list of dataframes; create each dataframe and use assign to add a label column
df_list = [pd.DataFrame({'x': np.random.random(30),
'y': np.random.random(30)}).assign(label='d1'),
pd.DataFrame({'x': np.random.random(30) + 1,
'y': np.random.random(30) + 1}).assign(label='d2')]
# combine the list of dataframes with concat
df = pd.concat(df_list, ignore_index=True)
# display(df.head())
x y label
0 0.009359 0.564672 d1
1 0.499058 0.349429 d1
2 0.113384 0.975909 d1
3 0.049974 0.037820 d1
4 0.685408 0.794270 d1
# display(df.tail())
x y label
55 1.829995 1.251087 d2
56 1.626445 1.241247 d2
57 1.871438 1.841468 d2
58 1.625907 1.020932 d2
59 1.130638 1.894918 d2
sns.displot
# plot the dataframe in a figure level plot
g = sns.displot(kind='kde', data=df, x='x', y='y', hue='label')
sns.kdeplot
# plot the dataframe in an axes level plot
fig, ax = plt.subplots(figsize=(7, 5))
sns.kdeplot(data=df, x='x', y='y', hue='label', ax=ax)

Geopandas with log-scale colormap

If I have the plot below, how can I turn the colormap/legend into a log-scale?
import geopandas as gpd
import matplotlib.pyplot as plt
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est>0) & (world.name!="Antarctica")]
fig, ax = plt.subplots(1, 1)
world.plot(column='pop_est', ax=ax, legend=True)
GeoPandas plots are using matplotlib, so you can use normalization of colormap provided by it. Note than I am also specifying min and max values as mins and maxs of the column I am plotting.
world.plot(column='pop_est', legend=True, norm=matplotlib.colors.LogNorm(vmin=world.pop_est.min(), vmax=world.pop_est.max()), )
You can simply plot the log of the value instead of the value itself.
import geopandas as gpd
import matplotlib.pyplot as plt
from numpy import log10
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est>0) & (world.name!="Antarctica")]
world['logval'] = log10(world['pop_est'])
fig, ax = plt.subplots(1, 1)
world.plot(column='logval', ax=ax, legend=True)

Generating Legend for geopandas plot

I am plotting a shape file with Geopandas. Additionally im Adding Points of a dataframe (see picture). Now im trying to add a legend (at the right of the original plot) for the point. I dont really know how to do that!
Plot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import test
variable = 'RTD_rtd'
df = test.getdataframe()
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.NP_LongDegree, df.NP_LatDegree))
fp = "xxx"
map_df = gpd.read_file(fp)
ax = map_df.plot(color='white', edgecolor='black', linewidth=0.4, figsize= (10,10))
gdf.plot(column=variable, ax=ax, cmap='Reds', markersize=14.0, linewidth=2.0)
plt.show()
One Idea was to add a simple legend. I want something looking better. Maybe something similar to whats done in this tutorial: Tutorial
I followed the example that you referred to and this is the concise version. It would have been better if you could have shared a bit of your dataset 'df'. It seems that you want to have a colorbar which fig.colorbar generates.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import test
from shapely.geometry import Point
df = pd.read_csv('london-borough-profiles.csv', header=0)
df = df[['Area name','Population density (per hectare) 2017']]
fp = 'London_Borough_Excluding_MHW.shp'
map_df = gpd.read_file(fp)
gdf = map_df.set_index('NAME').join(df.set_index('Area name'))
variable = 'Population density (per hectare) 2017'
vmin, vmax = 120, 220
fig, ax = plt.subplots(1, figsize=(10, 6))
gdf.plot(column=variable, cmap='Blues', ax = ax, linewidth=0.8, edgecolor='0.8')
ax.axis('off')
ax.set_title('Population density (per hectare) 2017', fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.annotate('Source: London Datastore, 2014',xy=(0.1, .08), xycoords='figure fraction', horizontalalignment='left', verticalalignment='top', fontsize=12, color='#555555')
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)
You can add this into your solution and for this you have to set label for each plot
plt.legend()

How to add multiple trendlines pandas

I have plotted a graph with two y axes and would now like to add two separate trendlines for each of the y plots.
This is my code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
amp_costs=pd.read_csv('/Users/Ampicillin_Costs.csv', index_col=None, usecols=[0,1,2])
amp_costs.columns=['PERIOD', 'ITEMS', 'COST PER ITEM']
ax=amp_costs.plot(x='PERIOD', y='COST PER ITEM', color='Blue', style='.', markersize=10)
amp_costs.plot(x='PERIOD', y='ITEMS', secondary_y=True,
color='Red', style='.', markersize=10, ax=ax)
Any guidance as to how to plot these two trend lines to this graph would be much appreciated!
Here is a quick example of how to use sklearn.linear_model.LinearRegression to make the trend line.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
plt.style.use('ggplot')
%matplotlib inline
period = np.arange(10)
items = -2*period +1 + np.random.randint(-2,2,len(period))
cost = 35000*period +15000 + np.random.randint(-25000,25000,len(period))
data = np.vstack((period,items,cost)).T
df = pd.DataFrame(data, columns=\['P','ITEMS', 'COST'\]).set_index('P')
lmcost = LinearRegression().fit(period.reshape(-1,1), cost.reshape(-1,1))
lmitems = LinearRegression().fit(period.reshape(-1,1), items.reshape(-1,1))
df['ITEMS_LM'] = lmitems.predict(period.reshape(-1,1))
df['COST_LM'] = lmcost.predict(period.reshape(-1,1))
fig,ax = plt.subplots()
df.ITEMS.plot(ax = ax, color = 'b')
df.ITEMS_LM.plot(ax = ax,color= 'b', linestyle= 'dashed')
df.COST.plot(ax = ax, secondary_y=True, color ='g')
df.COST_LM.plot(ax = ax, secondary_y=True, color = 'g', linestyle='dashed')

Categories

Resources