I would like to customize the labels on the geopandas plot legend.
fig, ax = plt.subplots(figsize = (8,5))
gdf.plot(column = "WF_CEREAL", ax = ax, legend=True, categorical=True, cmap='YlOrBr',legend_kwds = {"loc":"lower right"}, figsize =(10,6))
Adding "labels" in legend_kwds does not help.
I tried to add labels with legend_kwds in the following ways, but it didn't work-
legend_kwds = {"loc":"lower right", "labels":["low", "mid", "high", "strong", "severe"]
legend_labels:["low", "mid", "high", "strong", "severe"]
legend_labels=["low", "mid", "high", "strong", "severe"]
Since the question does not have reproducible code and data to work on. I will use the best possible approach to give a demo code that the general readers can follow and some of it may answer the question.
The code I provide below can run without the need of external data. Comments are inserted in various places to explain at important steps.
# Part 1
# Classifying the data of choice
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world['gdp_per_cap'] = world.gdp_md_est / world.pop_est
num_classes = 4 #quartile scheme has 4 classes
# You can use values derived from your preferred classification scheme here
num_qtiles = [0, .25, .5, .75, 1.] #class boundaries for quartiles
# Here is the categorical data to append to the dataframe
# They are also used as legend's label texts
qlabels = ["1st quartile","2nd quartile","3rd quartile","4th quartile"] #matching categorical data/labels
# Conditions
# len(num_qtiles)-1 == num_classes
# len(qlabels) == num_classes
# Create a new column for the categorical data mentioned above
world['gdp_quartile'] = pd.qcut(world['gdp_per_cap'], num_qtiles, labels=qlabels)
# Plotting the categorical data for checking
ax1 = world['gdp_quartile'].value_counts().plot(figsize=(5,4), kind='bar', xlabel='Quartile_Classes', ylabel='Countries', rot=45, legend=True)
The output of part1:-
# Part 2
# Plot world map using the categorical data
fig, ax = plt.subplots(figsize=(9,4))
# num_classes = 4 # already defined
#color_steps = plt.colormaps['Reds']._resample(num_classes) #For older version
color_steps = plt.colormaps['Reds'].resampled(num_classes) #Current version of matplotlib
# This plots choropleth map using categorical data as the theme
world.plot(column='gdp_quartile', cmap = color_steps,
legend=True,
legend_kwds={'loc':'lower left',
'bbox_to_anchor':(0, .2),
'markerscale':1.29,
'title_fontsize':'medium',
'fontsize':'small'},
ax=ax)
leg1 = ax.get_legend()
leg1.set_title("GDP per capita")
ax.title.set_text("World Map: GDP per Capita")
plt.show()
Output of part2:-
Edit
Additional code,
use it to replace the line plt.show() above.
This answers the question posted in the comment below.
# Part 3
# New categorical texts to use with legend
new_legtxt = ["low","mid","high","v.high"]
for ix,eb in enumerate(leg1.get_texts()):
print(eb.get_text(), "-->", new_legtxt[ix])
eb.set_text(new_legtxt[ix])
plt.show()
I am not sure if this will work but try:
gdf.plot(column = "WF_CEREAL", ax = ax, legend=True, categorical=True, cmap='YlOrBr',legend_kwds = {"loc":"lower right"}, figsize =(10,6), legend_labels=["low", "mid", "high", "strong", "severe"])
Related
I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.
In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()
In addition to the solution posted in this link I would also like if I can also add the Hue Parameter, and add the Median Values in each of the plots.
The Current Code:
testPlot = sns.boxplot(x='Pclass', y='Age', hue='Sex', data=trainData)
m1 = trainData.groupby(['Pclass', 'Sex'])['Age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
p1 = range(len(m1))
for tick, label in zip(p1, testPlot.get_xticklabels()):
print(testPlot.text(p1[tick], m1[tick] + 1, mL1[tick]))
Gives a Output Like:
I'm working on the Titanic Dataset which can be found in this link.
I'm getting the required values, but only when I do a print statement, how do I include it in my Plot?
Place your labels manually according to hue parameter and width of bars for every category in a cycle of all xticklabels:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
trainData = pd.read_csv('titanic.csv')
testPlot = sns.boxplot(x='pclass', y='age', hue='sex', data=trainData)
m1 = trainData.groupby(['pclass', 'sex'])['age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
ind = 0
for tick in range(len(testPlot.get_xticklabels())):
testPlot.text(tick-.2, m1[ind+1]+1, mL1[ind+1], horizontalalignment='center', color='w', weight='semibold')
testPlot.text(tick+.2, m1[ind]+1, mL1[ind], horizontalalignment='center', color='w', weight='semibold')
ind += 2
plt.show()
This answer is nearly copy & pasted from here but fit more to your example code. The linked answer is IMHO a bit missplaced there because that question is just about labeling a boxplot and not about a boxplot using the hue argument.
I couldn't use your Train dataset because it is not available as Python package. So I used Titanic instead which has nearly the same column names.
#!/usr/bin/env python3
import pandas as pd
import matplotlib
import matplotlib.patheffects as path_effects
import seaborn as sns
def add_median_labels(ax, fmt='.1f'):
"""Credits: https://stackoverflow.com/a/63295846/4865723
"""
lines = ax.get_lines()
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
for median in lines[4:len(lines):lines_per_box]:
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1] - median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:{fmt}}', ha='center', va='center',
fontweight='bold', color='white')
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
df = sns.load_dataset('titanic')
plot = sns.boxplot(x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot)
plot.figure.show()
Als an alternative when you create your boxplot with a figure-based function. In that case you need to give the axes parameter to add_median_labels().
# imports and add_median_labels() unchanged
df = sns.load_dataset('titanic')
plot = sns.catplot(kind='box', x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot.axes[0][0])
plot.figure.show()
The resulting plot
This solution also works with more then two categories in the column used for the hue argument.
I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :
I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()
Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.
I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.
This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:
I am trying to plot two different variables (linked by a relation of causality), delai_jour and date_sondage on a single FacetGrid. I can do it with this code:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True,)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
g = g.map(plt.bar, "date_sondage", "impossible")
which gives me this:
FacetGrid
(There are 33 of them in total).
I'm interested in comparing the patterns across the various prefecture, but due to the difference in magnitude I cannot see the changes in the line chart.
For this specific work, the best way to do it is to create a secondary y axis, but I can't seem to make anything work: it doesn't look like it's possible with FacetGrid, and I didn't understand the code not was able to replicate the examples i've seen with pure matplotlib.
How should I go about it?
I got this to work by iterating through the axes and plotting a secondary axis as in a typical Seaborn graph.
Using the OP example:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
for ax, (_, subdata) in zip(g.axes, df_verif_sum.groupby('prefecture')):
ax2=ax.twinx()
subdata.plot(x='data_sondage',y='impossible', ax=ax2,legend=False,color='r')
If you do any formatting to the x-axis, you may have to do it to both ax and ax2.
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.plot(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.
I've been trying to follow this How to make custom legend in matplotlib SO question but I think a few things are getting lost in translation. I used a custom color mapping for the different classes of points in my plot and I want to be able to put a table with those color-label pairs. I stored the info in a dictionary D_color_label and then made 2 parallel lists colors and labels. I tried using it in the ax.legend but it didn't seem to work.
np.random.seed(0)
# Create dataframe
DF_0 = pd.DataFrame(np.random.random((100,2)), columns=["x","y"])
# Label to colors
D_idx_color = {**dict(zip(range(0,25), ["#91FF61"]*25)),
**dict(zip(range(25,50), ["#BA61FF"]*25)),
**dict(zip(range(50,75), ["#916F61"]*25)),
**dict(zip(range(75,100), ["#BAF1FF"]*25))}
D_color_label = {"#91FF61":"label_0",
"#BA61FF":"label_1",
"#916F61":"label_2",
"#BAF1FF":"label_3"}
# Add color column
DF_0["color"] = pd.Series(list(D_idx_color.values()), index=list(D_idx_color.keys()))
# Plot
fig, ax = plt.subplots(figsize=(8,8))
sns.regplot(data=DF_0, x="x", y="y", scatter_kws={"c":DF_0["color"]}, ax=ax)
# Add custom legend
colors = list(set(DF_0["color"]))
labels = [D_color_label[x] for x in set(DF_0["color"])]
# If I do this, I get the following error:
# ax.legend(colors, labels)
# UserWarning: Legend does not support '#BA61FF' instances.
# A proxy artist may be used instead.
According to http://matplotlib.org/users/legend_guide.html you have to put to legend function artists which will be labeled. To use scatter_plot individually you have to group by your data by color and plot every data of one color individually to set its own label for every artist:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
np.random.seed(0)
# Create dataframe
DF_0 = pd.DataFrame(np.random.random((100, 2)), columns=["x", "y"])
DF_0['color'] = ["#91FF61"]*25 + ["#BA61FF"]*25 + ["#91FF61"]*25 + ["#BA61FF"]*25
#print DF_0
D_color_label = {"#91FF61": "label_0", "#BA61FF": "label_1",
"#916F61": "label_2", "#BAF1FF": "label_3"}
colors = list(DF_0["color"].uniqe())
labels = [D_color_label[x] for x in DF_0["color"].unique()]
ax = sns.regplot(data=DF_0, x="x", y="y", scatter_kws={'c': DF_0['color'], 'zorder':1})
# Make a legend
# groupby and plot points of one color
for i, grp in DF_0.groupby(['color']):
grp.plot(kind='scatter', x='x', y='y', c=i, ax=ax, label=labels[i+1], zorder=0)
ax.legend(loc=2)
plt.show()