How can custom errorbars be aligned on grouped bars? - python

I have created a sns.catplot using seaborn. My goal is to obtain a barplot with error bars.
I followed this answer to error bars to my plot. However, I now find that my error bars, using the same ax.errorbar function no longer align to my bar plot.
I appreciate any answers or comments as to why sorting my data frame has caused this issue.
import pandas as pd
import matplotlib
import seaborn as sns
data = {'Parameter': ['$μ_{max}$', '$μ_{max}$', '$μ_{max}$', '$μ_{max}$', '$μ_{max}$', '$m$', '$m$', '$m$', '$m$', '$m$', '$\\alpha_D$', '$\\alpha_D$', '$\\alpha_D$', '$\\alpha_D$', '$\\alpha_D$', '$N_{max}$', '$N_{max}$', '$N_{max}$', '$N_{max}$', '$N_{max}$', '$\\gamma_{cell}$', '$\\gamma_{cell}$', '$\\gamma_{cell}$', '$\\gamma_{cell}$', '$\\gamma_{cell}$', '$K_d$', '$K_d$', '$K_d$', '$K_d$', '$K_d$'],
'Output': ['POC', 'DOC', 'IC', 'Cells', 'Mean', 'POC', 'DOC', 'IC', 'Cells', 'Mean', 'POC', 'DOC', 'IC', 'Cells', 'Mean', 'POC', 'DOC', 'IC', 'Cells', 'Mean', 'POC', 'DOC', 'IC', 'Cells', 'Mean', 'POC', 'DOC', 'IC', 'Cells', 'Mean'],
'Total-effect': [0.9806103414992552, -7.054718234598588e-10, 0.1960778044402512, 0.2537531550865872, 0.3576103250801555, 0.1663846098641205, 1.0851909901687566, 0.2563681021056311, 0.0084168031549801, 0.3790901263233721, 0.0031054085922008, 0.0002724061050653, 0.1659030569337202, 0.2251452993113863, 0.0986065427355931, 0.0340237460462674, 0.3067235088110348, 0.3150260538485233, 0.3349234507482945, 0.24767418986353, 0.1938746960877987, -6.17103884336228e-07, 0.0041542186143554, 0.0032055759222461, 0.050308468380129, 0.0417496162986251, 2.328088857274425e-09, 0.9483137697398172, 0.9881583951740854, 0.4945554458851541],
'First-order': [0.7030107013984165, 2.266962154339895e-19, 0.0062233586910709, 0.001029343445717, 0.1775658508838011, 0.0007896517048184, 0.7264368524472167, 0.0072701545157557, 0.0047752182357577, 0.1848179692258871, -2.123427373989929e-05, 2.395667282242805e-19, 0.0055179953736572, 0.0004377224837127, 0.0014836208959075, -1.509666411558862e-06, 6.068293373049956e-20, 0.0115237519530005, 0.0009532607225978, 0.0031188757522967, 0.0117401346791109, 3.482140934635793e-24, 0.0015109239301033, -2.9803014832201013e-08, 0.0033127572015498, 0.0015795893288074, 3.393882814623132e-17, 0.3451307225252993, 0.4106729024860886, 0.1893458035850488],
'Total Error': [0.0005752772018327, 1.3690325778564916e-09, 0.0033197127516203, 0.0042203628326116, 0.0020288385387743, 0.0007817126652407, 0.074645390474463, 0.0016832816591233, 0.0023529269720789, 0.0198658279427265, 0.0001233951911322, 0.0023340612253369, 0.0029383350061101, 0.003741247467092, 0.0022842597224178, 0.0005740976276596, 0.1017075201238418, 0.0016784578928217, 0.0037270295879161, 0.0269217763080598, 0.0009021103063017, 4.619682769520493e-07, 0.0005201826302926, 0.0005615428740041, 0.0004960744447188, 0.000910170372727, 1.0571905831111963e-09, 0.0029389557787801, 0.0054832440706334, 0.0023330928198327],
'First Error': [0.0024072925459877, 9.366089709991011e-20, 0.0002667351219131, 0.0002702376243862, 0.0007360663230718, 0.0002586411466273, 0.0409234887280223, 0.0005053286335856, 0.0003348751699561, 0.0105055834195478, 2.195881790893627e-05, 8.208495135059976e-20, 0.0001643584459509, 0.0002162523113349, 0.0001006423937987, 0.0001928274220008, 3.4836161809305005e-20, 0.0005126354796536, 0.0005972681850905, 0.0003256827716862, 0.0003252835339205, 5.013811598030501e-24, 3.247452070080876e-05, 8.972262407759052e-08, 8.946194431135658e-05, 0.0001221659592046, 2.8775799201024936e-18, 0.0033817071114312, 0.0058875798799757, 0.0023478632376529]}
df = pd.DataFrame(data)
# Picks outputs to show
show_vars = ["Mean"]
err_df = df.melt(id_vars=["Parameter", "Output"], value_vars=["Total Error", "First Error"], var_name="Error").sort_values(by="Parameter")
df = df.melt(id_vars=["Parameter", "Output"], value_vars=["Total-effect", "First-order"], var_name="Sobol index", value_name="Value").sort_values(by="Parameter")
# Plot
grid = sns.catplot(data=df[df["Output"].isin(show_vars)], x="Parameter", y="Value", col="Output", col_wrap=2,
hue="Sobol index", kind="bar", aspect=1.8, legend_out=False)
grid.set_titles(col_template="Sensitivity with respect to {col_name}")
# Add error lines and values
for ax, var in zip(grid.axes.ravel(), show_vars):
# Value labels
for i, c in enumerate(ax.containers):
if type(c) == matplotlib.container.BarContainer:
ax.bar_label(c, labels=[f'{v.get_height():.2f}' if v.get_height() >= 0.01 else "<0.01" for v in c],
label_type='center')
# Error bars
ticklocs = ax.xaxis.get_majorticklocs()
offset = ax.containers[0][0].get_width() / 2
ax.errorbar(x=np.append(ticklocs - offset, ticklocs + offset), y=df[df["Output"] == var]["Value"],
yerr=err_df[err_df["Output"] == var]["value"], ecolor='black', linewidth=0, elinewidth=2, capsize=2) # Careful: array order matters
# Change title for mean
if var == "Mean":
ax.set_title("Average sensitivity across outputs")
grid.tight_layout()
Output:
I did try to sort the select dataframes by doing:
y=df[df["Output"] == var].sort_values(by="Parameter")["Value"], yerr=err_df[err_df["Output"] == var].sort_values(by="Parameter")["value"]
This despite the fact that order in the data frame seems to be preserved across operations.

seaborn is a high-level API for matplotlib and pandas uses matplotlib as the default plotting backend. Both packages work with matplotlib in different ways, which make certain types of plots and customizations easier.
seaborn.barplot automatically aggregates data and adds errors bars, however, since this data is already aggregated, and has columns of data with the errors, it's easier to add the errors with pandas.DataFrame.plot and the yerr parameter.
See pandas User Guide: Plotting with error bars
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
import matplotlib as mpl
import pandas as pd
# set the index as the column to be the x-axis
df = df.set_index('Parameter')
# select the Mean data
df_mean = df[df.Output.eq('Mean')]
# specify the columns to use for the errors
yerr = df_mean[['Total Error', 'First Error']]
# the columns must be the same name as the columns used for the data values
yerr.columns = ['Total-effect', 'First-order']
# plot the selected data and add the yerr
ax = df_mean.plot(kind='bar', y=['Total-effect', 'First-order'], yerr=yerr, rot=0, figsize=(12, 8), title='Average sensitivity across outputs')
# iterate through each group of bars
for c in ax.containers:
# add labels to the bars
if type(c) == mpl.container.BarContainer:
labels=[f'{h:.2f}' if (h := v.get_height()) >= 0.01 else "<0.01" for v in c]
ax.bar_label(c, labels=labels, label_type='center')

Related

How can I add hatching for specific bars in sns.catplot?

I use seaborn to make a categorical barplot of a df containing Pearson correlation R-values for 17 vegetation classes, 3 carbon species and 4 regions. I try to recreate a smaller sample df here:
import pandas as pd
import seaborn as sns
import random
import numpy as np
df = pd.DataFrame({
'veg class':12*['Tree bl dc','Shrubland','Grassland'],
'Pearson R':np.random.uniform(0,1, 36),
'Pearson p':np.random.uniform(0,0.1, 36),
'carbon':4*['CO2','CO2','CO2', 'CO', 'CO', 'CO', 'CO2 corr', 'CO2 corr', 'CO2 corr'],
'spatial':9*['SH'] + 9*['larger AU region'] + 9*['AU'] + 9*['SE-AU']
})
#In my original df, the number of vegetation classes where R-values are
#available is not the same for all spatial scales, so I drop random rows
#to make it more similar:
df.drop([11,14,17,20,23,26,28,29,31,32,34,35], inplace=True)
#I added colums indicating where hatching should be
#boolean:
df['significant'] = 1
df.loc[df['Pearson p'] > 0.05, 'significant'] = 0
#string:
df['hatch'] = ''
df.loc[df['Pearson p'] > 0.05, 'hatch'] = 'x'
df.head()
This is my plotting routine:
sns.set(font_scale=2.1)
#Draw a nested barplot by veg class
g = sns.catplot(
data=df, kind="bar", row="spatial",
x="veg class", y="Pearson R", hue="carbon",
ci=None, palette="YlOrBr", aspect=5
)
g.despine(left=True)
g.set_titles("{row_name}")
g.set_axis_labels("", "Pearson R")
g.set(xlabel=None)
g.legend.set_title("")
g.set_xticklabels(rotation = 60)
(The plot looks as follows: seaborn categorical barplot)
The plot is exactly how I would like it, except that now I would like to add hatching (or any kind of distinction) for all bars where the Pearson R value is insignificant, i.e. where the p value is larger than 0.05. I found this stackoverflow entry, but my problem differs from this, as the plots that should be hatched are not in repetitive order.
Any hints will be highly appreciated!
To determine the height of individual bars and hatching, we get a container for each graph unit, get the height of that individual container, determine it with a specified threshold, and then set the hatching and color. Please add the following code at the end.
for ax in g.axes.flat:
for k in range(len(ax.containers)):
h = ax.patches[k].get_height()
if h >= 0.8:
ax.patches[k].set_hatch('*')
ax.patches[k].set_edgecolor('k')
Edit: The data has been updated to match the actual data, and the code has been modified accordingly. Also, the logic is conditional on the value of the hatching column.
for i,ax in enumerate(g.axes.flat):
s = ax.get_title()
dff = df.query('spatial == #s')
dff = dff.sort_values('veg class', ascending=False)
ha = dff['hatch'].tolist()
p = dff['Pearson R'].tolist()
print(ha)
for k in range(len(dff)):
if ha[k] == 'x':
ax.patches[k].set_hatch('*')
ax.patches[k].set_edgecolor('k')

Add the label for the value to display above the bars [duplicate]

This question already has answers here:
How to add value labels on a bar chart
(7 answers)
How to add multiple annotations to a bar plot
(1 answer)
Closed 1 year ago.
I created a bar chart and would like to place the count value above each bar.
# Import the libraries
import pandas as pd
from matplotlib import pyplot as plt
# Create the DataFrame
df = pd.DataFrame({
'city_code':[1200013, 1200104, 1200138, 1200179, 1200203],
'index':['good', 'bad', 'good', 'good', 'bad']
})
# Plot the graph
df['index'].value_counts().plot(kind='bar', color='darkcyan',
figsize=[15,10])
plt.xticks(rotation=0, horizontalalignment="center", fontsize=14)
plt.ylabel("cities", fontsize=16)
I'm getting the following result
I would like to add the values ​​at the top of each bar. The values ​​of the count I got from value_counts.
Something like this:
Thanks to everyone who helps.
Example using patches and annotate:
# Import the libraries
import pandas as pd
from matplotlib import pyplot as plt
# Create the DataFrame
df = pd.DataFrame(
{
"city_code": [1200013, 1200104, 1200138, 1200179, 1200203],
"index": ["good", "bad", "good", "good", "bad"],
}
)
# Plot the graph
ax = df["index"].value_counts().plot(kind="bar", color="darkcyan", figsize=[15, 10])
plt.xticks(rotation=0, horizontalalignment="center", fontsize=14)
plt.ylabel("cities", fontsize=16)
for p in ax.patches:
ax.annotate(
str(p.get_height()), xy=(p.get_x() + 0.25, p.get_height() + 0.1), fontsize=20
)
plt.savefig("test.png")
Result:
You can use ax.text to add the label one by one, use a for loop.
But there is already a built in method in matplotlib to do this.
You can change the line df['index'].value_counts().plot(kind='bar', color='darkcyan', figsize=[15,10]) in your example into
d = df['index'].value_counts()
p = ax.bar(d.index, d.values,color='darkcyan')
ax.bar_label(p)
The complete example will be:
fig, ax = plt.subplots(figsize=(4, 3))
# Create the DataFrame
df = pd.DataFrame({
'city_code':[1200013, 1200104, 1200138, 1200179, 1200203],
'index':['good', 'bad', 'good', 'good', 'bad']
})
# Plot the graph
d = df['index'].value_counts()
p = ax.bar(d.index, d.values,color='darkcyan')
ax.bar_label(p)
plt.xticks(rotation=0, horizontalalignment="center", fontsize=14)
plt.ylabel("cities", fontsize=16)
fig.show()
And the output image looks like this:

A facet-specific legend in each facet of a FacetGrid Seaborn

I am trying to make a facet-specific legend in each facet of a FacetGrid Seaborn object, such as that produced by a catplot.
Consider the following DataFrame where measurement is the variable to plot, against the categorical Condition, faceted across rows and columns according to variables Lab and (instrument) model. The hue is set to the serial number of the particular instrument on which the measurement was made.
Here is the DataFrame:
df = pd.DataFrame({'Condition': ['C1','C2','C1','C2','C1','C1','C2','C1',
'C1','C1', 'C1', 'C2', 'C1', 'C2', 'C1', 'C2', 'C2'],
'model': ['Pluto','Pluto','Jupy','Jupy','Jupy','Jupy','Jupy','Jupy',
'Jupy', 'Pluto', 'Pluto', 'Pluto', 'Pluto', 'Pluto', 'Jupy', 'Jupy',
'Pluto'],
'serial': [2520,2520,3568,3568,3568,3580,3580,356,
456, 2580, 2580, 2580, 2599, 2599, 2700, 2700,
2560],
'measurement': [1.02766,1.0287,1.0099,1.0198,1.0034,1.0036,1.0054,1.0024,
1.0035,1.00245,1.00456, 1.01, 1.0023, 1.0024, 1.00238, 1.0115,
1.020],
'Lab': ['John','John','John','John','Jack','Jack','Jack','John',
'Jack','John', 'Jack', 'Jack', 'Jack', 'Jack', 'John', 'John',
'John']}
)
some facets contain only a subset of the hue levels and as the levels grow in number the FacetGrid legend gets rather long. Inspired by an the answer to another post, I opt for iterating through the FacetGrid axes using g.axes.ravel() to get a legend in each facet:
sns.set_style("ticks")
g = sns.catplot(x='Condition', # returns a FacetGrid object for further editing
y = 'measurement',
data=df,
hue='serial',
row='Lab',
col='model',
s=10,
kind='swarm',
dodge=False,
aspect = 1,
sharey = True,
legend_out = True,
).despine(left=True)
for axes in g.axes.ravel():
axes.legend()
g.savefig('/Users/massimopinto/Desktop/legend_in_facets.png',
bbox_inches='tight')
this leads to a rather crowded plot and overloaded information from the entire FacetGrid object legend. What I would prefer to have is the legend of each facet only to show the hue levels that appear in that specific facet.
How do I get to that?
versions: pandas: 1.0.3; seaborn: 0.10.0; python: 3.7.2
Consider iterating elementwise with a groupby() object using zip to rebuild each legend by corresponding values of hue column. Importantly, you must sort data frame before plotting.
df = df.sort_values(['Lab', 'model', 'serial']).reset_index(drop=True)
sns.set_style("ticks")
g = sns.catplot(x = 'Condition',
y = 'measurement',
data = df,
hue = 'serial',
row = 'Lab',
col = 'model',
s=10,
kind='swarm',
dodge=False,
aspect = 1,
sharey = True,
legend_out = False, # REMOVE MASTER LEGEND
).despine(left=True)
# MASTER SERIES OF serial
ser_vals = pd.Series(df['serial'].sort_values().unique())
for axes, (i, d) in zip(g.axes.ravel(), df.groupby(['Lab', 'model'])):
handles, labels = axes.get_legend_handles_labels()
# SUBSET MASTER SERIES OF serial
vals = ser_vals[ser_vals.isin(d['serial'].unique())]
idx = vals.index.tolist()
if len(idx) > 0:
axes.legend(handles = [handles[i] for i in idx],
labels = vals.tolist())

How to make a grouped multibar time series chart in altair with DATE on the bottom and Column labels in a legend or popup?

For example you might want data like:
DATE,KEY,VALUE
2019-01-01,REVENUE,100
2019-01-01,COST,100.1
...
plotted as a time series BAR chart with little space in between the bars and no labels except for dates. The popup or legend would show you what the REV,COST cols were.
Basic bar chart with alt.Column, alt.X, alt.Y works but the labels and grouping are wrong. Is it possible to make the Column groups correspond to the x-axis and hide the X axis labels?
EDIT:
Latest best:
import altair as alt
import pandas as pd
m = 100
data = pd.DataFrame({
'DATE': pd.date_range('2019-01-01', freq='D', periods=m),
'REVENUE': np.random.randn(m),
'COST': np.random.randn(m),
}).melt('DATE', var_name='KEY', value_name='VALUE')
bars = alt.Chart(data, width=10).mark_bar().encode(
y=alt.Y('VALUE:Q', title=None),
x=alt.X('KEY:O', axis=None),
color=alt.Color('KEY:O', scale=alt.Scale(scheme='category20')),
tooltip=['DATE', 'KEY', 'VALUE'],
)
(bars).facet(
column=alt.Column(
'yearmonthdate(DATE):T', header=alt.Header(labelOrient="bottom",
labelAngle=-45,
format='%b %d %Y'
)
),
align="none",
spacing=0,
).configure_header(
title=None
).configure_axis(
grid=False
).configure_view(
strokeOpacity=0
)
Another post because I can't seem to add multiple images to the original one.
This is another way with another flaw: the bars are overlapping. Notice the dates however are handled properly because this is using an actual axis.
import altair as alt
import pandas as pd
import numpy as np
m = 250
data = pd.DataFrame({
'DATE': pd.date_range('2019-01-01', freq='D', periods=m),
'REVENUE': np.random.randn(m),
'COST': np.random.randn(m),
}).melt('DATE', var_name='KEY', value_name='VALUE')
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['REVENUE'], empty='none')
# The basic line
line = alt.Chart(data).mark_bar(interpolate='basis').encode(
x='DATE:T',
y='VALUE:Q',
color='KEY:N'
).configure_bar(opacity=0.5)
line
You can create a grouped bar chart using a combination of encodings and facets, and you can adjust the axis titles and scales to customize the appearance. Here is an examle (replicating https://vega.github.io/editor/#/examples/vega/grouped-bar-chart in Altair, as you mentioned in your comment):
import altair as alt
import pandas as pd
data = pd.DataFrame([
{"category":"A", "position":0, "value":0.1},
{"category":"A", "position":1, "value":0.6},
{"category":"A", "position":2, "value":0.9},
{"category":"A", "position":3, "value":0.4},
{"category":"B", "position":0, "value":0.7},
{"category":"B", "position":1, "value":0.2},
{"category":"B", "position":2, "value":1.1},
{"category":"B", "position":3, "value":0.8},
{"category":"C", "position":0, "value":0.6},
{"category":"C", "position":1, "value":0.1},
{"category":"C", "position":2, "value":0.2},
{"category":"C", "position":3, "value":0.7}
])
text = alt.Chart(data).mark_text(dx=-10, color='white').encode(
x=alt.X('value:Q', title=None),
y=alt.Y('position:O', axis=None),
text='value:Q'
)
bars = text.mark_bar().encode(
color=alt.Color('position:O', legend=None, scale=alt.Scale(scheme='category20')),
)
(bars + text).facet(
row='category:N'
).configure_header(
title=None
)
original answer:
I had trouble parsing from your question exactly what you're trying to do (in the future please consider including a code snippet demonstrating what you've tried and pointing out why the result is not sufficient), but here is an example of a bar chart with data of this form, that has x axis labeled by only date, with a tooltip and legend showing the revenue and cost:
import altair as alt
import pandas as pd
data = pd.DataFrame({
'DATE': pd.date_range('2019-01-01', freq='D', periods=4),
'REVENUE': [100, 200, 150, 50],
'COST': [150, 125, 75, 80],
}).melt('DATE', var_name='KEY', value_name='VALUE')
alt.Chart(data).mark_bar().encode(
x='yearmonthdate(DATE):O',
y='VALUE',
color='KEY',
tooltip=['KEY', 'VALUE'],
)

Pandas, matplotlib and plotly - how to fix series legend?

I'm trying to create an interactive plotly graph from pandas dataframes.
However, I can't get the legends displayed correctly.
Here is a working example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
# sign into the plotly api
py.sign_in("***********", "***********")
# create some random dataframes
dates = pd.date_range('1/1/2000', periods=8)
df1 = pd.DataFrame(np.random.randn(8, 1), index=dates, columns=['A'])
df2 = pd.DataFrame(np.random.randn(8, 1), index=dates, columns=['B'])
df1.index.name = 'date'
df2.index.name = 'date'
Now I attempt to plot the dataframes using plotly.
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
py.iplot_mpl(fig, filename='random')
Notice there is no legend
Edit:
Based on suggestions below I have added an update dict. Although this does display the legend, it messes up the plot itself:
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
update = dict(
layout=dict(
annotations=[dict(text=' ')], # rm erroneous 'A', 'B', ... annotations
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update, filename='random')
Edit 2:
Removing the annotations entry from the layout dict results in the plot being displayed correctly, but the legend is not the y column name, but rather the x column name, the index name of the dataframe
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
update = dict(
layout=dict(
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update, filename='random')
This results in the following plot:
Edit 3:
I have found a way to override the legend text but it seems a bit klunky. Given that I've specified the dataframe column I want to plot:
df1.plot(y='A', ax=ax)
I would have expected that y='A' would result in 'A' being used as the legend label.
It seems this is not the case, and while it is possible to override using the index label, as seen below, it just feels wrong.
Is there a better way to achieve this result?
update = dict(
layout=dict(
showlegend=True,
),
data=[
dict(name='A'),
dict(name='B'),
]
)
py.iplot_mpl(fig, update=update, filename='random')
Legends don't convert well from matplotlib to plotly.
Fortunately, adding a plotly legend to a matplotlib plot is straight forward:
update = dict(
layout=dict(
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update)
See the full working ipython notebook here.
For more information, refer to the plotly user guide.

Categories

Resources