Add labels to grouped seaborn barplot from DIFFERENT COLUMN - python

I know there are other entries similar to this, but nothing exactly like this.
Suppose I have this dataframe:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({"percentage": [0.3, 0.4, 0.5, 0.2],
"xaxis": ["set1", "set1", "set2", "set2"],
"hues": ["a", "b", "c", "d"],
"number": [1,2,3,4]
})
and I create a grouped barplot in Seaborn:
sns.set(style="whitegrid")
fig, ax = plt.subplots(figsize=(10,10))
ax = sns.barplot(data=df,
x="xaxis",
y="percentage",
hue="hues")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)
for container in ax.containers:
ax.bar_label(container)
This nicely adds labels from the "percentage" column.
BUT
How do I label the barplots using the entries from the "number" column? For clarification, I chose the numbers 1,2,3,4 as a toy example. They are not consecutive in my real data.
For reference, I am using Python 3.9.X, Seaborn 0.11.2, and Matplotlib 3.5.0.
I suspect the answer lies somewhere in the container but do not know.
I have also seen potential answers that use this code:
for index, row in df.iterrows():
ax.text(insert_codehere)
but that did not seem to work for me either.
Thanks in advance.

for container, number in zip(ax.containers, df.number):
ax.bar_label(container, labels=[number, number])

Related

Share same plot area and map to original xticks

I have some dataframes that I'd like to plot the information into the same area. The first data frame uses hue and plots some bars, and subsequently all plots in the same axis should map to those xticks (they might not be in the same order). See this example:
import seaborn as sns
import matplotlib.pyplot as plt
df1 = pd.DataFrame({ "col" : ["col_a", "col_a", "col_a", "col_c", "col_c", "col_b", "col_b"], "cluster": ["A", "B", "C", "A", "B", "A", "C"], "value_x":[2,4,1,5,6,2,1]})
df2 = pd.DataFrame({ "col" : ["col_a", "col_b", "col_c"], "value_y": [11,13,9]})
f, ax = plt.subplots(1, figsize=(15, 5))
# This will write the "master order" of the xticks
sns.barplot(x="col", y="value_x", hue="cluster", data=df1, ax=ax)
# Follow plots in the same plot should map to those xticks
ax = sns.lineplot(
data=df2,
x="col",
y="value_y",
ax=ax,
)
The second line will not map correctly to the xticks. I was thinking in getting all the labels from the initial plot using "get_xticklabels" and using that as the master to join all subsequent frames so that when I plot them the order matches, but I was hoping there might be a better solution.
Thank you!
What is happening is that sns.barplot is plotting the values of df1. First it finds "col_a" than "col_c" and finally "col_b". Then you plot the line, where it finds "col_a", "col_b" and "col_c".
All you need to do is to sort the df1 before plotting:
sns.barplot(x="col", y="value_x", hue="cluster", data=df1.sort_values(by=['col']), ax=ax)

Sorting values in plt.bar

I have been looking around the net for hours now, and have not been able to solve this problem, and hope some of you can help.
plt.bar(att_new['player'], att_new['shots'].groupby(att_new['player']).transform('sum'))
plt.axhline(y=att_shots_leauge_average, color='r')
plt.xticks(rotation=90)
plt.figure(figsize=(10,30))
my dataframe looks like this:
att_new = att[['id','player','date','team_name','fixture_name','position_new', 'goals','shots',
'shots_on_target', 'xg', 'attacking_pen_area_touches',
'aerials_won', 'final_third_entry_passes', 'dribbles_completed']]
I have been going over: https://datavizpyr.com/sort-bars-in-barplot-using-seaborn-in-python/, but for me, it seems like the groupby I am doing, is making quite some problems but I need it to get the sum value.
Hope you can help! Thanks!
------EDITED CODE------
import pandas as pd
import seaborn as sns
# groupby and sort
dfg = att_new.groupby('player', as_index=False).shots.sum().sort_values('shots', ascending=False)
# get the mean value for everything
mean = att_shots_leauge_average
# plot
ax = dfg.plot.bar('player', 'shots', figsize=(9, 7), legend=False)
ax.axhline(y=mean, color='gray', lw=3)
ax.text(1.5, mean + 0.2, f'mean{mean:0.2f}', weight='bold')
You must sort the values with .sort_values()
plt.bar(att_new['player'], att_new['shots'].groupby(att_new['player']).transform('sum')) is convoluted, do the .groupby separately, and then plot the result, as shown below.
import pandas as pd
import seaborn as sns # only used for importing the data
# sample data
tips = sns.load_dataset('tips')
# groupby and sort
dfg = tips.groupby('day', as_index=False).total_bill.sum().sort_values('total_bill', ascending=False)
# get the mean value for everything
mean_tips = tips.total_bill.mean()
# plot
ax = dfg.plot.bar('day', 'total_bill', figsize=(9, 7), legend=False)
ax.axhline(y=mean_tips, color='gray', lw=3)
ax.text(1.5, mean_tips + 0.2, f'Mean Tips: ${mean_tips:0.2f}', weight='bold')

How to create an automatic set of subplots from a looping through a list of dataframe columns

I want to create efficient code in which I can pass a set of dataframe columns to a for-loop or list comprehension and it will return a set of subplots of the same type (one for each variable) depending on the type of matplotlib or seaborn plot I want to use. I'm looking for an approach that is relatively agnostic to the type of graph.
I've only tried to create code using matplotlib. Below, I provide a simple dataframe and the latest code I tried.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({"A": [1, 2,8,3,4,3], "B": [0, 2,4,8,3,2], "C": [0, 0,7,8,2,1]},
index =[1995,1996,1997,1998,1999,2000] )
df.index.name='Year'
fig, axs = plt.subplots(ncols=3,figsize=(8,4))
for yvar in df:
ts = pd.Series(yvar, index = df.index)
ts.plot(kind = 'line',ax=axs[i])
plt.show()
I expect to see a subplot for each variable that is passed to the loop.
Is this what you are looking for
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A": [1, 2,8,3,4,3], "B": [0, 2,4,8,3,2], "C": [0, 0,7,8,2,1]},
index =[1995,1996,1997,1998,1999,2000] )
plt.figure(figsize=(10,10))
for i, col in enumerate(df.columns):
plt.subplot(1,3,i+1)
plt.plot(df.index, df[col], label=col)
plt.xticks(df.index)
plt.legend(loc='upper left')
plt.show()
Use plt.subplot(no_of_rows, no_of_cols, current_subplot_number) to set the current plotting to a subplot. Any plotting done will go the current_subplot_number.
Loop over both, the columns and the axes simultaneously. Show the plot outside the loop.
fig, axs = plt.subplots(ncols=len(df.columns), figsize=(8,4))
for ax, yvar in zip(axs.flat, df):
df[yvar].plot(ax=ax)
plt.show()
Alternatively, you can also directly plot the complete dataframe
fig, axs = plt.subplots(ncols=len(df.columns), figsize=(8,4))
df.plot(subplots=True, ax=axs)
plt.show()

Why isn't pandas df.plot.bar() accepting an array for width?

Per the documentation, matplotlib's bar and barh accept a single value or an array of widths, one for each bar. I've seen examples of this around. However, it's not working when using the pandas wrapper, and the stack trace suggests that the arithmetic which checks for the x-limits (y-limits, respectively, for barh) can't handle multiple widths. What's going on?
import pandas as pd
import matplotlib.pyplot as plt
# Minimum reproducible bug
df = pd.DataFrame({'value': [3,5,4], 'width': [0.1, 0.5, 1.0]})
plt.close('all')
fig, ax = plt.subplots(1,3)
df.plot.barh(ax=ax[0], width=0.9, stacked=True) # Base case
df.plot.bar(ax=ax[1], width=df.width.values) # Throws error after drawing bars, before setting x-limits
df.plot.barh(ax=ax[2], width=df.width.values) # (Analogous to above)
So I guess the obvious workaround is to use matplotlib directly, because it handles multiple widths/heights as expected.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'value': [3,5,4], 'width': [0.1, 0.5, 1.0]})
fig, ax = plt.subplots(1,3)
ax[0].barh(df.index, df.value, height=df.width)
ax[1].bar(df.index, df.value, width=df.width)
ax[2].barh(df.index, df.value, height=df.width)
plt.show()

Adjusting the color coding on a barplot so that all values are color coded correctly in matplotlib

I have a barplot that plots Rates by State and by Category (there are 5 categories) but the problem is that some States have more categories than other states.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"state" : ["AL","AL","AL","AK", ],
"status" : ["Booked", "Rejected","Cancelled","Rejected"],
"0" : [1.5,2.5,3.5,1.0]})
df2 = df.groupby(['state','status']).size()/df.groupby(['state']).size()
fig, ax = plt.subplots()
plt.xlabel('State')
plt.ylabel('Bookings')
my_colors = 'gyr'
df2.plot(kind='bar', color=my_colors, orientation='vertical')
plt.tight_layout()
plt.show()
This does a good job with most of what I need to do however, what happens is that because some States do not have all values for status and hence do not appear in the plot, it makes some of the color coding incorrect because the colors are just shifted to repeat every 5 colors rather then based on whenever a value is missing or not. What can I do about this?
Possibly you want to show the data in a grouped fashion, namely to have 3 categories per group, such that each category has its own color.
In this case it seems this can easily be achieved by unstacking the multi-index dataframe,
df2.unstack().plot(...)
Complete example:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"state" : ["AL","AL","AL","AK", ],
"status" : ["Booked", "Rejected","Cancelled","Rejected"],
"0" : [1.5,2.5,3.5,1.0]})
df2 = df.groupby(['state','status']).size()/df.groupby(['state']).size()
fig, ax = plt.subplots()
plt.xlabel('State')
plt.ylabel('Bookings')
my_colors = 'gyr'
df2.unstack().plot(kind='bar', color=my_colors, orientation='vertical', ax=ax)
plt.tight_layout()
plt.show()

Categories

Resources