Annotate values for stacked horizontal bar plot - python

I'm trying to annotate the values for a stacked horizontal bar graph created using pandas. Current code is below
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
ax = df.plot.barh(stacked=True, figsize=(10,12))
for p in ax.patches:
ax.annotate(str(p.get_x()), xy=(p.get_x(), p.get_y()+0.2))
plt.legend(bbox_to_anchor=(0, -0.15), loc=3, prop={'size': 14}, frameon=False)
The problem is the annotation method I used gives the x starting points and not the values of each segment. I'd like to be able to annotate values of each segment in the center of each segment for each of the bars.
edit: for clarity, what I would like to achieve is something like this where the values are centered horizontally (and vertically) for each segment:

You can use the patches bbox to get the information you want.
ax = df.plot.barh(stacked=True, figsize=(10, 12))
for p in ax.patches:
left, bottom, width, height = p.get_bbox().bounds
ax.annotate(str(width), xy=(left+width/2, bottom+height/2),
ha='center', va='center')

Another possible solution is to get your df.values to a flatten array via values = df.values.flatten("F")
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
ax = df.plot.barh(stacked=True, figsize=(10,12))
values = df.values.flatten("F")
for i, p in enumerate(ax.patches):
ax.annotate(str(values[i]), xy=(p.get_x()+ values[i]/2, p.get_y()+0.2))
plt.legend(bbox_to_anchor=(0, -0.15), loc=3, prop={'size': 14}, frameon=False);

From matplotlib 3.4.0 use matplotlib.pyplot.bar_label
The labels parameter can be used to customize annotations, but it's not required.
See this answer for additional details and examples.
Each group of containers must be iterated through to add labels.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1
Horizontal Stacked
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
# add tot to sort the bars
df['tot'] = df.sum(axis=1)
# sort
df = df.sort_values('tot')
# plot all columns except tot
ax = df.iloc[:, :-1].plot.barh(stacked=True, figsize=(10, 12))
# iterate through each group of bars
for c in ax.containers:
# format the number of decimal places (if needed) and replace 0 with an empty string
labels = [f'{w:.0f}' if (w := v.get_width()) > 0 else '' for v in c ]
ax.bar_label(c, labels=labels, label_type='center')
Horizontal Grouped
Not stacked is a better presentation of the data, because it is easier to compare bar lengths visually.
# plot all columns except tot
ax = df.iloc[:, :-1].plot.barh(stacked=False, figsize=(8, 9))
# iterate through each group of bars
for c in ax.containers:
# format the number of decimal places (if needed) and replace 0 with an empty string
labels = [f'{w:.0f}' if (w := v.get_width()) > 0 else '' for v in c ]
ax.bar_label(c, labels=labels, label_type='center')
df view
group 1 group 2 group 3 tot
2 5 1 2 8
1 2 6 2 10
4 4 2 4 10
6 10 2 4 16
0 1 5 12 18
3 7 8 4 19
5 5 6 8 19

Related

is it possible to add x_ticks to pywaffle

i was wondering if and how i can add x axis label to pywaffle.
value1 = new_df['value1'].tolist()
new_list = [i+1 for i in range(len(value1))]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1), # Either rows or columns could be omitted
values=value1,
title = {"label": name, "loc": "left"},
)
plt.savefig("plot.png", bbox_inches="tight")
my value1 values are [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
i will like every column to be labeld
Yes, it is possible to add ticks etc.
A waffle chart with limited number of columns
But it is a bit unclear what your final goal is. By default, a waffle charts draws as many squares as each of the values indicates. So, if the values are [1, 2, 3, 4, 5, 6], and the color ['red', 'orange', 'blue', 'gold', 'green', 'purple'], there would be 1 red square, 2 oranges, 3 blues, 4 yellows, 5 greens and 6 purples.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
#columns=sum(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
If you set the number of rows and columns so their product is smaller than 21, each of the values will be reduced more or less proportionally, but still be an integer. In the current example, the red one goes suppressed, the orange, blue, yellow and green get reduced to 1, and the green gets reduced to 2 squares. This makes it unclear which label you want to put where.
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
Adding x ticks
To add ticks to a waffle chart, you can turn the axes on. To position the ticks, you need to know that the squares have a width of 1, and a default distance of 0.2. So, the first tick comes at 0.5, the next one at 1+0.2+0.5, etc. Optionally, you can remove spines and the dummy y ticks.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
title={"label": 'title', "loc": "left"},
figsize=(15,3),
)
plt.axis('on')
plt.yticks([])
plt.xticks([i * 1.2 + 0.5 for i in range(len(value1))], value1)
for sp in ['left', 'right', 'top']:
plt.gca().spines[sp].set_visible(False)
plt.show()
A Seaborn heatmap
Instead of a waffle chart, you could create a heatmap. Then, each square will get a color corresponding to the given values. Optionally, these values (or another string) can be shown as annotation or as x tick label.
import matplotlib.pyplot as plt
import seaborn as sns
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
plt.figure(figsize=(15, 3))
ax = sns.heatmap(data=[value1], xticklabels=value1, yticklabels=False,
annot=True, square=True, linewidths=1.5, cbar=False)
ax.set_title('title', loc='left')
plt.tight_layout()
plt.show()
# Remove borders, ticks, etc.
ax.axis("off")
saw this in pywaffle.py, so i dont think adding axis is possible.

Plot a histogram where the bars are coloured based on a second list of values

I can plot a histogram in Python for example with matplotlib:
from matplotlib import pyplot as plt
x = [3,5,12,7,8,6,4,6]
plt.hist(x)
However I have a second array y = [4,6,8,2,4,5,8,7] where each value corresponds to the value at the same position of x. Now I would like to create a histogram where each bar's height is defined by x, but each bar's color is defined by the values in y that belong to its x values. You could also say I have tuples as in list(zip(x,y)) where the first value should be used for the histogram itself and the mean value of the second tuple value in each bin should determine the color.
np.unique(x, return_counts=True) returns an array with the unique values of x and their count.
Converting everything to numpy arrays, y[x == val] selects the subset of y at each position where x is equal to val. y[x == val].mean() gets the mean of those values. Calling cmap(norm(...)) gives the color corresponding to that value. The cmap and norm can be used to create a colorbar.
Here is some example code, including embellishments to change ticks, margins and spines:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
from matplotlib.cm import ScalarMappable
import numpy as np
x = np.array([3, 5, 12, 7, 8, 6, 4, 6])
y = np.array([4, 6, 8, 2, 4, 5, 8, 7])
values, counts = np.unique(x, return_counts=True)
cmap = plt.get_cmap('inferno')
norm = plt.Normalize(0, y.max()) # or plt.Normalize(y.min(), y.max())
colors = [cmap(norm(y[x == val].mean())) for val in values]
fig, ax = plt.subplots()
ax.bar(values, counts, color=colors, edgecolor='black')
ax.yaxis.set_major_locator(MultipleLocator(1))
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.set_ylabel('Count')
ax.margins(x=0.02, y=0)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.colorbar(ScalarMappable(cmap=cmap, norm=norm), pad=0.02, ax=ax)
plt.show()
Here is another example, using the tips dataset from seaborn, with the rounded total_bill on the x-axis, the count on the y-axis and colored via the tip amount.
import seaborn as sns
tips = sns.load_dataset('tips')
x = np.round(tips['total_bill'])
y = np.array(tips['tip'])
values, counts = np.unique(x, return_counts=True)
cmap = plt.get_cmap('turbo')
PS: As mentioned in #Arne's answer, seaborn can be used to replace the norm and color assignment with seaborn's hue. Without embelishments, the code would look like:
import numpy as np
import seaborn as sns
x = np.array([3, 5, 12, 7, 8, 6, 4, 6])
y = np.array([4, 6, 8, 2, 4, 5, 8, 7])
values, counts = np.unique(x, return_counts=True)
sns.set_style('darkgrid')
ax = sns.barplot(x=values, y=counts, hue=[y[x == val].mean() for val in values],
palette='inferno', dodge=False)
The seaborn library is very useful to visualize multi-dimensional data like these. You could store x and y in a pandas dataframe and then add the bin numbers and the average y values per bin:
import numpy as np
import pandas as pd
import seaborn as sns
x = [3, 5, 12, 7, 8, 6, 4, 6]
y = [4, 6, 8, 2, 4, 5, 8, 7]
n_bins = 4 # number of bins for the histogram
df = pd.DataFrame({'x': x, 'y': y})
_, bin_edges = np.histogram(x, bins=n_bins)
df['bin'] = pd.cut(x, bins=bin_edges, labels=False, include_lowest=True)
color = df.groupby('bin').mean()['y']
df['color'] = df.bin.apply(lambda k: color[k])
df
x y bin color
0 3 4 0 6.000000
1 5 6 0 6.000000
2 12 8 3 8.000000
3 7 2 1 4.666667
4 8 4 2 4.000000
5 6 5 1 4.666667
6 4 8 0 6.000000
7 6 7 1 4.666667
Then drawing the colored histogram is easy:
sns.histplot(data=df, x='x', bins=bin_edges, hue='color');

Show all colors in histogram bars on top of each other without adding weights in python

I am using following code to make 5 bars on 3 different data sets a, b and c. How can I show all colors in each bar. I don't want their value to add up. For example, in first bar if the value of Green is 1, Yellow is 3 and Red is 6 I don't want the final value to be 10 rather it should be 6 but all colors should appear till their final value. I don't want to use transparent colors or only bar outlines.
import matplotlib.pyplot as plt
import numpy as np
a = [1, 2, 3, 4, 5]
b = [3, 4, 1, 10, 9]
c = [6, 7, 2, 4, 6]
ind = np.arange(len(a))
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=a, width=0.35, align='center', label='Green',
facecolor='g')
ax.bar(x=ind, height=b, width=0.35, align='center', label='Yellow',
facecolor='y')
ax.bar(x=ind, height=c, width=0.35, align='center', label='Red', facecolor='r')
plt.xticks(ind, a)
plt.xlabel('Coordination Number')
plt.ylabel('Frequency')
plt.legend()
plt.show()
The reference value for the 'a' column is 6, but it was unclear if it is the maximum value. I understood it to be the maximum value and calculated the composition ratio.
I created a stacked graph based on the results.
import numpy as np
import pandas as pd
a = [1, 2, 3, 4, 5]
b = [3, 4, 1, 10, 9]
c = [6, 7, 2, 4, 6]
ind = np.arange(len(a))
df = pd.DataFrame({'a':a,'b':b,'c':c}, index=ind)
df['total'] = df.sum(axis=1)
df['max'] = df[['a','b','c']].max(axis=1)
df['aa'] = df['max']*(df['a']/df['total'])
df['bb'] = df['max']*(df['b']/df['total'])
df['cc'] = df['max']*(df['c']/df['total'])
df
a b c total max aa bb cc
0 1 3 6 10 6 0.600000 1.800000 3.600000
1 2 4 7 13 7 1.076923 2.153846 3.769231
2 3 1 2 6 3 1.500000 0.500000 1.000000
3 4 10 4 18 10 2.222222 5.555556 2.222222
4 5 9 6 20 9 2.250000 4.050000 2.700000
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=df.loc[:,'aa'], bottom=0, width=0.35, align='center', label='Green',
facecolor='g')
ax.bar(x=ind, height=df.loc[:,'bb'], bottom=df.loc[:,'aa'], width=0.35, align='center', label='Yellow',
facecolor='y')
ax.bar(x=ind, height=df.loc[:,'cc'], bottom=df.loc[:,'aa']+df.loc[:,'bb'], width=0.35, align='center', label='Red', facecolor='r')
plt.xticks(ind, a)
plt.xlabel('Coordination Number')
plt.ylabel('Frequency')
plt.legend()
plt.show()
If I understand your question correctly, you want to show all colour bars starting from the same zero baseline and grouped together under their corresponding Number?
I'll use bokeh for plotting, since it provides an easy way to "offset" each bar in the group. To vary the amount of visual offset for each bar, change the second parameter of the dodge function. For this combination of widths, 0.05 seemed like a nice value.
from bokeh.io import output_notebook, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
output_notebook() # or output_file("chart.html") if not using Jupyter
x_axis_values = [str(x) for x in range(1, 6)]
data = {
"Coordination Number" : x_axis_values,
"Green" : [1, 2, 3, 4, 5],
"Yellow" : [3, 4, 1, 10, 9],
"Red" : [6, 7, 2, 4, 6]
}
src = ColumnDataSource(data=data)
p = figure(
x_range=x_axis_values, y_range=(0, 10), plot_height=275,
title="Offset Group Bar Chart", toolbar_location=None, tools="")
p.vbar(
x=dodge('Coordination Number', -0.05, range=p.x_range),
top='Green', width=0.2, source=src, color="#8DD3C7", legend_label="Green")
p.vbar(
x=dodge('Coordination Number', 0.0, range=p.x_range),
top='Yellow', width=0.2, source=src, color="#FFD92F", legend_label="Yellow")
p.vbar(
x=dodge('Coordination Number', 0.05, range=p.x_range),
top='Red', width=0.2, source=src, color="#E15759", legend_label="Red")
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.axis_label = "Coordination Number"
p.yaxis.axis_label = "Frequency"
show(p)

How to create conditional coloring for matplotlib table values?

How do I add conditional coloring to this table?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':[16, 15, 14, 16],
'B': [3, -2, 5, 0],
'C': [200000, 3, 6, 800000],
'D': [51, -6, 3, 2]})
fig, ax = plt.subplots(figsize=(10,5))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText = df.values, colLabels = df.columns, loc='center')
plt.show()
How do I add conditional coloring to the table where column A and column D values are greater than or equal to 15, the cells are red; else they're green. If column B and column C values are greater than or equal to 5, the cells are red; else they're green. This is what it should look like:
Generate a list of lists and feed it to cellColours. Make sure that the list of lists contains as many lists as you have rows in the data frame and each of the lists within the list of lists contains as many strings as you have columns in the data frame.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':[16, 15, 14, 16],
'B': [3, -2, 5, 0],
'C': [200000, 3, 6, 800000],
'D': [51, -6, 3, 2]})
colors = []
for _, row in df.iterrows():
colors_in_column = ["g", "g", "g", "g"]
if row["A"]>=15:
colors_in_column[0] = "r"
if row["B"]>=5:
colors_in_column[1] = "r"
if row["C"]>5:
colors_in_column[2] = "r"
if row["D"]>=15:
colors_in_column[3] = "r"
colors.append(colors_in_column)
fig, ax = plt.subplots(figsize=(10,5))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText = df.values, colLabels = df.columns, loc='center', cellColours=colors)
plt.show()

python - Plotting bar graph side by side on the same graph with seaborn

I need to try to plot 3 bars on the same graph. I have 2 dataframes set up right now. My first dataframe was created off a JSON file seen here.
My second dataframe was created in the code below:
def make_bar_graph():
with open('filelocation.json') as json_file:
data = json.load(json_file)
df = pd.DataFrame([])
for item in data["Results"]["Result"]:
df = df.append(pd.DataFrame.from_dict(kpi for kpi in item["KPI"]))
df.reset_index(level=0, inplace= True)
df.rename(columns={0: 'id', 1: 'average', 2:'std. dev', 3: 'min', 4:
'median', 5:'max'}, inplace=True)
wanted_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
wanted_y = [5, 5, .500, .500, .500, 1, 1, 5, 5, .500, .500, .500, 1, 1]
kpi = ['kpi1', 'kpi2', 'kpi3', 'kpi4', 'kpi5', 'kpi6', 'kpi7', 'kpi8', 'kpi9', 'kpi10', 'kpi11', 'kpi12',
'kpi13', 'kpi14']
df2 = pd.DataFrame(dict(x=wanted_x, y=wanted_y, kpi=kpi))
sns.set()
sns.set_context("talk")
sns.axes_style("darkgrid")
h = sns.barplot(x='id', y ='average', data=df.ix[0:13], label='Test
on 4/30/2018', color='b')
g = sns.barplot(x='id', y='average', data=df.ix[14:27], label='Test
on 6/4/2018', color='r')
k = sns.barplot("x", "y", data=df2, label='Desired Results', color='y')
plt.legend()
plt.xlabel('KPI number')
plt.ylabel('Time(s)')
plt.show()
This is the graph I get from that:
Graph1
I need the bars to be next to each other, separated by id (or KPI, id number and KPI number are the same things). I'm not sure how to rework my dataframe to do this

Categories

Resources