Python Matplotlib bar chart with categories - python

I have data (duration of a certain activity) for two categories (Monday, Tuesday). I would like to generate a bar chart (see 1). Bars above a threshold (different for both categories) should have a different color; e.g. on Mondays data above 10 hours should be blue and on Tuesdays above 12 hours. Any ideas how I could implement this in seaborn or matplotlib?
Thank you very much.
Monday = [5,6,8,12,5,20,4, 8]
Tuesday=[3,5,8,12,4,17]
Goal

You could draw two barplots, using an array of booleans for the coloring (hue):
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
monday = np.array([5, 6, 8, 12, 5, 20, 4, 8])
tuesday = np.array([3, 5, 8, 12, 4, 17])
sns.set_style('whitegrid')
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)
palette = {False: 'skyblue', True: 'tomato'}
sns.barplot(x=np.arange(len(monday)), y=monday, hue=monday >= 10, palette=palette, dodge=False, ax=ax0)
ax0.set_xlabel('Monday', size=20)
ax0.set_xticks([])
ax0.legend_.remove()
sns.barplot(x=np.arange(len(tuesday)), y=tuesday, hue=tuesday >= 12, palette=palette, dodge=False, ax=ax1)
ax1.set_xlabel('Tuesday', size=20)
ax1.set_xticks([])
ax1.legend_.remove()
sns.despine()
plt.tight_layout()
plt.subplots_adjust(wspace=0)
plt.show()

Related

is it possible to add x_ticks to pywaffle

i was wondering if and how i can add x axis label to pywaffle.
value1 = new_df['value1'].tolist()
new_list = [i+1 for i in range(len(value1))]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1), # Either rows or columns could be omitted
values=value1,
title = {"label": name, "loc": "left"},
)
plt.savefig("plot.png", bbox_inches="tight")
my value1 values are [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
i will like every column to be labeld
Yes, it is possible to add ticks etc.
A waffle chart with limited number of columns
But it is a bit unclear what your final goal is. By default, a waffle charts draws as many squares as each of the values indicates. So, if the values are [1, 2, 3, 4, 5, 6], and the color ['red', 'orange', 'blue', 'gold', 'green', 'purple'], there would be 1 red square, 2 oranges, 3 blues, 4 yellows, 5 greens and 6 purples.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
#columns=sum(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
If you set the number of rows and columns so their product is smaller than 21, each of the values will be reduced more or less proportionally, but still be an integer. In the current example, the red one goes suppressed, the orange, blue, yellow and green get reduced to 1, and the green gets reduced to 2 squares. This makes it unclear which label you want to put where.
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
Adding x ticks
To add ticks to a waffle chart, you can turn the axes on. To position the ticks, you need to know that the squares have a width of 1, and a default distance of 0.2. So, the first tick comes at 0.5, the next one at 1+0.2+0.5, etc. Optionally, you can remove spines and the dummy y ticks.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
title={"label": 'title', "loc": "left"},
figsize=(15,3),
)
plt.axis('on')
plt.yticks([])
plt.xticks([i * 1.2 + 0.5 for i in range(len(value1))], value1)
for sp in ['left', 'right', 'top']:
plt.gca().spines[sp].set_visible(False)
plt.show()
A Seaborn heatmap
Instead of a waffle chart, you could create a heatmap. Then, each square will get a color corresponding to the given values. Optionally, these values (or another string) can be shown as annotation or as x tick label.
import matplotlib.pyplot as plt
import seaborn as sns
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
plt.figure(figsize=(15, 3))
ax = sns.heatmap(data=[value1], xticklabels=value1, yticklabels=False,
annot=True, square=True, linewidths=1.5, cbar=False)
ax.set_title('title', loc='left')
plt.tight_layout()
plt.show()
# Remove borders, ticks, etc.
ax.axis("off")
saw this in pywaffle.py, so i dont think adding axis is possible.

Plot a histogram where the bars are coloured based on a second list of values

I can plot a histogram in Python for example with matplotlib:
from matplotlib import pyplot as plt
x = [3,5,12,7,8,6,4,6]
plt.hist(x)
However I have a second array y = [4,6,8,2,4,5,8,7] where each value corresponds to the value at the same position of x. Now I would like to create a histogram where each bar's height is defined by x, but each bar's color is defined by the values in y that belong to its x values. You could also say I have tuples as in list(zip(x,y)) where the first value should be used for the histogram itself and the mean value of the second tuple value in each bin should determine the color.
np.unique(x, return_counts=True) returns an array with the unique values of x and their count.
Converting everything to numpy arrays, y[x == val] selects the subset of y at each position where x is equal to val. y[x == val].mean() gets the mean of those values. Calling cmap(norm(...)) gives the color corresponding to that value. The cmap and norm can be used to create a colorbar.
Here is some example code, including embellishments to change ticks, margins and spines:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
from matplotlib.cm import ScalarMappable
import numpy as np
x = np.array([3, 5, 12, 7, 8, 6, 4, 6])
y = np.array([4, 6, 8, 2, 4, 5, 8, 7])
values, counts = np.unique(x, return_counts=True)
cmap = plt.get_cmap('inferno')
norm = plt.Normalize(0, y.max()) # or plt.Normalize(y.min(), y.max())
colors = [cmap(norm(y[x == val].mean())) for val in values]
fig, ax = plt.subplots()
ax.bar(values, counts, color=colors, edgecolor='black')
ax.yaxis.set_major_locator(MultipleLocator(1))
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.set_ylabel('Count')
ax.margins(x=0.02, y=0)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.colorbar(ScalarMappable(cmap=cmap, norm=norm), pad=0.02, ax=ax)
plt.show()
Here is another example, using the tips dataset from seaborn, with the rounded total_bill on the x-axis, the count on the y-axis and colored via the tip amount.
import seaborn as sns
tips = sns.load_dataset('tips')
x = np.round(tips['total_bill'])
y = np.array(tips['tip'])
values, counts = np.unique(x, return_counts=True)
cmap = plt.get_cmap('turbo')
PS: As mentioned in #Arne's answer, seaborn can be used to replace the norm and color assignment with seaborn's hue. Without embelishments, the code would look like:
import numpy as np
import seaborn as sns
x = np.array([3, 5, 12, 7, 8, 6, 4, 6])
y = np.array([4, 6, 8, 2, 4, 5, 8, 7])
values, counts = np.unique(x, return_counts=True)
sns.set_style('darkgrid')
ax = sns.barplot(x=values, y=counts, hue=[y[x == val].mean() for val in values],
palette='inferno', dodge=False)
The seaborn library is very useful to visualize multi-dimensional data like these. You could store x and y in a pandas dataframe and then add the bin numbers and the average y values per bin:
import numpy as np
import pandas as pd
import seaborn as sns
x = [3, 5, 12, 7, 8, 6, 4, 6]
y = [4, 6, 8, 2, 4, 5, 8, 7]
n_bins = 4 # number of bins for the histogram
df = pd.DataFrame({'x': x, 'y': y})
_, bin_edges = np.histogram(x, bins=n_bins)
df['bin'] = pd.cut(x, bins=bin_edges, labels=False, include_lowest=True)
color = df.groupby('bin').mean()['y']
df['color'] = df.bin.apply(lambda k: color[k])
df
x y bin color
0 3 4 0 6.000000
1 5 6 0 6.000000
2 12 8 3 8.000000
3 7 2 1 4.666667
4 8 4 2 4.000000
5 6 5 1 4.666667
6 4 8 0 6.000000
7 6 7 1 4.666667
Then drawing the colored histogram is easy:
sns.histplot(data=df, x='x', bins=bin_edges, hue='color');

Stacked scatter plot

Is it possible to have the scatter plot below stacked by “sex” and grouped by day similar to the bar graph in the background?
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# Scatter Plot
fig = px.strip(df, x='day', y='tip', color='sex').update_traces(jitter = 1)
# Female bars
fig.add_bar(name='Female',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[5, 6, 7, 8], marker_color='rgba(0,0,255,0.2)'
)
# Male bars
fig.add_bar(name='Male',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[8, 2, 4, 6], marker_color='rgba(255,0,0,0.2)'
)
# Make bars stacked
fig.update_layout(barmode='stack')
fig.show()
stripmode='overlay' does the job.
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# Scatter Plot
fig = px.strip(df, x='day', y='tip', color='sex', stripmode='overlay').update_traces(jitter = 1)
# Female bars
fig.add_bar(name='Female',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[5, 6, 7, 8], marker_color='rgba(0,0,255,0.2)'
)
# Male bars
fig.add_bar(name='Male',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[8, 2, 4, 6], marker_color='rgba(255,0,0,0.2)'
)
# Make bars stacked
fig.update_layout(barmode='stack')
fig.show()
Gives

Show all colors in histogram bars on top of each other without adding weights in python

I am using following code to make 5 bars on 3 different data sets a, b and c. How can I show all colors in each bar. I don't want their value to add up. For example, in first bar if the value of Green is 1, Yellow is 3 and Red is 6 I don't want the final value to be 10 rather it should be 6 but all colors should appear till their final value. I don't want to use transparent colors or only bar outlines.
import matplotlib.pyplot as plt
import numpy as np
a = [1, 2, 3, 4, 5]
b = [3, 4, 1, 10, 9]
c = [6, 7, 2, 4, 6]
ind = np.arange(len(a))
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=a, width=0.35, align='center', label='Green',
facecolor='g')
ax.bar(x=ind, height=b, width=0.35, align='center', label='Yellow',
facecolor='y')
ax.bar(x=ind, height=c, width=0.35, align='center', label='Red', facecolor='r')
plt.xticks(ind, a)
plt.xlabel('Coordination Number')
plt.ylabel('Frequency')
plt.legend()
plt.show()
The reference value for the 'a' column is 6, but it was unclear if it is the maximum value. I understood it to be the maximum value and calculated the composition ratio.
I created a stacked graph based on the results.
import numpy as np
import pandas as pd
a = [1, 2, 3, 4, 5]
b = [3, 4, 1, 10, 9]
c = [6, 7, 2, 4, 6]
ind = np.arange(len(a))
df = pd.DataFrame({'a':a,'b':b,'c':c}, index=ind)
df['total'] = df.sum(axis=1)
df['max'] = df[['a','b','c']].max(axis=1)
df['aa'] = df['max']*(df['a']/df['total'])
df['bb'] = df['max']*(df['b']/df['total'])
df['cc'] = df['max']*(df['c']/df['total'])
df
a b c total max aa bb cc
0 1 3 6 10 6 0.600000 1.800000 3.600000
1 2 4 7 13 7 1.076923 2.153846 3.769231
2 3 1 2 6 3 1.500000 0.500000 1.000000
3 4 10 4 18 10 2.222222 5.555556 2.222222
4 5 9 6 20 9 2.250000 4.050000 2.700000
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=df.loc[:,'aa'], bottom=0, width=0.35, align='center', label='Green',
facecolor='g')
ax.bar(x=ind, height=df.loc[:,'bb'], bottom=df.loc[:,'aa'], width=0.35, align='center', label='Yellow',
facecolor='y')
ax.bar(x=ind, height=df.loc[:,'cc'], bottom=df.loc[:,'aa']+df.loc[:,'bb'], width=0.35, align='center', label='Red', facecolor='r')
plt.xticks(ind, a)
plt.xlabel('Coordination Number')
plt.ylabel('Frequency')
plt.legend()
plt.show()
If I understand your question correctly, you want to show all colour bars starting from the same zero baseline and grouped together under their corresponding Number?
I'll use bokeh for plotting, since it provides an easy way to "offset" each bar in the group. To vary the amount of visual offset for each bar, change the second parameter of the dodge function. For this combination of widths, 0.05 seemed like a nice value.
from bokeh.io import output_notebook, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
output_notebook() # or output_file("chart.html") if not using Jupyter
x_axis_values = [str(x) for x in range(1, 6)]
data = {
"Coordination Number" : x_axis_values,
"Green" : [1, 2, 3, 4, 5],
"Yellow" : [3, 4, 1, 10, 9],
"Red" : [6, 7, 2, 4, 6]
}
src = ColumnDataSource(data=data)
p = figure(
x_range=x_axis_values, y_range=(0, 10), plot_height=275,
title="Offset Group Bar Chart", toolbar_location=None, tools="")
p.vbar(
x=dodge('Coordination Number', -0.05, range=p.x_range),
top='Green', width=0.2, source=src, color="#8DD3C7", legend_label="Green")
p.vbar(
x=dodge('Coordination Number', 0.0, range=p.x_range),
top='Yellow', width=0.2, source=src, color="#FFD92F", legend_label="Yellow")
p.vbar(
x=dodge('Coordination Number', 0.05, range=p.x_range),
top='Red', width=0.2, source=src, color="#E15759", legend_label="Red")
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.axis_label = "Coordination Number"
p.yaxis.axis_label = "Frequency"
show(p)

How to format seaborn plots

The following code produces 2 side-by-side plots. However, I would like to push the right plot to the right so that its label shows detached from the left plot. How can I do it? I could not find any option in subplots, nor in countplot
here is the code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = {
'apples': [3, 2, 0, np.nan, 2],
'oranges': [0, 7, 7, 2, 7],
'figs':[1, np.nan, 10, np.nan, 10]
}
purchases = pd.DataFrame(data)
fig, ax =plt.subplots(1,2)
sns.countplot(purchases['apples'], ax=ax[0])
sns.countplot(purchases['oranges'], ax=ax[1])
show()
An option is tight_layout:
fig, ax =plt.subplots(1,2)
sns.countplot(purchases['apples'], ax=ax[0])
sns.countplot(purchases['oranges'], ax=ax[1])
plt.tight_layout()
output:
In order to make your data play nicely with seaborn, consider changing your dataframe to the "long" format and plotting all categories and their corresponding count with sns.catplot:
data = purchases.stack().droplevel(0).reset_index()
data.columns = ['fruit', 'number']
print(data.head(5))
# output:
# fruit number
# 0 apples 3.0
# 1 oranges 0.0
# 2 figs 1.0
# 3 apples 2.0
# 4 oranges 7.0
sns.catplot(data=data, x='number', kind='count', col='fruit')
plt.show()
output:

Categories

Resources