Stacked scatter plot - python

Is it possible to have the scatter plot below stacked by “sex” and grouped by day similar to the bar graph in the background?
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# Scatter Plot
fig = px.strip(df, x='day', y='tip', color='sex').update_traces(jitter = 1)
# Female bars
fig.add_bar(name='Female',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[5, 6, 7, 8], marker_color='rgba(0,0,255,0.2)'
)
# Male bars
fig.add_bar(name='Male',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[8, 2, 4, 6], marker_color='rgba(255,0,0,0.2)'
)
# Make bars stacked
fig.update_layout(barmode='stack')
fig.show()

stripmode='overlay' does the job.
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# Scatter Plot
fig = px.strip(df, x='day', y='tip', color='sex', stripmode='overlay').update_traces(jitter = 1)
# Female bars
fig.add_bar(name='Female',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[5, 6, 7, 8], marker_color='rgba(0,0,255,0.2)'
)
# Male bars
fig.add_bar(name='Male',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[8, 2, 4, 6], marker_color='rgba(255,0,0,0.2)'
)
# Make bars stacked
fig.update_layout(barmode='stack')
fig.show()
Gives

Related

is it possible to add x_ticks to pywaffle

i was wondering if and how i can add x axis label to pywaffle.
value1 = new_df['value1'].tolist()
new_list = [i+1 for i in range(len(value1))]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1), # Either rows or columns could be omitted
values=value1,
title = {"label": name, "loc": "left"},
)
plt.savefig("plot.png", bbox_inches="tight")
my value1 values are [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
i will like every column to be labeld
Yes, it is possible to add ticks etc.
A waffle chart with limited number of columns
But it is a bit unclear what your final goal is. By default, a waffle charts draws as many squares as each of the values indicates. So, if the values are [1, 2, 3, 4, 5, 6], and the color ['red', 'orange', 'blue', 'gold', 'green', 'purple'], there would be 1 red square, 2 oranges, 3 blues, 4 yellows, 5 greens and 6 purples.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
#columns=sum(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
If you set the number of rows and columns so their product is smaller than 21, each of the values will be reduced more or less proportionally, but still be an integer. In the current example, the red one goes suppressed, the orange, blue, yellow and green get reduced to 1, and the green gets reduced to 2 squares. This makes it unclear which label you want to put where.
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
Adding x ticks
To add ticks to a waffle chart, you can turn the axes on. To position the ticks, you need to know that the squares have a width of 1, and a default distance of 0.2. So, the first tick comes at 0.5, the next one at 1+0.2+0.5, etc. Optionally, you can remove spines and the dummy y ticks.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
title={"label": 'title', "loc": "left"},
figsize=(15,3),
)
plt.axis('on')
plt.yticks([])
plt.xticks([i * 1.2 + 0.5 for i in range(len(value1))], value1)
for sp in ['left', 'right', 'top']:
plt.gca().spines[sp].set_visible(False)
plt.show()
A Seaborn heatmap
Instead of a waffle chart, you could create a heatmap. Then, each square will get a color corresponding to the given values. Optionally, these values (or another string) can be shown as annotation or as x tick label.
import matplotlib.pyplot as plt
import seaborn as sns
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
plt.figure(figsize=(15, 3))
ax = sns.heatmap(data=[value1], xticklabels=value1, yticklabels=False,
annot=True, square=True, linewidths=1.5, cbar=False)
ax.set_title('title', loc='left')
plt.tight_layout()
plt.show()
# Remove borders, ticks, etc.
ax.axis("off")
saw this in pywaffle.py, so i dont think adding axis is possible.

Python Matplotlib bar chart with categories

I have data (duration of a certain activity) for two categories (Monday, Tuesday). I would like to generate a bar chart (see 1). Bars above a threshold (different for both categories) should have a different color; e.g. on Mondays data above 10 hours should be blue and on Tuesdays above 12 hours. Any ideas how I could implement this in seaborn or matplotlib?
Thank you very much.
Monday = [5,6,8,12,5,20,4, 8]
Tuesday=[3,5,8,12,4,17]
Goal
You could draw two barplots, using an array of booleans for the coloring (hue):
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
monday = np.array([5, 6, 8, 12, 5, 20, 4, 8])
tuesday = np.array([3, 5, 8, 12, 4, 17])
sns.set_style('whitegrid')
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)
palette = {False: 'skyblue', True: 'tomato'}
sns.barplot(x=np.arange(len(monday)), y=monday, hue=monday >= 10, palette=palette, dodge=False, ax=ax0)
ax0.set_xlabel('Monday', size=20)
ax0.set_xticks([])
ax0.legend_.remove()
sns.barplot(x=np.arange(len(tuesday)), y=tuesday, hue=tuesday >= 12, palette=palette, dodge=False, ax=ax1)
ax1.set_xlabel('Tuesday', size=20)
ax1.set_xticks([])
ax1.legend_.remove()
sns.despine()
plt.tight_layout()
plt.subplots_adjust(wspace=0)
plt.show()

Plot Pandas DataFrame and plot side by side

There are options to have plots side by side, likewise for pandas dataframes. Is there a way to plot a pandas dataframe and a plot side by side?
This is the code I have so far, but the dataframe is distorted.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import table
# sample data
d = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'jan': [4, 24, 31, 2, 3],
'feb': [25, 94, 57, 62, 70],
'march': [5, 43, 23, 23, 51]}
df = pd.DataFrame(d)
df['total'] = df.iloc[:, 1:].sum(axis=1)
plt.figure(figsize=(16,8))
# plot table
ax1 = plt.subplot(121)
plt.axis('off')
tbl = table(ax1, df, loc='center')
tbl.auto_set_font_size(False)
tbl.set_fontsize(14)
# pie chart
ax2 = plt.subplot(122, aspect='equal')
df.plot(kind='pie', y = 'total', ax=ax2, autopct='%1.1f%%',
startangle=90, shadow=False, labels=df['name'], legend = False, fontsize=14)
plt.show()
It's pretty simple to do with plotly and make_subplots()
define a figure with appropriate specs argument
add_trace() which is tabular data from your data frame
add_trace() which is pie chart from your data frame
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# sample data
d = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'jan': [4, 24, 31, 2, 3],
'feb': [25, 94, 57, 62, 70],
'march': [5, 43, 23, 23, 51]}
df = pd.DataFrame(d)
df['total'] = df.iloc[:, 1:].sum(axis=1)
fig = make_subplots(rows=1, cols=2, specs=[[{"type":"table"},{"type":"pie"}]])
fig = fig.add_trace(go.Table(cells={"values":df.T.values}, header={"values":df.columns}), row=1,col=1)
fig.add_trace(px.pie(df, names="name", values="total").data[0], row=1, col=2)

How to format seaborn plots

The following code produces 2 side-by-side plots. However, I would like to push the right plot to the right so that its label shows detached from the left plot. How can I do it? I could not find any option in subplots, nor in countplot
here is the code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = {
'apples': [3, 2, 0, np.nan, 2],
'oranges': [0, 7, 7, 2, 7],
'figs':[1, np.nan, 10, np.nan, 10]
}
purchases = pd.DataFrame(data)
fig, ax =plt.subplots(1,2)
sns.countplot(purchases['apples'], ax=ax[0])
sns.countplot(purchases['oranges'], ax=ax[1])
show()
An option is tight_layout:
fig, ax =plt.subplots(1,2)
sns.countplot(purchases['apples'], ax=ax[0])
sns.countplot(purchases['oranges'], ax=ax[1])
plt.tight_layout()
output:
In order to make your data play nicely with seaborn, consider changing your dataframe to the "long" format and plotting all categories and their corresponding count with sns.catplot:
data = purchases.stack().droplevel(0).reset_index()
data.columns = ['fruit', 'number']
print(data.head(5))
# output:
# fruit number
# 0 apples 3.0
# 1 oranges 0.0
# 2 figs 1.0
# 3 apples 2.0
# 4 oranges 7.0
sns.catplot(data=data, x='number', kind='count', col='fruit')
plt.show()
output:

Seaborn multiple lineplots for calendar weeks YYYYWW on x-axis

I have some problems with the x-axis values of a seaborn line-plot:
import pandas as pd
import seaborn as sns
# data
df = pd.DataFrame(columns=['calendar_week', 'product_name', 'value'],
data=[['201850', 'product01', 1], ['201905', 'product01', 10], ['201910', 'product01', 7],
['201840', 'product02', 4], ['201911', 'product02', 9], ['201917', 'product02', 17], ['201918', 'product02', 12]])
# plot
sns.lineplot(data=df, x='calendar_week', y='value', hue='product_name');
If the calendar_week values are strings, it plots the second graph after the first one. If the calendar_week values are integers, it fills the data from 201852 to 201899 automatically. What's the best way to plot both graphs on one sorted x-axis with only the given calendar_week values?
Here is the plot with calendar_week as string:
Here is the plot with calendar_week as int:
Thanks for help.
It's a bit roundabout, but I think you need first to convert your week numbers into real dates, plot, then use a custom formater on the x-axis to show the week number again.
df = pd.DataFrame(columns=['calendar_week', 'product_name', 'value'],
data=[['201850', 'product01', 1], ['201905', 'product01', 10], ['201910', 'product01', 7],
['201840', 'product02', 4], ['201911', 'product02', 9], ['201917', 'product02', 17], ['201918', 'product02', 12]])
df['date'] = pd.to_datetime(df.calendar_week+'0', format='%Y%W%w')
# plot
fig, ax = plt.subplots()
sns.lineplot(data=df, x='date', y='value', hue='product_name', ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%Y-%W"))
fig.autofmt_xdate()
I'm from Germany and I have to deal with ISO weeks, so I ended up doing this:
import pandas as pd
import seaborn as sns
import datetime
import matplotlib
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
# data
df = pd.DataFrame(columns=['calendar_week', 'product_name', 'value'],
data=[['201850', 'product01', 1], ['201905', 'product01', 10], ['201910', 'product01', 7],
['201840', 'product02', 4], ['201911', 'product02', 9], ['201917', 'product02', 17], ['201918', 'product02', 12]])
# convert calendar weeks to date
df['date'] = df['calendar_week'].apply(lambda x: datetime.datetime.strptime(x + '-1', '%G%V-%u'))
# plot
fig, ax = plt.subplots()
sns.lineplot(data=df, x='date', y='value', hue='product_name', ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%G%V'))
fig.autofmt_xdate()
plt.show();

Categories

Resources