Seaborn multiple lineplots for calendar weeks YYYYWW on x-axis - python

I have some problems with the x-axis values of a seaborn line-plot:
import pandas as pd
import seaborn as sns
# data
df = pd.DataFrame(columns=['calendar_week', 'product_name', 'value'],
data=[['201850', 'product01', 1], ['201905', 'product01', 10], ['201910', 'product01', 7],
['201840', 'product02', 4], ['201911', 'product02', 9], ['201917', 'product02', 17], ['201918', 'product02', 12]])
# plot
sns.lineplot(data=df, x='calendar_week', y='value', hue='product_name');
If the calendar_week values are strings, it plots the second graph after the first one. If the calendar_week values are integers, it fills the data from 201852 to 201899 automatically. What's the best way to plot both graphs on one sorted x-axis with only the given calendar_week values?
Here is the plot with calendar_week as string:
Here is the plot with calendar_week as int:
Thanks for help.

It's a bit roundabout, but I think you need first to convert your week numbers into real dates, plot, then use a custom formater on the x-axis to show the week number again.
df = pd.DataFrame(columns=['calendar_week', 'product_name', 'value'],
data=[['201850', 'product01', 1], ['201905', 'product01', 10], ['201910', 'product01', 7],
['201840', 'product02', 4], ['201911', 'product02', 9], ['201917', 'product02', 17], ['201918', 'product02', 12]])
df['date'] = pd.to_datetime(df.calendar_week+'0', format='%Y%W%w')
# plot
fig, ax = plt.subplots()
sns.lineplot(data=df, x='date', y='value', hue='product_name', ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%Y-%W"))
fig.autofmt_xdate()

I'm from Germany and I have to deal with ISO weeks, so I ended up doing this:
import pandas as pd
import seaborn as sns
import datetime
import matplotlib
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
# data
df = pd.DataFrame(columns=['calendar_week', 'product_name', 'value'],
data=[['201850', 'product01', 1], ['201905', 'product01', 10], ['201910', 'product01', 7],
['201840', 'product02', 4], ['201911', 'product02', 9], ['201917', 'product02', 17], ['201918', 'product02', 12]])
# convert calendar weeks to date
df['date'] = df['calendar_week'].apply(lambda x: datetime.datetime.strptime(x + '-1', '%G%V-%u'))
# plot
fig, ax = plt.subplots()
sns.lineplot(data=df, x='date', y='value', hue='product_name', ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%G%V'))
fig.autofmt_xdate()
plt.show();

Related

Python Matplotlib bar chart with categories

I have data (duration of a certain activity) for two categories (Monday, Tuesday). I would like to generate a bar chart (see 1). Bars above a threshold (different for both categories) should have a different color; e.g. on Mondays data above 10 hours should be blue and on Tuesdays above 12 hours. Any ideas how I could implement this in seaborn or matplotlib?
Thank you very much.
Monday = [5,6,8,12,5,20,4, 8]
Tuesday=[3,5,8,12,4,17]
Goal
You could draw two barplots, using an array of booleans for the coloring (hue):
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
monday = np.array([5, 6, 8, 12, 5, 20, 4, 8])
tuesday = np.array([3, 5, 8, 12, 4, 17])
sns.set_style('whitegrid')
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)
palette = {False: 'skyblue', True: 'tomato'}
sns.barplot(x=np.arange(len(monday)), y=monday, hue=monday >= 10, palette=palette, dodge=False, ax=ax0)
ax0.set_xlabel('Monday', size=20)
ax0.set_xticks([])
ax0.legend_.remove()
sns.barplot(x=np.arange(len(tuesday)), y=tuesday, hue=tuesday >= 12, palette=palette, dodge=False, ax=ax1)
ax1.set_xlabel('Tuesday', size=20)
ax1.set_xticks([])
ax1.legend_.remove()
sns.despine()
plt.tight_layout()
plt.subplots_adjust(wspace=0)
plt.show()

Stacked scatter plot

Is it possible to have the scatter plot below stacked by “sex” and grouped by day similar to the bar graph in the background?
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# Scatter Plot
fig = px.strip(df, x='day', y='tip', color='sex').update_traces(jitter = 1)
# Female bars
fig.add_bar(name='Female',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[5, 6, 7, 8], marker_color='rgba(0,0,255,0.2)'
)
# Male bars
fig.add_bar(name='Male',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[8, 2, 4, 6], marker_color='rgba(255,0,0,0.2)'
)
# Make bars stacked
fig.update_layout(barmode='stack')
fig.show()
stripmode='overlay' does the job.
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# Scatter Plot
fig = px.strip(df, x='day', y='tip', color='sex', stripmode='overlay').update_traces(jitter = 1)
# Female bars
fig.add_bar(name='Female',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[5, 6, 7, 8], marker_color='rgba(0,0,255,0.2)'
)
# Male bars
fig.add_bar(name='Male',
x=['Sun', 'Sat', 'Thur', 'Fri'], y=[8, 2, 4, 6], marker_color='rgba(255,0,0,0.2)'
)
# Make bars stacked
fig.update_layout(barmode='stack')
fig.show()
Gives

Bar graph df.plot() vs ax.bar() structure matplotlib

I am trying to graph a table as a bar graph.
I get my desired outcome using df.plot(kind='bar') structure. But for certain reasons, I now need to graph it using the ax.bar() structure.
Please refer to the example screenshot. I would like to graph the x axis as categorical labels like the df.plot(kind='bar') structure rather than continuous scale, but need to learn to use ax.bar() structure to do the same.
Make the index categorical by setting the type to 'str'
import pandas as pd
import matplotlib.pyplot as plt
data = {'SA': [11, 12, 13, 16, 17, 159, 209, 216],
'ET': [36, 45, 11, 15, 16, 4, 11, 10],
'UT': [11, 26, 10, 11, 16, 7, 2, 2],
'CT': [5, 0.3, 9, 5, 0.2, 0.2, 3, 4]}
df = pd.DataFrame(data)
df['SA'] = df['SA'].astype('str')
df.set_index('SA', inplace=True)
width = 3
fig, ax = plt.subplots(figsize=(12, 8))
p1 = ax.bar(df.index, df.ET, color='b', label='ET')
p2 = ax.bar(df.index, df.UT, bottom=df.ET, color='g', label='UT')
p3 = ax.bar(df.index, df.CT, bottom=df.ET+df.UT, color='r', label='CT')
plt.legend()
plt.show()

Why does setting hue in seaborn plot change the size of a point?

The plot I am trying to make needs to achieve 3 things.
If a quiz is taken on the same day with the same score, that point needs to be bigger.
If two quiz scores overlap there needs to be some jitter so we can see all points.
Each quiz needs to have its own color
Here is how I am going about it.
import seaborn as sns
import pandas as pd
data = {'Quiz': [1, 1, 2, 1, 2, 1],
'Score': [7.5, 5.0, 10, 10, 10, 10],
'Day': [2, 5, 5, 5, 11, 11],
'Size': [115, 115, 115, 115, 115, 355]}
df = pd.DataFrame.from_dict(data)
sns.lmplot(x = 'Day', y='Score', data = df, fit_reg=False, x_jitter = True, scatter_kws={'s': df.Size})
plt.show()
Setting the hue, which almost does everything I need, results in this.
import seaborn as sns
import pandas as pd
data = {'Quiz': [1, 1, 2, 1, 2, 1],
'Score': [7.5, 5.0, 10, 10, 10, 10],
'Day': [2, 5, 5, 5, 11, 11],
'Size': [115, 115, 115, 115, 115, 355]}
df = pd.DataFrame.from_dict(data)
sns.lmplot(x = 'Day', y='Score', data = df, fit_reg=False, hue = 'Quiz', x_jitter = True, scatter_kws={'s': df.Size})
plt.show()
Is there a way I can have hue while keeping the size of my points?
It doesn't work because when you are using hue, seaborn does two separate scatterplots and therefore the size argument you are passing using scatter_kws= no longer aligns with the content of the dataframe.
You can recreate the same effect by hand however:
x_col = 'Day'
y_col = 'Score'
hue_col = 'Quiz'
size_col = 'Size'
jitter=0.2
fig, ax = plt.subplots()
for q,temp in df.groupby(hue_col):
n = len(temp[x_col])
x = temp[x_col]+np.random.normal(scale=0.2, size=(n,))
ax.scatter(x,temp[y_col],s=temp[size_col], label=q)
ax.set_xlabel(x_col)
ax.set_ylabel(y_col)
ax.legend(title=hue_col)

Cufflinks(Plotly) can't plot date time but numbers on the x-axis for the heatmap

I tried to plot a heatmap using Cufflinks API and i changed my pandas dataframe's index to datetime
but the x-axis shows scientific numbers rather than dates:
A solution may be to specify the x-axis tick values explicitly from your pandas index.
For example:
import pandas as pd
import plotly.graph_objs as go
import cufflinks as cf
df = pd.DataFrame({'C1': [0, 1, 2, 3], 'C2': [1, 2, 3, 4], 'C3': [2, 3, 4, 5], 'C4': [3, 4, 5, 6]},
index=pd.to_datetime(['2016-11', '2016-12', '2017-01', '2017-02']))
# set the tick values to the index labels directly
layout = go.Layout(xaxis = go.XAxis(
ticks=df.index.tolist(),
showticklabels=True,
linewidth=1,
tickvals=df.index.tolist()
))
cf.go_offline()
df.iplot(kind='heatmap', fill=True, colorscale='spectral', filename='cufflinks/test', layout=layout)
This produces the plot
I arrived here attempting to resolve a similar issue to yours. This post is a little late, but perhaps this will also help future readers.

Categories

Resources