Let's take the following pd.DataFrame as an example
df = pd.DataFrame({
'month': ['2022-01', '2022-02', '2022-03'],
'col1': [1_000, 1_500, 2_000],
'col2': [100, 150, 200],
}).melt(id_vars=['month'], var_name='col_name')
which creates
month col_name value
-----------------------------
0 2022-01 col1 1000
1 2022-02 col1 1500
2 2022-03 col1 2000
3 2022-01 col2 100
4 2022-02 col2 150
5 2022-03 col2 200
Now when I would use simple seaborn
sns.barplot(data=df, x='month', y='value', hue='col_name');
I would get:
Now I would like to use plotly and the following code
import plotly.express as px
fig = px.histogram(df,
x="month",
y="value",
color='col_name', barmode='group', height=500, width=1_200)
fig.show()
And I get:
So why are the x-ticks so weird and not simply 2022-01, 2022-02 and 2022-03?
What is happening here?
I found that I always have this problem with the ticks when using color. It somehow messes the ticks up.
You can solve it by customizing the step as 1 month per tick with dtick="M1", as follows:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({
'month': ['2022-01', '2022-02', '2022-03'],
'col1': [1000, 1500, 2000],
'col2': [100, 150, 200],
}).melt(id_vars=['month'], var_name='col_name')
fig = px.bar(df,
x="month",
y="value",
color='col_name', barmode='group', height=500, width=1200)
fig.update_xaxes(
tickformat = '%Y-%m',
dtick="M1",
)
fig.show()
Related
Context: I have a dataframe and I'm plotting a line plot and a bar plot on the same chart. Now, I'd like to add a type of "timeline" below the date on the main X axis or above the chart as a secondary x axis.
Minimal reproducible code:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import base64
import plotly.graph_objects as go
from plotly.subplots import make_subplots
plot_df = pd.DataFrame({'time':['2022-01-01','2022-01-02','2022-01-03','2022-01-04','2022-01-05'],'A':[2.1,2.4,3.2,4.2,2.4],'B':[12,23,24,27,17],'C':[np.nan,500,200,np.nan,np.nan],'D':['pre','during','during','post','post']})
plot_df
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Line(x=plot_df['time'], y=plot_df['A'],name='A'),secondary_y=True)
fig.add_trace(go.Line(x=plot_df['time'], y=plot_df['B'],name='B'),secondary_y=True)
fig.add_trace(go.Bar(x=plot_df['time'], y=plot_df['C'],name='C'), secondary_y=False)
fig.update_layout(
#margin=dict(l=2, r=1, t=55, b=2),
autosize=True,
xaxis=dict(title_text="Time"),
yaxis=dict(title_text="C"),
width=1000
)
fig.show()
From this, I get this plot:
(I'm actually getting an error for now uploading the image, I'll update the post with the plot as soon as I can)
Image: (uploaded using ImgBB) https://ibb.co/w6w8677
The idea would essentially to take the column D and plot the "pre","during" and "post" on top of the plot or right below the "time" on the x axis (whichever would be easier/more visually appealing)
How could I do that?
Ultimate goal for the output would be something like this (doesn't have to have the same box size or fonts or colors, just an example of how to do something like this):
Thanks!
I have not tried the method described by the plotly community, but I think it is less time consuming to select a string annotation than to add a second axis with no data. The method I have taken is not to use a time series, but to combine the date string and D column values as labels for the x-axis scale.
The other method is to use a string annotation at the top of the graph. Which one you choose is up to you.
plot_df
time A B C D
0 2022-01-01 2.1 12 NaN pre
1 2022-01-02 2.4 23 500.0 during
2 2022-01-03 3.2 24 200.0 during
3 2022-01-04 4.2 27 NaN post
4 2022-01-05 2.4 17 NaN post
plot_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 5 non-null object
1 A 5 non-null float64
2 B 5 non-null int64
3 C 2 non-null float64
4 D 5 non-null object
dtypes: float64(2), int64(1), object(2)
memory usage: 328.0+ bytes
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
xlabels = ['{}</br></br>{}'.format(t,d) for t,d in zip(plot_df['time'], plot_df['D'])]
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Scatter(mode='lines', x=plot_df.index, y=plot_df['A'], name='A'), secondary_y=True)
fig.add_trace(go.Scatter(mode='lines', x=plot_df.index, y=plot_df['B'], name='B'), secondary_y=True)
fig.add_trace(go.Bar(x=plot_df.index, y=plot_df['C'], name='C'), secondary_y=False)
for i,d in enumerate(plot_df['D']):
fig.add_annotation(x=i, y=1.1, xref='x', yref='paper', text=d, showarrow=False)
fig.update_xaxes(tickvals=np.arange(5), ticktext=xlabels)
fig.update_layout(
#margin=dict(l=2, r=1, t=55, b=2),
autosize=True,
xaxis=dict(title_text="Time"),
yaxis=dict(title_text="C"),
width=1000
)
fig.show()
Update: Case example of manually adding a string annotation as the expected output is updated.
fig.add_annotation(x=0, y=1.1, xref='paper', yref='paper', text=' Pre ', showarrow=False, font=dict(color='white'), bgcolor='blue')
fig.add_annotation(x=0.31, y=1.1, xref='paper', yref='paper', text=' During ', showarrow=False, font=dict(color='blue'), bgcolor='gray')
fig.add_annotation(x=0.94, y=1.1, xref='paper', yref='paper', text=' Post ', showarrow=False, font=dict(color='blue'), bgcolor='green')
I am trying to visualize different type of "purchases" over a quarterly period for selected customers. To generate this visual, I am using a catplot functionality in seaborn but am unable to add a horizontal line that connects each of the purchased fruits. Each line should start at the first dot for each fruit and end at the last dot for the same fruit. Any ideas on how to do this programmatically?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dta = pd.DataFrame(columns=["Date", "Fruit", "type"], data=[['2017-01-01','Orange',
'FP'], ['2017-04-01','Orange', 'CP'], ['2017-07-01','Orange', 'CP'],
['2017-10-08','Orange', 'CP'],['2017-01-01','Apple', 'NP'], ['2017-04-01','Apple', 'CP'],
['2017-07-01','Banana', 'NP'], ['2017-10-08','Orange', 'CP']
])
dta['quarter'] = pd.PeriodIndex(dta.Date, freq='Q')
sns.catplot(x="quarter", y="Fruit", hue="type", kind="swarm", data=dta)
plt.show()
This is the result:
.
How can I add individual horizontal lines that each connect the dots for purchases of orange and apple?
Each line should start at the first dot for each fruit and end at the last dot for the same fruit.
Use groupby.ngroup to map the quarters to xtick positions
Use groupby.agg to find each fruit's min and max xtick endpoints
Use ax.hlines to plot horizontal lines from each fruit's min to max
df = pd.DataFrame([['2017-01-01', 'Orange', 'FP'], ['2017-04-01', 'Orange', 'CP'], ['2017-07-01', 'Orange', 'CP'], ['2017-10-08', 'Orange', 'CP'], ['2017-01-01', 'Apple', 'NP'], ['2017-04-01', 'Apple', 'CP'], ['2017-07-01', 'Banana', 'NP'], ['2017-10-08', 'Orange', 'CP']], columns=['Date', 'Fruit', 'type'])
df['quarter'] = pd.PeriodIndex(df['Date'], freq='Q')
df = df.sort_values('quarter') # sort dataframe by quarter
df['xticks'] = df.groupby('quarter').ngroup() # map quarter to xtick position
ends = df.groupby('Fruit')['xticks'].agg(['min', 'max']) # find min and max xtick per fruit
g = sns.catplot(x='quarter', y='Fruit', hue='type', kind='swarm', s=8, data=df)
g.axes[0, 0].hlines(ends.index, ends['min'], ends['max']) # plot horizontal lines from each fruit's min to max
Detailed breakdown:
catplot plots the xticks in the order they appear in the dataframe. The sample dataframe is already sorted by quarter, but the real dataframe should be sorted explicitly:
df = df.sort_values('quarter')
Map the quarters to their xtick positions using groupby.ngroup:
df['xticks'] = df.groupby('quarter').ngroup()
# Date Fruit type quarter xticks
# 0 2017-01-01 Orange FP 2017Q1 0
# 1 2017-04-01 Orange CP 2017Q2 1
# 2 2017-07-01 Orange CP 2017Q3 2
# 3 2017-10-08 Orange CP 2017Q4 3
# 4 2017-01-01 Apple NP 2017Q1 0
# 5 2017-04-01 Apple CP 2017Q2 1
# 6 2017-07-01 Banana NP 2017Q3 2
# 7 2017-10-08 Orange CP 2017Q4 3
Find the min and max xticks to get the endpoints per Fruit using groupby.agg:
ends = df.groupby('Fruit')['xticks'].agg(['min', 'max'])
# min max
# Fruit
# Apple 0 1
# Banana 2 2
# Orange 0 3
Use ax.hlines to plot a horizontal line per Fruit from min-endpoint to max-endpoint:
g = sns.catplot(x='quarter', y='Fruit', hue='type', kind='swarm', s=8, data=df)
ax = g.axes[0, 0]
ax.hlines(ends.index, ends['min'], ends['max'])
You just need to enable the horizontal grid for the chart as follows:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
dta = pd.DataFrame(
columns=["Date", "Fruit", "type"],
data=[
["2017-01-01", "Orange", "FP"],
["2017-04-01", "Orange", "CP"],
["2017-07-01", "Orange", "CP"],
["2017-10-08", "Orange", "CP"],
["2017-01-01", "Apple", "NP"],
["2017-04-01", "Apple", "CP"],
["2017-07-01", "Banana", "NP"],
["2017-10-08", "Orange", "CP"],
],
)
dta["quarter"] = pd.PeriodIndex(dta.Date, freq="Q")
sns.catplot(x="quarter", y="Fruit", hue="type", kind="swarm", data=dta)
plt.grid(axis='y')
plt.show()
Preview
I would like to have a radar plot that is filled and have information on hover. I only get one of it working. Here is an example:
Let us assume we have unpivoted data:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'sample':['sample_1','sample_2','sample_3','sample_1','sample_2','sample_3','sample_1','sample_2','sample_3'],
'KPI':['KPI_1','KPI_1','KPI_1','KPI_2','KPI_2','KPI_2','KPI_3','KPI_3','KPI_3'],
'value':[1,2,1,1,1,2,2,1,1],
'sample_info':['info_1','info_1','info_1','info_2','info_2','info_2','info_3','info_3','info_3']})
df
sample KPI value sample_info
0 sample_1 KPI_1 1 info_1
1 sample_2 KPI_1 2 info_1
2 sample_3 KPI_1 1 info_1
3 sample_1 KPI_2 1 info_2
4 sample_2 KPI_2 1 info_2
5 sample_3 KPI_2 2 info_2
6 sample_1 KPI_3 2 info_3
7 sample_2 KPI_3 1 info_3
8 sample_3 KPI_3 1 info_3
I want to have a radar plot with the sample_info on hover, like this:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.show()
output
That works fine. Now I would like to fill the graph:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.update_traces(fill='toself')
fig.show()
Now, the hover information is somehow overwritten. I tried it with custom_data and a hovertemplate:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
custom_data = ['sample_info'])
fig.update_traces(fill='toself',hovertemplate="'sample_info: %{customdata[0]}'")
fig.show()
but without success. What am I missing? Thanks in advance!
You can use:
fig.for_each_trace(lambda t: t.update(hoveron='points'))
And get:
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'sample':['sample_1','sample_2','sample_3','sample_1','sample_2','sample_3','sample_1','sample_2','sample_3'],
'KPI':['KPI_1','KPI_1','KPI_1','KPI_2','KPI_2','KPI_2','KPI_3','KPI_3','KPI_3'],
'value':[1,2,1,1,1,2,2,1,1],
'sample_info':['info_1','info_1','info_1','info_2','info_2','info_2','info_3','info_3','info_3']})
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.update_traces(fill='toself')
fig.for_each_trace(lambda t: t.update(hoveron='points'))
fig.show()
How do I plot multiple plots for each of the groups (each ID) below with Seaborn? I would like to plot two plots, one underneath the other, one line (ID) per plot.
ID Date Cum Value Daily Value
3306 2019-06-01 100.0 100.0
3306 2019-07-01 200.0 100.0
3306 2019-08-01 350.0 150.0
4408 2019-06-01 200.0 200.0
4408 2019-07-01 375.0 175.0
4408 2019-08-01 400.0 025.0
This only plots both lines together and can look messy if there are 200 unique IDs.
sns.lineplot(x="Date", y="Daily Value",
hue="ID", data=df)
you can use
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'id': [3306, 3306, 3306, 4408, 4408, 4408],
'date': ['2019-06-01', '2019-07-01', '2019-08-01', '2019-06-01', '2019-07-01', '2019-08-01'],
'cum': [100, 200, 350, 200, 375, 400],
'daily': [100, 100, 150, 200, 175, 25]
})
g = sns.FacetGrid(df, col = 'id')
g.map(plt.plot, 'date', 'daily')
which gives
but what happens if you have 200 ids?
I am trying to show a barchart above a pie chart using matplotlib in SAME FIGURE. The code is as follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('stats.csv')
agg_df = df.groupby(['Area','Sex']).sum()
agg_df.reset_index(inplace=True)
piv_df = agg_df.pivot(index='Area', columns='Sex', values='Count')
plt.figure(1)
plt.subplot(211)
piv_df.plot.bar(stacked=True)
df = pd.read_csv('stats.csv', delimiter=',', encoding="utf-8-sig")
df=df.loc[df['"Year"']==2015]
agg_df = df.groupby(['Sex']).sum()
agg_df.reset_index(inplace=True)
plt.subplot(212)
plt.pie(agg_df["Count"],labels=agg_df["Sex"],autopct='%1.1f%%',startangle=90)
plt.show()
after execution, there are two problems.
The Bar chart is not being produced
The barchart is in figure 1 and Pie chart is in figure 2
If I execute the barchart code and pie chart code seperately,they just work fine.
Here is the sample dataframe:
Year Sex Area Count
2015 W Dhaka 6
2015 M Dhaka 3
2015 W Khulna 1
2015 M Khulna 8
2014 M Dhaka 13
2014 W Dhaka 20
2014 M Khulna 9
2014 W Khulna 6
2013 W Dhaka 11
2013 M Dhaka 2
2013 W Khulna 8
2013 M Khulna 5
2012 M Dhaka 12
2012 W Dhaka 4
2012 W Khulna 7
2012 M Khulna 1
and the barchart output is as follows:
what can possibly the problem here?seeking help from matploltlib experts.
You have to pass axes to pandas plotting function with ax parameter to let them know where to draw the pictures. (In the snippet below I use the code from the question but I removed the code that calculates dataframes we use to draw picture and replaced them with the actual resulting dataframes hardcoded. As this question is about figures, it is not important how we obtain these dataframes, and new version is easier to reproduce.)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
piv_df = pd.DataFrame([[3, 6], [8, 1]],
columns=pd.Series(['M', 'W'], name='Sex'),
index=pd.Series(['Dhaka', 'Khulna'], name='Area'))
fig = plt.figure()
ax1 = fig.add_subplot(211)
piv_df.plot.bar(stacked=True, ax=ax1)
agg_df = pd.DataFrame({'Count': {0: 11, 1: 7},
'Sex': {0: 'M', 1: 'W'},
'Year': {0: 4030, 1: 4030}})
ax2 = fig.add_subplot(212)
ax2.pie(agg_df["Count"], labels=agg_df["Sex"], autopct='%1.1f%%',
startangle=90)