Plotting stacked plot from grouped pandas data frame - python

I have a data frame which looks as given below.First, I wanted the count of each status in each date. For example number of 'COMPLETED' in 2017-11-02 is 2.I want a stack plot of such.
status start_time end_time \
0 COMPLETED 2017-11-01 19:58:54.726 2017-11-01 20:01:05.414
1 COMPLETED 2017-11-02 19:43:04.000 2017-11-02 19:47:54.877
2 ABANDONED_BY_USER 2017-11-03 23:36:19.059 2017-11-03 23:36:41.045
3 ABANDONED_BY_TIMEOUT 2017-10-31 17:02:38.689 2017-10-31 17:12:38.844
4 COMPLETED 2017-11-02 19:35:33.192 2017-11-02 19:42:51.074
Here is the csv for the dataframe:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
To achieve this:
df_['status'].astype('category')
df_ = df_.set_index('start_time')
grouped = df_.groupby('status')
color = {'COMPLETED':'green','ABANDONED_BY_TIMEOUT':'blue',"MISSED":'red',"ABANDONED_BY_USER":'yellow'}
for key_, group in grouped:
print(key_)
df_ = group.groupby(lambda x: x.date).count()
print(df_)
df_['status'].plot(label=key_,kind='bar',stacked=True,\
color=color[key_],rot=90)
plt.show()
The output of the following is :
ABANDONED_BY_TIMEOUT
status end_time
2017-10-31 1 1
ABANDONED_BY_USER
status end_time
2017-11-03 1 1
COMPLETED
status end_time
2017-11-01 1 1
2017-11-02 2 2
The problem here as we can see it is taking into account only last two dates '2017-11-01' and '2017-11-02' instead of all the dates in all the categories.
How can I solve this problem?I am welcome to a whole new approach for stacked plot.Thanks in advance.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')
grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count")).pivot(columns='status', index='date', values='count')
print(grouped)
sns.set()
grouped.plot(kind='bar', stacked=True)
# g = grouped.plot(x='date', kind='bar', stacked=True)
plt.show()
output:

Try restructuring df_ with pandas.crosstab instead:
color = ['blue', 'yellow', 'green', 'red']
df_xtab = pd.crosstab(df_.start_time.dt.date, df_.status)
This DataFrame will look like:
status ABANDONED_BY_TIMEOUT ABANDONED_BY_USER COMPLETED
start_time
2017-10-31 1 0 0
2017-11-01 0 0 1
2017-11-02 1 0 2
2017-11-03 0 1 0
and will be easier to plot.
df_xtab.plot(kind='bar',stacked=True, color=color, rot=90)

use seaborn library barplot with its hue
code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')
print(df_)
grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count"))
print(grouped)
g = sns.barplot(x='date', y='count', hue='status', data=grouped)
plt.show()
output:
data:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074

Related

How can I make a heatmap from a repetitive dataframe?

I've got a df that has three columns, one of them has a repetitive pattern, the df looks like this
>>> df
date hour value
0 01/01/2022 1 0.267648
1 01/01/2022 2 1.564420
2 01/01/2022 ... 0.702019
3 01/01/2022 24 1.504663
4 01/02/2022 1 0.309097
5 01/02/2022 2 0.309097
6 01/02/2022 ... 0.309097
7 01/02/2022 24 0.309097
>>>
I want to make a heatmap with this, the x-axis would be the month, the y axis the hour of the day and the value would be the median of all the values in that specific hour from everyday in the month.
import seaborn as sns
import matplotlib.pyplot as plt
df.date = pd.to_datetime(df.date)
df['month'] = df.date.dt.month
pivot = df.pivot_table(columns='month', index='hour', values='value', aggfunc='median')
sns.heatmap(pivot.sort_index(ascending=False))
plt.show()
Output:
Seaborn Heatmap

Hover data does not show up in plotly graph

I would like to have a radar plot that is filled and have information on hover. I only get one of it working. Here is an example:
Let us assume we have unpivoted data:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'sample':['sample_1','sample_2','sample_3','sample_1','sample_2','sample_3','sample_1','sample_2','sample_3'],
'KPI':['KPI_1','KPI_1','KPI_1','KPI_2','KPI_2','KPI_2','KPI_3','KPI_3','KPI_3'],
'value':[1,2,1,1,1,2,2,1,1],
'sample_info':['info_1','info_1','info_1','info_2','info_2','info_2','info_3','info_3','info_3']})
df
sample KPI value sample_info
0 sample_1 KPI_1 1 info_1
1 sample_2 KPI_1 2 info_1
2 sample_3 KPI_1 1 info_1
3 sample_1 KPI_2 1 info_2
4 sample_2 KPI_2 1 info_2
5 sample_3 KPI_2 2 info_2
6 sample_1 KPI_3 2 info_3
7 sample_2 KPI_3 1 info_3
8 sample_3 KPI_3 1 info_3
I want to have a radar plot with the sample_info on hover, like this:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.show()
output
That works fine. Now I would like to fill the graph:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.update_traces(fill='toself')
fig.show()
Now, the hover information is somehow overwritten. I tried it with custom_data and a hovertemplate:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
custom_data = ['sample_info'])
fig.update_traces(fill='toself',hovertemplate="'sample_info: %{customdata[0]}'")
fig.show()
but without success. What am I missing? Thanks in advance!
You can use:
fig.for_each_trace(lambda t: t.update(hoveron='points'))
And get:
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'sample':['sample_1','sample_2','sample_3','sample_1','sample_2','sample_3','sample_1','sample_2','sample_3'],
'KPI':['KPI_1','KPI_1','KPI_1','KPI_2','KPI_2','KPI_2','KPI_3','KPI_3','KPI_3'],
'value':[1,2,1,1,1,2,2,1,1],
'sample_info':['info_1','info_1','info_1','info_2','info_2','info_2','info_3','info_3','info_3']})
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.update_traces(fill='toself')
fig.for_each_trace(lambda t: t.update(hoveron='points'))
fig.show()

How to aggregate a metric and plot groups separately

I have this dataset:
df = pd.DataFrame()
df['year'] = [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011]
df['month'] = [1,2,3,4,5,6,1,2,3,4,5,6]
df['after'] = [0,0,0,1,1,1,0,0,0,1,1,1]
df['campaign'] = [0,0,0,0,0,0,1,1,1,1,1,1]
df['sales'] = [10000,11000,12000,10500,10000,9500,7000,8000,5000,6000,6000,7000]
df['date_m'] = pd.to_datetime(df.year.astype(str) + '-' + df.month.astype(str))
And I want to make a line plot grouped by month and campaign, so I have tried this code:
df['sales'].groupby(df['date_m','campaign']).mean().plot.line()
But I get this error message KeyError: ('date_m', 'campaign'). Please, any help will be greatly appreciated.
Plotting is typically dependant upon the shape of the DataFrame.
.groupby creates a long format DataFrame, which is great for seaborn
.pivot_table creates a wide format DataFrame, which easily works with pandas.DataFrame.plot
.groupby the DataFrame
df['sales'].groupby(...) is incorrect, because df['sales'] selects one column of the dataframe; none of the other columns are available
.groupby converts the DataFrame into a long format, which is great for plotting with seaborn.lineplot.
Specify the hue parameter to separate by 'campaign'.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# perform groupby and reset the index
dfg = df.groupby(['date_m','campaign'])['sales'].mean().reset_index()
# display(dfg.head())
date_m campaign sales
0 2011-01-01 0 10000
1 2011-01-01 1 7000
2 2011-02-01 0 11000
3 2011-02-01 1 8000
4 2011-03-01 0 12000
# plot with seaborn
sns.lineplot(data=dfg, x='date_m', y='sales', hue='campaign')
.pivot_table the DataFrame
.pivot_table shapes the DataFrame correctly for plotting with pandas.DataFrame.plot, and it has an aggregation parameter.
The DataFrame is shaped into a wide format.
# pivot the dataframe into the correct shape for plotting
dfp = df.pivot_table(index='date_m', columns='campaign', values='sales', aggfunc='mean')
# display(dfp.head())
campaign 0 1
date_m
2011-01-01 10000 7000
2011-02-01 11000 8000
2011-03-01 12000 5000
2011-04-01 10500 6000
2011-05-01 10000 6000
# plot the dataframe
dfp.plot()
Plotting with matplotlib directly
fig, ax = plt.subplots(figsize=(8, 6))
for v in df.campaign.unique():
# select the data based on the campaign
data = df[df.campaign.eq(v)]
# this is only necessary if there is more than one value per date
data = data.groupby(['date_m','campaign'])['sales'].mean().reset_index()
ax.plot('date_m', 'sales', data=data, label=f'{v}')
plt.legend(title='campaign')
plt.show()
Notes
Package versions:
pandas v1.2.4
seaborn v0.11.1
matplotlib v3.3.4

Plot Price as Horizontal Line for Non Zero Volume Values

My Code:
import matplotlib.pyplot as plt
plt.style.use('seaborn-ticks')
import pandas as pd
import numpy as np
path = 'C:\\File\\Data.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '08/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/1000) < 60, 0, (df.volume/1000))
df.plot('Time','Price')
dff = df[df.Volume > 60].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
plt.subplots_adjust(left=0.05, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=None)
plt.show()
My Plot Output is as below:
The Output of dff Datframe as below:
Date Time Price Volume
0 08/02/2019 13:39:43 685.35 97.0
1 08/02/2019 13:39:57 688.80 68.0
2 08/02/2019 13:43:50 683.00 68.0
3 08/02/2019 13:43:51 681.65 92.0
4 08/02/2019 13:49:42 689.95 70.0
5 08/02/2019 13:52:00 695.20 64.0
6 08/02/2019 14:56:42 686.25 68.0
7 08/02/2019 15:03:15 685.35 63.0
8 08/02/2019 15:03:31 683.15 69.0
9 08/02/2019 15:08:08 684.00 61.0
I want to plot the Prices of this table as Vertical Lines as per the below image. Any Help..
Based on your image, I think you mean horizontal lines. Either way it's pretty simple, Pyplot has hlines/vlines builtins. In your case, try something like
plt.hlines(dff['Price'], '08/02/2019', '09/02/2019')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
path = 'File.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '05/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/7500) < 39, 0, (df.volume/7500))
df["Time"] = pd.to_datetime(df['Time'])
df.plot(x="Time",y='Price', rot=0)
plt.title("Date: " + str(df['Date'].iloc[0]))
dff = df[df.Volume > 39].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
dict = dff.to_dict('index')
for x in range(0, len(dict)):
plt.axhline(y=dict[x]['Price'],linewidth=2, color='blue')
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.95, top=0.96, wspace=None, hspace=None)
plt.show()

Pandas Seaborn Heatmap Error

I have a DataFrame that looks like this when unstacked.
Start Date 2016-07-11 2016-07-12 2016-07-13
Period
0 1.000000 1.000000 1.0
1 0.684211 0.738095 NaN
2 0.592105 NaN NaN
I'm trying to plot it in Seaborn as a heatmap but it's giving me unintended results.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.array(data), columns=['Start Date', 'Period', 'Users'])
df = df.fillna(0)
df = df.set_index(['Start Date', 'Period'])
sizes = df['Users'].groupby(level=0).first()
df = df['Users'].unstack(0).divide(sizes, axis=1)
plt.title("Test")
sns.heatmap(df.T, mask=df.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.savefig(table._v_name + "fig.png")
I want it so that text doesn't overlap and there aren't 6 heat legends on the side. Also if possible, how do I fix the date so that it only displays %Y-%m-%d?
While exact reproducible data is not available, consider below using posted snippet data. This example runs a pivot_table() to achieve the structure as posted with StartDates across columns. Overall, your heatmap possibly outputs the multiple color bars and overlapping figures due to the unstack() processing where you seem to be dividing by users (look into seaborn.FacetGrid to split). So below runs the df as is through heatmap. Also, an apply() re-formats datetime to specified need.
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data = '''Period,StartDate,Value
0,2016-07-11,1.000000
0,2016-07-12,1.000000
0,2016-07-13,1.0
1,2016-07-11,0.684211
1,2016-07-12,0.738095
1,2016-07-13
2,2016-07-11,0.592105
2,2016-07-12
2,2016-07-13'''
df = pd.read_csv(StringIO(data))
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['StartDate'] = df['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
pvtdf = df.pivot_table(values='Value', index=['Period'],
columns='StartDate', aggfunc=sum)
print(pvtdf)
# StartDate 2016-07-11 2016-07-12 2016-07-13
# Period
# 0 1.000000 1.000000 1.0
# 1 0.684211 0.738095 NaN
# 2 0.592105 NaN NaN
sns.set()
plt.title("Test")
ax = sns.heatmap(pvtdf.T, mask=pvtdf.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.show()

Categories

Resources