Hover data does not show up in plotly graph - python

I would like to have a radar plot that is filled and have information on hover. I only get one of it working. Here is an example:
Let us assume we have unpivoted data:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'sample':['sample_1','sample_2','sample_3','sample_1','sample_2','sample_3','sample_1','sample_2','sample_3'],
'KPI':['KPI_1','KPI_1','KPI_1','KPI_2','KPI_2','KPI_2','KPI_3','KPI_3','KPI_3'],
'value':[1,2,1,1,1,2,2,1,1],
'sample_info':['info_1','info_1','info_1','info_2','info_2','info_2','info_3','info_3','info_3']})
df
sample KPI value sample_info
0 sample_1 KPI_1 1 info_1
1 sample_2 KPI_1 2 info_1
2 sample_3 KPI_1 1 info_1
3 sample_1 KPI_2 1 info_2
4 sample_2 KPI_2 1 info_2
5 sample_3 KPI_2 2 info_2
6 sample_1 KPI_3 2 info_3
7 sample_2 KPI_3 1 info_3
8 sample_3 KPI_3 1 info_3
I want to have a radar plot with the sample_info on hover, like this:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.show()
output
That works fine. Now I would like to fill the graph:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.update_traces(fill='toself')
fig.show()
Now, the hover information is somehow overwritten. I tried it with custom_data and a hovertemplate:
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
custom_data = ['sample_info'])
fig.update_traces(fill='toself',hovertemplate="'sample_info: %{customdata[0]}'")
fig.show()
but without success. What am I missing? Thanks in advance!

You can use:
fig.for_each_trace(lambda t: t.update(hoveron='points'))
And get:
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'sample':['sample_1','sample_2','sample_3','sample_1','sample_2','sample_3','sample_1','sample_2','sample_3'],
'KPI':['KPI_1','KPI_1','KPI_1','KPI_2','KPI_2','KPI_2','KPI_3','KPI_3','KPI_3'],
'value':[1,2,1,1,1,2,2,1,1],
'sample_info':['info_1','info_1','info_1','info_2','info_2','info_2','info_3','info_3','info_3']})
fig = px.line_polar(df, r='value', theta='KPI', color='sample',line_close = True,
hover_data = ['sample_info'])
fig.update_traces(fill='toself')
fig.for_each_trace(lambda t: t.update(hoveron='points'))
fig.show()

Related

Python Plotly Stacked Bar Chart with multiple values (indicators)

I have the following sample data:
Date
Job
Active
Completed
2022-06-01
Job1
3
2
2022-06-01
Job2
5
1
2022-06-02
Job1
4
3
2022-06-02
Job2
6
4
2022-06-03
Job1
5
5
2022-06-03
Job2
3
1
I want to get the next result:
I am trying with the following code:
fig=go.Figure()
colors = ['#0b215c','#4d256c','#812571','#af286d','#d6385f','#f1564a','#ff7c30','#ffa600']
group = df['JOB'].unique()
for t,c in zip(group,colors):
dfp = df[df['JOB']==t]
fig.add_traces(go.Bar(x=dfp['DATE'], y = dfp['ACTIVE'], name=t, marker_color=c))
for t,c in zip(group,colors):
dfp = df[df['JOB']==t]
fig.add_traces(go.Bar(x=dfp['DATE'], y = dfp['COMPLETED'], name=t, marker_color=c))
fig.update_layout(
barmode='stack')
But as the result I got only 3 bars where Active and Completed are summed (not separate bars for Active and Completed values).
This graph will be a stacked graph with two categories, so we need to devise a way to draw two categories on the x-axis. Since it is not possible to offset the time series as it is, we will create an arbitrary x-axis and add an offset value to the value we set for the other x-axis. The final touch is to set a date string for the index of the new x-axis.
import plotly.graph_objects as go
import numpy as np
fig=go.Figure()
colors = ['#0b215c','#4d256c','#812571','#af286d','#d6385f','#f1564a','#ff7c30','#ffa600']
group = df['Job'].unique()
for t,c in zip(group,colors):
dfp = df[df['Job']==t]
fig.add_traces(go.Bar(x=[1,2,3], y = dfp['Active'], name=t, marker_color=c, showlegend=True))
fig.update_layout(barmode='stack')
for t,c in zip(group,colors):
dfp = df[df['Job']==t]
fig.add_traces(go.Bar(x=[1.5,2.5,3.5], y = dfp['Completed'], name=t, marker_color=c, showlegend=False))
fig.update_layout(barmode='stack')
fig.update_layout(bargap=0.1)
fig.update_xaxes(
tickvals=np.arange(1,4.0,0.5),
ticktext=['2022-06-01','2022-06-01','2022-06-02','2022-06-02','2022-06-03','2022-06-03'])
fig.show()

visualization with pandas in python

I have the following problem
I want to plot following table:
I want to compare the new_cases from germany and france per week how can i visualise this?
I already tried multiple plots but I'm not happy with the results:
for example:
pivot_df['France'].plot(kind='bar')
plt.figure(figsize=(15,5))
pivot_df['France'].plot(kind='bar')
plt.figure(figsize=(15,5))`
But it only shows me france
I think you're trying to get a timeseries plot. For that you'll need to convert year_week to a datetime object. Subsequently you can groupby the country, and unstack and plot the timeseries:
import datetime
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://opendata.ecdc.europa.eu/covid19/testing/csv/data.csv')
df = df[df['country'].isin(['France', 'Germany'])]
df = df[df['level'] == 'national'].reset_index()
df['datetime'] = df['year_week'].apply(lambda x: datetime.datetime.strptime(x + '-1', '%G-W%V-%u')) #https://stackoverflow.com/a/54033252/11380795
df.set_index('datetime', inplace=True)
grouped_df = df.groupby('country').resample('1W').sum()['new_cases']
ax = grouped_df.unstack().T.plot(figsize=(10,5))
ax.ticklabel_format(style='plain', axis='y')
result:
Here you go:
sample df:
Country week new_cases
0 FRANCE 9 210
1 GERMANY 9 300
2 FRANCE 10 410
3 GERMANY 10 200
4 FRANCE 11 910
5 GERMANY 9 500
Code:
df.groupby(['week','Country'])['new_cases'].sum().unstack().plot.bar()
plt.ylabel('New cases')
Output:

Using a scatter plot to plot multiple columns from a data set

import plotly.offline as pyo
import plotly.express as px
import matplotlib.pyplot as pls
pyo.init_notebook_mode()
data = pd.read_csv(r'C:.......Coronovirus Datasets\time_series_covid19_deaths_global.csv')
countries = ['US']
filtered_data = data[data['Country/Region'].isin(countries)]
wanted_values = filtered_data[['Country/Region','1/22/2020','1/23/2020','1/24/2020', '1/25/2020','1/26/2020','1/27/2020','1/28/2020','1/28/2020','1/29/2020',
'1/30/2020','1/31/2020','2/1/2020','2/2/2020','2/3/2020','2/4/2020','2/5/2020','2/6/2020','2/7/2020','2/8/2020','2/9/2020','2/10/2020',
'2/11/2020','2/12/2020','2/13/2020','2/14/2020','2/15/2020','2/16/2020','2/17/2020','2/18/2020','2/19/2020','2/20/2020','2/21/2020','2/22/2020','2/23/2020',
'2/24/2020','2/25/2020','2/26/2020','2/27/2020','2/28/2020','2/29/2020','3/1/2020','3/2/2020','3/3/2020','3/4/2020','3/5/2020','3/6/2020','3/7/2020',
'3/8/2020','3/9/2020','3/10/2020','3/11/2020','3/12/2020','3/13/2020','3/14/2020','3/15/2020','3/16/2020','3/17/2020','3/18/2020','3/19/2020',
'3/20/2020','3/21/2020','4/1/2020','4/2/2020','4/3/2020','4/4/2020','4/5/2020','4/6/2020','4/7/2020','4/8/2020','4/9/2020','4/10/2020',
'4/11/2020','4/12/2020','4/13/2020','4/14/2020','4/15/2020','4/16/2020','4/17/2020','4/18/2020','4/19/2020','4/20/2020','4/21/2020','4/22/2020','4/23/2020',
'4/24/2020','4/25/2020','4/26/2020','4/27/2020','4/28/2020','4/29/2020','5/1/2020','5/2/2020','5/3/2020','5/4/2020','5/5/2020','5/6/2020','5/7/2020','5/8/2020','5/9/2020']]
fig = px.scatter(wanted_values, x ='Country/Region', y = 'dates' , title = 'Number of Deaths Per Day')
fig.show()
#wanted_values.plot(x="5/9/2020, 5/8/2020", y = 'filtered_data' kind = 'bar')
#pls.show()
How can I plot all the dates with their corresponding deaths as a scatter plot? I plan to use linear regression to predict the amount of deaths since January first. I have been having a lot of trouble with plotting these values as I am really new to Python.
The data set can be found here: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases
This is how your data looks like:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("time_series_covid19_deaths_global.csv")
data.iloc[:2,:7]
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0
1 NaN Albania 41.1533 20.1683 0 0 0
First of all, subset it by giving it the start and end of dates (that match the column names) and melting it to give long format:
data = data[data['Country/Region']=='US']
data = data.loc[:,'1/22/20':'5/9/20'].melt(var_name="date")
data['date'] = pd.to_datetime(data['date'])
Looks like this now:
date value
0 2020-01-22 0
1 2020-01-23 0
2 2020-01-24 0
Plotting is simply:
data.plot.scatter(x="date",y="value",rot=45)

Error manipulating datetime x-axis in matplotlib python

Data df is in this format:
Id Timestamp Data Group
0 1 2013-08-12 10:29:19.673 40.0 1
1 2 2013-08-13 10:29:20.687 50.0 2
2 3 2013-09-14 10:29:20.687 40.0 3
3 4 2013-10-14 10:29:20.687 30.0 4
4 5 2013-11-15 10:29:20.687 50.0 5
...
I plotted the graph to observe how Data varies over time with code:
%matplotlib notebook
%matplotlib inline
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df= df[(df['Timestamp'] > '2013-12-05 ') & (df['Timestamp'] <= '2013-12-30 ')]
df1 = df[df['Group'] ==1]
df1.plot(x = 'Timestamp', y = 'Data',figsize=(20, 10))
The graph looks fine:
But when I was trying to narrow down the time interval to 2013-12-05 ~2013-12-11(from 2013-12-05 ~2013-12-30), with code:
%matplotlib notebook
%matplotlib inline
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df= df[(df['Timestamp'] > '2013-12-05 ') & (df['Timestamp'] <= '2013-12-11')]
df1 = df[df['Group'] ==1]
df1.plot(x = 'Timestamp', y = 'Data',figsize=(20, 10))
the graph looks off as we'd expected the new graph to capture the first half of the old graph given that new time interval overlaps with the old graph in the first half of the total duration. But the graph looks like this:
The x-axis marks also no longer makes sense. What could go wrong? Any help is appreciated. Thx

Plotting stacked plot from grouped pandas data frame

I have a data frame which looks as given below.First, I wanted the count of each status in each date. For example number of 'COMPLETED' in 2017-11-02 is 2.I want a stack plot of such.
status start_time end_time \
0 COMPLETED 2017-11-01 19:58:54.726 2017-11-01 20:01:05.414
1 COMPLETED 2017-11-02 19:43:04.000 2017-11-02 19:47:54.877
2 ABANDONED_BY_USER 2017-11-03 23:36:19.059 2017-11-03 23:36:41.045
3 ABANDONED_BY_TIMEOUT 2017-10-31 17:02:38.689 2017-10-31 17:12:38.844
4 COMPLETED 2017-11-02 19:35:33.192 2017-11-02 19:42:51.074
Here is the csv for the dataframe:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
To achieve this:
df_['status'].astype('category')
df_ = df_.set_index('start_time')
grouped = df_.groupby('status')
color = {'COMPLETED':'green','ABANDONED_BY_TIMEOUT':'blue',"MISSED":'red',"ABANDONED_BY_USER":'yellow'}
for key_, group in grouped:
print(key_)
df_ = group.groupby(lambda x: x.date).count()
print(df_)
df_['status'].plot(label=key_,kind='bar',stacked=True,\
color=color[key_],rot=90)
plt.show()
The output of the following is :
ABANDONED_BY_TIMEOUT
status end_time
2017-10-31 1 1
ABANDONED_BY_USER
status end_time
2017-11-03 1 1
COMPLETED
status end_time
2017-11-01 1 1
2017-11-02 2 2
The problem here as we can see it is taking into account only last two dates '2017-11-01' and '2017-11-02' instead of all the dates in all the categories.
How can I solve this problem?I am welcome to a whole new approach for stacked plot.Thanks in advance.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')
grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count")).pivot(columns='status', index='date', values='count')
print(grouped)
sns.set()
grouped.plot(kind='bar', stacked=True)
# g = grouped.plot(x='date', kind='bar', stacked=True)
plt.show()
output:
Try restructuring df_ with pandas.crosstab instead:
color = ['blue', 'yellow', 'green', 'red']
df_xtab = pd.crosstab(df_.start_time.dt.date, df_.status)
This DataFrame will look like:
status ABANDONED_BY_TIMEOUT ABANDONED_BY_USER COMPLETED
start_time
2017-10-31 1 0 0
2017-11-01 0 0 1
2017-11-02 1 0 2
2017-11-03 0 1 0
and will be easier to plot.
df_xtab.plot(kind='bar',stacked=True, color=color, rot=90)
use seaborn library barplot with its hue
code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')
print(df_)
grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count"))
print(grouped)
g = sns.barplot(x='date', y='count', hue='status', data=grouped)
plt.show()
output:
data:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074

Categories

Resources