How do I display a grouped graph using a CSV file - python

import pandas as pd
import plotly
import plotly.express as px
import plotly.io as pio
df = pd.read_csv("final_spreadsheet.csv")
barchart = px.bar(
data_frame = df,
x = "Post-Lockdown Period (May - September)",
y = "Post-Lockdown Period (May - September)",
color = "Peak-Lockdown Period (March-May)",
opacity = 0.9,
orientation ="v",
barmode = 'relative',
)
pio.show(barchart)
I want the x axis to be the different behavioral variables and for each behavioral variable I want there to be two bars one for peak pandemic and one for post pandemic. I also want the y axis to just be the values of each
This is my current attempt but no graphs appear. Attached is also a picture of the CSV file in excel form

In plotly.express you can create a grouped bar chart by passing a list of the two variables you want to group together in the argument y. In your case, you'll want to pass the argument y = ['Peak-Lockdown Period (March-May)','Post-Lockdown Period (May-September)'] as well as the argument barmode = 'grouped' to px.bar. I created a sample DataFrame to illustrate:
import pandas as pd
import plotly.express as px
import plotly.io as pio
# df = pd.read_csv("final_spreadsheet.csv")
## create example DataFrame similar to yours
df = pd.DataFrame({
'Behavioral': list('ABCD'),
'Peak-Lockdown Period (March-May)': [76.7,26.12,0,2.94],
'Post-Lockdown Period (May-September)': [77.32,26.38,0,3.36]
})
barchart = px.bar(
data_frame = df,
x = 'Behavioral',
y = ['Peak-Lockdown Period (March-May)','Post-Lockdown Period (May-September)'],
# color = "Peak-Lockdown Period (March-May)",
opacity = 0.9,
orientation ="v",
barmode = 'group',
)
pio.show(barchart)
EDIT: you can accomplish the same thing using plotly.graph_objects:
import plotly.graph_objects as go
fig = go.Figure(data=[
go.Bar(name='Peak-Lockdown Period (March-May)', x=df['Behavioral'].values, y=df['Peak-Lockdown Period (March-May)'].values),
go.Bar(name='Post-Lockdown Period (May-September)', x=df['Behavioral'].values, y=df['Post-Lockdown Period (May-September)'].values),
])

Related

Separate heatmap ranges for each row in Plotly

I'm trying to build a timeseries heatmap along a 24-hour day on each day of the week, and I want to have each day be subject within its own values only. Here's what I've done in Plotly so far.
The problem is the "highest" color only goes to the one on the 2nd row. My desired output, made in Excel, is this one:
Each row clearly shows its own green color since they each of them have separate conditional formatting.
My code:
import plotly.express as px
import pandas as pd
df = pd.read_csv('test0.csv', header=None)
fig = px.imshow(df, color_continuous_scale=['red', 'green'])
fig.update_coloraxes(showscale=False)
fig.show()
The csv file:
0,0,1,2,0,5,2,3,3,5,8,4,7,9,9,0,4,5,2,0,7,6,5,7
1,3,4,9,4,3,3,2,12,15,6,9,1,4,3,1,1,2,5,3,4,2,5,8
9,6,7,1,3,4,5,6,9,8,7,8,6,6,5,4,5,3,3,6,4,8,9,10
8,7,8,6,7,5,4,6,6,7,8,5,5,6,5,7,5,6,7,5,8,6,4,4
3,4,2,1,1,2,2,1,2,1,1,1,1,3,4,4,2,2,1,1,1,2,4,3
3,5,4,4,4,6,5,5,5,4,3,7,7,8,7,6,7,6,6,3,4,3,3,3
5,4,4,5,4,3,1,1,1,1,2,2,3,2,1,1,4,3,4,5,4,4,3,4
I've solved it! I had to make the heatmaps by row and combine them.
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd
import calendar
df = pd.read_csv('test0.csv', header=None)
# initialize subplots with vertical_spacing as 0 so the rows are right next to each other
fig = make_subplots(rows=7, cols=1, vertical_spacing=0)
# shift sunday to first position
days = list(calendar.day_name)
days = days[-1:] + days[:-1]
for index, row in df.iterrows():
row_list = row.tolist()
sub_fig = go.Heatmap(
x=list(range(0, 24)), # hours
y=[days[index]], # days of the week
z=[row_list], # data
colorscale=[
[0, '#FF0000'],
[1, '#00FF00']
],
showscale=False
)
# insert heatmap to subplot
fig.append_trace(sub_fig, index + 1, 1)
fig.show()
Output:

How can I discretize a continuous time-series dataset to build predictive and regression models?

I have a dataset with 3 columns, Index(date_time), Label, Value. Label can be one of 6 different sensors in a pharmaceutical reaction vessel. When I chart the data using plotly I get a continuous series over time with sections that look like this. Plotly chart of my data
T0 denotes the beginning of a chemical reaction. As I have multiple occurrences of this chemical reaction I want to create discrete "batches" from T0 to T0 + 4 hours. This will then be used to analyze the variance in all of the batches. Sometimes teh chemical reaction does not complete after several hours, s my task is to figure out why.
I also have an external dataset that has labels "good" or "bad" so I was also hoping to change the data format to wide and have target column for each batch.
This is all my code until now, using peakutils to try estimate the peaks for T0
import glob
import pandas as pd
import peakutils
import matplotlib.pyplot as plt
import peakutils
from peakutils.plot import plot as pplot
from matplotlib import pyplot
%matplotlib inline
# Get CSV files list from a folder
path = [path]
csv_files = glob.glob(path + "/*.csv")
# Read each CSV file into DataFrame
# This creates a list of dataframes
df_list = (pd.read_csv(file) for file in csv_files)
# Concatenate all DataFrames
big_df = pd.concat(df_list, ignore_index=True)
data = big_df
data[' Date'] = pd.to_datetime(data[' Date'], dayfirst='True')
data = data.sort_values(by = ' Date')
import plotly.express as px
import plotly.io as pio
pio.renderers.default='browser'
fig = px.line(data, x = ' Date', y = ' Value', color = ' Pen Name')
fig.show()
x = wt_df[' Date']
y = wt_df[' Value']
indexes = peakutils.indexes(y, thres=0.5, min_dist=300)
print(indexes)
print(x[indexes], y[indexes])
pyplot.figure(figsize=(10,6))
pplot(x, y, indexes)
pyplot.title('First estimate')
T
his is the peak utils output
enter image description here

Multiple consecutive bar plots with a time slider in Plotly, Python

I have a Pandas dataframe representing portfolio weights in multiple dates, such as the following contents in CSV format:
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
For each row in the Pandas dataframe resulting from this CSV, we can generate a bar chart with the portfolio composition at that day. I would like to have multiple bar charts, with a time slider, such that we can choose one of the dates and see the portfolio composition during that day.
Can this be achieved with Plotly?
I could not find a way to do it straight in the dataframe above, but it is possible to do it by "melting" the dataframe. The following code achieves what I was looking for, together with some beautification of the chart:
import pandas as pd
from io import StringIO
import plotly.express as px
string = """
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
"""
df = pd.read_csv(StringIO(string))
df = df.melt(id_vars=['DATE']).sort_values(by = 'DATE')
fig = px.bar(df, x="variable", y="value", animation_frame="DATE")
fig.update_layout(legend_title_text = None)
fig.update_xaxes(title = "Asset")
fig.update_yaxes(title = "Proportion")
fig.update_layout(autosize = True, height = 600)
fig.update_layout(hovermode="x")
fig.update_layout(plot_bgcolor="#F8F8F8")
fig.update_traces(
hovertemplate=
'<i></i> %{y:.2%}'
)
fig.show()
This produces the following:

Plotly: How to plot on secondary y-Axis with plotly express

How do I utilize plotly.express to plot multiple lines on two yaxis out of one Pandas dataframe?
I find this very useful to plot all columns containing a specific substring:
fig = px.line(df, y=df.filter(regex="Linear").columns, render_mode="webgl")
as I don't want to loop over all my filtered columns and use something like:
fig.add_trace(go.Scattergl(x=df["Time"], y=df["Linear-"]))
in each iteration.
It took me some time to fiddle this out, but I feel this could be useful to some people.
# import some stuff
import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
# create some data
df = pd.DataFrame()
n = 50
df["Time"] = np.arange(n)
df["Linear-"] = np.arange(n)+np.random.rand(n)
df["Linear+"] = np.arange(n)+np.random.rand(n)
df["Log-"] = np.arange(n)+np.random.rand(n)
df["Log+"] = np.arange(n)+np.random.rand(n)
df.set_index("Time", inplace=True)
subfig = make_subplots(specs=[[{"secondary_y": True}]])
# create two independent figures with px.line each containing data from multiple columns
fig = px.line(df, y=df.filter(regex="Linear").columns, render_mode="webgl",)
fig2 = px.line(df, y=df.filter(regex="Log").columns, render_mode="webgl",)
fig2.update_traces(yaxis="y2")
subfig.add_traces(fig.data + fig2.data)
subfig.layout.xaxis.title="Time"
subfig.layout.yaxis.title="Linear Y"
subfig.layout.yaxis2.type="log"
subfig.layout.yaxis2.title="Log Y"
# recoloring is necessary otherwise lines from fig und fig2 would share each color
# e.g. Linear-, Log- = blue; Linear+, Log+ = red... we don't want this
subfig.for_each_trace(lambda t: t.update(line=dict(color=t.marker.color)))
subfig.show()
The trick with
subfig.for_each_trace(lambda t: t.update(line=dict(color=t.marker.color)))
I got from nicolaskruchten here: https://stackoverflow.com/a/60031260
Thank you derflo and vestland! I really wanted to use Plotly Express as opposed to Graph Objects with dual axis to more easily handle DataFrames with lots of columns. I dropped this into a function. Data1/2 works well as a DataFrame or Series.
import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd
def plotly_dual_axis(data1,data2, title="", y1="", y2=""):
# Create subplot with secondary axis
subplot_fig = make_subplots(specs=[[{"secondary_y": True}]])
#Put Dataframe in fig1 and fig2
fig1 = px.line(data1)
fig2 = px.line(data2)
#Change the axis for fig2
fig2.update_traces(yaxis="y2")
#Add the figs to the subplot figure
subplot_fig.add_traces(fig1.data + fig2.data)
#FORMAT subplot figure
subplot_fig.update_layout(title=title, yaxis=dict(title=y1), yaxis2=dict(title=y2))
#RECOLOR so as not to have overlapping colors
subplot_fig.for_each_trace(lambda t: t.update(line=dict(color=t.marker.color)))
return subplot_fig

Plotly graph_objects add df column to hovertemplate

I am trying to generally recreate this graph and struggling with adding a column to the hovertemplate of a plotly Scatter. Here is a working example:
import pandas as pd
import chart_studio.plotly as py
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{percent}',
))
fig.show()
I'd like to get the column Cast-Fatality to show under {percent}
I've also tried putting in the Scatter() call a line for text = [df['Case-Fatality']], and switching {percent} to {text} as shown in this example, but this doesn't pull from the dataframe as hoped.
I've tried replotting it as a px, following this example but it throws the error dictionary changed size during iteration and I think using go may be simpler than px but I'm new to plotly.
Thanks in advance for any insight for how to add a column to the hover.
As the question asks for a solution with graph_objects, here are two that work-
Method (i)
Adding %{text} where you want the variable value to be and passing another variable called text that is a list of values needed in the go.Scatter() call. Like this-
percent = df['Case-Fatality']
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',text = percent
Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',
text = percent))
fig.show()
Method (ii)
This solution requires you to see the hoverlabel as when you pass x unified to hovermode. All you need to do then is pass an invisible trace with the same x-axis and the desired y-axis values. Passing mode='none' makes it invisible. Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0)))
fig.add_scatter(x=df.Confirmed, y=percent, mode='none')
fig.update_layout(hovermode='x unified')
fig.show()
The link you shared is broken. Are you looking for something like this?
import pandas as pd
import plotly.express as px
px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data={"Case-Fatality":True})
Then if you need to use bold or change your hover_template you can follow the last step in this answer
Drawing inspiration from another SO question/answer, I find that this is working as desired and permits adding multiple cols to the hover data:
import pandas as pd
import plotly.express as px
fig = px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data=[df['Case-Fatality'], df['Deaths/100K pop.']])
fig.show()

Categories

Resources