How to disable trendline in plotly.express.line? - python

I am willing to plot 3 timeseries on the same chart. Datasource is a pandas.DataFrame() object, the type of Timestamp being datetime.date, and the 3 different time series drawn from the same column Value using the color argument of plotly.express.line().
The 3 lines show on the chart, but each one is accompanied by some sort of trendline. I can't see in the function signature how to disable those trendlines. Can you please help?
I have made several attempts, e.g. using another color, but the trendlines just stay there.
Please find below the code snippet and the resulting chart.
import plotly.io as pio
import plotly.express as px
pio.renderers = 'jupyterlab'
fig = px.line(data_frame=df, x='Timestamp', y='Value', color='Position_Type')
fig.show()
(If relevant, I am using jupyterlab)
Timestamp on the screen appears like this (this are [regular] weekly timeseries) :
And, as per the type:
type(df.Timestamp[0])
> datetime.date
I am adding that it looks like the lines that I first thought were trendlines would rather be straight lines from the first datapoint to the last datapoint of each time series.

df_melt = df_melt.sort_values('datetime_id')
Sorting got rid of those "wrap-arounds". Thanks for the suggestions above. Using Plotly 4.8.2.

Introduction:
Your provided data sample is an image, and not very easy to work with, so I'm going to use some sampled random time series to offer a suggestion. The variables in your datasample don't match the ones you've used in px.Scatter either by the way.
I'm on plotly version '4.2.0' and unable to reproduce your issue. Hopefully you'll find this suggestion useful anyway.
Using data structured like this...
Timestamp Position_type value
145 2020-02-15 value3 86.418593
146 2020-02-16 value3 78.285128
147 2020-02-17 value3 79.665202
148 2020-02-18 value3 84.502445
149 2020-02-19 value3 91.287312
...I'm able to produce this plot...
...using this code:
# imports
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import pandas as pd
import numpy as np
# data
np.random.seed(123)
frame_rows = 50
n_plots = 2
frame_columns = ['V_'+str(e) for e in list(range(n_plots+1))]
df = pd.DataFrame(np.random.uniform(-10,10,size=(frame_rows, len(frame_columns))),
index=pd.date_range('1/1/2020', periods=frame_rows),
columns=frame_columns)
df=df.cumsum()+100
df.iloc[0]=100
df.reset_index(inplace=True)
df.columns=['Timestamp','value1', 'value2', 'value3' ]
varNames=df.columns[1:]
# melt dataframe with timeseries from wide to long format.
# YOUR dataset seems to be organized in a long format since
# you're able to set color using a variable name
df_long = pd.melt(df, id_vars=['Timestamp'], value_vars=varNames, var_name='Position_type', value_name='value')
#df_long.tail()
# plotly time
import plotly.io as pio
import plotly.express as px
#pio.renderers = 'jupyterlab'
fig = px.scatter(data_frame=df_long, x='Timestamp', y='value', color='Position_type')
#fig = px.line(data_frame=df_long, x='Timestamp', y='value', color='Position_type')
fig.show()
If you change...
px.scatter(data_frame=df_long, x='Timestamp', y='value', color='Position_type')
...to...
fig = px.line(data_frame=df_long, x='Timestamp', y='value', color='Position_type')
...you'll get this plot instead:
No trendlines as far as the eye can see.
Edit - I think I know what's going on...
Having taken a closer look at your figure, I've realized that those lines are not trendlines. A trendline doesn't normally start at the initial value of a series and end up at the last value of the series. And that's what happening here for all three series. So I think you've got some bad or duplicate timestamps somewhere.

Related

How can I generate a line animation for the stocks data using plotly?

Hey I was trying to make an line animation for the stocks data built in plotly.
So I tried this code below following
https://plotly.com/python/animations/
but nothing shows.
import plotly.express as px
df = px.data.stocks()
fig = px.line(df, x = 'date', y=df.columns[1:6], animation_frame = 'date')
fig.show()
what I intended to do was to make a line animation of the 6 company's stock price
with respect to the date. I'm totally new to plotly so this maybe a dumb question but I'd be grateful if you guys could help. Thank You!
you need some consistency across the animation frames for the xaxis and yaxis
to achieve this I modified to use day of month as xaxis and ensured range is appropriate for all frames in yaxis
then used month/year as the animation (a line only makes sense if there is more that one value to plot) so there are a collection of values in each frame
import plotly.express as px
import pandas as pd
df = px.data.stocks()
df["date"] = pd.to_datetime(df["date"])
fig = px.line(df, x = df["date"].dt.day, y=df.columns[1:6], animation_frame=df["date"].dt.strftime("%b-%Y"))
fig.update_layout(yaxis={"range":[0,df.iloc[:,1:6].max().max()]})

Plotly graph_objects add df column to hovertemplate

I am trying to generally recreate this graph and struggling with adding a column to the hovertemplate of a plotly Scatter. Here is a working example:
import pandas as pd
import chart_studio.plotly as py
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{percent}',
))
fig.show()
I'd like to get the column Cast-Fatality to show under {percent}
I've also tried putting in the Scatter() call a line for text = [df['Case-Fatality']], and switching {percent} to {text} as shown in this example, but this doesn't pull from the dataframe as hoped.
I've tried replotting it as a px, following this example but it throws the error dictionary changed size during iteration and I think using go may be simpler than px but I'm new to plotly.
Thanks in advance for any insight for how to add a column to the hover.
As the question asks for a solution with graph_objects, here are two that work-
Method (i)
Adding %{text} where you want the variable value to be and passing another variable called text that is a list of values needed in the go.Scatter() call. Like this-
percent = df['Case-Fatality']
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',text = percent
Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',
text = percent))
fig.show()
Method (ii)
This solution requires you to see the hoverlabel as when you pass x unified to hovermode. All you need to do then is pass an invisible trace with the same x-axis and the desired y-axis values. Passing mode='none' makes it invisible. Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0)))
fig.add_scatter(x=df.Confirmed, y=percent, mode='none')
fig.update_layout(hovermode='x unified')
fig.show()
The link you shared is broken. Are you looking for something like this?
import pandas as pd
import plotly.express as px
px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data={"Case-Fatality":True})
Then if you need to use bold or change your hover_template you can follow the last step in this answer
Drawing inspiration from another SO question/answer, I find that this is working as desired and permits adding multiple cols to the hover data:
import pandas as pd
import plotly.express as px
fig = px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data=[df['Case-Fatality'], df['Deaths/100K pop.']])
fig.show()

add a trace to every facet of a plotly figure

I'd like to add a trace to all facets of a plotly plot.
For example, I'd like to add a reference line to each daily facet of a scatterplot of the "tips" dataset showing a 15% tip. However, my attempt below only adds the line to the first facet.
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
df = px.data.tips()
ref_line_slope = 0.15 # 15% tip for reference
ref_line_x_range = np.array([df.total_bill.min(), df.total_bill.max()])
fig = px.scatter(df, x="total_bill", y="tip",facet_col="day", trendline='ols')
fig = fig.add_trace(go.Scatter(x=reference_line_x_range,y=ref_line_slope*reference_line_x_range,name='15%'))
fig.show()
According to an example from plotly you can pass 'all' as the row and col arguments and even skip empty subplots:
fig.add_trace(go.Scatter(...), row='all', col='all', exclude_empty_subplots=True)
It's not an elegant solution, but it should work for most cases
for row_idx, row_figs in enumerate(fig._grid_ref):
for col_idx, col_fig in enumerate(row_figs):
fig.add_trace(go.Scatter(...), row=row_idx+1, col=col_idx+1)

How to make a line plot from a pandas dataframe with a long or wide format

(This is a self-answered post to help others shorten their answers to plotly questions by not having to explain how plotly best handles data of long and wide format)
I'd like to build a plotly figure based on a pandas dataframe in as few lines as possible. I know you can do that using plotly.express, but this fails for what I would call a standard pandas dataframe; an index describing row order, and column names describing the names of a value in a dataframe:
Sample dataframe:
a b c
0 100.000000 100.000000 100.000000
1 98.493705 99.421400 101.651437
2 96.067026 98.992487 102.917373
3 95.200286 98.313601 102.822664
4 96.691675 97.674699 102.378682
An attempt:
fig=px.line(x=df.index, y = df.columns)
This raises an error:
ValueError: All arguments should have the same length. The length of argument y is 3, whereas the length of previous arguments ['x'] is 100`
Here you've tried to use a pandas dataframe of a wide format as a source for px.line.
And plotly.express is designed to be used with dataframes of a long format, often referred to as tidy data (and please take a look at that. No one explains it better that Wickham). Many, particularly those injured by years of battling with Excel, often find it easier to organize data in a wide format. So what's the difference?
Wide format:
data is presented with each different data variable in a separate column
each column has only one data type
missing values are often represented by np.nan
works best with plotly.graphobjects (go)
lines are often added to a figure using fid.add_traces()
colors are normally assigned to each trace
Example:
a b c
0 -1.085631 0.997345 0.282978
1 -2.591925 0.418745 1.934415
2 -5.018605 -0.010167 3.200351
3 -5.885345 -0.689054 3.105642
4 -4.393955 -1.327956 2.661660
5 -4.828307 0.877975 4.848446
6 -3.824253 1.264161 5.585815
7 -2.333521 0.328327 6.761644
8 -3.587401 -0.309424 7.668749
9 -5.016082 -0.449493 6.806994
Long format:
data is presented with one column containing all the values and another column listing the context of the value
missing values are simply not included in the dataset.
works best with plotly.express (px)
colors are set by a default color cycle and are assigned to each unique variable
Example:
id variable value
0 0 a -1.085631
1 1 a -2.591925
2 2 a -5.018605
3 3 a -5.885345
4 4 a -4.393955
... ... ... ...
295 95 c -4.259035
296 96 c -5.333802
297 97 c -6.211415
298 98 c -4.335615
299 99 c -3.515854
How to go from wide to long?
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])
The two snippets below will produce the very same plot:
How to use px to plot long data?
fig = px.line(df, x='id', y='value', color='variable')
How to use go to plot wide data?
colors = px.colors.qualitative.Plotly
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))
fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))
fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))
fig.show()
By the looks of it, go is more complicated and offers perhaps more flexibility? Well, yes. And no. You can easily build a figure using px and add any go object you'd like!
Complete go snippet:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# dataframe of a wide format
np.random.seed(123)
X = np.random.randn(100,3)
df=pd.DataFrame(X, columns=['a','b','c'])
df=df.cumsum()
df['id']=df.index
# plotly.graph_objects
colors = px.colors.qualitative.Plotly
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))
fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))
fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))
fig.show()
Complete px snippet:
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.offline import iplot
# dataframe of a wide format
np.random.seed(123)
X = np.random.randn(100,3)
df=pd.DataFrame(X, columns=['a','b','c'])
df=df.cumsum()
df['id']=df.index
# dataframe of a long format
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])
# plotly express
fig = px.line(df, x='id', y='value', color='variable')
fig.show()
I'm going to add this as answer so it will be on evidence.
First of all thank you #vestland for this. It's a question that come over and over so it's good to have this addressed and it could be easier to flag duplicated question.
Plotly Express now accepts wide-form and mixed-form data
as you can check in this post.
You can change the pandas plotting backend to use plotly:
import pandas as pd
pd.options.plotting.backend = "plotly"
Then, to get a fig all you need to write is:
fig = df.plot()
fig.show() displays the above image.

Seaborn HUE in Plotly

I have this dataframe Gas Price Brazil /
Data Frame
I get only the gasoline values from this DF and want to plot the average price (PREÇO MEDIO) over time (YEARS - ANO) from each region (REGIAO)
I used Seaborn with HUE and get this:
But when I try to plot the same thing at Plotly the result is:
How can I get the same plot with plotly?
I searched and find this: Seaborn Hue on Plotly
But this didn't work to me.
The answer:
You will achieve the same thing using plotly express and the color attribute:
fig = px.line(dfm, x="dates", y="value", color='variable')
The details:
You haven't described the structure of your data in detail, but assigning hue like this is normally meant to be applied to a data structure such as...
Date Variable Value
01.01.2020 A 100
01.01.2020 B 90
01.02.2020 A 110
01.02.2020 B 120
... where a unique hue or color is assigned to different variable names that are associated with a timestamp column where each timestamp occurs as many times as there are variables.
And that seems to be the case for seaborn too:
hue : name of variables in data or vector data, optional
Grouping variable that will produce points with different colors. Can
be either categorical or numeric, although color mapping will behave
differently in latter case.
You can achieve the same thing with plotly using the color attribute in go.Scatter(), but it seems that you could make good use of plotly.express too. Until you've provided a proper data sample, I'll show you how to do it using some sampled data in a dataframe using numpy and pandas.
Plot:
Code:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(50, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2020, 1, 1).strftime('%Y-%m-%d'), periods=50).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum().reset_index()
# melt data to provide the data structure mentioned earlier
dfm=pd.melt(df, id_vars=['dates'], value_vars=df.columns[1:])
dfm.set_index('dates')
dfm.head()
# plotly
fig = px.line(dfm, x="dates", y="value", color='variable')
fig.show()

Categories

Resources