I want to create a multi layer graph with the same data frame from pandas.
One should be a boxplot and the other a scatter to see where the company is located.
Is there a way to combine both plots?
boxplot
scatterplot
import pandas as pd
import plotly.express as px
df = pd.read_csv("company_index.csv", sep=";", decimal=",")
print(df)
df_u9 = df.loc[df["company"].isin(["U9"])]
fig_1 = px.box(
df,
x="period",
y="index"
)
fig_2 = px.scatter(
df_u9,
x="period",
y="index"
)
fig_1.show()
fig_2.show()
company_index.csv
period;index;company
1;202,4;U1
1;226,69;U10
1;235,18;U9
1;236,49;U4
1;238,13;U2
1;244,05;U6
1;252,08;U3
1;256,68;U8
1;294,99;U5
1;299,391;U7
2;243,78;U1
2;264,26;U10
2;270,6;U2
2;272,89;U9
2;285,26;U5
2;289,29;U4
2;291,15;U6
2;291,19;U3
2;305,92;U7
2;314,65;U8
3;271,82;U1
3;278,65;U2
3;296,16;U10
3;297,21;U4
3;305,93;U6
3;308,96;U5
3;323,74;U9
3;335,93;U3
3;354,13;U8
3;381,2;U7
4;281,26;U5
4;308,5;U2
4;311,61;U1
4;334,03;U4
4;335,72;U9
4;344,32;U8
4;345,27;U6
4;355,44;U3
4;373,54;U7
4;381,68;U10
5;288,6;U1
5;305,66;U5
5;323,2;U2
5;358,46;U8
5;365,57;U3
5;366,96;U10
5;368,38;U7
5;371,23;U6
5;373,63;U4
5;422,93;U9
6;285,32;U5
6;291,65;U1
6;308,68;U2
6;372,04;U8
6;376,64;U3
6;403,55;U6
6;407,38;U4
6;420,65;U10
6;423,68;U9
6;453,09;U7
Found this solution. Works rather well.
Im still struggling to understand the ".data[0]" but i believe its referring to the first fig in use. Maybe if you have multiple graphs.
import pandas as pd
import plotly.express as px
df = pd.read_csv("company_index.csv", sep=";", decimal=",")
print(df)
df_u9 = df.loc[df["company"].isin(["U9"])].copy()
df_u9["size"] = 1
fig = px.box(
df,
x="period",
y="index"
)
fig.add_trace(px.scatter(
df_u9,
x="period",
y="index",
size="size",
size_max=15,
color_discrete_sequence=(203,153,201)
).data[0])
fig.show()
Related
I am trying to write a for loop that for distplot subplots.
I have a dataframe with many columns of different lengths. (not including the NaN values)
fig = make_subplots(
rows=len(assets), cols=1,
y_title = 'Hourly Price Distribution')
i=1
for col in df_all.columns:
fig = ff.create_distplot([[df_all[[col]].dropna()]], col)
fig.append()
i+=1
fig.show()
I am trying to run a for loop for subplots for distplots and get the following error:
PlotlyError: Oops! Your data lists or ndarrays should be the same length.
UPDATE:
This is an example below:
df = pd.DataFrame({'2012': np.random.randn(20),
'2013': np.random.randn(20)+1})
df['2012'].iloc[0] = np.nan
fig = ff.create_distplot([df[c].dropna() for c in df.columns],
df.columns,show_hist=False,show_rug=False)
fig.show()
I would like to plot each distribution in a different subplot.
Thank you.
Update: Distribution plots
Calculating the correct values is probably both quicker and more elegant using numpy. But I often build parts of my graphs using one plotly approach(figure factory, plotly express) and then use them with other elements of the plotly library (plotly.graph_objects) to get what I want. The complete snippet below shows you how to do just that in order to build a go based subplot with elements from ff.create_distplot. I'd be happy to give further explanations if the following suggestion suits your needs.
Plot
Complete code
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import plotly.graph_objects as go
df = pd.DataFrame({'2012': np.random.randn(20),
'2013': np.random.randn(20)+1})
df['2012'].iloc[0] = np.nan
df = df.reset_index()
dfm = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:])
dfm = dfm.dropna()
dfm.rename(columns={'variable':'year'}, inplace = True)
cols = dfm.year.unique()
nrows = len(cols)
fig = make_subplots(rows=nrows, cols=1)
for r, col in enumerate(cols, 1):
dfs = dfm[dfm['year']==col]
fx1 = ff.create_distplot([dfs['value'].values], ['distplot'],curve_type='kde')
fig.add_trace(go.Scatter(
x= fx1.data[1]['x'],
y =fx1.data[1]['y'],
), row = r, col = 1)
fig.show()
First suggestion
You should:
1. Restructure your data with pd.melt(df, id_vars=['index'], value_vars=df.columns[1:]),
2. and the use the occuring column 'variable' to build subplots for each year through the facet_row argument to get this:
In the complete snippet below you'll see that I've changed 'variable' to 'year' in order to make the plot more intuitive. There's one particularly convenient side-effect with this approach, namely that running dfm.dropna() will remove the na value for 2012 only. If you were to do the same thing on your original dataframe, the corresponding value in the same row for 2013 would also be removed.
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'2012': np.random.randn(20),
'2013': np.random.randn(20)+1})
df['2012'].iloc[0] = np.nan
df = df.reset_index()
dfm = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:])
dfm = dfm.dropna()
dfm.rename(columns={'variable':'year'}, inplace = True)
fig = px.histogram(dfm, x="value",
facet_row = 'year')
fig.show()
I have following pandas dataframe. I would like to create box (sub)plots of all the 5 columns (in one plot). How can I achieve this.
I am using following python statement but I am not getting the output.
df.boxplot(column=['synonym']['score'])
Here is an example of boxplot via plotly.express:
import plotly.express as px
df = pd.DataFrame(dict(x1=[1,2,3], x2=[4,8,12],x3=[1,5,10]))
df = df.melt(value_vars=['x1','x2','x3'])
fig = px.box(df, x='variable', y='value', color='variable')
fig.show()
I am trying to generally recreate this graph and struggling with adding a column to the hovertemplate of a plotly Scatter. Here is a working example:
import pandas as pd
import chart_studio.plotly as py
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{percent}',
))
fig.show()
I'd like to get the column Cast-Fatality to show under {percent}
I've also tried putting in the Scatter() call a line for text = [df['Case-Fatality']], and switching {percent} to {text} as shown in this example, but this doesn't pull from the dataframe as hoped.
I've tried replotting it as a px, following this example but it throws the error dictionary changed size during iteration and I think using go may be simpler than px but I'm new to plotly.
Thanks in advance for any insight for how to add a column to the hover.
As the question asks for a solution with graph_objects, here are two that work-
Method (i)
Adding %{text} where you want the variable value to be and passing another variable called text that is a list of values needed in the go.Scatter() call. Like this-
percent = df['Case-Fatality']
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',text = percent
Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',
text = percent))
fig.show()
Method (ii)
This solution requires you to see the hoverlabel as when you pass x unified to hovermode. All you need to do then is pass an invisible trace with the same x-axis and the desired y-axis values. Passing mode='none' makes it invisible. Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0)))
fig.add_scatter(x=df.Confirmed, y=percent, mode='none')
fig.update_layout(hovermode='x unified')
fig.show()
The link you shared is broken. Are you looking for something like this?
import pandas as pd
import plotly.express as px
px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data={"Case-Fatality":True})
Then if you need to use bold or change your hover_template you can follow the last step in this answer
Drawing inspiration from another SO question/answer, I find that this is working as desired and permits adding multiple cols to the hover data:
import pandas as pd
import plotly.express as px
fig = px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data=[df['Case-Fatality'], df['Deaths/100K pop.']])
fig.show()
I am trying to draw a bar chart from the CSV data I transform using pivot_table. The bar chart should have the count on the y-axis and companystatus along the x-axis.
I am getting this instead:
Ultimately, I want to stack the bar by CompanySizeId.
I have been following this video.
import plotly.graph_objects as go
import plotly.offline as pyo
import pandas as pd
countcompany = pd.read_csv(
'https://raw.githubusercontent.com/redbeardcr/Plotly/master/Data/countcompany.csv')
df = pd.pivot_table(countcompany, index='CompanyStatusLabel',
values='n', aggfunc=sum)
print(df)
data = [go.Bar(
x=df.index,
y=df.values,
)]
layout = go.Layout(title='Title')
fig = go.Figure(data=data, layout=layout)
pyo.plot(fig)
Code can be found here
Thanks for any help
If you flatten the array with the y values, i.e. if you replace y=df.values with y=df.values.flatten(), your code will work as expected.
import plotly.graph_objects as go
import plotly.offline as pyo
import pandas as pd
countcompany = pd.read_csv('https://raw.githubusercontent.com/redbeardcr/Plotly/master/Data/countcompany.csv')
df = pd.pivot_table(countcompany, index='CompanyStatusLabel', values='n', aggfunc=sum)
data = [go.Bar(
x=df.index,
y=df.values.flatten(),
)]
layout = go.Layout(title='Title')
fig = go.Figure(data=data, layout=layout)
pyo.plot(fig)
I need to create a line chart from multiple columns of a dataframe. In pandas, you can draw a multiple line chart using a code as follows:
df.plot(x='date', y=['sessions', 'cost'], figsize=(20,10), grid=True)
How can this be done using plotly_express?
With version 4.8 of Plotly.py, the code in the original question is now supported almost unmodified:
pd.options.plotting.backend = "plotly"
df.plot(x='date', y=['sessions', 'cost'])
Previous answer, as of July 2019
For this example, you could prepare the data slightly differently.
df_melt = df.melt(id_vars='date', value_vars=['sessions', 'cost'])
If you transpose/melt your columns (sessions, cost) into additional rows, then you can specify the new column 'variable' to partition by in the color parameter.
px.line(df_melt, x='date' , y='value' , color='variable')
Example plotly_express output
With newer versions of plotly, all you need is:
df.plot()
As long as you remember to set pandas plotting backend to plotly:
pd.options.plotting.backend = "plotly"
From here you can easily adjust your plot to your liking, for example setting the theme:
df.plot(template='plotly_dark')
Plot with dark theme:
One particularly awesome feature with newer versions of plotly is that you no longer have to worry whether your pandas dataframe is of a wide or long format. Either way, all you need is df.plot(). Check out the details in the snippet below.
Complete code:
# imports
import plotly.express as px
import pandas as pd
import numpy as np
# settings
pd.options.plotting.backend = "plotly"
# sample dataframe of a wide format
np.random.seed(4); cols = list('abc')
X = np.random.randn(50,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0; df=df.cumsum()
# plotly figure
df.plot(template = 'plotly_dark')
Answer for older versions:
I would highly suggest using iplot() instead if you'd like to use plotly in a Jupyter Notebook for example:
Plot:
Code:
import plotly
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas as pd
import numpy as np
# setup
init_notebook_mode(connected=True)
np.random.seed(123)
cf.set_config_file(theme='pearl')
# Random data using cufflinks
df1 = cf.datagen.lines()
df2 = cf.datagen.lines()
df3 = cf.datagen.lines()
df = pd.merge(df1, df2, how='left',left_index = True, right_index = True)
df = pd.merge(df, df3, how='left',left_index = True, right_index = True)
fig = df1.iplot(asFigure=True, kind='scatter',xTitle='Dates',yTitle='Returns',title='Returns')
iplot(fig)
Its also worth pointing out you can combine plotly express with graph_objs. This is a good route when the lines have different x points.
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
# data set 1
x = np.linspace(0, 9, 10)
y = x
# data set 2
df = pd.DataFrame(np.column_stack([x*0.5, y]), columns=["x", "y"])
fig = go.Figure(px.scatter(df, x="x", y="y"))
fig.add_trace(go.Scatter(x=x, y=y))
fig.show()