Python PLOTLY I want to make the circles clearer - python

import plotly.express as px
import pandas as pd
data = pd.read_csv("Book1.csv")
fig = px.scatter(data, y="Category", x="Mean", color="Change")
fig.update_layout(
xaxis=dict(title="Title",range=[2,3],),
yaxis=dict(title="Mean"),
title="Title"
)
fig.update_traces(marker=dict(size=30,
line=dict(width=2,
color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.show()
I want to make the circles clearer, like more spaced out or scattered. Do you have any suggestions?
Here is the plot:

There is a technique called jittering where you add a small amount of noise to make it less likely for the circles to overlap as much as in your sample figure. It's not perfect, but here is an example of what you can accomplish. You can also try regenerating the plot with a different amount of jittering, as well as different random seeds until you are happy with the result.
import plotly.express as px
import numpy as np
import pandas as pd
# data = pd.read_csv("Book1.csv")
data = pd.DataFrame({
'Category': ['More than usual']*5 + ['About the same']*5 + ['Less than usual']*5,
'Mean': [2.2,2.4,2.22,2.24,2.6] + [2.4,2.41,2.5,2.1,2.12] + [2.81,2.1,2.5,2.45,2.42],
'Change': [1]*5 + [2]*5 + [3]*5
})
category_to_value_map = {
'Less than usual': 1,
'About the same': 2,
'More than usual': 3
}
data['y'] = data['Category'].map(category_to_value_map)
## apply jittering
max_jittering = 0.15
np.random.seed(4)
data['y'] = data['y'] + np.random.uniform(
low=-1*max_jittering,
high=max_jittering,
size=len(data)
)
fig = px.scatter(data, y="y", x="Mean", color="Change")
fig.update_layout(
xaxis=dict(title="Title",range=[2,3],),
yaxis=dict(title="Mean"),
title="Title"
)
fig.update_traces(marker=dict(size=20,
line=dict(width=2,
color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.update_layout(
yaxis = dict(
tickmode = 'array',
tickvals = [1, 2, 3],
ticktext = ['Less than usual', 'About the same', 'More than usual']
)
)
fig.show()

Related

Plotly: Setting the marker size based on the column in the exported data?

The code is running well; however, in my dataset, there is a column SD in my custom dataset. I would like the size of these markers should be based on SD and I did it in the seaborn library, it is running well. However, I get errors here.
%Error is
Did you mean "line"?
Bad property path:
size
^^^^
Code is
df=pd.read_csv("Lifecycle.csv")
df1=df[df["Specie"]=="pot_marigold"]
df1
df2=df[df["Specie"]=="Sunflowers"]
df2
trace=go.Scatter(x=df1["Days"], y=df1["Lifecycle"],text=df1["Specie"],marker={"color":"green"}, size=df1[SD],
mode="lines+markers")
trace1=go.Scatter(x=df2["Days"], y=df2["Lifecycle"],text=df2["Specie"],marker={"color":"red"},
mode="lines+markers")
data=[trace,trace1]
layout=go.Layout(
title="Lifecycle",
xaxis={"title":"Days"},
yaxis={"title":"Lifecycle"})
fig=go.Figure(data=data,layout=layout)
pyo.plot(fig)
you have not provided sample data, so I have simulated based on what I can imply from your code
simply you can set marker_size within framework you have used
this type of plot is far simpler with Plotly Express have also shown code for this
import pandas as pd
import numpy as np
import plotly.graph_objects as go
# df=pd.read_csv("Lifecycle.csv")
df = pd.DataFrame(
{
"Specie": np.repeat(["pot_marigold", "Sunflowers"], 10),
"Days": np.tile(np.arange(1, 11, 1), 2),
"Lifecycle": np.concatenate(
[np.sort(np.random.uniform(1, 5, 10)).astype(int) for _ in range(2)]
),
"SD": np.random.randint(1, 8, 20),
}
)
df1 = df[df["Specie"] == "pot_marigold"]
df2 = df[df["Specie"] == "Sunflowers"]
trace = go.Scatter(
x=df1["Days"],
y=df1["Lifecycle"],
text=df1["Specie"],
marker={"color": "green"},
marker_size=df1["SD"],
mode="lines+markers",
)
trace1 = go.Scatter(
x=df2["Days"],
y=df2["Lifecycle"],
text=df2["Specie"],
marker={"color": "red"},
mode="lines+markers",
)
data = [trace, trace1]
layout = go.Layout(
title="Lifecycle", xaxis={"title": "Days"}, yaxis={"title": "Lifecycle"}
)
fig = go.Figure(data=data, layout=layout)
fig
Plotly Express
import plotly.express as px
px.scatter(
df,
x="Days",
y="Lifecycle",
color="Specie",
size="SD",
color_discrete_map={"pot_marigold": "green", "Sunflowers": "red"},
).update_traces(mode="lines+markers")
You can use plotly.express instead:
import plotly.express as px
trace=px.scatter(df, x="Days", y="Lifecycle", text="Specie", marker="SD")

Plotly express box plot hover data not working

Trying to add data to hover of boxplot express in plotly and following the instructions here in plotly 5.4.1. It is mentioned in the tutorial that additional information to be shown in the hover can be added by hover_data and hover_name argument. However, The additional hover data, in this case information from continent column, is not presented in the hover. I am not sure what is going wrong here? (Here is the code I test in Google colab)
import plotly.express as px
import pandas as pd
import numpy as np
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(20, 1),columns=['Col1'])
df['country']=['canada','france']*10
df['continent']=['america','europe']*10
fig = px.box(df, x="country", y="Col1", hover_data=['continent'])
fig.show()
Here is what i get in google colab:
Error I get with suggested solution (this was solved with pip install plotly --upgrade):
The solution offered by #Rob works but to make it a generic function, here is what I wrote out of it:
def box_with_hover(df,x,y,hover_data):
fig = px.box(df, x=x, y=y, hover_data=[hover_data])
fig.add_traces(
px.bar(
df.groupby([x, hover_data], as_index=False).agg(
base=(y, "min"), y=(y, lambda s: s.max() - s.min())
),
x=x,
base="base",
y="y",
hover_data={hover_data:True, x:True, "base":False, "y":False},
)
.update_traces(opacity=0.1)
.data ).update_layout(bargap=0.8)
fig.show()
this is similar to Change Plotly Boxplot Hover Data
boxplot hover info is within javascript layer of plotly. Hence have overlayed a bar plot where hover can be controlled in way you require. When you hover over boxplot you get standard boxplot hover. bar different hover info
import plotly.express as px
import pandas as pd
import numpy as np
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(20, 1), columns=["Col1"])
df["country"] = ["canada", "france"] * 10
df["continent"] = ["america", "europe"] * 10
fig = px.box(df, x="country", y="Col1", hover_data=["continent"])
fig.add_traces(
px.bar(
df.groupby(["country", "continent"], as_index=False).agg(
base=("Col1", "min"), y=("Col1", lambda s: s.max() - s.min())
),
x="country",
base="base",
y="y",
hover_data={"continent":True, "country":True, "base":False, "y":False},
)
.update_traces(opacity=0.1)
.data
).update_layout(bargap=0.8)
fig
generic function
import plotly.express as px
import pandas as pd
import numpy as np
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(20, 1), columns=["Col1"])
df["country"] = ["canada", "france"] * 10
df["continent"] = ["america", "europe"] * 10
df["letter"] = list("AB") * 10
def box_with_hover(*args, **kwargs):
if isinstance(args[0], pd.DataFrame):
kwargs["data_frame"] = args[0]
fig = px.box(**kwargs)
fig.add_traces(
px.bar(
kwargs["data_frame"]
.groupby([kwargs["x"]], as_index=False)
.agg(
**{
**{
"base": (kwargs["y"], "min"),
"y": (kwargs["y"], lambda s: s.max() - s.min()),
},
**{c: (c, "first") for c in kwargs["hover_data"]},
}
),
x=kwargs["x"],
base="base",
y="y",
hover_data={
**{c: True for c in kwargs["hover_data"]},
**{kwargs["x"]: True, "base": False, "y": False},
},
)
.update_traces(opacity=0.1)
.data
).update_layout(bargap=0.8)
return fig
box_with_hover(
df.reset_index(), x="country", y="Col1", hover_data=["continent", "letter", "index"]
)

Plotly log scale show full tick values

Consider the code in the example which is
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", hover_name="country", log_x=True)
fig.show()
and produces
Is it possible that the ticks in the x axis would show 2000, 20k and so on instead of 2, and the same for 3, 4 ... ?
you can use https://plotly.com/python/tick-formatting/#tickmode--array
below example takes into account it's geometric and only want one significant digit in axis How to round a number to significant figures in Python
import plotly.express as px
import numpy as np
import pandas as pd
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", hover_name="country", log_x=True)
fig.update_layout(
xaxis={
"tickmode": "array",
"tickvals": pd.to_numeric(
[f"{n:.1g}" for n in np.geomspace(1, df["gdpPercap"].max(), 15)]
),
}
)

Plotly: How to retrieve regression results using plotly express?

You can easily plot a regression line using plotly express / px.scatter and retrieve regression results like beta using px.get_trendline_results(fig).iloc[0]["px_fit_results"].params[1]. But how can you retrieve other parameters like R-squared or p-vales for the coefficients?
Plot:
Code:
# imports
import plotly.express as px
import pandas as pd
import numpy as np
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# figure using px.scatter
fig = px.scatter(df, x="X", y="Y", trendline="ols", template = 'plotly_dark')
fig.show()
The answer:
model = px.get_trendline_results(fig)
results = model.iloc[0]["px_fit_results"]
alpha = params[0]
beta = .params[1]
p_beta = .pvalues[1]
r_squared = .rsquared
Details:
All regression results are available through:
px.get_trendline_results(fig)
Which, when run, will return a somewhat cryptic looking pandas dataframe:
px_fit_results
0 <statsmodels.regression.linear_model.Regressio...
The element under px_fit_results is an object of type statsmodels.regression.linear_model.RegressionResultsWrapper which is a wrapper for statsmodels.
So if we simplify matters a bit by setting:
models = px.get_trendline_results(fig)
And:
results = model.iloc[0]["px_fit_results"]
Then we can check what's available in that object using:
dir(results)
And find all the regression details one should need, like:
'predict',
'pvalues',
'remove_data',
'resid',
'resid_pearson',
'rsquared',
'rsquared_adj',
'save',
'scale',
'ssr',
'summary',
'summary2',
't_test',
't_test_pairwise',
But note that all these available results can be structured differently.
Running results.rsquared will return a single float 0.611901357827784, while running results.pvalues will return an array array([9.95834884e-01, 4.59734574e-05]). Which again will be subsettable for the constant and trendline through results.pvalues[0] and results.pvalues[1], respectively.
With this information available, you could for example extract some of them and include them as annotations to further improve your plotly figures:
Plot:
Complete code:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# Figure using plotly express
fig = px.scatter(df, x="X", y="Y", trendline="ols", template = 'plotly_dark')
# retrieve model estimates
model = px.get_trendline_results(fig)
results = model.iloc[0]["px_fit_results"]
alpha = results.params[0]
beta = results.params[1]
p_beta = results.pvalues[1]
r_squared = results.rsquared
line1 = 'y = ' + str(round(alpha, 4)) + ' + ' + str(round(beta, 4))+'x'
line2 = 'p-value = ' + '{:.5f}'.format(p_beta)
line3 = 'R^2 = ' + str(round(r_squared, 3))
summary = line1 + '<br>' + line2 + '<br>' + line3
fig.add_annotation(
x=110,
y=140,
xref="x",
yref="y",
text=summary,
showarrow=False,
font=dict(
family="Courier New, monospace",
size=16,
color="#ffffff"
),
align="left",
arrowhead=2,
arrowsize=1,
arrowwidth=2,
arrowcolor="#636363",
ax=20,
ay=-30,
borderwidth=2,
borderpad=4,
bgcolor="rgba(100,100,100, 0.6)",
opacity=0.8
)
fig.show()

Plotly: Plot multiple figures as subplots

These resources show how to take data from a single Pandas DataFrame and plot different columns subplots on a Plotly graph. I'm interested in creating figures from separate DataFrames and plotting them to the same graph as subplots. Is this possible with Plotly?
https://plot.ly/python/subplots/
https://plot.ly/pandas/subplots/
I'm creating each figure from a dataframe like this:
import pandas as pd
import cufflinks as cf
from plotly.offline import download_plotlyjs, plot,iplot
cf.go_offline()
fig1 = df.iplot(kind='bar',barmode='stack',x='Type',
y=mylist,asFigure=True)
Edit:
Here is an example based on Naren's feedback:
Create the dataframes:
a={'catagory':['loc1','loc2','loc3'],'dogs':[1,5,6],'cats':[3,1,4],'birds':[4,12,2]}
df1 = pd.DataFrame(a)
b={'catagory':['loc1','loc2','loc3'],'dogs':[12,3,5],'cats':[4,6,1],'birds':[7,0,8]}
df2 = pd.DataFrame(b)
The plot will just show the information for the dogs, not the birds or cats:
fig = tls.make_subplots(rows=2, cols=1)
fig1 = df1.iplot(kind='bar',barmode='stack',x='catagory',
y=['dogs','cats','birds'],asFigure=True)
fig.append_trace(fig1['data'][0], 1, 1)
fig2 = df2.iplot(kind='bar',barmode='stack',x='catagory',
y=['dogs','cats','birds'],asFigure=True)
fig.append_trace(fig2['data'][0], 2, 1)
iplot(fig)
Here's a short function in a working example to save a list of figures all to a single HTML file.
def figures_to_html(figs, filename="dashboard.html"):
with open(filename, 'w') as dashboard:
dashboard.write("<html><head></head><body>" + "\n")
for fig in figs:
inner_html = fig.to_html().split('<body>')[1].split('</body>')[0]
dashboard.write(inner_html)
dashboard.write("</body></html>" + "\n")
# Example figures
import plotly.express as px
gapminder = px.data.gapminder().query("country=='Canada'")
fig1 = px.line(gapminder, x="year", y="lifeExp", title='Life expectancy in Canada')
gapminder = px.data.gapminder().query("continent=='Oceania'")
fig2 = px.line(gapminder, x="year", y="lifeExp", color='country')
gapminder = px.data.gapminder().query("continent != 'Asia'")
fig3 = px.line(gapminder, x="year", y="lifeExp", color="continent",
line_group="country", hover_name="country")
figures_to_html([fig1, fig2, fig3])
You can get a dashboard that contains several charts with legends next to each one:
import plotly
import plotly.offline as py
import plotly.graph_objs as go
fichier_html_graphs=open("DASHBOARD.html",'w')
fichier_html_graphs.write("<html><head></head><body>"+"\n")
i=0
while 1:
if i<=40:
i=i+1
#______________________________--Plotly--______________________________________
color1 = '#00bfff'
color2 = '#ff4000'
trace1 = go.Bar(
x = ['2017-09-25','2017-09-26','2017-09-27','2017-09-28','2017-09-29','2017-09-30','2017-10-01'],
y = [25,100,20,7,38,170,200],
name='Debit',
marker=dict(
color=color1
)
)
trace2 = go.Scatter(
x=['2017-09-25','2017-09-26','2017-09-27','2017-09-28','2017-09-29','2017-09-30','2017-10-01'],
y = [3,50,20,7,38,60,100],
name='Taux',
yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
title= ('Chart Number: '+str(i)),
titlefont=dict(
family='Courier New, monospace',
size=15,
color='#7f7f7f'
),
paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)',
yaxis=dict(
title='Bandwidth Mbit/s',
titlefont=dict(
color=color1
),
tickfont=dict(
color=color1
)
),
yaxis2=dict(
title='Ratio %',
overlaying='y',
side='right',
titlefont=dict(
color=color2
),
tickfont=dict(
color=color2
)
)
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.plot(fig, filename='Chart_'+str(i)+'.html',auto_open=False)
fichier_html_graphs.write(" <object data=\""+'Chart_'+str(i)+'.html'+"\" width=\"650\" height=\"500\"></object>"+"\n")
else:
break
fichier_html_graphs.write("</body></html>")
print("CHECK YOUR DASHBOARD.html In the current directory")
Result:
You can also try the following using cufflinks:
cf.subplots([df1.figure(kind='bar',categories='category'),
df2.figure(kind='bar',categories='category')],shape=(2,1)).iplot()
And this should give you:
New Answer:
We need to loop through each of the animals and append a new trace to generate what you need. This will give the desired output I am hoping.
import pandas as pd
import numpy as np
import cufflinks as cf
import plotly.tools as tls
from plotly.offline import download_plotlyjs, plot,iplot
cf.go_offline()
import random
def generate_random_color():
r = lambda: random.randint(0,255)
return '#%02X%02X%02X' % (r(),r(),r())
a={'catagory':['loc1','loc2','loc3'],'dogs':[1,5,6],'cats':[3,1,4],'birds':[4,12,2]}
df1 = pd.DataFrame(a)
b={'catagory':['loc1','loc2','loc3'],'dogs':[12,3,5],'cats':[4,6,1],'birds':[7,0,8]}
df2 = pd.DataFrame(b)
#shared Xaxis parameter can make this graph look even better
fig = tls.make_subplots(rows=2, cols=1)
for animal in ['dogs','cats','birds']:
animal_color = generate_random_color()
fig1 = df1.iplot(kind='bar',barmode='stack',x='catagory',
y=animal,asFigure=True,showlegend=False, color = animal_color)
fig.append_trace(fig1['data'][0], 1, 1)
fig2 = df2.iplot(kind='bar',barmode='stack',x='catagory',
y=animal,asFigure=True, showlegend=False, color = animal_color)
#if we do not use the below line there will be two legend
fig2['data'][0]['showlegend'] = False
fig.append_trace(fig2['data'][0], 2, 1)
#additional bonus
#use the below command to use the bar chart three mode
# [stack, overlay, group]
#as shown below
#fig['layout']['barmode'] = 'overlay'
iplot(fig)
Output:
Old Answer:
This will be the solution
Explanation:
Plotly tools has a subplot function to create subplots you should read the documentation for more details here. So I first use cufflinks to create a figure of the bar chart. One thing to note is cufflinks create and object with both data and layout. Plotly will only take one layout parameter as input, hence I take only the data parameter from the cufflinks figure and append_trace it to the make_suplots object. so fig.append_trace() the second parameter is row number and third parameter is column number
import pandas as pd
import cufflinks as cf
import numpy as np
import plotly.tools as tls
from plotly.offline import download_plotlyjs, plot,iplot
cf.go_offline()
fig = tls.make_subplots(rows=2, cols=1)
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
fig1 = df.iplot(kind='bar',barmode='stack',x='A',
y='B',asFigure=True)
fig.append_trace(fig1['data'][0], 1, 1)
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('EFGH'))
fig2 = df2.iplot(kind='bar',barmode='stack',x='E',
y='F',asFigure=True)
fig.append_trace(fig2['data'][0], 2, 1)
iplot(fig)
If you want to add a common layout to the subplot I suggest that you do
fig.append_trace(fig2['data'][0], 2, 1)
fig['layout']['showlegend'] = False
iplot(fig)
or even
fig.append_trace(fig2['data'][0], 2, 1)
fig['layout'].update(fig1['layout'])
iplot(fig)
So in the first example before plotting, I access the individual parameters of the layout object and change them, you need to go through layout object properties for refernce.
In the second example before plotting, I update the layout of the figure with the cufflinks generated layout this will produce the same output as we see in cufflinks.
You've already received a few suggestions that work perfectly well. They do however require a lot of coding. Facet / trellis plots using px.bar() will let you produce the plot below using (almost) only this:
px.bar(df, x="category", y="dogs", facet_row="Source")
The only extra steps you'll have to take is to introduce a variable on which to split your data, and then gather or concatenate your dataframes like this:
df1['Source'] = 1
df2['Source'] = 2
df = pd.concat([df1, df2])
And if you'd like to include the other variables as well, just do:
fig = px.bar(df, x="category", y=["dogs", "cats", "birds"], facet_row="Source")
fig.update_layout(barmode = 'group')
Complete code:
# imports
import plotly.express as px
import pandas as pd
# data building
a={'category':['loc1','loc2','loc3'],'dogs':[1,5,6],'cats':[3,1,4],'birds':[4,12,2]}
df1 = pd.DataFrame(a)
b={'category':['loc1','loc2','loc3'],'dogs':[12,3,5],'cats':[4,6,1],'birds':[7,0,8]}
df2 = pd.DataFrame(b)
# data processing
df1['Source'] = 1
df2['Source'] = 2
df = pd.concat([df1, df2])
# plotly figure
fig = px.bar(df, x="category", y="dogs", facet_row="Source")
fig.show()
#fig = px.bar(df, x="category", y=["dogs", "cats", "birds"], facet_row="Source")
#fig.update_layout(barmode = 'group')

Categories

Resources