I want to create a scatter plot matrix (matrix of scatter plots of multiple variables to see the correlation between each pair). However, I would like to remove some variables from the rows (but keep them in the columns).
With the following code, I'm able to get the complete scatter plot matrix (with all variables):
import numpy as np
import pandas as pd
import plotly.graph_objects as go
df = pd.DataFrame(
np.random.randn(1000, 5),
columns=['A', 'B', 'C', 'M1', 'M2']
)
fig = go.Figure(
data=go.Splom(
dimensions=[dict(label=c, values=df[c]) for c in df.columns],
text=df.index,
marker=dict(
size=3,
color=df['M1'],
colorscale='Bluered',
),
)
)
fig.show()
I would like to have the same plot, but only with the rows corresponding to M1 and M2, something like that:
Is this possible with plotly?
Note: I want to get an interactive HTML output, so just cropping the image won't work in this case.
Related
I have a dataframe like this
df = pd.DataFrame({'name':['a', 'b', 'c', 'd', 'e'], 'value':[54.2, 53.239, 43.352, 36.442, -12.487]})
df
I'd like to plot a simple stacked bar chart like the one below whit plotly.express
How can a I do that?
I've seen on documentation several examples but none of them solved my problem
Thank you
It's a little wordy, but you can set a single value for the x axis, in this case zero. Then you just need to tweak your dimension, lables, and ranges.
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'name':['a', 'b', 'c', 'd', 'e'], 'value':[54.2, 53.239, 43.352, 36.442, -12.487]})
df['x'] = 0
fig = px.bar(df, x='x', y='value',color='name', width=500, height=1000)
fig.update_xaxes(showticklabels=False, title=None)
fig.update_yaxes(range=[-50,200])
fig.update_traces(width=.3)
fig.show()
The bar chart's only ever going to have one column? That seems like an odd use-case for a bar chart, but...
What I would do is create one trace per "name", filtering df as trace_df=df[df['name']==name], and then make a Bar for each of those, something like this:
import plotly.graph_objects as go
trace_dfs = [df[df['name']==name] for name in df['name']]
bars = [
go.Bar(
name=name,
x=['constant' for _ in trace_frame['value']],
y=trace_frame['value'],
)
for trace_frame in trace_dfs
]
fig = go.Figure(
data=bars,
barmode='stack'
)
Granted, that's not plotly_express, it's core plotly, which allows a lot more control. If you want multiple stacked bars for different values, you'll need separate labels and separate values for x and y, not the two-column DF you described. There are several more examples here and a full description of the available bar chart options here.
Pandas provides builtin plotting functionality for DataFrames with several plotting backend engines (matplotlib, etc.). Id like to plot an interactive 3D scatterplot directly from a dataframe via df.plot() but came up with non-interactive plots only. I'm thinking of something I get when e.g. plotly. I'd prefer a solution which is independent of exploratory data analysis IDE setup dependencies (e.g. ipywidget when using JupyterLab). How can I plot interactive 3D scatter plots via df.plot()?
plotly is the way to go...use scatter3d
import plotly as py
import plotly.graph_objs as go
import numpy as np
import pandas as pd
# data
np.random.seed(1)
df = pd.DataFrame(np.random.rand(20, 3), columns=list('ABC'))
trace = go.Scatter3d(
x=df['A'],
y=df['B'],
z=df['C'],
mode='markers',
marker=dict(
size=5,
color=c,
colorscale='Viridis',
),
name= 'test',
# list comprehension to add text on hover
text= [f"A: {a}<br>B: {b}<br>C: {c}" for a,b,c in list(zip(df['A'], df['B'], df['C']))],
# if you do not want to display x,y,z
hoverinfo='text'
)
layout = dict(title = 'TEST',)
data = [trace]
fig = dict(data=data, layout=layout)
py.offline.plot(fig, filename = 'Test.html')
Holding to the df.plot() approach there is an df.iplot() with Cufflinks and Plotly.
I have several histograms that I succeded to plot using plotly like this:
fig.add_trace(go.Histogram(x=np.array(data[key]), name=self.labels[i]))
I would like to create something like this 3D stacked histogram but with the difference that each 2D histogram inside is a true histogram and not just a hardcoded line (my data is of the form [0.5 0.4 0.5 0.7 0.4] so using Histogram directly is very convenient)
Note that what I am asking is not similar to this and therefore also not the same as this. In the matplotlib example, the data is presented directly in a 2D array so the histogram is the 3rd dimension. In my case, I wanted to feed a function with many already computed histograms.
The snippet below takes care of both binning and formatting of the figure so that it appears as a stacked 3D chart using multiple traces of go.Scatter3D and np.Histogram.
The input is a dataframe with random numbers using np.random.normal(50, 5, size=(300, 4))
We can talk more about the other details if this is something you can use:
Plot 1: Angle 1
Plot 2: Angle 2
Complete code:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'browser'
# data
np.random.seed(123)
df = pd.DataFrame(np.random.normal(50, 5, size=(300, 4)), columns=list('ABCD'))
# plotly setup
fig=go.Figure()
# data binning and traces
for i, col in enumerate(df.columns):
a0=np.histogram(df[col], bins=10, density=False)[0].tolist()
a0=np.repeat(a0,2).tolist()
a0.insert(0,0)
a0.pop()
a1=np.histogram(df[col], bins=10, density=False)[1].tolist()
a1=np.repeat(a1,2)
fig.add_traces(go.Scatter3d(x=[i]*len(a0), y=a1, z=a0,
mode='lines',
name=col
)
)
fig.show()
Unfortunately you can't use go.Histogram in a 3D space so you should use an alternative way. I used go.Scatter3d and I wanted to use the option to fill line doc but there is an evident bug see
import numpy as np
import plotly.graph_objs as go
# random mat
m = 6
n = 5
mat = np.random.uniform(size=(m,n)).round(1)
# we want to have the number repeated
mat = mat.repeat(2).reshape(m, n*2)
# and finally plot
x = np.arange(2*n)
y = np.ones(2*n)
fig = go.Figure()
for i in range(m):
fig.add_trace(go.Scatter3d(x=x,
y=y*i,
z=mat[i,:],
mode="lines",
# surfaceaxis=1 # bug
)
)
fig.show()
I need to plot a very large number of segments with plotly. Contrary to a regular scatter plot where all points can be connected, here I need to only connect points two by two.
I considered different options:
adding line shapes to the plot; apparently relatively slow
creating a large number of line plots with only two points
Would there be a more suitable method? Possibly a single scatter plot where only every other couple of points are connected.
I'm looking for an efficient way to produce the plot in Python but also for good rendering performances.
This answer builds on the suggestion in the comment from Maximilian Peters, as well as Jezraels approach to insert a new row after every nth row.
A key part is also to include fig.update_traces(connectgaps=False)
Plot:
Complete code:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# dataframe, sample
np.random.seed(123)
cols = ['a','b','c', 'd', 'e', 'f', 'g']
X = np.random.randn(50,len(cols))
df=pd.DataFrame(X, columns=cols)
df=df.cumsum()
df['id']=df.index
# dataframe with every nth row containing np.nan
df2 = (df.iloc[1::2]
.assign(id = lambda x: x['id'] + 1, c = np.nan)
.rename(lambda x: x + .5))
df1 = pd.concat([df, df2], sort=False).sort_index().reset_index(drop=True)
df1.loc[df1.isnull().any(axis=1), :] = np.nan
df1
# plotly figure
colors = px.colors.qualitative.Plotly
fig = go.Figure()
for i, col in enumerate(df1.columns[:-1]):
fig.add_traces(go.Scatter(x=df1.index, y=df1[col],
mode='lines+markers', line=dict(color=colors[i])))
fig.update_traces(connectgaps=False)
fig.show()
I have a pandas dataframe where one of the columns is a set of labels that I would like to plot each of the other columns against in subplots. In other words, I want the y-axis of each subplot to use the same column, called 'labels', and I want a subplot for each of the remaining columns with the data from each column on the x-axis. I expected the following code snippet to achieve this, but I don't understand why this results in a single nonsensical plot:
examples.plot(subplots=True, layout=(-1, 3), figsize=(20, 20), y='labels', sharey=False)
The problem with that code is that you didn't specify an x value. It seems nonsensical because it's plotting the labels column against an index from 0 to the number of rows. As far as I know, you can't do what you want in pandas directly. You might want to check out seaborn though, it's another visualization library that has some nice grid plotting helpers.
Here's an example with your data:
import pandas as pd
import seaborn as sns
import numpy as np
examples = pd.DataFrame(np.random.rand(10,4), columns=['a', 'b', 'c', 'labels'])
g = sns.PairGrid(examples, x_vars=['a', 'b', 'c'], y_vars='labels')
g = g.map(plt.plot)
This creates the following plot:
Obviously it doesn't look great with random data, but hopefully with your data it will look better.