In Plotly, in order to create scatter plots, I usually do the following:
fig = px.scatter(df, x=x, y=y)
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
I want the yaxis to intersect the xaxis at x=6. So, instead of left yaxis representing negative numbers, I want it to be from [2,6] After the intersection, right side of graph is from [6,10].
Likewise, yaxis from below axis goes from [2,6]. Above the xaxis, it goes from [6,10].
How can I do this in Plotly?
Following on from my comment, as far as I am aware, what you're after is not currently available.
However, here is an example of a work-around which uses a shapes dictionary to add horizontal and vertical lines - acting as intersecting axes - placed at your required x/y intersection of 6.
Sample dataset:
import numpy as np
x = (np.random.randn(100)*2)+6
y1 = (np.random.randn(100)*2)+6
y2 = (np.random.randn(100)*2)+6
Example plotting code:
import plotly.io as pio
layout = {'title': 'Intersection of X/Y Axes Demonstration'}
shapes = []
traces = []
traces.append({'x': x, 'y': y1, 'mode': 'markers'})
traces.append({'x': x, 'y': y2, 'mode': 'markers'})
shapes.append({'type': 'line',
'x0': 2, 'x1': 10,
'y0': 6, 'y1': 6})
shapes.append({'type': 'line',
'x0': 6, 'x1': 6,
'y0': 2, 'y1': 10})
layout['shapes'] = shapes
layout['xaxis'] = {'range': [2, 10]}
layout['yaxis'] = {'range': [2, 10]}
pio.show({'data': data, 'layout': layout})
Output:
Comments (TL;DR):
The example code shown here uses the low-level Plotly API (plotly.io), rather than a convenience wrapper such as graph_objects or express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).
I think fig.add_hline() and fig.add_vline() is the function your need.
Example code
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'x':[6,7,3], 'y':[4,5,6]})
fig = px.scatter(df, x='x', y='y')
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
fig.add_hline(y=4)
fig.add_vline(x=6)
fig.show()
Output
Related
I have a dataset that looks like this:
x y z
0 Jan 28446000 110489.0
1 Feb 43267700 227900.0
When I plot a line chart like this:
px.line(data,x = 'x', y = ['y','z'], line_shape = 'spline', title="My Chart")
The y axis scale comes from 0 to 90 M. The first line on the chart for y is good enough. However, the second line appears to be always at 0M. What can I do to improve my chart such that we can see clearly how the values of both column change over the x values?
Is there any way I can normalize the data? Or perhaps I could change the scaling of the chart.
Often times we use data which is in different scales, and scaling the data would mask a characteristic we wish to display. One way to handle this is to add a secondary y-axis. An example is shown below.
Key points:
Create a layout dictionary object
Add a yaxis2 key to the dict, with the following: 'side': 'right', 'overlaying': 'y1'
This tells Plotly to create a secondary y-axis on the right side of the graph, and to overlay the primary y-axis.
Assign the appropriate trace to the newly created secondary y-axis as: 'yaxis': 'y2'
The other trace does not need to be assigned, as 'y1' is the default y-axis.
Comments (TL;DR):
The example code shown here uses the lower-level Plotly API, rather than a convenience wrapper such as graph_object to express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).
The Docs:
Here is a link to the Plotly docs referencing multiple axes.
Example Code:
import pandas as pd
from plotly.offline import iplot
df = pd.DataFrame({'x': ['Jan', 'Feb'],
'y': [28446000, 43267700],
'z': [110489.0, 227900.0]})
layout = {'title': 'Secondary Y-Axis Demonstration',
'legend': {'orientation': 'h'}}
traces = []
traces.append({'x': df['x'], 'y': df['y'], 'name': 'Y Values'})
traces.append({'x': df['x'], 'y': df['z'], 'name': 'Z Values', 'yaxis': 'y2'})
# Add config common to all traces.
for t in traces:
t.update({'line': {'shape': 'spline'}})
layout['yaxis1'] = {'title': 'Y Values', 'range': [0, 50000000]}
layout['yaxis2'] = {'title': 'Z Values', 'side': 'right', 'overlaying': 'y1', 'range': [0, 400000]}
iplot({'data': traces, 'layout': layout})
Graph:
I have two sets of data in separate lists. Each list element has a value from 0:100, and elements repeat.
For example:
first_data = [10,20,40,100,...,100,10,50]
second_data = [20,50,50,10,...,70,10,100]
I can plot one of these in a histogram using:
import plotly.graph_objects as go
.
.
.
fig = go.Figure()
fig.add_trace(go.Histogram(histfunc='count', x=first_data))
fig.show()
By setting histfunc to 'count', my histogram consists of an x-axis from 0 to 100 and bars for the number of repeated elements in first_data.
My question is: How can I overlay the second set of data over the same axis using the same "count" histogram?
One method to do this is by simply adding another trace, you were nearly there! The dataset used to create these examples, can be found in the last section of this post.
Note:
The following code uses the 'lower-level' plotly API, as (personally) I feel it's more transparent and enables the user to see what is being plotted, and why; rather than relying on the convenience modules of graph_objects and express.
Option 1 - Overlaid Bars:
from plotly.offline import plot
layout = {}
traces = []
traces.append({'x': data1, 'name': 'D1', 'opacity': 1.0})
traces.append({'x': data2, 'name': 'D2', 'opacity': 0.5})
# For each trace, add elements which are common to both.
for t in traces:
t.update({'type': 'histogram',
'histfunc': 'count',
'nbinsx': 50})
layout['barmode'] = 'overlay'
plot({'data': traces, 'layout': layout})
Output 1:
Option 2 - Curve Plot:
Another option is to plot the curve (Gaussian KDE) of the distribution, as shown here. It's worth noting that this method plots the probability density, rather than the counts.
X1, Y1 = calc_curve(data1)
X2, Y2 = calc_curve(data2)
traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'D1'})
traces.append({'x': X2, 'y': Y2, 'name': 'D2'})
plot({'data': traces})
Output 2:
Associated calc_curve() function:
from scipy.stats import gaussian_kde
def calc_curve(data):
"""Calculate probability density."""
min_, max_ = data.min(), data.max()
X = [min_ + i * ((max_ - min_) / 500) for i in range(501)]
Y = gaussian_kde(data).evaluate(X)
return(X, Y)
Option 3 - Plot Bars and Curves:
Or, you can always combine the two methods together, using the probability density on the yaxis.
layout = {}
traces = []
traces.append({'x': data1, 'name': 'D1', 'opacity': 1.0})
traces.append({'x': data2, 'name': 'D2', 'opacity': 0.5})
for t in traces:
t.update({'type': 'histogram',
'histnorm': 'probability density',
'nbinsx': 50})
traces.append({'x': X1, 'y': Y1, 'name': 'D1'})
traces.append({'x': X2, 'y': Y2, 'name': 'D2'})
layout['barmode'] = 'overlay'
plot({'data': traces, 'layout': layout})
Output 3:
Dataset:
Here is the bit of code used to simulate your dataset of [0,100] values, and to create these examples:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler((0, 100))
np.random.seed(4)
data1 = mms.fit_transform(np.random.randn(10000).reshape(-1, 1)).ravel()
data2 = mms.fit_transform(np.random.randn(10000).reshape(-1, 1)).ravel()
I am trying to create a bar chart where the upper and lower bound of each bar could be above or below zero. Hence the boxes should "float" depending on the data. I'm also trying to use pandas.plot function as it makes my life way easier in the real application.
The solution I've devised is a horrible kludge and only partially works. Basically I'm running two different bar charts that overlap, with one of the bars being white to "hide" the main bar if necessary. I'm using a mask to mark which bars should be which color. As you can see, this works OK in the "London" and "Paris" example below, but in the "Tokyo" it isn't working because the green bar is "in front" of the white bar.
I could manually fix this a few ways that I can think of, but it would make an already kludgy solution even worse. I'm sure there's a better way that I'm just not smart enough to think of!
Here's the plot, and full code below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_data = {'Category':['London', 'Paris', 'New York', 'Tokyo'],
'Upper':[10, 5, 0, -5],
'Lower':[5, -5, -10, -10]}
df = pd.DataFrame(data = df_data)
#Color corrector
u_mask = df['Upper'] < 0
d_mask = df['Lower'] < 0
n = len(df)
uca = ['darkgreen' for i in range(n)]
uca = np.array(uca)
uc = uca.copy()
uc[u_mask] = 'white'
dca = ['white' for i in range(n)]
dca = np.array(dca, dtype=uca.dtype)
dc = dca.copy()
dc[d_mask] = 'darkgreen'
(df.plot(kind='bar', y='Upper', x='Category',
color=uc, legend=False))
ax = plt.gca()
(df.plot(kind='bar', y='Lower', x='Category',
color=dc, legend=False, ax=ax))
plt.axhline(0, color='black')
x_axis = ax.xaxis
x_axis.label.set_visible(False)
plt.subplots_adjust(left=0.1,right=0.90,bottom=0.2,top=0.90)
plt.show()
To create the plot via pandas, you could create an extra column with the height. And use df.plot(..., y=df['Height'], bottom=df['Lower']):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_data = {'Category': ['London', 'Paris', 'New York', 'Tokyo'],
'Upper': [10, 5, 0, -5],
'Lower': [5, -5, -10, -10]}
df = pd.DataFrame(data=df_data)
df['Height'] = df['Upper'] - df['Lower']
ax = df.plot(kind='bar', y='Height', x='Category', bottom=df['Lower'],
color='darkgreen', legend=False)
ax.axhline(0, color='black')
plt.tight_layout()
plt.show()
PS: Note that pandas barplot forces the lower ylim to be "sticky". This is a desired behavior when all values are positive and the bars stand firmly on y=0. However, this behavior is distracting when both positive and negative values are involved.
To remove the stickyness:
ax.use_sticky_edges = False # df.plot() makes the lower ylim sticky
ax.autoscale(enable=True, axis='y')
plt.bar has a bottom paramter. You just need to calculate the heights. Here is a very easy exampel:
upper = [10, 5, 0, -5]
lower = [5, -5, -10, -10]
height = [upper[i] - lower[i] for i in range(len(upper))]
data = [1,2,3]
plt.bar(range(len(lower)),height, bottom=lower)
plt.show()
I have plotted a figure with 2 subplots, each with different scales. Everything plots correctly, except the colorscales are both plotted on the right and completely overlap - they are are not readable. I cannot find out how to position/reposition the individual subplot scales. I have included my code below. Thanks.
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
df = pd.read_csv(entry)
custColorscale = [[0, 'green'], [0.5, 'red'], [1, 'rgb(50, 50, 50)']]
fig = make_subplots(
rows=1, cols=2, subplot_titles=('one', 'two'))
fig.add_trace(
go.Scatter(x=df['tO'],
y=df['t1'],
mode='markers',
marker=dict(colorscale=custColorscale,
cmin=0, cmax=2,
size=6, color=df['Var1'],
showscale=True),
text=df['Var2']),
1, 1)
fig.add_trace(
go.Scatter(x=df['tO'],
y=df['t1'],
mode='markers',
marker=dict(
size=6, color=df['Var2'],
showscale=True),
text=df['Var2']),
1, 2)
fig.update_layout(height=700, width=1900,
title='Raw data')
fig.update_layout(coloraxis=dict(
colorscale='Bluered_r'))
fig.write_html(fig, file='raw plots.html', auto_open=True)
Looking through the Plotly documentation you find this which provide some hints as to how to solve the problem. Scroll to the 'marker' attributes and you will find that it has sub-attribute called 'colorbar'. The colorbar in turn has multiple options that could help set the plot the way you want. Particularly you find the 'x', 'y' and 'len' attributes of the colorbar very useful. You can use them to position the scales.
This question is also related to this but for a contour plot - you are making a scatterplot which is why the scatterplot reference would be what one should search.
A minimal working example (MWE) is shown below but with a toy dataset.
## make necessary imports
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd
## make a fake dataset with pandas
d = {'t0': [i for i in np.arange(0.,10.,1.)], 't1': [i for i in
np.arange(10.,20.,1.)],'Var1': [i for i in np.arange(20.,30.,1.)],'Var2':
[i for i in np.arange(30.,40.,1.)] }
df = pd.DataFrame(data=d) #the dataset is made to mock the example code you provided
And for your plot you have the following :
# make subplots
custColorscale = [[0, 'green'], [0.5, 'red'], [1, 'rgb(50, 50, 50)']]
fig = make_subplots(
rows=1, cols=2, subplot_titles=('one', 'two'),horizontal_spacing = 0.4)
# plot 1
fig.add_trace(
go.Scatter(x=df['t0'],
y=df['t1'],
mode='markers',
marker=dict(colorscale=custColorscale,
cmin=0, cmax=2,
size=6, color=df['Var1'],
showscale=True,colorbar=dict(len=1.05, x=0.35
,y=0.49)), text=df['Var2']), 1, 1)
## plot 2
fig.add_trace(
go.Scatter(x=df['t0'],
y=df['t1'],
mode='markers',
marker=dict(
size=6, color=df['Var2'],
showscale=True,colorbar=dict(len=1.05, x=1.2 , y=0.49)),
text=df['Var2']),
1, 2 )
# show plots
fig.update_layout(height=500, width=700,
title='Raw data')
fig.update_layout(coloraxis=dict(
colorscale='Bluered_r'))
fig.show()
The only additions were:
The colorbar attribute of the marker.
The horizontal spacing to allow space for the first scale.
Feel free to play with these attributes.
I hope this helps!
Best regards.
I'm using the code below to generate a scatter plot in pyplot where I'd like to have each of the 9 classes plotted in a different color. There are multiple points within each class.
I cannot figure out why the legend does not work with smaller sample sizes.
def plot_scatter_test(x, y, c, title):
data = pd.DataFrame({'x': x, 'y': y, 'c': c})
classes = len(np.unique(c))
colors = cm.rainbow(np.linspace(0, 1, classes))
ax = plt.subplot(111)
for s in range(0,classes):
ss = data[data['c']==s]
plt.scatter(x=ss['x'], y=ss['y'],c=colors[s], label=s)
ax.legend(loc='lower left',scatterpoints=1, ncol=3, fontsize=8, bbox_to_anchor=(0, -.4), title='Legend')
plt.show()
My data looks like this
When I plot this by calling
plot_scatter_test(test['x'], test['y'],test['group'])
I get varying colors in the chart, but the legend is a single color
So to make sure my data was ok, I created a random dataframe using the same type of data. Now I get different colors, but something is still wrong as they aren't sequential.
test2 = pd.DataFrame({
'y': np.random.uniform(0,1400,36),
'x': np.random.uniform(-250,-220,36),
'group': np.random.randint(0,9,36)
})
plot_scatter_test(test2['x'], test2['y'],test2['group'])
Finally, I create a larger plot of 360 data points, and everything looks the way I would expect it to. What am I doing wrong?
test3 = pd.DataFrame({
'y': np.random.uniform(0,1400,360),
'x': np.random.uniform(-250,-220,360),
'group': np.random.randint(0,9,360)
})
plot_scatter_test(test3['x'], test3['y'],test3['group'])
You need to make sure not to confuse the class itself with the number you use for indexing.
To better observe what I mean, use the following dataset with your function:
np.random.seed(22)
X,Y= np.meshgrid(np.arange(3,7), np.arange(4,8))
test2 = pd.DataFrame({
'y': Y.flatten(),
'x': X.flatten(),
'group': np.random.randint(0,9,len(X.flatten()))
})
plot_scatter_test(test2['x'], test2['y'],test2['group'])
which results in the following plot, where points are missing.
So, make a clear distinction between the index and the class, e.g. as follows
import numpy as np; np.random.seed(22)
import matplotlib.pyplot as plt
import pandas as pd
def plot_scatter_test(x, y, c, title="title"):
data = pd.DataFrame({'x': x, 'y': y, 'c': c})
classes = np.unique(c)
print classes
colors = plt.cm.rainbow(np.linspace(0, 1, len(classes)))
print colors
ax = plt.subplot(111)
for i, clas in enumerate(classes):
ss = data[data['c']==clas]
plt.scatter(ss["x"],ss["y"],c=[colors[i]]*len(ss), label=clas)
ax.legend(loc='lower left',scatterpoints=1, ncol=3, fontsize=8, title='Legend')
plt.show()
X,Y= np.meshgrid(np.arange(3,7), np.arange(4,8))
test2 = pd.DataFrame({
'y': Y.flatten(),
'x': X.flatten(),
'group': np.random.randint(0,9,len(X.flatten()))
})
plot_scatter_test(test2['x'], test2['y'],test2['group'])
Apart from that it is indeed necessary not to supply the color 4-tuple directly to c as this would be interpreted as four single colors.
I feel silly now after staring at this for a while. The error was in the color being passed. I was passing a single color to the .scatter function. However since there are multiple points, you need to pass an equal number of colors. Therefore
plt.scatter(x=ss['x'], y=ss['y'],c=colors[s], label=s)
Can be something like
plt.scatter(x=ss['x'], y=ss['y'],c=[colors[s]]*len(ss), label=s)