I have a dataset that looks like this:
x y z
0 Jan 28446000 110489.0
1 Feb 43267700 227900.0
When I plot a line chart like this:
px.line(data,x = 'x', y = ['y','z'], line_shape = 'spline', title="My Chart")
The y axis scale comes from 0 to 90 M. The first line on the chart for y is good enough. However, the second line appears to be always at 0M. What can I do to improve my chart such that we can see clearly how the values of both column change over the x values?
Is there any way I can normalize the data? Or perhaps I could change the scaling of the chart.
Often times we use data which is in different scales, and scaling the data would mask a characteristic we wish to display. One way to handle this is to add a secondary y-axis. An example is shown below.
Key points:
Create a layout dictionary object
Add a yaxis2 key to the dict, with the following: 'side': 'right', 'overlaying': 'y1'
This tells Plotly to create a secondary y-axis on the right side of the graph, and to overlay the primary y-axis.
Assign the appropriate trace to the newly created secondary y-axis as: 'yaxis': 'y2'
The other trace does not need to be assigned, as 'y1' is the default y-axis.
Comments (TL;DR):
The example code shown here uses the lower-level Plotly API, rather than a convenience wrapper such as graph_object to express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).
The Docs:
Here is a link to the Plotly docs referencing multiple axes.
Example Code:
import pandas as pd
from plotly.offline import iplot
df = pd.DataFrame({'x': ['Jan', 'Feb'],
'y': [28446000, 43267700],
'z': [110489.0, 227900.0]})
layout = {'title': 'Secondary Y-Axis Demonstration',
'legend': {'orientation': 'h'}}
traces = []
traces.append({'x': df['x'], 'y': df['y'], 'name': 'Y Values'})
traces.append({'x': df['x'], 'y': df['z'], 'name': 'Z Values', 'yaxis': 'y2'})
# Add config common to all traces.
for t in traces:
t.update({'line': {'shape': 'spline'}})
layout['yaxis1'] = {'title': 'Y Values', 'range': [0, 50000000]}
layout['yaxis2'] = {'title': 'Z Values', 'side': 'right', 'overlaying': 'y1', 'range': [0, 400000]}
iplot({'data': traces, 'layout': layout})
Graph:
Related
I have a DataFrame that contains two columns:
Nucleotide (ordinal, only unique values)
Similarities (quantative, count of specific Nucleotide)
I want to plot an interactive bar chart using Streamlit, where each Nucleotide will have different color, like on the example below:
I know how to do it using matplotlib or seaborn, but these figures are not interactive.
Also my approach using vega-lite does not work, because the 'c' argument for the colormap cannot refer to the axis being already used on the plot.
st.vega_lite_chart(df, {
'mark': {'type': 'bar', 'tooltip': True},
'encoding': {
'x': {'field': 'Nucleotide', 'type': 'ordinal'},
'y': {'field': 'Similarities', 'type': 'quantitative'},
'color': {'field': 'Nucleotide', 'type': 'ordinal'},
},
})
Do you maybe have some other ideas?
Altair is a great choice here in my view. It comes out of the box with streamlit and creates very nice looking and interactive charts. Bascially you have to create a Chart object, pass in the data that you want to plot, and use the column names for things like x,y or color.
For your example, the code would read like
import altair as alt
import streamlit as st
chart = (
alt.Chart(data)
.mark_bar()
.encode(
alt.X("Nucleotide:O"),
alt.Y("Similarities"),
alt.Color("Nucleotide:O"),
alt.Tooltip(["Nucleotide", "Similarities"]),
)
.interactive()
)
st.altair_chart(chart)
assuming your dataframe is called data and the columns are called "Nucleotide" and "Similarities".
This would be a very basic bar chart that you can zoom in and hover over to see a tooltip.
I have a data source that I'm trying to bin and build a histogram out of. (Note that the data below is just as an example of the post-processed bin data).
My goal is to draw vertical lines to annotate different parts of the axis.
I got relatively close following other StackOverflow answers but the problem is that the axis for the vertical lines is separate from the axis for the binned data. My guess is that this is because the x values for the vertical lines are quantitative while the binned data is categorial.
Is there any way to have the vertical lines align with the x-axis on the bottom?
data_bar = pd.DataFrame({
'bin': [0.78,0.82,0.88,0.92,0.98,1.02,1.08,1.12,1.18,1.23,1.27,1.32,1.38],
'freq': [0,3,18,95,279,416,660,411,263,200,53,22,0]
})
data_bar['bin'] = data_bar['bin'].astype('category')
data_lines = pd.DataFrame({
'value': [0.8, 0.88, 1.001, 1.38],
'title': ['no_match', 'match', 'no_match', 'match']
})
bar = alt.Chart(data_bar).mark_bar().encode(x='bin', y='freq')
vertlines = alt.Chart(data_lines).mark_rule(
color='black',
strokeWidth=2
).encode(x='value')
text = alt.Chart(data_lines).mark_text(
align='left', dx=5, dy=-5
).encode(
x='value', text='title')
alt.layer(bar + vertlines + text).properties(width=500)
For reference, here is the graph in a vega editor.
You need to plot your binned data on a quantitative axis, which you can do by setting bin='binned' and adding an x2 encoding to specify the upper limit of each bin. Here are the required modifications to the data frame and the bar chart; the rest can stay the same:
data_bar = pd.DataFrame({
'bin': [0.78,0.82,0.88,0.92,0.98,1.02,1.08,1.12,1.18,1.23,1.27,1.32,1.38],
'freq': [0,3,18,95,279,416,660,411,263,200,53,22,0]
})
data_bar['bin_max'] = data_bar['bin'].shift(-1).fillna(data_bar['bin'].max() + 0.05)
# Note: don't convert data['bin'] to category
bar = alt.Chart(data_bar).mark_bar().encode(
x=alt.X('bin', bin='binned'),
x2='bin_max',
y='freq')
Here is the resulting chart:
In Plotly, in order to create scatter plots, I usually do the following:
fig = px.scatter(df, x=x, y=y)
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
I want the yaxis to intersect the xaxis at x=6. So, instead of left yaxis representing negative numbers, I want it to be from [2,6] After the intersection, right side of graph is from [6,10].
Likewise, yaxis from below axis goes from [2,6]. Above the xaxis, it goes from [6,10].
How can I do this in Plotly?
Following on from my comment, as far as I am aware, what you're after is not currently available.
However, here is an example of a work-around which uses a shapes dictionary to add horizontal and vertical lines - acting as intersecting axes - placed at your required x/y intersection of 6.
Sample dataset:
import numpy as np
x = (np.random.randn(100)*2)+6
y1 = (np.random.randn(100)*2)+6
y2 = (np.random.randn(100)*2)+6
Example plotting code:
import plotly.io as pio
layout = {'title': 'Intersection of X/Y Axes Demonstration'}
shapes = []
traces = []
traces.append({'x': x, 'y': y1, 'mode': 'markers'})
traces.append({'x': x, 'y': y2, 'mode': 'markers'})
shapes.append({'type': 'line',
'x0': 2, 'x1': 10,
'y0': 6, 'y1': 6})
shapes.append({'type': 'line',
'x0': 6, 'x1': 6,
'y0': 2, 'y1': 10})
layout['shapes'] = shapes
layout['xaxis'] = {'range': [2, 10]}
layout['yaxis'] = {'range': [2, 10]}
pio.show({'data': data, 'layout': layout})
Output:
Comments (TL;DR):
The example code shown here uses the low-level Plotly API (plotly.io), rather than a convenience wrapper such as graph_objects or express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).
I think fig.add_hline() and fig.add_vline() is the function your need.
Example code
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'x':[6,7,3], 'y':[4,5,6]})
fig = px.scatter(df, x='x', y='y')
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
fig.add_hline(y=4)
fig.add_vline(x=6)
fig.show()
Output
first question so go easy on me please!
I'm trying to create a scatter plot in plotly where the points are coloured according to a datetime column, but it seems to error out. it works fine if I set the color to, say, a numeric column. is there a way to do this please?
Sample code below. if I change color to, say np.arange(0,graph_data.shape[0]) it would work fine but the colorbar labels would be meaningless.
fig1 = go.Figure()
fig1.add_trace(go.Scatter(
x=graph_data['x_data'],
y=graph_data['y_data'],
mode='markers',
marker={
'size': 15,
'opacity': 0.95,
'line': {'width': 0.5, 'color': 'white'},
'color': graph_data['date'],
'colorbar': {'title': 'Date'},
'colorscale': 'Viridis'
}
)
There may be a better way to do this, but one possible workaround is to convert your datetime to seconds after a given date. You could try the following using the datetime module:
int(datetime.datetime.utcnow().timestamp())
This will then be an integer which will be understood by the scatter function.
Using the principles of Matt's work around, I created a 'number of days from start' column for the colorscale to reference and then customised tick labels and spacing on the colour bar as follows:
# 'date' column contains the dates I want to set colorscale on
# set minimum date
min_date = graph_data['date'].min()
# create column giving number of days from the minimum date, as the colour reference
graph_data['days'] = graph_data['date'].apply(lambda x: (x-min_date).days)
# here I want colorbar tick labels every 7 days, so I create a list of the
# required multiples of 7
max_days = graph_data['days'].max()
fig1ticks = np.arange(0, (int(max_days/7)+1)*7, 7)
# use datetime.timedelta function to create the dates that match the tick values
fig1datetimes = [min_date + datetime.timedelta(days=i) for i in fig1ticks.tolist()]
# and create text strings of these dates in a suitable format
fig1text = [i.strftime("%d-%b-%Y") for i in fig1datetimes]
fig1 = go.Figure()
fig1.add_trace(go.Scatter(
x=graph_data['x_data'],
y=graph_data['y_data'],
mode='markers',
marker={
'size': 15,
'opacity': 0.95,
'line': {'width': 0.5, 'color': 'white'},
# set color reference to new 'days' column
'color': graph_data['days'],
# set 'tickvals' and 'ticktext' in colorbar dict
'colorbar': {'title': 'Date',
'tickvals': fig1ticks,
'ticktext': fig1text,
},
'colorscale': 'Viridis'
}
)
)
Sorry beforehand for the long post. I'm new to python and to plotly, so please bear with me.
I'm trying to make a scatterplot with a trendline to show me the legend of the plot including the regression parameters but for some reason I can't understand why px.scatter doesn't show me the legend of my trace. Here is my code
fig1 = px.scatter(data_frame = dataframe,
x="xdata",
y="ydata",
trendline = 'ols')
fig1.layout.showlegend = True
fig1.show()
This displays the scatterplot and the trendline, but no legend even when I tried to override it.
I used pio.write_json(fig1, "fig1.plotly") to export it to jupyterlab plotly chart studio and add manually the legend, but even though I enabled it, it won't show either in the chart studio.
I printed the variable with print(fig1) to see what's happening, this is (part of) the result
(Scatter({
'hovertemplate': '%co=%{x}<br>RPM=%{y}<extra></extra>',
'legendgroup': '',
'marker': {'color': '#636efa', 'symbol': 'circle'},
'mode': 'markers',
'name': '',
'showlegend': False,
'x': array([*** some x data ***]),
'xaxis': 'x',
'y': array([*** some y data ***]),
'yaxis': 'y'
}), Scatter({
'hovertemplate': ('<b>OLS trendline</b><br>RPM = ' ... ' <b>(trend)</b><extra></extra>'),
'legendgroup': '',
'marker': {'color': '#636efa', 'symbol': 'circle'},
'mode': 'lines',
'name': '',
'showlegend': False,
'x': array([*** some x data ***]),
'xaxis': 'x',
'y': array([ *** some y data ***]),
'yaxis': 'y'
}))
As we can see, creating a figure with px.scatter by default hides the legend when there's a single trace (I experimented adding a color property to px.scatter and it showed the legend), and searching the px.scatter documentation I can't find something related to override the legend setting.
I went back to the exported file (fig1.plotly.json) and manually changed the showlegend entries to True and then I could see the legend in the chart studio, but there has to be some way to do it directly from the command.
Here's the question:
Does anyone know a way to customize px.express graphic objects?
Another workaround I see is to use low level plotly graph object creation, but then I don't know how to add a trendline.
Thank you again for reading through all of this.
You must specify that you'd like to display a legend and provide a legend name like this:
fig['data'][0]['showlegend']=True
fig['data'][0]['name']='Sepal length'
Plot:
Complete code:
import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length",
trendline='ols',
trendline_color_override='red')
fig['data'][0]['showlegend']=True
fig['data'][0]['name']='Sepal length'
fig.show()
Complete code: