Dash/plotly, show only top 10 values in histogram - python

I am creating a dashboard in dash for a course at university. I created 3 histograms however, there are many unique values which give a long range of x values. In my plots I would like to show only the 10 or 20 values that have the highest count (top 10 values). Can someone help me out?
import plotly.express as px
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# Build App
app = JupyterDash(__name__)
app.layout = html.Div([
html.H1("forensics "),
dcc.Graph(id='graph'),
dcc.Graph(id='graph1'),
dcc.Graph(id='graph2'),
html.Label([
"select market",
dcc.Dropdown(
id='market', clearable=False,
value='whitehousemarket', options=[
{'label': c, 'value': c}
for c in posts['marketextract'].unique()
])
]),
])
# Define callback to update graph
#app.callback(
Output('graph', 'figure'),
Output('graph1', 'figure'),
Output('graph2', 'figure'),
[Input("market", "value")]
)
def update_figure(market):
fig=px.histogram(x=posts['datetime'].loc[posts['marketextract']==market])
fig1=px.histogram(x=posts['username'].loc[posts['marketextract']==market])
fig2=px.histogram(x=posts['drugs'].loc[posts['marketextract']==market])
return [fig, fig1, fig2]
# Run app and display result inline in the notebook
app.run_server(mode='inline')

To my knowledge, px.histogram() does not have a method to exclude certain observations of bins. But judging by the look of your data (please consider sharing a proper sample), what you're doing here is just showing the different counts of some user names. And you can easily do that through a combination of df.groupby() and px.histogram. Or px.bar() or go.Bar() for that matter, but we'll stick with px.histogram since that is what you're seeking help with. Anyway, using random selections of country names from px.gapminder you can use:
dfg = df.groupby(['name']).size().to_frame().sort_values([0], ascending = False).head(10).reset_index()
fig = px.histogram(dfg, x='name', y = 'count')
And get:
If you drop .head(10) you'll get this instead:
And I hope this is the sort of functionality you were looking for. And don't be intimidated by the long df.groupby(['name']).size().to_frame().sort_values([0], ascending = False).reset_index(). I'm not a pandas expert, so you could quite possibly find a more efficient approach. But it does the job. Here's the complete code with some sample data:
# imports
import pandas as pd
import plotly.express as px
import random
# data sample
gapminder = list(set(px.data.gapminder()['country']))[1:20]
names = random.choices(gapminder, k=100)
# data munging
df = pd.DataFrame({'name':names})
dfg = df.groupby(['name']).size().to_frame().sort_values([0], ascending = False).reset_index()
dfg.columns = ['name', 'count']
# plotly
fig = px.histogram(dfg, x='name', y = 'count')
fig.layout.yaxis.title.text = 'count'
fig.show()

Related

How to add a number of points on a current view of plotly scatterplot?

I have a plotly generated plot in python.
It can be zoomed or a specific region selected by window selection.
Is there any solution to calculate current number of points on current view of scatterplot?
E.g. initial screen gives us 1000 points, but when I zoom or using a window to choose any specific area - I want to see that this area includes only 100 points from initial scatterplot. Is it possible? Or maybe to get bounds from x-axis of a plot to use it in further dashboard - e.g. to calculate max/min/mean values for the points on the screen..
you clearly state dashboard hence assuming dash
zoom and pan result in relayoutDatacallback being triggered
this passes a dict which can be parsed for min/max x and y
code below shows this, filtering dataframe used to create scatter to get number of points
import dash
import plotly.express as px
from dash.dependencies import Input, Output, State
from jupyter_dash import JupyterDash
import numpy as np
import pandas as pd
r = np.random.RandomState(42)
# some data to plot
df = pd.DataFrame(
{"x-val": np.linspace(1, 100, 1000), "y-val": r.uniform(1, 100, 1000)}
)
fig = px.scatter(df, x="x-val", y="y-val")
app = JupyterDash(__name__)
app.layout = dash.html.Div(
[dash.dcc.Graph(id="graph", figure=fig), dash.html.Div(id="debug")]
)
# simple callback capture zoom and pan
#app.callback(Output("debug", "children"), Input("graph", "relayoutData"))
def figEvent(relayoutData):
r = relayoutData
# parse out min & max values displayed
rng = {
ax: [df[c].min(), df[c].max()]
if f"{ax}axis.range[0]" not in r.keys()
else [r[f"{ax}axis.range[0]"], r[f"{ax}axis.range[1]"]]
for ax, c in zip("xy", ["x-val", "y-val"])
}
# filter dataframe and get number of rows
n = df.loc[df["x-val"].between(*rng["x"]) & df["y-val"].between(*rng["y"])].shape[0]
return n
app.run_server(mode="inline", debug=True)

Error with Multi Dropdown List in Python Dash

I'm trying to visualize the price development of different financial indices (A, B, C) in an interactive line chart embedded in Python Dash. I want to allow users to select multiple indices and compare them accordingly in the same plot over a specific period of time. At the same time, the plot should also change accordingly when unselecting indices. So far, I was able to plot only one index. The issue I'm having now is that the the plot does not change at all when adding additional indices. I've tried to solve this issue myself for the last couple of hours, but without success, unfortunately.
I'm using Jupyter Notebook. Here's my code with a data sample:
import pandas as pd
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output
import plotly.express as px
data = [['2020-01-31', 100, 100, 100], ['2020-02-28', 101, 107, 99], ['2020-03-31', 104, 109, 193], ['2020-04-30', 112, 115, 94], ['2020-05-31', 112, 120, 189]]
df = pd.DataFrame(data, columns = ['DATE', 'A', 'B', 'C'])
df = df.set_index('DATE')
df
# create the Dash app
app = dash.Dash()
# set up app layout
app.layout = html.Div(children=[
html.H1(children='Index Dashboard'),
dcc.Dropdown(id='index-dropdown',
options=[{'label': x, 'value': x}
for x in df.columns],
value='A',
multi=True, clearable=True),
dcc.Graph(id='price-graph')
])
# set up the callback function
#app.callback(
Output(component_id='price-graph', component_property='figure'),
[Input(component_id='index-dropdown', component_property='value')]
)
def display_time_series(selected_index):
filtered_index = [df.columns == selected_index]
fig = px.line(df, x=df.index, y=selected_index,
labels={'x', 'x axis label'})
fig.update_layout(
title="Price Index Development",
xaxis_title="Month",
yaxis_title="Price",
font=dict(size=13))
return fig
# Run local server
if __name__ == '__main__':
app.run_server(debug=True, use_reloader=False)
As I'm relatively new to Python Dash, any help or advice would be extremely appreciated!
You're not applying your filter inside your callback to your data and the filter itself doesn't work.
Instead you can do something like this:
#app.callback(
Output(component_id="price-graph", component_property="figure"),
[Input(component_id="index-dropdown", component_property="value")],
)
def display_time_series(selected_index):
dff = df[selected_index] # Only use columns selected in dropdown
fig = px.line(dff, x=df.index, y=selected_index, labels={"x", "x axis label"})
fig.update_layout(
title="Price Index Development",
xaxis_title="Month",
yaxis_title="Price",
font=dict(size=13),
)
return fig

Plotly combined barplot and table controled by range slider

I'm currently trying to create a graph with plotly,
My goal would be to create a combined Barplot / Data table both controled with a range slider in order to controle the values with the date. I've succeded to create the barplot controled with the range slider.
I can't manage to control the table :/
Here is a combined plot but where the range slider is attached to the table, as you can see it does not control the date but the table view
https://plotly.com/~tristan1551/31/
Here is an exemple of a barplot i've done with a range slider https://plotly.com/~tristan1551/23/
Another idea would be to only to control the table with the ranger slider, i can't manager to do that too.
Is there a way to achive what i want to do ?
Thank you for your herlp :)
to synchronise a table with a range slider on a figure you can use a dash callback
below code creates a bar chart with a rangeslider
attaches a callback to changes in figure to get position of rangeslider
constructs table based on these inputs
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
import dash_table
from dash.dependencies import Input, Output, State
import pandas as pd
import numpy as np
import plotly.express as px
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/2014_apple_stock.csv")
df["AAPL_x"] = pd.to_datetime(df["AAPL_x"])
fig = px.bar(df, x="AAPL_x", y="AAPL_y").update_layout(
xaxis={
"range": [df["AAPL_x"].quantile(0.9), df["AAPL_x"].max()],
"rangeslider": {"visible": True},
}
)
# Build App
app = JupyterDash(__name__)
app.layout = html.Div(
[dcc.Graph(id="bargraph", figure=fig), html.Div(id="bartable", children=[])],
)
#app.callback(
Output("bartable", "children"),
Input("bargraph", "relayoutData"),
)
def updateTable(graphData):
global df
if graphData and "xaxis.range" in graphData.keys():
d1 = pd.to_datetime(graphData["xaxis.range"][0])
d2 = pd.to_datetime(graphData["xaxis.range"][1])
else:
d1 = df["AAPL_x"].quantile(0.9)
d2 = df["AAPL_x"].max()
dft = df.loc[df["AAPL_x"].between(d1, d2)]
return dash_table.DataTable(
columns=[{"name": c, "id": c} for c in dft.columns],
data=dft.to_dict("records"),
)
# Run app and display result inline in the notebook
app.run_server(mode="inline")

Python Dash resizing candlesticks

I've been trying to create a candlestick graph that shows the prices of NASDAQ and moving average on it, which has been a partial success:
import dash
from dash.dependencies import Output, Input
import dash_core_components as dcc
import dash_html_components as html
import plotly
import random
import plotly.graph_objs as go
import yfinance as yf
import plotly.express as px
import datetime as dt
from datetime import datetime
from metody import metody
import time
import pandas as pd
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
NASDAQ = pd.read_excel(r'file.xlsx')
app = dash.Dash(external_stylesheets=external_stylesheets)
fig = go.Figure(data=go.Candlestick(
open=NASDAQ['Open'],
close=NASDAQ['Close'],
low=NASDAQ['Low'],
high=NASDAQ['High'],
))
close = NASDAQ['Close']
open = NASDAQ['Open']
srednia = metody.generateMovingAverage(NASDAQ['Close'], 3)
fig.add_trace(
go.Scatter(
y=srednia
)
)
app.layout = html.Div([
html.H1(
children="This is a chart of {}".format("NASDAQ"),
style={
'text-align': 'center'
}
),
dcc.Graph(
id='candles',
animate=True,
figure=fig,
),
dcc.Interval(
id='update',
interval=1000
)
])
app.run_server(debug=True)
unfortunately however when I'm trying to zoom the results, the candles do not upscale so that it's readable:
My question is: how do I deal with that? I'd love my chart to be nicely interactive (meaning the user can adjust the period and the candles are as big as it fits to the size of the chart).
PS: I'm really new to Dash, so if you've got any comments on my code or you know something I've done the wrong way round, please tell me :)
You need to get this into your code somewhere (upon zoom trigger):
fig.update_yaxes(range=[minY, maxY])
with minY and maxY being chosen from a downselection of your results (and rounded down and up respectively to look a bit better)
But I don't see where your zoom is being done - I assume you aren't using the plotly default but instead closing in using the lower summary bar?
You might have to dive in to the .css I'm afraid - or maybe you can insert additional lines into it before passing it to dash.Dash()

Accessing state of traces in Plotly Dash

I'm using Plotly Dash to build a stacked bar chart with 3 trace values.
I'm trying to access the state of the trace values so that I can filter a dataframe and pass the resulting DF back to the plot, as opposed to simply hiding the traces on de-select.
for example, I have a dataframe :
Item Status Value
1 First 2000
1 Second 3490
1 Third 542
2 First 641
2 Second 564
3 First 10
My traces are 3 values (first, Second, Third) pertaining to a linear process where each value is a status marking the advancement of an item.
My intention is to be able to select statuses from further down the progression so only those items that have advanced to a certain step are plotted.
As I select more advanced statuses in the trace legend, my plotted x-values should drop off since fewer advance that far, even though they all share the majority of the statuses
The only solution I can think of is to make checkboxes for each trace value and use those inputs in a callback, but that seems redundant to the select/de-select traces functionality built in.
You looking for something like that?
Code:
import dash
from dash.dependencies import Output, Input
import dash_core_components as dcc
import dash_html_components as html
import plotly
import plotly.graph_objs as go
import pandas as pd
app = dash.Dash(__name__)
df = pd.DataFrame({'Item': [1, 1, 1, 2, 2, 3],
'Status': ["First", "Second", "Third",
"First", "Second", "First"],
'Value': [2000, 3490, 542, 641, 564, 10]})
colors = {
'background': '#111111',
'background2': '#FF0',
'text': '#7FDBFF'
}
df1 = df.loc[df["Status"] == "First"]
df2 = df.loc[df["Status"] == "Second"]
df3 = df.loc[df["Status"] == "Third"]
trace1 = go.Bar(
x=df1["Item"],
y=df1["Value"],
name='First',
)
trace2 = go.Bar(
x=df2["Item"],
y=df2["Value"],
name='Second',
)
trace3 = go.Bar(
x=df3["Item"],
y=df3["Value"],
name='Third',
)
app.layout = html.Div(children=[
html.Div([
html.H5('Your Plot'),
dcc.Graph(
id='cx1',
figure=go.Figure(data=[trace1, trace2, trace3],
layout=go.Layout(barmode='stack')))],)])
if __name__ == '__main__':
app.run_server(debug=True)
Output:

Categories

Resources