I have a dataset with millions of Latitude/Longitude points that we are plotting at high resolution using plotly-dash with a Densitymapbox:
data = pandas.DataFrame()
# ...
go.Densitymapbox(
lat=data['Latitude'],
lon=data['Longitude'],
z=data['Count'],
hoverinfo='skip',
# ...
)
According to Mapbox, their library should support millions of points without issue as shown by their demo # https://demos.mapbox.com/100mpoints/
When I try to do this, it does appear that Mapbox is able to handle the requests. However in my implementation with plotly/dash, unlike the demo above, the browser gets underwater. The first load works fine (although does use a lot of memory), but on a reload of the data, Chrome crashes and Firefox reports an out of memory error to the console and does not update the heatmap.
The data set I am using is 1093737 points. Doing back-of-the-napkin math, this should only be < ~25 MB of data (1093737 * (8 + 8 + 8)) for 2 double precision floating point values and 1 (64bit) integer, and the amount of data sent to the browser does show this. However, the browser process balloons in memory to over 3.5GB and then on subsequent reloads, it appears the browser runs out of memory.
Are there any facilities in dash/plotly to prevent this from taking down the browser? I do not need to interact with the points of density plot, and have set the hoverinfo='skip' to indicate that, but would like to keep the interactivity of the heatmap recalculating the overlay when the map zoom changes. I am investigating other alternatives such as rasterizing the heatmap server side using datashader, but that would remove this interactivity which I would like to keep.
LensPy was created to solve this exact problem. It is built on top of Plotly Dash to allow you to plot very large datasets while maintaining fluid interactivity. Here is an example of how you can achieve this with a Mapbox.
import pandas as pd
import plotly.express as px
from lenspy import DynamicPlot
df = pd.read_csv(
'https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
fig = px.density_mapbox(df,
lat='Latitude', lon='Longitude',
z='Magnitude',
radius=10,
center=dict(lat=0, lon=180),
zoom=0,
mapbox_style="stamen-terrain")
plot = DynamicPlot(fig)
plot.show()
Disclaimer: I am the creator of LensPy.
Related
I'm building a dashboard to display on a TV for production data using Django+RaspberryPI but the rendering is quite slow.
The dashboard contains 12 plotly indicator (number type) and 3 bar charts.
At my view, I'm reading a .csv file using pandas, about ~50 lines, then I start creating variables to generates the plotly graphs and pass as context, like below
Create plotly indicator:
df_36_ordered = df_detalhado_full.query('Maquina=="3.6"').sort_values(by='Data')
oee_36 = df_36_ordered['OEE'].iloc[-1:].iloc[0]
Create dataframe for bar plot:
df_36_historico = df_36_ordered[['dia_semana', 'OEE']].tail(6).iloc[:-1]
And then I generates the html for all of it:
oee_36 = plot(fig, output_type='div', include_plotlyjs=False, show_link=False, link_text='')
I'm looking for tips/alternatives on improving rendering the page.
From entering address to full load it's taking 20 seconds.
Thanks in advance!!
Dask Instead of Pandas, however, your dataset is only 50 lines, I don't know if it will improve processing time.
Try to implement multithreading to process your graphs simultaneously.
If you can only use plotly.graph_objects it's will probably make it faster, but harder for a similar result. Try to delete useless feature, plotly give you lot of built in feature when you plot somethings, those features will make your rendering slower
I have the following dataframe with 40M rows:
occ_status_pre = ["retired","unemployed","house person","financially independent","employed","student"]
test_df = pd.DataFrame(np.random.randint(0,100,size=(40000000, 4)), columns=["id","occupation_status","age","height"])
occ_status = []
for num in range(0,40000000):
occ_status.append(random.choice(occ_status_pre))
test_df["occupation_status"] = occ_status
test_df.head()
id occupation_status age height
0 32 unemployed 41 78
1 83 retired 35 99
2 77 retired 61 19
3 8 house person 28 64
4 6 unemployed 46 22
In Seaborn, I can successfully create a Box plot for the entire dataframe without any issue:
fig,ax = plt.subplots(figsize=(10,8))
ax = sns.boxplot(x="occupation_status",y="age",data=test_df)
plt.tight_layout()
However, if I try to recreate this same Box plot in Plotly 4.2 then it crashes my web browser.
Further investigation led me to the pio.renderers attribute. If I set pio.renderers to equal "browser" then it outputs the Box plot visualisation to a new browser tab:
fig = px.box(test_df,x="occupation_status",y="age")
fig.show(renderer="browser")
However, if the row count of my dataframe is more than 28M rows then this will only display a blank white screen - no visualisation ever appears in the new tab.
From further investigation, it didn't seem to matter if I had more columns in my dataframe, if I try to plot a Box plot for a dataframe that has more than 28M rows then I can not plot it.
I know that there is render_mode="webgl" for working with larger data, but I can only seem to set that for Scatter and Line plot types.
So my question is, is there a way to produce interactive Box plots in Plotly for large dataframes? (Same question also holds true for Violin plots too.)
If there is not, then what is the limitation preventing the plot from rendering when the row count is greater than 28 million rows?
If this is not possible in Plotly then does anyone know of any alternative tools that I could produce big data Box/Violin plots using Python? For example would this be possible with ggplot2 or will the same limitation also exist in that too?
My ultimate aim is to produce nice interactive plots using Plotly and then create Dash dashboards that use these plots.
Many thanks
23/10/19: Additional Testing:
I downgraded Plotly to 3.10.0 and got the same result - no figure is rendered and I am just presented with a white screen. I have now upgraded back up to version 4.2 again.
Additionaly, I installed Cufflinks. I followed the process described here to get Cufflinks working with Plotly 4: https://github.com/santosjorge/cufflinks/pull/203
Cufflinks behaviour is almost identical to Plotly Express behaviour - if I let the plot render in the notebook then nothing happens (no crash/error, no output of any kind but cell marks itself as run). If I output it to a html file as per the accepted answer Edit in Cufflinks for plotly: setting cufflinks config options launches, then it produces a very large (approx 1.5gb) html file that again shows up as a white screen when opened.
As this issue seems to be caused by working on a large dataframe, I thought there might be an issue with the Jupyter notebook being unable to handle such a large volume of data. Therefore I tried adjusting the iopub.data_rate as per https://community.plot.ly/t/tips-for-using-plotly-with-jupyter-notebook-5-0-the-latest-version/4156 but it didn't make a difference.
As I am experiencing very similar behaviour in both Plotly Express and Cufflinks, this suggests to me that the issue must be to do with Plotly itself?
Has anyone had any success producing Box or Violin plots for larger datasets?
In the end my solution was to move to holoviews.
import holoviews as hv
hv.extension('plotly')
boxwhisker = hv.BoxWhisker(test_df, 'occupation_status', 'age')
boxwhisker
Out[2]:
Points to note:
When I used the "bokeh" extension my plot rendered but was not interactive. However, when I used the "plotly" extension, my interactive box plot was successfully produced as per above. This is really interesting because when I try to produce this plot using plotly directly then it still crashes my browser.
For some reason my "occupation status" categories have been truncated to a single letter. I am experimenting with holoviews opts xrotation and xticks but have yet to fix this. This is not the end of the world, however it would be nice to fix.
I'm trying to build a Power BI tool for some data analysis, and one of the plots I need is an inverse quantile plot (quantiles on x axis, values on y axis). Power BI does not have this, and I can't find one on the app marketplace, so am using Python to code up what I need.
The static plot from pandas.DataFrame.plot() works fine but lacks the pinache of an interactive plot. I've coded up the plot I need using plotly, and ran it with py.iplot(), but Power BI tells me
No image was created. The Python code didn't result in creation of any visuals. Make sure your Python script results in a plot to the Python default device
There was no error, and I confirmed the code is fine by running the plot using py.plot(), and viewed the result in the browser. My code is:
import plotly.plotly as py
import plotly.graph_objs as go
# get the quantiles and reshape
qs = dataset.groupby(by='HYDROCARBON_TYPE').Q42018_AbsDevi.quantile(q=[0.01,0.05,0.1,0.2,0.25,0.5,0.75,0.8,0.9,0.95,0.99]).unstack().transpose()
# plot it
traces = []
for col in qs.columns:
traces.append(go.Scatter(x=qs.index, y=qs[col], name=col))
py.plot(traces,filename='basic-line')
Why would this not be working?
I wasn't able to find a solution using PowerBI, Plotly and Python, nor was I able to reproduce your errors. Regarding your errors, I ended up with visualizations that were either timed out or reporting a data type error. But we can get back to that if that's still interesting after another suggested solution, because I have been able to produce an interactive q-plot using PowerBI, plotly, ggplot and an R script visual like this:
Assuming that your main priorities are to make an interactive quantile plot in PowerBI, and that Python as a tool comes second, just follow the steps outlined in this post, and replace the R script with this:
source('./r_files/flatten_HTML.r')
############### Library Declarations ###############
libraryRequireInstall("ggplot2");
libraryRequireInstall("plotly")
####################################################
################### Actual code ####################
df <- data.frame(y = Values$Data)
# Build basic ggplot
g <- ggplot(df, aes(sample = y))
# Add quantile details
g = g + stat_qq() + stat_qq_line()
############# Create and save widget ###############
p = ggplotly(g);
internalSaveWidget(p, 'out.html');
####################################################
That should do the trick. Don't hesitate to let me know if this does not work for you.
You can take a look at this blog post. It describes how to add an interactive javascript Plotly Chart in Power BI. Its quite easy.
Kind regards,
Steve.
I am trying to plot something with a huge number of data points (2mm-3mm) using plotly.
When I run
py.iplot(fig, filename='test plot')
I get the following error:
Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points
If the visualization you're using aggregates points (e.g., box plot, histogram, etc.) you can disregard this warning.
So then I try to save it with this:
py.image.save_as(fig, 'my_plot.png')
But then I get this error:
PlotlyRequestError: Unknown Image Server Error
How do I do this properly? I don't care if it's a still image or an interactive display within my notebook.
Plotly really seems to be very bad in this. I am just trying to create a boxplot with 5 Million points, which is no problem in the simple R function "boxplot", but plotly is calculating endlessly for this.
It should be a major issue to improve this. Not all data has to be saved (and shown) in the plotly object. This is the main problem I guess.
one option would be down-sampling your data, not sure if you'd like that:
https://github.com/devoxi/lttb-py
I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write!
Thank you!
You can try the render_mode argument. Example:
import plotly.express as px
import pandas as pd
import numpy as np
N = int(1e6) # Number of points
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N)))
fig = px.scatter(df, x="x", y="y", render_mode='webgl')
fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))
fig.show()
In my computer N=1e6 takes about 5 seconds until the plot is visible, and the "interactiveness" is still very good. With N=10e6 it takes about 1 minute and the plot is not responsive anymore (i.e. it is really slow to zoom, pan or anything).
I'm writing a web interface for a database of genes values of some experiments with CGI in Python and I want to draw a graph for the data queried. I'm using matplotlib.pyplot, draw a graph, save it, and perform it on the web page. But usually there are many experiments queried hence there are a lot of values. Sometimes I want to know which experiment does one value belong to because it's a big value, whereas it's hard to identify because the picture is small in size. The names of the experiments are long strings so that it will mess the x axis if I put all the experiment names on the x axis.
So I wonder if there is a way to draw a graph that can interact with users, i.e. if I point my mouse to some part on the graph, there would be one small window appears and tells me the exact value and what is the experiment name here. And the most important is, I can use this function when I put the graph on the web page.
Thank you.
What you want is basically D3.js rendering of your plots. As far as I know, there are currently three great ways of achieving this, all under rapid development:
MPLD3 for creating graphs with Matplotlib and serving them as interactive web graphics (see examples in Jake's blog post).
Plotly where you can either generate the plots directly via Plotly or from Matplotlib figures (e.g. using matplotlylib) and have them served by Plotly.
Bokeh if you do not mind moving away from Matplotlib.