plotly: huge number of datapoints

plotly: huge number of datapoints - python

I am trying to plot something with a huge number of data points (2mm-3mm) using plotly.
When I run
py.iplot(fig, filename='test plot')
I get the following error:
Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points
If the visualization you're using aggregates points (e.g., box plot, histogram, etc.) you can disregard this warning.
So then I try to save it with this:
py.image.save_as(fig, 'my_plot.png')
But then I get this error:
PlotlyRequestError: Unknown Image Server Error
How do I do this properly? I don't care if it's a still image or an interactive display within my notebook.

Plotly really seems to be very bad in this. I am just trying to create a boxplot with 5 Million points, which is no problem in the simple R function "boxplot", but plotly is calculating endlessly for this.
It should be a major issue to improve this. Not all data has to be saved (and shown) in the plotly object. This is the main problem I guess.

one option would be down-sampling your data, not sure if you'd like that:
https://github.com/devoxi/lttb-py
I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write!
Thank you!

You can try the render_mode argument. Example:
import plotly.express as px
import pandas as pd
import numpy as np
N = int(1e6) # Number of points
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N)))
fig = px.scatter(df, x="x", y="y", render_mode='webgl')
fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))
fig.show()
In my computer N=1e6 takes about 5 seconds until the plot is visible, and the "interactiveness" is still very good. With N=10e6 it takes about 1 minute and the plot is not responsive anymore (i.e. it is really slow to zoom, pan or anything).

Related

Slow view rendering using plotly

I'm building a dashboard to display on a TV for production data using Django+RaspberryPI but the rendering is quite slow.
The dashboard contains 12 plotly indicator (number type) and 3 bar charts.
At my view, I'm reading a .csv file using pandas, about ~50 lines, then I start creating variables to generates the plotly graphs and pass as context, like below
Create plotly indicator:
df_36_ordered = df_detalhado_full.query('Maquina=="3.6"').sort_values(by='Data')
oee_36 = df_36_ordered['OEE'].iloc[-1:].iloc[0]
Create dataframe for bar plot:
df_36_historico = df_36_ordered[['dia_semana', 'OEE']].tail(6).iloc[:-1]
And then I generates the html for all of it:
oee_36 = plot(fig, output_type='div', include_plotlyjs=False, show_link=False, link_text='')
I'm looking for tips/alternatives on improving rendering the page.
From entering address to full load it's taking 20 seconds.
Thanks in advance!!

Dask Instead of Pandas, however, your dataset is only 50 lines, I don't know if it will improve processing time.
Try to implement multithreading to process your graphs simultaneously.
If you can only use plotly.graph_objects it's will probably make it faster, but harder for a similar result. Try to delete useless feature, plotly give you lot of built in feature when you plot somethings, those features will make your rendering slower

Plotly/Dash large dataset Densitymapbox memory usage

I have a dataset with millions of Latitude/Longitude points that we are plotting at high resolution using plotly-dash with a Densitymapbox:
data = pandas.DataFrame()
# ...
go.Densitymapbox(
lat=data['Latitude'],
lon=data['Longitude'],
z=data['Count'],
hoverinfo='skip',
# ...
)
According to Mapbox, their library should support millions of points without issue as shown by their demo # https://demos.mapbox.com/100mpoints/
When I try to do this, it does appear that Mapbox is able to handle the requests. However in my implementation with plotly/dash, unlike the demo above, the browser gets underwater. The first load works fine (although does use a lot of memory), but on a reload of the data, Chrome crashes and Firefox reports an out of memory error to the console and does not update the heatmap.
The data set I am using is 1093737 points. Doing back-of-the-napkin math, this should only be < ~25 MB of data (1093737 * (8 + 8 + 8)) for 2 double precision floating point values and 1 (64bit) integer, and the amount of data sent to the browser does show this. However, the browser process balloons in memory to over 3.5GB and then on subsequent reloads, it appears the browser runs out of memory.
Are there any facilities in dash/plotly to prevent this from taking down the browser? I do not need to interact with the points of density plot, and have set the hoverinfo='skip' to indicate that, but would like to keep the interactivity of the heatmap recalculating the overlay when the map zoom changes. I am investigating other alternatives such as rasterizing the heatmap server side using datashader, but that would remove this interactivity which I would like to keep.

LensPy was created to solve this exact problem. It is built on top of Plotly Dash to allow you to plot very large datasets while maintaining fluid interactivity. Here is an example of how you can achieve this with a Mapbox.
import pandas as pd
import plotly.express as px
from lenspy import DynamicPlot
df = pd.read_csv(
'https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
fig = px.density_mapbox(df,
lat='Latitude', lon='Longitude',
z='Magnitude',
radius=10,
center=dict(lat=0, lon=180),
zoom=0,
mapbox_style="stamen-terrain")
plot = DynamicPlot(fig)
plot.show()
Disclaimer: I am the creator of LensPy.

Rotating parallel coordinate axis-names in Pandas

When using some of the built in visualization tools in Pandas, one that is very helpful for me is the parallel_coordinates visualization. However, since I have around 18 features in the dataframe, the bottom of the parallel_coords plot gets really messy.
Therefore, I was wondering if anyone knew how to rotate the axis-names to be vertical rather than horizontal as shown here:
I did find a way to use parallel_coords in a polar set up, creating a radar-chart; while that was helpful for getting the different features to be visible, that solution doesn't quite work since whenever the values are close to 0, it becomes almost impossible to see the curve. Furthermore, doing it with the polar coord frame required me to break from using pandas' dataframe which is part of what made the this method so appealing.

Use plt.xticks(rotation=90) should be enough. Here is an example with the “Iris” dataset:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import parallel_coordinates
data = pd.read_csv('iris.csv')
parallel_coordinates(data, 'Name')
plt.xticks(rotation=90)
plt.show()

What are ways to speed up seaborns pairplot

I have a dataframe with 250.000 rows but 140 columns and I'm trying to construct a pair plot. of the variables.
I know the number of subplots is huge, as well as the time it takes to do the plots. (I'm waiting for more than an hour on an i5 with 3,4 GHZ and 32 GB RAM).
Remebering that scikit learn allows to construct random forests in parallel, I was checking if this was possible also with seaborn.
However, I didn't find anything. The source code seems to call the matplotlib plot function for every single image.
Couldn't this be parallelised? If yes, what is a good way to start from here?

Rather than parallelizing, you could downsample your DataFrame to say, 1000 rows to get a quick peek, if the speed bottleneck is indeed occurring there. 1000 points is enough to get a general idea of what's going on, usually.
i.e. sns.pairplot(df.sample(1000)).

Save your pairplot to image and then show this image instead of rendering it all in your browser.
from IPython.display import Image
import seaborn as sns
import matplotlib.pyplot as plt
sns_plot = sns.pairplot(df, size=2.0)
sns_plot.savefig("pairplot.png")
plt.clf() # Clean parirplot figure from sns
Image(filename='pairplot.png') # Show pairplot as image

For me, I had a situation where the histograms were taking a very long time due to the variance in the data. I only had 1200 rows and 4 columns, but it took half an hour before I gave up. I think it was so spread out and unordered that the histogram was constantly updating. One workaround might be to play with the bin parameter, but my solution was to use a KDE for the diagonal instead. With the KDE, it takes only a few seconds.
sns.pairplot(df, diag_kind='kde')

Barchart (o plot) 3D in Python

I need to plot some data in various forms. Currently I'm using Matplotlib and I'm fairly happy with the plots I'm able to produce.
This question is on how to plot the last one. The data is similar to the "distance table", like this (just bigger, my table is 128x128 and still have 3 or more number per element).
Now, my data is much better "structured" than a distance table (my data doesn't varies "randomly" like in a alphabetically sorted distance table), thus a 3D barchart, or maybe 3 of them, would be perfect. My understanding is that such a chart is missing in Matplotlib.
I could use a (colored) Countor3d like these or something in 2D like imshow, but it isn't really well representative of what the data is (the data has meaning just in my 128 points, there isn't anything between two points). And the height of bars is more readable than color, IMO.
Thus the questions:
is it possible to create 3D barchart in Matplotlib? It should be clear that I mean with a 2D domain, not just a 2D barchart with a "fake" 3D rendering for aesthetics purposes
if the answer to the previous question is no, then is there some other library able to do that? I strongly prefer something Python-based, but I'm OK with other Linux-friendly possibilities
if the answer to the previous question is no, then do you have any suggestions on how to show that data? E.g. create a table with the values, superimposed to the imshow or other colored way?

For some time now, matplotlib had no 3D support, but it has been added back recently. You will need to use the svn version, since no release has been made since, and the documentation is a little sparse (see examples/mplot3d/demo.py). I don't know if mplot3d supports real 3D bar charts, but one of the demos looks a little like it could be extended to something like that.
Edit: The source code for the demo is in the examples but for some reason the result is not. I mean the test_polys function, and here's how it looks like:
example figure http://www.iki.fi/jks/tmp/poly3d.png
The test_bar2D function would be even better, but it's commented out in the demo as it causes an error with the current svn version. Might be some trivial problem, or something that's harder to fix.

MyavaVi2 can make 3D barcharts (scroll down a bit). Once you have MayaVi/VTK/ETS/etc. installed it all works beautifully, but it can be some work getting it all installed. Ubuntu has all of it packaged, but they're the only Linux distribution I know that does.

One more possibility is Gnuplot, which can draw some kind of pseudo 3D bar charts, and gnuplot.py allows interfacing to Gnuplot from Python. I have not tried it myself, though.

This is my code for a simple Bar-3d using matplotlib.
import mpl_toolkits
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
%matplotlib inline
## The value you want to plot
zval=[0.020752244,0.078514652,0.170302899,0.29543857,0.45358061,0.021255922,0.079022499,\
0.171294169,0.29749654,0.457114286,0.020009631,0.073154019,0.158043498,0.273889264,0.419618287]
fig = plt.figure(figsize=(12,9))
ax = fig.add_subplot(111,projection='3d')
col=["#ccebc5","#b3cde3","#fbb4ae"]*5
xpos=[1,2,3]*5
ypos=range(1,6,1)*5
zpos=[0]*15
dx=[0.4]*15
dy=[0.5]*15
dz=zval
for i in range(0,15,1):
ax.bar3d(ypos[i], xpos[i], zpos[i], dx[i], dy[i], dz[i],
color=col[i],alpha=0.75)
ax.view_init(azim=120)
plt.show()
http://i8.tietuku.com/ea79b55837914ab2.png

You might check out Chart Director:
http://www.advsofteng.com
It has a pretty wide variety of charts and graphs and has a nice Python (and several other languages) API.
There are two editions: The free version puts a blurb on the generated image, and the
pay version is pretty reasonably priced.
Here's one of the more interesting looking 3d stacked bar charts:
(source: advsofteng.com)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

plotly: huge number of datapoints - python

one option would be down-sampling your data, not sure if you'd like that: https://github.com/devoxi/lttb-py I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write! Thank you!

Related

Slow view rendering using plotly

Plotly/Dash large dataset Densitymapbox memory usage

Rotating parallel coordinate axis-names in Pandas

What are ways to speed up seaborns pairplot

Barchart (o plot) 3D in Python

Categories

Resources