I'm trying to build a Power BI tool for some data analysis, and one of the plots I need is an inverse quantile plot (quantiles on x axis, values on y axis). Power BI does not have this, and I can't find one on the app marketplace, so am using Python to code up what I need.
The static plot from pandas.DataFrame.plot() works fine but lacks the pinache of an interactive plot. I've coded up the plot I need using plotly, and ran it with py.iplot(), but Power BI tells me
No image was created. The Python code didn't result in creation of any visuals. Make sure your Python script results in a plot to the Python default device
There was no error, and I confirmed the code is fine by running the plot using py.plot(), and viewed the result in the browser. My code is:
import plotly.plotly as py
import plotly.graph_objs as go
# get the quantiles and reshape
qs = dataset.groupby(by='HYDROCARBON_TYPE').Q42018_AbsDevi.quantile(q=[0.01,0.05,0.1,0.2,0.25,0.5,0.75,0.8,0.9,0.95,0.99]).unstack().transpose()
# plot it
traces = []
for col in qs.columns:
traces.append(go.Scatter(x=qs.index, y=qs[col], name=col))
py.plot(traces,filename='basic-line')
Why would this not be working?
I wasn't able to find a solution using PowerBI, Plotly and Python, nor was I able to reproduce your errors. Regarding your errors, I ended up with visualizations that were either timed out or reporting a data type error. But we can get back to that if that's still interesting after another suggested solution, because I have been able to produce an interactive q-plot using PowerBI, plotly, ggplot and an R script visual like this:
Assuming that your main priorities are to make an interactive quantile plot in PowerBI, and that Python as a tool comes second, just follow the steps outlined in this post, and replace the R script with this:
source('./r_files/flatten_HTML.r')
############### Library Declarations ###############
libraryRequireInstall("ggplot2");
libraryRequireInstall("plotly")
####################################################
################### Actual code ####################
df <- data.frame(y = Values$Data)
# Build basic ggplot
g <- ggplot(df, aes(sample = y))
# Add quantile details
g = g + stat_qq() + stat_qq_line()
############# Create and save widget ###############
p = ggplotly(g);
internalSaveWidget(p, 'out.html');
####################################################
That should do the trick. Don't hesitate to let me know if this does not work for you.
You can take a look at this blog post. It describes how to add an interactive javascript Plotly Chart in Power BI. Its quite easy.
Kind regards,
Steve.
Related
I'm new to using the SHAP library in python.
I'm trying to create a force plot in order to view the output of a single specific observation.
This is the code I used:
shap.force_plot(
explainer.expected_value,shap_values[4102,:],
x_validation.iloc[4102,:],matplotlib=True
)
The code works but I just wanted to be sure - does this code provide the force plot for the observation in row 4102?
Is there anything missing/some unnecessary parameters?
I have a dataset with millions of Latitude/Longitude points that we are plotting at high resolution using plotly-dash with a Densitymapbox:
data = pandas.DataFrame()
# ...
go.Densitymapbox(
lat=data['Latitude'],
lon=data['Longitude'],
z=data['Count'],
hoverinfo='skip',
# ...
)
According to Mapbox, their library should support millions of points without issue as shown by their demo # https://demos.mapbox.com/100mpoints/
When I try to do this, it does appear that Mapbox is able to handle the requests. However in my implementation with plotly/dash, unlike the demo above, the browser gets underwater. The first load works fine (although does use a lot of memory), but on a reload of the data, Chrome crashes and Firefox reports an out of memory error to the console and does not update the heatmap.
The data set I am using is 1093737 points. Doing back-of-the-napkin math, this should only be < ~25 MB of data (1093737 * (8 + 8 + 8)) for 2 double precision floating point values and 1 (64bit) integer, and the amount of data sent to the browser does show this. However, the browser process balloons in memory to over 3.5GB and then on subsequent reloads, it appears the browser runs out of memory.
Are there any facilities in dash/plotly to prevent this from taking down the browser? I do not need to interact with the points of density plot, and have set the hoverinfo='skip' to indicate that, but would like to keep the interactivity of the heatmap recalculating the overlay when the map zoom changes. I am investigating other alternatives such as rasterizing the heatmap server side using datashader, but that would remove this interactivity which I would like to keep.
LensPy was created to solve this exact problem. It is built on top of Plotly Dash to allow you to plot very large datasets while maintaining fluid interactivity. Here is an example of how you can achieve this with a Mapbox.
import pandas as pd
import plotly.express as px
from lenspy import DynamicPlot
df = pd.read_csv(
'https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
fig = px.density_mapbox(df,
lat='Latitude', lon='Longitude',
z='Magnitude',
radius=10,
center=dict(lat=0, lon=180),
zoom=0,
mapbox_style="stamen-terrain")
plot = DynamicPlot(fig)
plot.show()
Disclaimer: I am the creator of LensPy.
I'm hoping someone can point me in the right direction. The python datavis landscape has now become huge and there are so many options that I'm a bit lost on what the best way to achieve this is.
I have an xarray dataset (but it could easily be a pandas dataframe or a list of numpy arrays).
I have 3 columns, A, B, and C. They contain 40 data points.
I want to plot a scatter plot of A vs B + scale*C where scale is determined from an interactive slider.
The more advanced version of this would have a dropdown where you can select a different set of 3 columns but I'll worry about that bit later.
The caveat on all of this is that I'd like it to be online and interactive for others to use.
There seem to be so many options:
Jupyter (I don't use notebooks so I'm not that familiar with them but
with mybinder I assume this is easy to do)
Plotly
Bokeh Server
pyviz.org (this is the really interesting one but again, there'd seem
to be so many options on how to accomplish this)
Any thoughts or advice would be much appreciated.
There are indeed many options and i'm not sure what is best but i use bokeh a lot and am happy about it. The example below can get you started. To launch this open a cmd in the directory where you save the script and run "bokeh serve script.py --show --allow-websocket-origin=*".
from bokeh.plotting import figure
from bokeh.io import curdoc
from bokeh.models.widgets import Slider
from bokeh.models import Row,ColumnDataSource
#create the starting data
x=[0,1,2,3,4,5,6,7,8]
y_noise=[1,2,2.5,3,3.5,6,5,7,8]
slope=1 #set the starting value of the slope
intercept=0 #set the line to go through 0, you can change this later
y= [slope*i + intercept for i in x]#create the y data via a list comprehension
# create a plot
fig=figure() #create a figure
source=ColumnDataSource(dict(x=x, y=y)) #the data destined for the figure
fig.circle(x,y_noise)#add some datapoints to the plot
fig.line('x','y',source=source,color='red')#add a line to the figure
#create a slider and update the graph source data when it changes
def updateSlope(attrname, old, new):
print(str(new)+" is the new slider value")
y = [float(new)*i + intercept for i in x]
source.data = dict(x=x, y=y)
slider = Slider(title="slope", value=slope, start=0.0, end=2.0,step=0.1)
slider.on_change('value', updateSlope)
layout=Row(fig,slider)#put figure and slider next to eachother
curdoc().add_root(layout)#serve it via "bokeh serve slider.py --show --allow-websocket-origin=*"
The allow-websocket-origin=* is to allow other users to reach out to the server and see the graph. The http would be http://yourPCservername:5006/ (5006 is the default bokeh port). If you don't want to serve from your PC you can subscribe to a cloud service like Heroku: example.
I have been using the same setup for quite some time now but suddenly I am no longer allowed to plot more than one graph in a program.
Usually I can plot multiple plots after each other and let the program run through it. It executes the next lines of code after closing the first window. However, recently the first plot is not shown but instead the data is added to the last plot.
I have included a sample code which used to give me two plots but now only one.
import matplotlib.pyplot as plt
import numpy as np
random_num = np.random.randint(0,5,10)
random_num_2 = np.random.randint(0,100,10)
plt.plot(random_num, 'ko')
plt.show()
plt.plot(random_num_2, 'g*')
plt.show()
The first image shows the output from my program. But I would like to have them separated into two plots like Figure 2 and 3 show.
Maybe I should add that I am using Python 3.6 with Spyder 3.2.4. The graphics option is set to display it in Qt5 even though I tried all settings and only 'Inline' shows me the results the way I want it.
Sorry if this is a very simple question. I have tried googling but I only come up with questions about my topic where the way mine works would be the solution not the problem.
#TheresaOtt. I would suggest you create a new figure instance (plt.figure()) for each plot and use only once at the end the plt.show() command.
I am trying to plot something with a huge number of data points (2mm-3mm) using plotly.
When I run
py.iplot(fig, filename='test plot')
I get the following error:
Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points
If the visualization you're using aggregates points (e.g., box plot, histogram, etc.) you can disregard this warning.
So then I try to save it with this:
py.image.save_as(fig, 'my_plot.png')
But then I get this error:
PlotlyRequestError: Unknown Image Server Error
How do I do this properly? I don't care if it's a still image or an interactive display within my notebook.
Plotly really seems to be very bad in this. I am just trying to create a boxplot with 5 Million points, which is no problem in the simple R function "boxplot", but plotly is calculating endlessly for this.
It should be a major issue to improve this. Not all data has to be saved (and shown) in the plotly object. This is the main problem I guess.
one option would be down-sampling your data, not sure if you'd like that:
https://github.com/devoxi/lttb-py
I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write!
Thank you!
You can try the render_mode argument. Example:
import plotly.express as px
import pandas as pd
import numpy as np
N = int(1e6) # Number of points
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N)))
fig = px.scatter(df, x="x", y="y", render_mode='webgl')
fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))
fig.show()
In my computer N=1e6 takes about 5 seconds until the plot is visible, and the "interactiveness" is still very good. With N=10e6 it takes about 1 minute and the plot is not responsive anymore (i.e. it is really slow to zoom, pan or anything).