Python newbie here
I've been working on a data analysis project that require some pieces of dataviz. I really want to use plotly.express library for it's interactiveness but the vizualization is a complete mess.
For this graph, I want to show position variations of F1 pilots for each lap and I have a perfect datafram for that but the result is this:
DF on plotly.express
For comparisson, when I run the very same dataframe on seaborn I get this:
same DF on sns
It appears that plotly.express is show multiple positions for the drivers on a same lap witch is impossible. I can't figure out what's going on, any tips?
Thanks a lot in advance
Related
I'm struggling with a basic qs.
My dataframe looks like this:
q = pd.DataFrame(data=[[28,50,30],[29,40,40],[30,30,30]],columns=['sprint','created','resolved'])
I want to plot a barplot with sprint & for each sprint I want to plot created & resolved.
Can someone help out how this can be done?
Pandas can use matplotlib in order to display information of pandas objects. In your case it seems for me the easiest way to go with pandas plots.
The following gives you a plot with only the information about the sprints as a bar plot
df.sprint.plot(kind='bar')
This gives you a plot for each sprint showing the created and resolved tasks
df.groupby('sprint').plot(y=['created','resolved'], kind='bar')
I want to create a barchart for my dataframe but it doesn't show up, so I made this small script to try out some things and this does display the barchart the way i want. The dataframe is structured the exact same way (I assume) as my big script where all my data is transformed.
Even if I copy paste this code in my other script it doesn't show the the plot
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({
'soortfout':['totaalnoodstoppen','aantaltrapopen','aantaltrapdicht','aantalrectdicht','aantalphotocellopen','aantalphotocelldicht','aantalsafetyedgeopen', 'aantalsafetyedgeclose'],
'aantalfouten':[19,9,0,0,10,0,0,0],
})
print(df)
df.plot(kind='bar',x='soortfout',y='aantalfouten')
plt.show()
I can't really paste my other code in here since it's pretty big. But is it possible that other code that doesn't even use anything from matplotlib interferes with plotting a chart?
I've tried most other solutions like:
matplotlib.rcParams['backend'] = "Qt4Agg"
Currently using Pycharm 2.5
It does work when i use Jupyter notebook.
I was importing modules that i wasn't using so they were grayed out.
But apparently you shouldn't use import pandas_profiling if you want to plot with matplotlib
Don't import modules that can interfere with plotting like pandas_profiling
I am trying to plot some graphs with plotly and, after several times having the wrong graph, i backed to the basics and tried to plot an example from the plotly web, but the same error appears:
My dates are not dates, but an extremely high number (10^18 order) and I get a second small graph that noone asked for.
import plotly.plotly as py
import plotly.graph_objs as go
import pandas_datareader as web
from datetime import datetime
import fix_yahoo_finance as yf
yf.pdr_override()
df = web.DataReader("aapl", 'robinhood').reset_index()
trace = go.Candlestick(x=df.Date,
open=df.Open,
high=df.High,
low=df.Low,
close=df.Close)
data = [trace]
py.plot(data, filename='simple_candlestick')
I just changed the source (morningstar is down) and used the offline version instead of the live program, but the error still appeared if I used the web anyways, so It does not matter.
EDIT:
After a previous edit, the images are no longer appearing in the post, sorry for the inconveniences.
OK, I managed to fix the date problem inserting the index with dates in an array and using that array as the x.
Also, I figured out that the second small graph only appears with the go.Candlestick representation (since it does not appear with figure_factory or go.Scatter hehe).
Thanks :)
I am trying to plot something with a huge number of data points (2mm-3mm) using plotly.
When I run
py.iplot(fig, filename='test plot')
I get the following error:
Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points
If the visualization you're using aggregates points (e.g., box plot, histogram, etc.) you can disregard this warning.
So then I try to save it with this:
py.image.save_as(fig, 'my_plot.png')
But then I get this error:
PlotlyRequestError: Unknown Image Server Error
How do I do this properly? I don't care if it's a still image or an interactive display within my notebook.
Plotly really seems to be very bad in this. I am just trying to create a boxplot with 5 Million points, which is no problem in the simple R function "boxplot", but plotly is calculating endlessly for this.
It should be a major issue to improve this. Not all data has to be saved (and shown) in the plotly object. This is the main problem I guess.
one option would be down-sampling your data, not sure if you'd like that:
https://github.com/devoxi/lttb-py
I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write!
Thank you!
You can try the render_mode argument. Example:
import plotly.express as px
import pandas as pd
import numpy as np
N = int(1e6) # Number of points
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N)))
fig = px.scatter(df, x="x", y="y", render_mode='webgl')
fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))
fig.show()
In my computer N=1e6 takes about 5 seconds until the plot is visible, and the "interactiveness" is still very good. With N=10e6 it takes about 1 minute and the plot is not responsive anymore (i.e. it is really slow to zoom, pan or anything).
When using some of the built in visualization tools in Pandas, one that is very helpful for me is the parallel_coordinates visualization. However, since I have around 18 features in the dataframe, the bottom of the parallel_coords plot gets really messy.
Therefore, I was wondering if anyone knew how to rotate the axis-names to be vertical rather than horizontal as shown here:
I did find a way to use parallel_coords in a polar set up, creating a radar-chart; while that was helpful for getting the different features to be visible, that solution doesn't quite work since whenever the values are close to 0, it becomes almost impossible to see the curve. Furthermore, doing it with the polar coord frame required me to break from using pandas' dataframe which is part of what made the this method so appealing.
Use plt.xticks(rotation=90) should be enough. Here is an example with the “Iris” dataset:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import parallel_coordinates
data = pd.read_csv('iris.csv')
parallel_coordinates(data, 'Name')
plt.xticks(rotation=90)
plt.show()