Rotating parallel coordinate axis-names in Pandas - python

When using some of the built in visualization tools in Pandas, one that is very helpful for me is the parallel_coordinates visualization. However, since I have around 18 features in the dataframe, the bottom of the parallel_coords plot gets really messy.
Therefore, I was wondering if anyone knew how to rotate the axis-names to be vertical rather than horizontal as shown here:
I did find a way to use parallel_coords in a polar set up, creating a radar-chart; while that was helpful for getting the different features to be visible, that solution doesn't quite work since whenever the values are close to 0, it becomes almost impossible to see the curve. Furthermore, doing it with the polar coord frame required me to break from using pandas' dataframe which is part of what made the this method so appealing.

Use plt.xticks(rotation=90) should be enough. Here is an example with the “Iris” dataset:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import parallel_coordinates
data = pd.read_csv('iris.csv')
parallel_coordinates(data, 'Name')
plt.xticks(rotation=90)
plt.show()

Related

marker style by third variable

Might seem like a repeat question, but the solution in this post doesn't seem to work for me.
I have a bunch of data I want to plot as lines/curves, and another dataset linked to the curves consisting of XYZ data, where Z represents a labeling variable for the curves.
I've got some example code here with some XY data, and labels for anyone wanting to replicate what I'm doing:
plt.plot(xdata, ydata)
plt.scatter(xlab, ylab, c=lab) # needs a marker function adding
plt.show()
Ideally I want to add some kind of unique marker based on the label values; 0.1,0.5,1,2,3,4,6,8,10,20. The labels are the same for each curve.
I have over 100 curves to plot, so something quick and effective is needed. Any help would be great!
My current solution would be to just split the data by labelling values, and then plot separately for each one (long and messy in my opinion). Figured someone might have a more elegant solution here.
I'm guessing you could do this with a dictionary... but I might need some help doing that!
Cheers, KB
Matplotlib does not accepts different markers per plot.
However, a less verbose and more robust solution for large dataset is using the pandas and seaborn library:
Additionally you can use the pandas.cut function to plot bins (Its something I regularly need to produce graphs where I can use a third continuous value as a parameter). The way to use it is :
import pandas as pd
import seaborn as sns
url = 'https://pastebin.com/raw/dwGBLqSb' # url of paste
df = pd.read_csv(url)
sns.scatterplot(data = df, x='labx', y='laby', style='lab')
and it produces the following example:
If you have something more advanced labelling you could also look at LabelEncoder of Sklearn.
Hopefully, I've edited enough this answer not to offend don't post identical answers to multiple questions. For what is worth, I am not affiliated with seaborn library in any way nor am I trying to promote anything. The only thing I am trying to do is help someone with a similar problem that I've come across and I couldn't find easily a clear answer in SE.

python multiple stacked plots along y axis

I have a binned data of an x-axis n-length vector and 3 y-axis n-length vector for 3 different histograms on the same x-axis.
Now I want this kind of stacked bar plot or any thing similar as below.
The nearest I have found is Qtiplot (which is not python). It can generate exactly this kind of histogram plots. But it computes the histogram by itself and requires the actual data samples which are not present in my case (I only have the histogram itself).
Please note that I don't know python very well. So I don't have a clue from where I shall start, neither I am really in a mood to learn programming in python. I need this only to make a nice vector-graphics plot for my research thesis.
I have tagged python as I think python is the most obvious language. In case someone knows any better solution other than in python (but not Matlab, I cannot install that huge pile), I will thankfully add the proper tag.
Thanks in advance for any help.
use matplotlib package in python
import matplotlib.pyplot as plt
apple_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
banana_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
mango_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
fig=plt.figure()
ax1=fig.add_subplot(311)
ax2=fig.add_subplot(312)
ax3=fig.add_subplot(313)
ax1.hist(apple_weight)
ax2.hist(banana_weight)
ax3.hist(mango_weight)
plt.show()
import matplotlib.pyplot as plt
apple_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
banana_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
mango_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
fig=plt.figure()
ax1=fig.add_subplot(111)
ax2=ax1.twinx()
#only two y axes so the third list just add to either
ax1.hist(apple_weight)
ax2.hist(banana_weight)
ax1.hist(mango_weight)
plt.show()

plotly: huge number of datapoints

I am trying to plot something with a huge number of data points (2mm-3mm) using plotly.
When I run
py.iplot(fig, filename='test plot')
I get the following error:
Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points
If the visualization you're using aggregates points (e.g., box plot, histogram, etc.) you can disregard this warning.
So then I try to save it with this:
py.image.save_as(fig, 'my_plot.png')
But then I get this error:
PlotlyRequestError: Unknown Image Server Error
How do I do this properly? I don't care if it's a still image or an interactive display within my notebook.
Plotly really seems to be very bad in this. I am just trying to create a boxplot with 5 Million points, which is no problem in the simple R function "boxplot", but plotly is calculating endlessly for this.
It should be a major issue to improve this. Not all data has to be saved (and shown) in the plotly object. This is the main problem I guess.
one option would be down-sampling your data, not sure if you'd like that:
https://github.com/devoxi/lttb-py
I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write!
Thank you!
You can try the render_mode argument. Example:
import plotly.express as px
import pandas as pd
import numpy as np
N = int(1e6) # Number of points
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N)))
fig = px.scatter(df, x="x", y="y", render_mode='webgl')
fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))
fig.show()
In my computer N=1e6 takes about 5 seconds until the plot is visible, and the "interactiveness" is still very good. With N=10e6 it takes about 1 minute and the plot is not responsive anymore (i.e. it is really slow to zoom, pan or anything).

Python Heatmaps (Basic and Complex)

What's the best way to do a heatmap in python (2.7)? I've found the heatmap.py module, and I was wondering if people have any advice on using it, or if there are other packages that do a good job.
I'm dealing with pretty basic data, like xy = np.random.rand(1000,2) superimposed on an image.
Although there's another thing I want to try, which is doing a heatmap that's scaled to a different heatmap. E.g., I have
attempts = np.random.rand(5000,2)
successes = np.random.rand(500,2)
And I want a heatmap of the successes relative to the density of the attempts. Is this possible?
Seaborn is a pretty widely-used library for making nice-looking plots, and has a heatmap function. Seaborn uses matplotlib under the hood.
import numpy as np
import seaborn as sns
xy = np.random.rand(1000,2)
sns.heatmap(xy, yticklabels=100)
Regarding your second question, I'm not sure what you mean. But my advice would be to create a numpy array or pandas dataframe of "successes [scaled] relative to the density of the attempts", however you mean that, and then pass that scaled array or dataframe to sns.heatmap
You can plot very complex heatmap using python package PyComplexHeatmap: https://github.com/DingWB/PyComplexHeatmap
https://github.com/DingWB/PyComplexHeatmap/blob/main/examples.ipynb
The most basic heatmap you can get is an image plot:
import matplotlib.pyplot as plt
import numpy as np
xy = np.random.rand(100,2)
plt.imshow(xy, aspect="auto")
plt.colorbar()
plt.show()
Note that using more points than you have pixels to show the heatmap might not make too much sense.
There are of course also different methods to draw a heatmaps and you may go through the matplotlib example gallery and see which plot appeals most to you.

Matplotlib: different stacked bars?

I want to create a stacked bar plot with different amount of stacks for each bar. The general example for stacked bars works fine if my data are all homogenous, but I want something that rather looks like the shown example.
This turned out to be whole other level in Matplotlib (while still easy with some Excel-like tool, as you can see). Is there a convenient way of creating this kind of plot in Matplotlib? Thanks.
I guess you are working directly in matplotlib, but these days plotting data, especially for quick a view can be easily done with pandas, following your example we get:
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use("ggplot")
import pandas as pd
import numpy as np
df = pd.DataFrame([pd.Series([10,20,40,10,np.nan]), pd.Series([20,10,30,10,10]), pd.Series([30,40, np.nan, np.nan, np.nan])], index=["Bar1", "Bar2", "Bar3"])
df.plot.bar(stacked=True)
plt.show()

Categories

Resources