marker style by third variable - python

Might seem like a repeat question, but the solution in this post doesn't seem to work for me.
I have a bunch of data I want to plot as lines/curves, and another dataset linked to the curves consisting of XYZ data, where Z represents a labeling variable for the curves.
I've got some example code here with some XY data, and labels for anyone wanting to replicate what I'm doing:
plt.plot(xdata, ydata)
plt.scatter(xlab, ylab, c=lab) # needs a marker function adding
plt.show()
Ideally I want to add some kind of unique marker based on the label values; 0.1,0.5,1,2,3,4,6,8,10,20. The labels are the same for each curve.
I have over 100 curves to plot, so something quick and effective is needed. Any help would be great!
My current solution would be to just split the data by labelling values, and then plot separately for each one (long and messy in my opinion). Figured someone might have a more elegant solution here.
I'm guessing you could do this with a dictionary... but I might need some help doing that!
Cheers, KB

Matplotlib does not accepts different markers per plot.
However, a less verbose and more robust solution for large dataset is using the pandas and seaborn library:
Additionally you can use the pandas.cut function to plot bins (Its something I regularly need to produce graphs where I can use a third continuous value as a parameter). The way to use it is :
import pandas as pd
import seaborn as sns
url = 'https://pastebin.com/raw/dwGBLqSb' # url of paste
df = pd.read_csv(url)
sns.scatterplot(data = df, x='labx', y='laby', style='lab')
and it produces the following example:
If you have something more advanced labelling you could also look at LabelEncoder of Sklearn.
Hopefully, I've edited enough this answer not to offend don't post identical answers to multiple questions. For what is worth, I am not affiliated with seaborn library in any way nor am I trying to promote anything. The only thing I am trying to do is help someone with a similar problem that I've come across and I couldn't find easily a clear answer in SE.

Related

python multiple stacked plots along y axis

I have a binned data of an x-axis n-length vector and 3 y-axis n-length vector for 3 different histograms on the same x-axis.
Now I want this kind of stacked bar plot or any thing similar as below.
The nearest I have found is Qtiplot (which is not python). It can generate exactly this kind of histogram plots. But it computes the histogram by itself and requires the actual data samples which are not present in my case (I only have the histogram itself).
Please note that I don't know python very well. So I don't have a clue from where I shall start, neither I am really in a mood to learn programming in python. I need this only to make a nice vector-graphics plot for my research thesis.
I have tagged python as I think python is the most obvious language. In case someone knows any better solution other than in python (but not Matlab, I cannot install that huge pile), I will thankfully add the proper tag.
Thanks in advance for any help.
use matplotlib package in python
import matplotlib.pyplot as plt
apple_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
banana_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
mango_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
fig=plt.figure()
ax1=fig.add_subplot(311)
ax2=fig.add_subplot(312)
ax3=fig.add_subplot(313)
ax1.hist(apple_weight)
ax2.hist(banana_weight)
ax3.hist(mango_weight)
plt.show()
import matplotlib.pyplot as plt
apple_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
banana_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
mango_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
fig=plt.figure()
ax1=fig.add_subplot(111)
ax2=ax1.twinx()
#only two y axes so the third list just add to either
ax1.hist(apple_weight)
ax2.hist(banana_weight)
ax1.hist(mango_weight)
plt.show()

seaborn tsplot with non-connected confidence intervals

I'm using seaborn's tsplot function to plot how well my model fit matches actual data in a time series, with CIs showing my predictions' standard deviations. My question is: Is there a way for tsplot not to fill in CIs between points? That is, for it to show the CIs of each point individually without connecting one CI to the next.
For the means this is accomplished by setting "interpolate" to False. I'm looking the same -- but for CIs.
To illustrate, my plots currently look like this:
I'm fine with how this looks for means (red dots) that are close together, but the CI-transition looks rather odd when one mean is close to 1 and the next is close to 0. The data just happens to be like this. I'd be happy to turn the CI "connection" off, but would also be happy for any related aesthetic suggestions. Thank you.
For completeness' sake, the relevant offending code fragment is as follows:
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
model_fit = #fit data
data = #actual data
sns.tsplot(model_fit,interpolate=False,ci='sd',color='indianred',condition='predicted')
plt.plot(X,actual_data ,linestyle='None',marker='*',label='actual')

How to prevent from plotting outlier in boxplot in pandas

I have a DataFrame(called result_df) and want to plot one column with boxplot.
But certain outliers spoiled the visualization. How could I prevent from ploting outliers?
Code I used:
fig, ax = pl.subplots()
fig.set_size_inches(18.5,10.5)
result_df.boxplot(ax=ax)
pl.show()
Important: I haven't paid enough attention, apparently that happens a lot, and I missed that it's pandas specific. However from questions I saw it's basically matplotlib for graphing in the background so this could still work. Sorry I failed to be more careful.
Luckily for you there is such a thing. In the manual under results: dict title torwards the bottom of the page it states:
fliers: points representing data that extend beyond the whiskers
(outliers).
Setting showfliers=False will hopefully help you.
I do have to mention though, that I find it really really strange they shortened outliers to fliers. If that doesn't help manual offers a second solution:
sym : str or None, default = None
The default symbol for flier points. Enter an empty string (‘’) if you don’t want to show fliers. If None, then the fliers default to
‘b+’ If you want more control use the flierprops kwarg.

Plotting histograms against classes in pandas / matplotlib

Is there a idiomatic way to plot the histogram of a feature for two classes?
In pandas, I basically want
df.feature[df.class == 0].hist()
df.feature[df.class == 1].hist()
To be in the same plot. I could do
df.feature.hist(by=df.class)
but that gives me two separate plots.
This seems to be a common task so I would imagine there to be an idiomatic way to do this. Of course I could manipulate the histograms manually to fit next to each other but usually pandas does that quite nicely.
Basically I want this matplotlib example in one line of pandas: http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I thought I was missing something, but maybe it is not possible (yet).
How about df.groupby("class").feature.hist()? To see overlapping distributions you'll probably need to pass alpha=0.4 to hist(). Alternatively, I'd be tempted to use a kernel density estimate instead of a histogram with df.groupby("class").feature.plot(kind='kde').
As an example, I plotted the iris dataset's classes using:
iris.groupby("Name").PetalWidth.plot(kind='kde', ax=axs[1])
iris.groupby("Name").PetalWidth.hist(alpha=0.4, ax=axs[0])

Barchart (o plot) 3D in Python

I need to plot some data in various forms. Currently I'm using Matplotlib and I'm fairly happy with the plots I'm able to produce.
This question is on how to plot the last one. The data is similar to the "distance table", like this (just bigger, my table is 128x128 and still have 3 or more number per element).
Now, my data is much better "structured" than a distance table (my data doesn't varies "randomly" like in a alphabetically sorted distance table), thus a 3D barchart, or maybe 3 of them, would be perfect. My understanding is that such a chart is missing in Matplotlib.
I could use a (colored) Countor3d like these or something in 2D like imshow, but it isn't really well representative of what the data is (the data has meaning just in my 128 points, there isn't anything between two points). And the height of bars is more readable than color, IMO.
Thus the questions:
is it possible to create 3D barchart in Matplotlib? It should be clear that I mean with a 2D domain, not just a 2D barchart with a "fake" 3D rendering for aesthetics purposes
if the answer to the previous question is no, then is there some other library able to do that? I strongly prefer something Python-based, but I'm OK with other Linux-friendly possibilities
if the answer to the previous question is no, then do you have any suggestions on how to show that data? E.g. create a table with the values, superimposed to the imshow or other colored way?
For some time now, matplotlib had no 3D support, but it has been added back recently. You will need to use the svn version, since no release has been made since, and the documentation is a little sparse (see examples/mplot3d/demo.py). I don't know if mplot3d supports real 3D bar charts, but one of the demos looks a little like it could be extended to something like that.
Edit: The source code for the demo is in the examples but for some reason the result is not. I mean the test_polys function, and here's how it looks like:
example figure http://www.iki.fi/jks/tmp/poly3d.png
The test_bar2D function would be even better, but it's commented out in the demo as it causes an error with the current svn version. Might be some trivial problem, or something that's harder to fix.
MyavaVi2 can make 3D barcharts (scroll down a bit). Once you have MayaVi/VTK/ETS/etc. installed it all works beautifully, but it can be some work getting it all installed. Ubuntu has all of it packaged, but they're the only Linux distribution I know that does.
One more possibility is Gnuplot, which can draw some kind of pseudo 3D bar charts, and gnuplot.py allows interfacing to Gnuplot from Python. I have not tried it myself, though.
This is my code for a simple Bar-3d using matplotlib.
import mpl_toolkits
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
%matplotlib inline
## The value you want to plot
zval=[0.020752244,0.078514652,0.170302899,0.29543857,0.45358061,0.021255922,0.079022499,\
0.171294169,0.29749654,0.457114286,0.020009631,0.073154019,0.158043498,0.273889264,0.419618287]
fig = plt.figure(figsize=(12,9))
ax = fig.add_subplot(111,projection='3d')
col=["#ccebc5","#b3cde3","#fbb4ae"]*5
xpos=[1,2,3]*5
ypos=range(1,6,1)*5
zpos=[0]*15
dx=[0.4]*15
dy=[0.5]*15
dz=zval
for i in range(0,15,1):
ax.bar3d(ypos[i], xpos[i], zpos[i], dx[i], dy[i], dz[i],
color=col[i],alpha=0.75)
ax.view_init(azim=120)
plt.show()
http://i8.tietuku.com/ea79b55837914ab2.png
You might check out Chart Director:
http://www.advsofteng.com
It has a pretty wide variety of charts and graphs and has a nice Python (and several other languages) API.
There are two editions: The free version puts a blurb on the generated image, and the
pay version is pretty reasonably priced.
Here's one of the more interesting looking 3d stacked bar charts:
(source: advsofteng.com)

Categories

Resources