Plotting histograms against classes in pandas / matplotlib - python

Is there a idiomatic way to plot the histogram of a feature for two classes?
In pandas, I basically want
df.feature[df.class == 0].hist()
df.feature[df.class == 1].hist()
To be in the same plot. I could do
df.feature.hist(by=df.class)
but that gives me two separate plots.
This seems to be a common task so I would imagine there to be an idiomatic way to do this. Of course I could manipulate the histograms manually to fit next to each other but usually pandas does that quite nicely.
Basically I want this matplotlib example in one line of pandas: http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I thought I was missing something, but maybe it is not possible (yet).

How about df.groupby("class").feature.hist()? To see overlapping distributions you'll probably need to pass alpha=0.4 to hist(). Alternatively, I'd be tempted to use a kernel density estimate instead of a histogram with df.groupby("class").feature.plot(kind='kde').
As an example, I plotted the iris dataset's classes using:
iris.groupby("Name").PetalWidth.plot(kind='kde', ax=axs[1])
iris.groupby("Name").PetalWidth.hist(alpha=0.4, ax=axs[0])

Related

Trim data outside 3d plot in matplotlib

I have a set of PDF that I need to plot for a certain section of the PDF domain. However, when I plot my lines on a 3d plot I get tails for each PDF,
Is there a clean way to not plot the tails that happen outside my plot limits? I know I can change the data to NaNs to achieve the same effect but I want to do this in matplotlib. Here is my current workaround code,
`# trim the data
y = np.ones(PDF_x.shape)*PDF_x
y[y>95]= np.nan
y[y<75]= np.nan
# plot the data
fig = plt.figure()
ax = fig.gca(projection='3d')
for i in range(PDF_capacity.shape[1]):
ax.plot(life[i]*np.ones((PDF_x.shape)),y,PDF_capacity[:,i], label='parametric curve')
# set the axis limits
ax.set_ylim(75,95)
# add axis labels
ax.set_xlabel('charge cycles to failure point of 75% capacity')
ax.set_ylabel('capacity at 100 charge cycles')
ax.set_zlabel('probability')`
After trimming I can make the following plot,
Masking the data with nan in the way you're doing it is a good and practical solution.
Since matplotlib 3D plots are projections into 2D space, it would be hard to implement automatic clipping. While I do think it would be possible, I'm not convinced that it's worth the effort. First, because you would need to treat different kinds of plots differently, second, because at least in some cases it would probably turn out that masking the data is still the best choice. Now, doing a complex subclassing of the plotting objects just to do the same thing that can be manually done in one or two lines is probably overkill.
My clear recommendation would therefore be to use the solution you already have. Especially since it does not seem to have any drawbacks so far.

Determine kind of Matplotlib Axes subplot

Given a matplotlib.axes_subplots.AexesSubplot object how do I tell what type of plot it contains? Is there a matplotlib feature that will determine this for me? for example...
I commonly plot data with pandas
import pandas as pd
df = pd.DataFrame({'y':range(10)})
line_ax = df.plot()
or
bar_ax = df.plot(kind='bar')
or
barh_ax = df.plot(kind='barh')
The matplotlib axes does not care about which plot it contains and it does not even know about it.
The question would also be how to distinguish "kinds" of plots. What kind of plot is in an axes which contains 2 bars, several markers, 2 lines and 3 arrows?
The kind argument to pandas plot function is simply a flag by which pandas decides which plotting function to call. This is independent of the axes and you may of course also have a plot produced by kind='bar' and kind='scatter' in the same axes.
So the answer is: No there is no general way to determine the kind of plot in an axes, mainly due to the fact that there is no such thing as a "kind of plot".
Of course, depending on what you'd need this type of information for, there are probably alternative ways to accomplish what you need.

Swarmplot with more than just one categorical level (Python)

I am trying to make a swam plot that contains more information than a single categorical level and two variables. I am looking to create something like this
So ideally, something like this would work (but it does not):
ax = sns.swarmplot(x="round_id", y="independent_error_abs", hue="difficulty", hue_order=['easy','medium','hard'], size="followers", markershape="rank",data=df)
where "difficulty", "followers", and "rank" determine the color of the point, the size of the point, and the shape of the point, respectively.
No, this is not possible with swarmplot. Personally I find this kind of plot very difficult to interpret: a good statistical plot should make the patterns in the data immediately apparent, whereas plots with multiple categorical variables that manipulate the size or shape of the points quickly become more like puzzles. My recommendation in these cases (following Andrew Gelman) is to make more than one plot, each with relatively simple semantics.
You don't have to agree, of course, but you will have to make it yourself using matplotlib.
I am facing the same issue, and actually the solution seems to be pretty simple at least for the marker type!
Just divide your dataframe in subdataframes, each for a different marker type. The you make a swarmplot on top of each other, and that's it.
If the size of the dot, is also a categorical variable, you just need to do the same as above where each subdtaframe will represent a marker and a different size.
If size is continuous, then it seems you would need to plot each dot independently in a for loop, but for that I would use matplotlib.pyplot.

Plotting distribution from sampled data in python

I have two sets of sampled points in 2d space[x ,y], each set represents one class. When I plot all points, it's mess and one can't see anything on it. I need somehow plot distribution of each set (if it's possible on same canvas with different colours, then better). Does anybody know about some good library for it?
Matplotlib is a very good library for that task. You can plot histograms, scatter plots and lot of other things. You just have to know what you want and then you can probably do it with that. I use that for similar tasks a lot.
[UPDATE]
As I said, you can do that with matplotlib. Here is an example from their gallery: http://matplotlib.org/examples/pylab_examples/scatter_hist.html
It's not so pretty as with the answer in the comment of #lejlot, but still correct.

Creating a box-plot like scatter-plot with matplotlib

Is there any (simple or complex) way to recreate this plot in matplotlib?
I've tried plotting it using a scatter plot with two different x-values, while adding a small random number to it, but obviously it didn't produce the nice "ordered" effect seen above.
There's a package built on top of matplotlib called beeswarm that positions the points as requested.

Categories

Resources