I currently have a dataframe called df with 18 columns, and I have plotted a histogram of each to check the distribution shape of each variable using the hist() function in pandas:
df.hist(figsize=(30,30))
What I now want to do is add a boxplot above each box plot so I can understand at a glance which variables contain outliers. I want the plot to look as follows:
I can plot the boxplot using the following code, but it displays all of the boxplots on a single plot:
df.boxplot(figsize=(30,30))
And I can add a group by, however, this isn't what I require. I just want each histogram in my df.hist plot to be overlayed with the boxplot derived from the same column of data. I suspect I could write a funciton to do this, but as the hist function seems quite intuitive, I suspect there is a straighforward way that I'm probably missing.
Related
I Posted this question about 3D plots of data frames:
3D plot of 2d Pandas data frame
and the user referred me very very helfully to this:
Plotting Pandas Crosstab Dataframe into 3D bar chart
It use useful and the code worked in principle, but it lookes like a mess (see image below) for several reasons:
I have huge number of values to plot (470 or so, along the y-axis) so perhaps a bar chart is not the best way (I am going for a histogram kind of look, so I assumed very narrow bars would be suitable)
my counts (z axis) do not give almost any information, because the differences I need to see are from 100 to the max value
how can I make the 3D plot that shows up interactive? (being able to rotate etc) - I have seen it done in blogs/videos but sure if it's something on Tools -> Preferences that I can't find
So re: the second issue, simple enough, I tried to just change the limits of the zbar as I would for a 2D Plot, by incorporating:
ax.set_zlim([110,150])
just before the axis labels, but obviously this is the wrong way:
SO do I have to limit the values from the original data set (i.e. filter out <110), or is there a way to do this from the plot?
I have this dataframe:
and I need to have a chart similar to this:
I recommend reading this documentation on plotting with pandas:
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
Then potting simply works like df.plot() plus some additional lines depending on color, axis labels, etc.
Additional remark: you should not post data as image. See https://stackoverflow.com/questions/how-to-ask
I have a data frame table "pandastable3" that looks like this:
I would like to plot histograms of values for all the columns separately, but so far I am able to get only a single figure containing all the plots together with this to plot the first 3 columns:
pandastable3.hist(layout=(1,2,3))
But I am not sure I am doing that correctly as I cannot visualize anything.
I suppose diff() gives different plots for each column:
pandastable3.diff().hist()
I am trying to use the matplotlib boxplot to show the boxplot of a number of distributions. One group of distributions is on a very different scale than the other group. I have been trying to use twinx() to plot the second group but it overlaps with the other boxplots.
Is there a better method to add a different scale for specific data?
Is there a idiomatic way to plot the histogram of a feature for two classes?
In pandas, I basically want
df.feature[df.class == 0].hist()
df.feature[df.class == 1].hist()
To be in the same plot. I could do
df.feature.hist(by=df.class)
but that gives me two separate plots.
This seems to be a common task so I would imagine there to be an idiomatic way to do this. Of course I could manipulate the histograms manually to fit next to each other but usually pandas does that quite nicely.
Basically I want this matplotlib example in one line of pandas: http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I thought I was missing something, but maybe it is not possible (yet).
How about df.groupby("class").feature.hist()? To see overlapping distributions you'll probably need to pass alpha=0.4 to hist(). Alternatively, I'd be tempted to use a kernel density estimate instead of a histogram with df.groupby("class").feature.plot(kind='kde').
As an example, I plotted the iris dataset's classes using:
iris.groupby("Name").PetalWidth.plot(kind='kde', ax=axs[1])
iris.groupby("Name").PetalWidth.hist(alpha=0.4, ax=axs[0])