Plot pandas df into boxplot & histogram - python

Currently I am trying to plot a boxplot into a histogram (without using seaborn). I have tried many varieties but I always get skewed graphs.
This was my starting point:
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6))
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6))
which resulted in the following graph:
As you can see the boxplot and outliers are at the bottom, but I want it to be on top.
Anyone an idea?

You can use subplots and set the percentage of the plot to ensure that the boxplot is first and the hist plot is below. In below example, I am using 30% for boxplot and 70% for bistogram. Also adjusted the spacing between the plots and used a common x-axis using sharex. Hope this is what you are looking for...
fig, ax = plt.subplots(2, figsize=(14, 6), sharex=True, # Common x-axis
gridspec_kw={"height_ratios": (.3, .7)}) # boxplot 30% of the vertical space
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6), ax=ax[0])
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6), ax=ax[1])
ax[1].title.set_size(0)
plt.subplots_adjust(hspace=0.1) ##Adjust gap between the two plots

Related

Problems with pandas boxplot showing points on it

I am plotting a box plot with se following code:
plt.figure(figsize=(7,7))
plt.title("Title")
plt.ylabel('Y-ax')
boxplot = df.boxplot(grid=False, rot=90, fontsize=10)
plt.show()
And I get this plot:
Is there any way I can just show like the normal boxplot with the 50/75/90 percentiles and not those circles that I have no clue what do they mean?
The data frame is huge, maybe that is why these points are shown?

Empty legend when plotting pandas dataframe columns with matplotlib

When I plot two columns of my dataframe, where one contains the x-values and the other one the y-values, I do not manage to show a legend. This is because there are non handles in the legend call. But I did not figure out how to create those handles out of the dataframe.
plt.figure(num=None, figsize=(15, 6), dpi=80, facecolor='w', edgecolor='k')
colordict = {'m':'blue', 'f':'red'}
plt.scatter(x=df.p_age, y=df.p_weight, c=df.p_gender.apply(lambda x: colordict[x]))
plt.title('Weight distribution per age')
plt.xlabel('Age [months]')
plt.ylabel('individual weight [kg]')
plt.legend(title="Legend Title")
The color-dictionnary colordict is so I can have different colors for the two genders. That works. And I want a legend with a blue and a red dot and "male", "female" next to it.
I have tried:
plt.legend(title="Legend Title", handles=(df.p_age,df.p_weight))
But that does not work, as handles cannot be made out of series.
Derived from the solution found here. This solution iteratively draws the plot for each class.
fig, ax = plt.subplots()
for gender,color in colordict.items():
scatter = ax.scatter(x=df.p_age, y=df.p_weight, c=color,label=gender)
ax.legend()
plt.show()

Displaying xticks using subplot Python

I'm attempting to plot a few subplots. The issue that I'm running into is in labeling the x-axis for each plot since they're all different.
The variables relHazardRate and relHazardFICO are dataframes of size 50 X 2
I attempting to plot the below I'm unable to show the x-axis tick marks (i.e. relHazardRate is a variable ranging from 3% to 6%, and relHazardFICO is a variable ranging from 300-850. Each figure in the subplot will have its own x-axis/ticker (there are 8 such plots) and I have provided my logic for 2 as shown below.
fig, ((ax1, ax2), (ax3, ax4), (ax5, ax6), (ax7, ax8)) = plt.subplots(4, 2,figsize=(12,8))
ax1.plot(relHazardRate['orig_coupon'],relHazardRate['Hazard Multiplier']);
ax1.title.set_text('Original Interest Rate');
ax1.set_xticks(range(len(relHazardRate['orig_coupon'])));
ax1.set_xticklabels(relHazardRate['orig_coupon'].to_list())
ax2.plot(relHazardFICO['orig_FICO'],relHazardFICO['Hazard Multiplier'], 'tab:orange');
ax2.title.set_text('Original FICO');
ax2.set_xticks(range(len(relHazardRate['orig_FICO'])));
ax2.set_xticklabels(relHazardRate['orig_FICO'].to_list())
ax.3 through ax.8 follow a similar decleration as the described above
for ax in fig.get_axes():
ax.label_outer()
The subplot that I get is as follows, I want to label each plot with its own x-axis, as shown this is not happening.
Remove the lines with label_outer.
From the docs:
label_outer()
Only show "outer" labels and tick labels.
x-labels are only kept for subplots on the last row; y-labels only for subplots on the first column
Clearly this is what is causing the behaviour you see in your plot

Matplotlib Subplot axes sharing: Apply to every other plot?

I am trying to find a way to apply the shared axes parameters of subplot() to every other plot in a series of subplots.
I've got the following code, which uses data from RPM4, based on rows in fpD
fig, ax = plt.subplots(2*(fpD['name'].count()), sharex=True, figsize=(6,fpD['name'].count()*2),
gridspec_kw={'height_ratios':[5,1]*fpD['name'].count()})
for i, r in fpD.iterrows():
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(x='date', y='RPM', ax=ax[(2*i)], legend=False)
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(kind='area', color='lightgrey', x='date', y='total', ax=ax[(2*i)+1],
legend=False,)
ax[2*i].set_title('test', fontsize=12)
plt.tight_layout()
Which produces an output that is very close to what I need. It loops through the 'name' column in a table and produces two plots for each, and displays them as subplots:
As you can see, the sharex parameter works fine for me here, since I want all the plots to share the same axis.
However, what I'd really like is for all the even-numbered (bigger) plots to share the same y axis, and for the odd-numbered (small grey) plots to all share a different y axis.
Any help on accomplishing this is much appreciated, thanks!

Stacked area chart display eventhough df contains 0 or NaN

I am using df.plot.area() and am very confused by the result. The dataframe has integers as index. The values to plot are in different columns. One column contains zeros from a specific integer onwards, however I can still see a thin line in the plot which isn't right.
After data processing this is the code I am using to actually plot:
# Start plotting
df.plot(kind='area', stacked=True, color=colors)
plt.legend(loc='best')
plt.xlabel('Year', fontsize=12)
plt.ylabel(mylabel, fontsize=12)
# Reverse Legend
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1])
plt.title(filename[:-4])
plt.tight_layout()
plt.autoscale(enable=True, axis='x', tight=True)
And this is a snapshot of the result, the orange thin line shouldn't be visiable because the value in the dataframe is zero.
Thanks for your support!

Categories

Resources