Problems with pandas boxplot showing points on it - python

I am plotting a box plot with se following code:
plt.figure(figsize=(7,7))
plt.title("Title")
plt.ylabel('Y-ax')
boxplot = df.boxplot(grid=False, rot=90, fontsize=10)
plt.show()
And I get this plot:
Is there any way I can just show like the normal boxplot with the 50/75/90 percentiles and not those circles that I have no clue what do they mean?
The data frame is huge, maybe that is why these points are shown?

Related

Plot pandas df into boxplot & histogram

Currently I am trying to plot a boxplot into a histogram (without using seaborn). I have tried many varieties but I always get skewed graphs.
This was my starting point:
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6))
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6))
which resulted in the following graph:
As you can see the boxplot and outliers are at the bottom, but I want it to be on top.
Anyone an idea?
You can use subplots and set the percentage of the plot to ensure that the boxplot is first and the hist plot is below. In below example, I am using 30% for boxplot and 70% for bistogram. Also adjusted the spacing between the plots and used a common x-axis using sharex. Hope this is what you are looking for...
fig, ax = plt.subplots(2, figsize=(14, 6), sharex=True, # Common x-axis
gridspec_kw={"height_ratios": (.3, .7)}) # boxplot 30% of the vertical space
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6), ax=ax[0])
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6), ax=ax[1])
ax[1].title.set_size(0)
plt.subplots_adjust(hspace=0.1) ##Adjust gap between the two plots

Seaborn: overlay scatterplot on top of boxplot

I want to overlay a simple scatterplot on top of my boxplot. When I plot the two plots separately, everything is fine:
But when I try to combine them, it almost looks like x-values of the scatterplot are all being divided by 2:
I'm pretty sure it's because the boxplot treats the x-axis values as categorical (even though they are floats in my DataFrame), while the scatterplot treats them as continuous. But I don't know what the solution is. Here is the code I am using:
sns.scatterplot(data=old_data, x="x-axis", y="y-axis", s=200, color="red", label="Old Data")
sns.boxplot(data=new_data, x="x-axis", y="y-axis", color="blue")
plt.plot([], [], label="New Data", color='blue') # this just adds the boxplot label to the legend
plt.legend()
Bonus question: if you know how to add the boxplot label to the legend in a better way, I would love to hear about it.

Shift matplotlib axes to match eachother

I am looking to overlay a scatter plot with a boxplot in matplotlib. I have created the chart but the x axes do not match--leading to the scatter plot showing dots that are shifted 1 tick to the left of the x axis for the boxplot. Below is my code.
fig, ax = plt.subplots()
ax.scatter(traits_5, data_df[traits_5].loc[y])
ax = greats_df[traits_5].boxplot ( showfliers=False , column=traits_5)
plt.ylabel ( 'Percentile Score' )
plt.title ( "Distribution of The Greats' Scores" )
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))
plt.show ()
Is it possible that the error is coming from the two different methods of plotting the data? I use matplotlib to plot the scatter and pandas to plot the boxplot. Matplotlib was plotting the rows on the xaxis, whereas I wanted the columns to be plotted along the x axis.
See outputted image below from above code.
Hard to investigate without having access to data, but if you just translate the x coordinates of your scatter plot, it should work:
ax.scatter([x+1 for x in traits_5], data_df[traits_5].loc[y])

How to set the number of ticks and labels of pyplot

Is there any way that I can divide an axis to a certain number of ticks and then label them? For example, I have the following plot and I want to have 4 ticks on the x axis and be able to set the labels myself.
and here's what I want to achieve (please note that the two plots are the same):
and this is the script I am using to create the plot:
import matplotlib.pyplot as plt
plt.imshow(data, cmap=plt.cm.jet)
plt.colorbar()
plt.show()
I can divide the axis using this: plt.locator_params(axis='x', nbins=4), but I could not set the labels myself.
As #ImportanceOfBeingErnest mentioned, using imshow's extent was the answer:
plt.imshow(data, extent=[0,1.5,3,0], cmap=plt.cm.jet)

Convert a boxplot to a scatter plot in seaborn

I currently have a box plot which plots cgpa vs totalScore for the subject 'Mathematics'. Here is the code for the box plot:
a = sns.boxplot(data=masterdata[masterdata.courseName == "Mathematics"], x = "totalScore", y="cgpa")
How can I transform the box plot I have to a scatter plot? I don't know how to select the specific subject of 'Mathematics' in the scatter plot. Scatter plots just have x-label, y-label and hue. But how do I plot for "Mathematics" only, like I did in the box plot I have? Thank you in advance for your help.
According to here https://seaborn.pydata.org/generated/seaborn.regplot.html scatter plot also have a data parameter. so you should be able to do like you did before.
b = sns.regplot(data = masterdata[masterdata.courseName == "Mathematics"],x="totalscore", y="cgpa")

Categories

Resources