seaborn distplot different bar width on each figure - python

Sorry for giving an image however I think it is the best way to show my problem.
As you can see all of the bin width are different, from my understanding it shows range of rent_hours. I am not sure why different figure have different bin width even though I didn't set any.
My code looks is as follows:
figure, axes = plt.subplots(nrows=4, ncols=3)
figure.set_size_inches(18,14)
plt.subplots_adjust(hspace=0.5)
for ax, age_g in zip(axes.ravel(), age_cat):
group = total_usage_df.loc[(total_usage_df.age_group == age_g) & (total_usage_df.day_of_week <= 4)]
sns.distplot(group.rent_hour, ax=ax, kde=False)
ax.set(title=age_g)
ax.set_xlim([0, 24])
figure.suptitle("Weekday usage pattern", size=25);
additional question:
Seaborn : How to get the count in y axis for distplot using PairGrid for here it says that kde=False makes y-axis count however http://seaborn.pydata.org/generated/seaborn.distplot.html in the doc, it uses kde=False and still seems to show something else. How can I set y-axis to show count?
I've tried
sns.distplot(group.rent_hour, ax=ax, norm_hist=True) and it still seems to give something else rather than count.
sns.distplot(group.rent_hour, ax=ax, kde=False) gives me count however I don't know why it is giving me count.

Answer 1:
From the documentation:
norm_hist : bool, optional
If True, the histogram height shows a density rather than a count.
This is implied if a KDE or fitted density is plotted.
So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.
Answer 2:
# Plotting hist without kde
ax = sns.distplot(your_data, kde=False)
# Creating another Y axis
second_ax = ax.twinx()
#Plotting kde without hist on the second Y axis
sns.distplot(your_data, ax=second_ax, kde=True, hist=False)
#Removing Y ticks from the second axis
second_ax.set_yticks([])

Related

Set log xticks in matplotlib for a linear plot

Consider
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.hist(np.log10(xdata), bins=100)
plt.show()
plt.semilogy(xdata)
plt.show()
is there any way to display xticks of the first plot (plt.hist) as in the second plot's yticks? For good reasons I want to histogram the np.log10(xdata) of xdata but I'd like to set minor ticks to display as usual in a log scale (even considering that the exponent is linear...)
In other words, I want the x_axis of this plot:
to be like the y_axis
of the 2nd plot, without changing the spacing between major ticks (e.g., adding log marks between 5.5 and 6.0, without altering these values)
Proper histogram plot with logarithmic x-axis:
Explanation:
Cut off negative values
The randomly generated example data likely contains still some negative values
activate the commented code lines at the beginning to see the effect
logarithmic function isn't defined for values <= 0
while the 2nd plot just deals with y-axis log scaling (negative values are just out of range), the 1st plot doesn't work with negative values in the BINs range
probably real world working data won't be <= 0, otherwise keep that in mind
BINs should be aligned to log scale as well
otherwise the 'BINs widths' distribution looks off
switch # on the plt.hist( statements in the 1st plot section to see the effect)
xdata (not np.log10(xdata)) to be plotted in the histogram
that 'workaround' with plotting np.log10(xdata) probably was the root cause for the misunderstanding in the comments
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
# MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}") # note the negative values
# cut off potential negative values (log function isn't defined for <= 0 )
xdata = np.ma.masked_less_equal(xdata, 0)
MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}")
# align the bins to fit a log scale
bins = 100
bins_log_aligned = np.logspace(np.log10(MIN_xdata), np.log10(MAX_xdata), bins)
# 1st plot
plt.hist(xdata, bins = bins_log_aligned) # note: xdata (not np.log10(xdata) )
# plt.hist(xdata, bins = 100)
plt.xscale('log')
plt.show()
# 2nd plot
plt.semilogy(xdata)
plt.show()
Just kept for now for clarification purpose. Will be deleted when the question is revised.
Disclaimer:
As Lucas M. Uriarte already mentioned that isn't an expected way of changing axis ticks.
x axis ticks and labels don't represent the plotted data
You should at least always provide that information along with such a plot.
The plot
From seeing the result I kinda understand where that special plot idea is coming from - still there should be a preferred way (e.g. conversion of the data in advance) to do such a plot instead of 'faking' the axis.
Explanation how that special axis transfer plot is done:
original x-axis is hidden
a twiny axis is added
note that its y-axis is hidden by default, so that doesn't need handling
twiny x-axis is set to log and the 2nd plot y-axis limits are transferred
subplots used to directly transfer the 2nd plot y-axis limits
use variables if you need to stick with your two plots
twiny x-axis is moved from top (twiny default position) to bottom (where the original x-axis was)
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.figure()
fig, axs = plt.subplots(2, figsize=(7,10), facecolor=(1, 1, 1))
# 1st plot
axs[0].hist(np.log10(xdata), bins=100) # plot the data on the normal x axis
axs[0].axes.xaxis.set_visible(False) # hide the normal x axis
# 2nd plot
axs[1].semilogy(xdata)
# 1st plot - twin axis
axs0_y_twin = axs[0].twiny() # set a twiny axis, note twiny y axis is hidden by default
axs0_y_twin.set(xscale="log")
# transfer the limits from the 2nd plot y axis to the twin axis
axs0_y_twin.set_xlim(axs[1].get_ylim()[0],
axs[1].get_ylim()[1])
# move the twin x axis from top to bottom
axs0_y_twin.tick_params(axis="x", which="both", bottom=True, top=False,
labelbottom=True, labeltop=False)
# Disclaimer
disclaimer_text = "Disclaimer: x axis ticks and labels don't represent the plotted data"
axs[0].text(0.5,-0.09, disclaimer_text, size=12, ha="center", color="red",
transform=axs[0].transAxes)
plt.tight_layout()
plt.subplots_adjust(hspace=0.2)
plt.show()

Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())
I read the documentation and also many similar questions here in Stack.
The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.
My code is
plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']
x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")
data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)
# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})
plt.legend(bbox_to_anchor=(1, 1), loc=1)
And I keep receiving this output:
Does anyone have an idea of what could be the problem here?
Thanks!
[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.
Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately.
From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.
versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:
# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.
first_ax = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])
If you need this just for visualization it should be good enough.

Adding histogram and 1 std band to residual plot

I have plotted a graph in python with a subplot of residuals and am trying to find a way to at a histogram plot of the residuals on the end of the histogram plot. I would also like to add a grey band on the residual plot showing 1 standard deviation.
also is there a way to remove the top and right-hand side boarders of the plot.
Here is a copy of the code and the graph I currently have.
fig1 = pyplot.figure(figsize =(9.6,7.2))
plt.frame1 =fig1.add_axes((0.2,0.4,.75,.6))
pyplot.errorbar(xval, yval*1000, yerr=yerr*1000, xerr=xerr, marker='x', linestyle='None')
# Axis labels
pyplot.xlabel('Height (m)', fontsize = 12)
pyplot.ylabel('dM/dt (g $s^{-1}$)', fontsize = 12)
# Generate best fit line using model function and best fit parameters, and add to plot
fit_line=model_funct(xval, [a_soln, b_soln])
pyplot.plot(xval, fit_line*1000)
# Set suitable axis limits: you will probably need to change these...
#pyplot.xlim(-1, 61)
#pyplot.ylim(65, 105)
# pyplot.show()
plt.frame2 = fig1.add_axes((0.2,0.2,.75,.2)) #start frame1 at 0.2, 0.4
plt.xlabel("Height of Water (m)", fontsize = 12)
plt.ylabel("Normalised\nResiduals", fontsize = 12) #\n is used to start a new line
plt.plot(h,normalised_residuals,"x", color = "green")
plt.axhline(0, linewidth=1, linestyle="--", color="black")
plt.savefig("Final Graph.png", dpi = 500)
The naming in your code is a bit weird, therefore I only post snippets since it is hard to try it by myself. Sometimes you use pyplot and sometimes you use plt which should be the same. Also you should name your axis like this ax = fig1.add_axes((0.2,0.4,.75,.6)). Then, if you do the plot, you should call it with the axis directly, i.e. use ax.errorbar().
To hide the borders of the axis in the top plot use:
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
Adding an error band in the bottom plot is pretty easy to do. Just calculate the mean and standard deviation using np.mean() and np.std(). Afterwards, call
plt.fill_between(h, y1=np.mean(normalised_residuals) - np.std(normalised_residuals),
y2=np.mean(normalised_residuals) + np.std(normalised_residuals),
color='gray', alpha=.5)
and change the color and alpha however you want it to be.
For the histogram projection you just add another axis like you've done it two times before (let's assume it is called ax) and call
ax.hist(normalised_residuals, bins=8, orientation="horizontal")
Here, bins has to be set to a small value probably since you don't have that many data points.

Seaborn (violinplot) too many y-axis values

I have a couple of subplots and the data are probabilities, so should (and do) range between 0 and 1. When I plot them with a violinplot the y-axis of ax[0] extends above 1 (see pic). I know this is just because of the distribution kernel that the violinplot makes, but still it looks bad and I want the y-axes of these 2 plots to be the same. I have tried set_ylim on the left plot, but then I can't get the values (or look) to be the same as the plot on the right. Any ideas?
When creating your subplots, set the sharey parameter to True so that both plots share the same limits for the vertical axis.
[EDIT]
Since you have already tried setting sharey to True, I suggest getting the lower and upper limits ymin and ymax from the left hand side figure and passing them as arguments in set_ylim() for the right hand side figure.
1) Create your subplots:
fig, ax1 = plt.subplots(1,2, figsize = (5, 5), dpi=100)
2) Create left hand side figure here: ax[0].plot(...)
3) Get the axes limits using the get_ylim() method as detailed here: ymin, ymax = ax[0].get_ylim()
4) Create right hand side figure: ax[1].plot(...)
5) Set the axes limits of this new figure: ax[1].set_ylim(bottom=ymin, top=ymax)
I don't have subplots, but I do have probabilities, and this visual extension beyond 1.0 was frustrating to me.
If you add 'cut=0' to the sns.violinplot() call, it will truncate the kernel at the range of your data exactly.
I found the answer here:
How to better fit seaborn violinplots?

Change distance between boxplots in the same figure in python [duplicate]

I'm drawing the bloxplot shown below using python and matplotlib. Is there any way I can reduce the distance between the two boxplots on the X axis?
This is the code that I'm using to get the figure above:
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['ytick.direction'] = 'out'
rcParams['xtick.direction'] = 'out'
fig = plt.figure()
xlabels = ["CG", "EG"]
ax = fig.add_subplot(111)
ax.boxplot([values_cg, values_eg])
ax.set_xticks(np.arange(len(xlabels))+1)
ax.set_xticklabels(xlabels, rotation=45, ha='right')
fig.subplots_adjust(bottom=0.3)
ylabels = yticks = np.linspace(0, 20, 5)
ax.set_yticks(yticks)
ax.set_yticklabels(ylabels)
ax.tick_params(axis='x', pad=10)
ax.tick_params(axis='y', pad=10)
plt.savefig(os.path.join(output_dir, "output.pdf"))
And this is an example closer to what I'd like to get visually (although I wouldn't mind if the boxplots were even a bit closer to each other):
You can either change the aspect ratio of plot or use the widths kwarg (doc) as such:
ax.boxplot([values_cg, values_eg], widths=1)
to make the boxes wider.
Try changing the aspect ratio using
ax.set_aspect(1.5) # or some other float
The larger then number, the narrower (and taller) the plot should be:
a circle will be stretched such that the height is num times the width. aspect=1 is the same as aspect=’equal’.
http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_aspect
When your code writes:
ax.set_xticks(np.arange(len(xlabels))+1)
You're putting the first box plot on 0 and the second one on 1 (event though you change the tick labels afterwards), just like in the second, "wanted" example you gave they are set on 1,2,3.
So i think an alternative solution would be to play with the xticks position and the xlim of the plot.
for example using
ax.set_xlim(-1.5,2.5)
would place them closer.
positions : array-like, optional
Sets the positions of the boxes. The ticks and limits are automatically set to match the positions. Defaults to range(1, N+1) where N is the number of boxes to be drawn.
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.boxplot.html
This should do the job!
As #Stevie mentioned, you can use the positions kwarg (doc) to manually set the x-coordinates of the boxes:
ax.boxplot([values_cg, values_eg], positions=[1, 1.3])

Categories

Resources