Show confidence interval in legend of plot in Python / Seaborn - python

I am generating some scatter plots with linear regression and confidence interval using seaborn on Python, with the sns.regplot function. I could find a way to show the Regression line in the legend, but I would also like to add the Confidence Interval in the legend (with the transparent blue as the reference colour).
Here is the code I have and the result I get so far.
Tobin_Nationality_Reg = sns.regplot(x="Nationality_Index_Normalized",
y="Tobins_Q_2017",
data=Scatter_Plot,
line_kws={'label':'Regression line'})
plt.xlabel("Nationality Index")
plt.ylabel("Tobin's Q")
plt.legend()`
plt.savefig('Tobin_Nationality_Reg.png')
Here is the output I currently get:
Scatter Plot
Does anybody have an idea how I could do that? Thanks in advance.

I believe there is no clean way to do this, because seaborn does not expose keyword arguments for the fill_between call that plots the confidence interval.
However, it can be done by modifying the label attribute of the PolyCollection directly:
x, y = np.random.rand(2, 20)
ax = sns.regplot(x, y, line_kws={'label': 'Regression line'})
ax.collections[1].set_label('Confidence interval')
ax.legend()

Related

Seaborn lineplot without lines between points

How can I use the lineplot plotting function in seaborn to create a plot with no lines connecting between the points. I know the function is called lineplot, but it has the useful feature of merging all datapoints with the same x value and plotting a single mean and confidence interval.
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', err_style='bars')
How do I plot without the line? I'm not sure of a better way to phrase my question. How can I plot points only? Lineless lineplot?
I know that seaborn has a pointplot function, but that is for categorical data. In some cases, my x-values are continuous values, so pointplot would not work.
I realize one could get into the matplotlib figure artists and delete the line, but that gets more complicated as the amount of stuff on the plot increases. I was wondering if there are some sort of arguments that can be passed to the lineplot function.
To get error bars without the connecting lines, you can set the linestyle parameter to '':
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', linestyle='', err_style='bars')
Other types of linestyle could also be interesting, for example "a loosely dotted line": sns.lineplot(..., linestyle=(0, (1, 10)))
I recommend setting join=False.
For me only join = True works.
sns.pointplot(data=df, x = "x_attribute", y = "y_attribute", ci= 95, join=False)

Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())
I read the documentation and also many similar questions here in Stack.
The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.
My code is
plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']
x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")
data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)
# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})
plt.legend(bbox_to_anchor=(1, 1), loc=1)
And I keep receiving this output:
Does anyone have an idea of what could be the problem here?
Thanks!
[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.
Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately.
From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.
versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:
# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.
first_ax = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])
If you need this just for visualization it should be good enough.

Python implementation of non uniform (non linear) x-axis in matplotlib

I am trying to have a non linear x - axis in Python using matplotlib and haven't found any functions or hack arounds to this problem.
This is how our graph looks at this point of time and I want to convert it to something like this. (Look at the difference in x axes of both graphs)
The code I have as of now is:
plt.axis([0, 100, 0, 1])
plt.plot(onecsma_x, onecsma_y, label='1-CSMA')
plt.plot(slotted_aloha_x,slotted_aloha_y, label ='Slotted Aloha')
plt.plot(pure_aloha_x,pure_aloha_y, label ='Pure Aloha')
plt.plot(npcsma_x, npcsma_y, label ='Non persisten CSMA')
plt.plot(pcsma_x, pcsma_y, label ='P persistent CSMA')
plt.legend(loc='upper right')
plt.show()
For log x-axis use semilogx instead of plot.
Also you could limit the x-axis maybe after using semilogx (but before show) with:
plt.xlim(0, 10**2)

Plotting a legend with matplotlib: error

I am trying to add a legend to my graph in matplotlib. instead of creating a legend it puts the full list of all mylabels in the legend.
My graph looks like this:
The legend is cut off and i cant see more than that, i assume due to its size.
This is my code:
features2 = ["Number of Sides"]
features3 = ["Largest Angle"]
header2 = ["Label"]
data_df = pd.DataFrame.from_csv("AllMixedShapes2.csv")
X1 = np.array(data_df[features2].values)
y1 = np.array(data_df[features3].values)
l = np.array(data_df[header2].values)
plt.scatter(X1[:, 0],y1, c=y, cmap=plt.cm.Paired, label=l)
plt.axis([0, 17, 0, 200])
plt.ylabel("Maximum Angle (Degrees)")
plt.xlabel("Number Of Sides")
plt.title('Original 450 Test Shapes')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()
And AllMixedShapes2.csv looks like this:
I'm quite new to python and machine learning and ive tried other examples but i cant get anything to work.
Matplotlib's label argument is meant to be a single string that labels the entire dataset, rather than an array of individual labels for the points within the dataset. If you wish to pass an array of point-by-point labels that will be aggregated into a legend, the best option is probably the Seaborn library. Seaborn provides a wrapper around matplotlib for more convenient statistical visualization.
This should do approximately what you wish to do with your data:
import seaborn
seaborn.lmplot('Number of Sides', 'Largest Angle', hue='Label',
data=data_df, fit_reg=False)
I'd suggest checking out the seaborn example gallery for more ideas.

Missing plots after for loop

I have a dataset that I want to plot, and also do a linear regression on the data in some invervals, plotting it in the same graph.
But I have some problems with this... The main graph is plotted first, the intervals and the linear regression in the for loop:
plt.plot(Trec, lnp, 'r-')
for i in range(len(Werte)):
plt.plot( subset(Time, Trec, Data[i][5], Data[i][6])[1], subset(Time, Trec, Data[i][5], Data[i][6])[1] * Data[i][2] + Data[i][4])
plt.axvline(x=Data[i][5])
plt.show()
With this code it only plots me the last iteration of the for loop. By itself, the commands all do what I intend them to do... What am I doing wrong?
What you want is superimposing figures on the same plot. For that purpose, you can use the axis object returned by subplots.
fig, ax = plt.subplots()
ax.plot(...) # plot your data here
ax.plot(...) # plot your interval and regression here.
plt.show()

Categories

Resources