Seaborn: overlay scatterplot on top of boxplot - python

I want to overlay a simple scatterplot on top of my boxplot. When I plot the two plots separately, everything is fine:
But when I try to combine them, it almost looks like x-values of the scatterplot are all being divided by 2:
I'm pretty sure it's because the boxplot treats the x-axis values as categorical (even though they are floats in my DataFrame), while the scatterplot treats them as continuous. But I don't know what the solution is. Here is the code I am using:
sns.scatterplot(data=old_data, x="x-axis", y="y-axis", s=200, color="red", label="Old Data")
sns.boxplot(data=new_data, x="x-axis", y="y-axis", color="blue")
plt.plot([], [], label="New Data", color='blue') # this just adds the boxplot label to the legend
plt.legend()
Bonus question: if you know how to add the boxplot label to the legend in a better way, I would love to hear about it.

Related

Misaligned labels in Seaborn ridge plot

I noticed that if I change this line g.figure.subplots_adjust(hspace=-0.25) in Seaborn ridge plot example here, the labels don't align well. For example if I change the line to g.figure.subplots_adjust(hspace=-0.9), this is what I get in picture below.
Is there a way to match the labels when trying to overlap histograms more using g.figure.subplots_adjust(hspace=-0.9) ?
For the y-axis labels, the strings are created individually and the y-axis position is determined manually. So changing the y-axis coordinates in ax.text() will produce the intended result. I tried it manually and 0.045 seemed optimal. You can modify it to your liking.
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .045, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)

Seaborn lineplot without lines between points

How can I use the lineplot plotting function in seaborn to create a plot with no lines connecting between the points. I know the function is called lineplot, but it has the useful feature of merging all datapoints with the same x value and plotting a single mean and confidence interval.
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', err_style='bars')
How do I plot without the line? I'm not sure of a better way to phrase my question. How can I plot points only? Lineless lineplot?
I know that seaborn has a pointplot function, but that is for categorical data. In some cases, my x-values are continuous values, so pointplot would not work.
I realize one could get into the matplotlib figure artists and delete the line, but that gets more complicated as the amount of stuff on the plot increases. I was wondering if there are some sort of arguments that can be passed to the lineplot function.
To get error bars without the connecting lines, you can set the linestyle parameter to '':
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', linestyle='', err_style='bars')
Other types of linestyle could also be interesting, for example "a loosely dotted line": sns.lineplot(..., linestyle=(0, (1, 10)))
I recommend setting join=False.
For me only join = True works.
sns.pointplot(data=df, x = "x_attribute", y = "y_attribute", ci= 95, join=False)

Change each regression line styling using in a multiple regressions plot Python

I am currently trying to plot two regression lines for my data split by a categorical attribute (which is either freedom or happiness scores). My current qualm is that I need color to encode another separate categorical attribute in my graph (GNI/capita brackets). Having a mix of colors seemed confusing so I decided to distinguish the data points using different markers instead. However, I am having trouble changing just one of the regression lines to a dashed line as they are identical. I don't even want to think about how I am going to create a legend for all of this. If you think this is an ugly graph, I agree, but certain circumstances mandate I have four attributes encoded in a single graph. By the way, open to any suggestions at all on a better way to do this - if there is any. An example of my current graph is below and would appreciate any help!
sns.lmplot(data=combined_indicators, x='x', y='y', hue='Indicator', palette=["#000620"], markers=['x', '.'], ci=None)
plt.axvspan(0,1025, alpha=0.5, color='#de425b', zorder=-1)
plt.axvspan(1025,4035, alpha=0.5, color='#fbb862', zorder=-1)
plt.axvspan(4035,12475, alpha=0.5, color ='#afd17c', zorder=-1)
plt.axvspan(12475,100000, alpha=0.5, color='#00876c', zorder=-1)
plt.title("HFI & Happiness Regressed on GNI/capita")
plt.xlabel("GNI/Capita by Purchasing Power Parity (2017 International $)")
plt.ylabel("Standard Indicator Score (0-10)")
My current figure rears its ugly head
To my knowledge, there is no easy way to change the style of the regression line in lmplot. But you can achieve your goal if you use regplot instead of lmplot, the drawback being that you have to implement the hue-splitting "by hand"
x_col = 'total_bill'
y_col = 'tip'
hue_col = 'smoker'
df = sns.load_dataset('tips')
markers = ['x','.']
colors = ["#000620", "#000620"]
linestyles = ['-','--']
plt.figure()
for (hue,gr),m,c,ls in zip(df.groupby(hue_col),markers,colors,linestyles):
sns.regplot(data=gr, x=x_col, y=y_col, marker=m, color=c, line_kws={'ls':ls}, ci=None, label=f'{hue_col}={hue}')
ax.legend()
Just wanted to add, if anyone stumbled upon this post later, you can create a legend for this mess manually using Line2D. Looks something like this for mine:
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color='#000620', lw=2, label='Freedom', linestyle='--'),
Line2D([0],[0], color='#000620', lw=2, label='Happiness'),
Line2D([0], [0], marker='x', color='#000620', label='Freedom',
markerfacecolor='#000620', markersize=15),
Line2D([0], [0], marker='.', color='#000620', label='Happiness',
markerfacecolor='#000620', markersize=15),
Patch(facecolor='#de425b', label='Low-Income'),
Patch(facecolor='#fbb862', label='Lower Middle-Income'),
Patch(facecolor='#afd17c', label='Upper Middle-Income'),
Patch(facecolor='#00876c', label='High-Income')]
The end result looks like this:
Graph with custom legend

matplotlib hatched and filled histograms

I would like to make histograms that are both hatched and filled (like these bar plots on the left in this matplotlib example):
Here's the code I tried to use:
import matplotlib.pyplot as plt
plt.hist(values, bins, histtype='step', linewidth=2, facecolor='c', hatch='/')
But no matter whether I specify "facecolor" or "color", only the lines of the hatching appear in colour and the histogram is still unfilled. How can I make the hatching show up on top of a filled histogram?
In order to fill the area below the histogram the kwarg fill can be set to True. Then, the facecolor and edgecolor can be set in order to use different colors for the hatch and the background.
plt.hist(np.random.normal(size=500), bins=10, histtype='step', linewidth=2, facecolor='c',
hatch='/', edgecolor='k',fill=True)
This generates the following output:
histtype='step'draws step lines. They are by definition not filled (because they are lines.
Instead, use histtype='bar' (which is the default, so you may equally leave it out completely).

Use colormap range for single line plot

I am plotting in a loop this block of code :
fig=plt.figure(figsize=(15,10))
ax1=fig.add_subplot(111)
ax1.plot(item['time'][:-1],item[headerss].iloc[:-1],marker='o')
ax1.legend(headerss,loc='center left', bbox_to_anchor=(1.0, 0.5))
ax1.set_xlabel('time')
ax1.set_ylabel('concentration (ppb)')
title=item['date'][0]+' '+item['list'][0]
ax1.set_title(title)
fig.savefig(title,bbox_inches='tight')
Item is a dataframe. I have over 20 item['concentrations'] and I would like to have as many different colors without creating a loop on the ax1.plot line.
Can I use an existing set of colors like the Python colormaps?
Cheers
The matplotlib plot, by definition, has a single color. If you don't want to loop over the points and plot them one by one, you can use a scatter plot.
ax1.scatter(item['time'][:-1],item[headerss].iloc[:-1],c=range(len(item[headerss].iloc[:-1])),marker='o', cmap="jet")
You can get a colormap, like plt.afmhot, and use it in imshow. You can see different colormaps here.
fig=plt.figure(figsize=(15,10))
ax1=fig.add_subplot(111)
ax1.imshow(item['time'][:-1],item[headerss].iloc[:-1], interpolation='nearest', cmap=plt.afmhot)
ax1.legend(headerss,loc='center left', bbox_to_anchor=(1.0, 0.5))
ax1.set_xlabel('time')
ax1.set_ylabel('concentration (ppb)')
title=item['date'][0]+' '+item['list'][0]
ax1.set_title(title)
fig.savefig(title,bbox_inches='tight')

Categories

Resources