I am currently trying to plot two regression lines for my data split by a categorical attribute (which is either freedom or happiness scores). My current qualm is that I need color to encode another separate categorical attribute in my graph (GNI/capita brackets). Having a mix of colors seemed confusing so I decided to distinguish the data points using different markers instead. However, I am having trouble changing just one of the regression lines to a dashed line as they are identical. I don't even want to think about how I am going to create a legend for all of this. If you think this is an ugly graph, I agree, but certain circumstances mandate I have four attributes encoded in a single graph. By the way, open to any suggestions at all on a better way to do this - if there is any. An example of my current graph is below and would appreciate any help!
sns.lmplot(data=combined_indicators, x='x', y='y', hue='Indicator', palette=["#000620"], markers=['x', '.'], ci=None)
plt.axvspan(0,1025, alpha=0.5, color='#de425b', zorder=-1)
plt.axvspan(1025,4035, alpha=0.5, color='#fbb862', zorder=-1)
plt.axvspan(4035,12475, alpha=0.5, color ='#afd17c', zorder=-1)
plt.axvspan(12475,100000, alpha=0.5, color='#00876c', zorder=-1)
plt.title("HFI & Happiness Regressed on GNI/capita")
plt.xlabel("GNI/Capita by Purchasing Power Parity (2017 International $)")
plt.ylabel("Standard Indicator Score (0-10)")
My current figure rears its ugly head
To my knowledge, there is no easy way to change the style of the regression line in lmplot. But you can achieve your goal if you use regplot instead of lmplot, the drawback being that you have to implement the hue-splitting "by hand"
x_col = 'total_bill'
y_col = 'tip'
hue_col = 'smoker'
df = sns.load_dataset('tips')
markers = ['x','.']
colors = ["#000620", "#000620"]
linestyles = ['-','--']
plt.figure()
for (hue,gr),m,c,ls in zip(df.groupby(hue_col),markers,colors,linestyles):
sns.regplot(data=gr, x=x_col, y=y_col, marker=m, color=c, line_kws={'ls':ls}, ci=None, label=f'{hue_col}={hue}')
ax.legend()
Just wanted to add, if anyone stumbled upon this post later, you can create a legend for this mess manually using Line2D. Looks something like this for mine:
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color='#000620', lw=2, label='Freedom', linestyle='--'),
Line2D([0],[0], color='#000620', lw=2, label='Happiness'),
Line2D([0], [0], marker='x', color='#000620', label='Freedom',
markerfacecolor='#000620', markersize=15),
Line2D([0], [0], marker='.', color='#000620', label='Happiness',
markerfacecolor='#000620', markersize=15),
Patch(facecolor='#de425b', label='Low-Income'),
Patch(facecolor='#fbb862', label='Lower Middle-Income'),
Patch(facecolor='#afd17c', label='Upper Middle-Income'),
Patch(facecolor='#00876c', label='High-Income')]
The end result looks like this:
Graph with custom legend
Related
I try to figure out how to create scatter plot in matplotlib with two different y-axis values.
Now i have one and need to add second with index column values on y.
points1 = plt.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1) #set style options
plt.rcParams['figure.figsize'] = [20,10]
#plt.colorbar(points)
plt.title("timeUTC vs Load")
#plt.xlim(0, 400)
#plt.ylim(0, 300)
plt.xlabel('timeUTC')
plt.ylabel('Load_MW')
cbar = plt.colorbar(points1)
cbar.set_label('Load')
Result i expect is like this:
So second scatter set should be for TimeUTC vs index. Colors are not the subject;) also in excel y-axes are different sites, but doesnt matter.
Appriciate your help! Thanks, Paulina
Continuing after the suggestions in the comments.
There are two ways of using matplotlib.
Via the matplotlib.pyplot interface, like you were doing in your original code snippet with .plt
The object-oriented way. This is the suggested way to use matplotlib, especially when you need more customisation like in your case. In your code, ax1 is an Axes instance.
From an Axes instance, you can plot your data using the Axes.plot and Axes.scatter methods, very similar to what you did through the pyplot interface. This means, you can write a Axes.scatter call instead of .plot and use the same parameters as in your original code:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1)
ax2.plot(r3_dda249["TimeUTC"], r3_dda249.index, c='b', linestyle='-')
ax1.set_xlabel('TimeUTC')
ax1.set_ylabel('r3_load_MW', color='g')
ax2.set_ylabel('index', color='b')
plt.show()
I have a box plot that I create using the following command:
sns.boxplot(y='points_per_block', x='block', data=data, hue='habit_trial')
So the different colors represent whether the trial was a habit trial or not (0,1). I want to also plot the individual data points, which I tried to achieve using:
sns.stripplot(y='points_per_block', x='block', data=data, hue='habit_trial')
The result was the following
I want the individual points to display over the corresponding box plots. Is there a way to do this without resorting to hacking their positions in some manner? The problem comes from the fact that the separation of data using hue works differently for stripplot and boxplot but I would have thought that these would be easily combinable.
Thanks in advance.
Seaborn functions working with categorical data usually have a dodge= parameter indicating whether data with different hue should be separated a bit. For a boxplot, dodge defaults to True, as it usually would look bad without dodging. For a stripplot defaults to dodge=False.
The following example also shows how the legend can be updated (matplotlib 3.4 is needed for HandlerTuple):
import seaborn as sns
from matplotlib.legend_handler import HandlerTuple
tips = sns.load_dataset("tips")
ax = sns.boxplot(data=tips, x="day", y="total_bill",
hue="smoker", hue_order=['Yes', 'No'], boxprops={'alpha': 0.4})
sns.stripplot(data=tips, x="day", y="total_bill",
hue="smoker", hue_order=['Yes', 'No'], dodge=True, ax=ax)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=[(handles[0], handles[2]), (handles[1], handles[3])],
labels=['Smoker', 'Non-smoker'],
loc='upper left', handlelength=4,
handler_map={tuple: HandlerTuple(ndivide=None)})
How can I use the lineplot plotting function in seaborn to create a plot with no lines connecting between the points. I know the function is called lineplot, but it has the useful feature of merging all datapoints with the same x value and plotting a single mean and confidence interval.
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', err_style='bars')
How do I plot without the line? I'm not sure of a better way to phrase my question. How can I plot points only? Lineless lineplot?
I know that seaborn has a pointplot function, but that is for categorical data. In some cases, my x-values are continuous values, so pointplot would not work.
I realize one could get into the matplotlib figure artists and delete the line, but that gets more complicated as the amount of stuff on the plot increases. I was wondering if there are some sort of arguments that can be passed to the lineplot function.
To get error bars without the connecting lines, you can set the linestyle parameter to '':
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', linestyle='', err_style='bars')
Other types of linestyle could also be interesting, for example "a loosely dotted line": sns.lineplot(..., linestyle=(0, (1, 10)))
I recommend setting join=False.
For me only join = True works.
sns.pointplot(data=df, x = "x_attribute", y = "y_attribute", ci= 95, join=False)
I am trying to add a legend to my graph in matplotlib. instead of creating a legend it puts the full list of all mylabels in the legend.
My graph looks like this:
The legend is cut off and i cant see more than that, i assume due to its size.
This is my code:
features2 = ["Number of Sides"]
features3 = ["Largest Angle"]
header2 = ["Label"]
data_df = pd.DataFrame.from_csv("AllMixedShapes2.csv")
X1 = np.array(data_df[features2].values)
y1 = np.array(data_df[features3].values)
l = np.array(data_df[header2].values)
plt.scatter(X1[:, 0],y1, c=y, cmap=plt.cm.Paired, label=l)
plt.axis([0, 17, 0, 200])
plt.ylabel("Maximum Angle (Degrees)")
plt.xlabel("Number Of Sides")
plt.title('Original 450 Test Shapes')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()
And AllMixedShapes2.csv looks like this:
I'm quite new to python and machine learning and ive tried other examples but i cant get anything to work.
Matplotlib's label argument is meant to be a single string that labels the entire dataset, rather than an array of individual labels for the points within the dataset. If you wish to pass an array of point-by-point labels that will be aggregated into a legend, the best option is probably the Seaborn library. Seaborn provides a wrapper around matplotlib for more convenient statistical visualization.
This should do approximately what you wish to do with your data:
import seaborn
seaborn.lmplot('Number of Sides', 'Largest Angle', hue='Label',
data=data_df, fit_reg=False)
I'd suggest checking out the seaborn example gallery for more ideas.
So I am having a little alignment issue with my legend in matplotlib. Hopefully it is easily solvable with the right know-how. I have scoured the matplotlib website but I'm struggling to find the exact solution.
Essentially I have the following axis vertical spans with the following labels (note that I have used $\mathrm{}$ for these):
fig = plt.figure()
ax = fig.add_subplot(111)
ax.axvspan(3851.1, 3951.1, color='gray', alpha=0.4, lw=1,label='$\mathrm{D}_{n}(4000)$')
ax.axvspan(4001.1, 4101.2, color='gray', alpha=0.4, lw=1)
ax.axvspan(4084.7, 4123.4, color='gray', alpha=0.8, lw=1,label='$\mathrm{H}\delta_{\mathrm{A}}$')
I also have the following legend label:
ax.legend(prop={'size':8}, loc=2)
Now the problem that I am seeing is this (for the image I have increased the prop size to 12 to show the issue but it scales down for size 8):
My issue is that the alignment is slightly off between the vspan regions and the math mode descriptive labels. Taking away the math mode solves the problem, but the labels do not contain the correct subscripts and greek lettering that I require. See here:
I was wondering if anyone knew of any alignment arguments for this niche scenario?