matplotlib does not show legend in scatter plot - python

I am trying to work on a clustering problem for which I need to plot a scatter plot for my clusters.
%matplotlib inline
import matplotlib.pyplot as plt
df = pd.merge(dataframe,actual_cluster)
plt.scatter(df['x'], df['y'], c=df['cluster'])
plt.legend()
plt.show()
df['cluster'] is the actual cluster number. So I want that to be my color code.
It shows me a plot but it does not show me the legend. it does not give me error as well.
Am I doing something wrong?

EDIT:
Generating some random data:
from scipy.cluster.vq import kmeans2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
n_clusters = 10
df = pd.DataFrame({'x':np.random.randn(1000), 'y':np.random.randn(1000)})
_, df['cluster'] = kmeans2(df, n_clusters)
Update
Use seaborn.relplot with kind='scatter' or use seaborn.scatterplot
Specify hue='cluster'
# figure level plot
sns.relplot(data=df, x='x', y='y', hue='cluster', palette='tab10', kind='scatter')
# axes level plot
fig, axes = plt.subplots(figsize=(6, 6))
sns.scatterplot(data=df, x='x', y='y', hue='cluster', palette='tab10', ax=axes)
axes.legend(loc='center left', bbox_to_anchor=(1, 0.5))
Original Answer
Plotting (matplotlib v3.3.4):
fig, ax = plt.subplots(figsize=(8, 6))
cmap = plt.cm.get_cmap('jet')
for i, cluster in df.groupby('cluster'):
_ = ax.scatter(cluster['x'], cluster['y'], color=cmap(i/n_clusters), label=i, ec='k')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
Result:
Explanation:
Not going too much into nitty gritty details of matplotlib internals, plotting one cluster at a time sort of solves the issue.
More specifically, ax.scatter() returns a PathCollection object which we are explicitly throwing away here but which seems to be passed internally to some sort of legend handler. Plotting all at once generates only one PathCollection/label pair, while plotting one cluster at a time generates n_clusters PathCollection/label pairs. You can see those objects by calling ax.get_legend_handles_labels() which returns something like:
([<matplotlib.collections.PathCollection at 0x7f60c2ff2ac8>,
<matplotlib.collections.PathCollection at 0x7f60c2ff9d68>,
<matplotlib.collections.PathCollection at 0x7f60c2ff9390>,
<matplotlib.collections.PathCollection at 0x7f60c2f802e8>,
<matplotlib.collections.PathCollection at 0x7f60c2f809b0>,
<matplotlib.collections.PathCollection at 0x7f60c2ff9908>,
<matplotlib.collections.PathCollection at 0x7f60c2f85668>,
<matplotlib.collections.PathCollection at 0x7f60c2f8cc88>,
<matplotlib.collections.PathCollection at 0x7f60c2f8c748>,
<matplotlib.collections.PathCollection at 0x7f60c2f92d30>],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
So actually ax.legend() is equivalent to ax.legend(*ax.get_legend_handles_labels()).
NOTES:
If using Python 2, make sure i/n_clusters is a float
Omitting fig, ax = plt.subplots() and using plt.<method> instead
of ax.<method> works fine, but I always prefer to explicitly
specify the Axes object I am using rather then implicitly use the
"current axes" (plt.gca()).
OLD SIMPLE SOLUTION
In case you are ok with a colorbar (instead of discrete value labels), you can use Pandas built-in Matplotlib functionality:
df.plot.scatter('x', 'y', c='cluster', cmap='jet')

This is a question that bothers me for so long. Now, I want to provide another simple solution. We do not have to write any loops!!!
def vis(ax, df, label, title="visualization"):
points = ax.scatter(df[:, 0], df[:, 1], c=label, label=label, alpha=0.7)
ax.set_title(title)
ax.legend(*points.legend_elements(), title="Classes")

Related

Combine 2 kde-functions in one plot in seaborn

I have the following code for plotting the histogram and the kde-functions (Kernel density estimation) of a training and validation dataset:
#Plot histograms
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
displot_dataTrain=sns.displot(data_train, bins='auto', kde=True)
displot_dataTrain._legend.remove()
plt.ylabel('Count')
plt.xlabel('Training Data')
plt.title("Histogram Training Data")
plt.show()
displot_dataValid =sns.displot(data_valid, bins='auto', kde=True)
displot_dataValid._legend.remove()
plt.ylabel('Count')
plt.xlabel('Validation Data')
plt.title("Histogram Validation Data")
plt.show()
# Try to plot the kde-functions together --> yields an AttributeError
X1 = np.linspace(data_train.min(), data_train.max(), 1000)
X2 = np.linspace(data_valid.min(), data_valid.max(), 1000)
fig, ax = plt.subplots(1,2, figsize=(12,6))
ax[0].plot(X1, displot_dataTest.kde.pdf(X1), label='train')
ax[1].plot(X2, displot_dataValid.kde.pdf(X1), label='valid')
The plotting of the histograms and kde-functions inside one plot works without problems. Now I would like to have the 2 kde-functions inside one plot but when using the posted code, I get the following error AttributeError: 'FacetGrid' object has no attribute 'kde'
Do you have any idea, how I can combined the 2 kde-functions inside one plot (without the histogram)?
sns.displot() returns a FacetGrid. That doesn't work as input for ax.plot(). Also, displot_dataTest.kde.pdf is never valid. However, you can write sns.kdeplot(data=data_train, ax=ax[0]) to create a kdeplot inside the first subplot. See the docs; note the optional parameters cut= and clip= that can be used to adjust the limits.
If you only want one subplot, you can use fig, ax = plt.subplots(1, 1, figsize=(12,6)) and use ax=ax instead of ax=ax[0] as in that case ax is just a single subplot, not an array of subplots.
The following code has been tested using the latest seaborn version:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
fig, ax = plt.subplots(figsize=(12, 6))
sns.kdeplot(data=np.random.normal(0.1, 1, 100).cumsum(),
color='crimson', label='train', fill=True, ax=ax)
sns.kdeplot(data=np.random.normal(0.1, 1, 100).cumsum(),
color='limegreen', label='valid', fill=True, ax=ax)
ax.legend()
plt.tight_layout()
plt.show()

How to set ticklabel rotation and add bar annotations

I would like to draw the following bar plot with annotation and I want to keep the x-label 45 degree so that it is easily readable. I am not sure why my code is not working. I have added the sample data and desired bar plots as a attachment. I appreciate your suggestions! Thanks!
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
#sns.set(rc={"figure.dpi":300, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
#sns.set_style('white')
sns.set_context("paper", font_scale = 2)
colors = ['b', 'g', 'r', 'c', 'm']
#sns.set(style="whitegrid")
#sns.set_palette(sns.color_palette(colors))
#fig, (ax1,ax2) = plt.subplots(1, 2, figsize=(16, 8))
#fig.subplots_adjust(wspace=0.3)
plots1 = sns.barplot(x="Model", y="G-mean", data=df_Aussel2014_5features, ax=ax1,palette='Spectral')
# Iterrating over the bars one-by-one
for bar in plots1.patches:
# Using Matplotlib's annotate function and
# passing the coordinates where the annotation shall be done
plots1.annotate(format(bar.get_height(), '.2f'),
(bar.get_x() + bar.get_width() / 2,
bar.get_height()), ha='center', va='center',
size=10, xytext=(0, 5),
textcoords='offset points')
plt.show()
# Save figure
#plt.savefig('Aussel2014_5features.png', dpi=300, transparent=False, bbox_inches='tight')
I got the following image.
You are using the object oriented interface (e.g. axes) so don't mix plt. and axes. methods
seaborn.barplot is an axes-level plot, which returns a matplotlib axes, p1 in this case.
Use the matplotlib.axes.Axes.tick_params to set the rotation of the axis, or a number of other parameters, as shown in the documentation.
Use matplotlib.pyplot.bar_label to add bar annotations.
See this answer with additional details and examples for using the method.
Adjust the nrow, ncols and figsize as needed, and set sharex=False and sharey=False.
Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3, seaborn 0.11.2
import seaborn as sns
import matplotlib.pyplot as plot
import pandas as pd
# data
data = {'Model': ['QDA', 'LDA', 'DT', 'Bagging', 'NB'],
'G-mean': [0.703780, 0.527855, 0.330928, 0.294414, 0.278713]}
df = pd.DataFrame(data)
# create figure and axes
fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(8, 8), sharex=False, sharey=False)
# plot
p1 = sns.barplot(x="Model", y="G-mean", data=df, palette='Spectral', ax=ax1)
p1.set(title='Performance Comparison based on G-mean')
# add annotation
p1.bar_label(p1.containers[0], fmt='%0.2f')
# add a space on y for the annotations
p1.margins(y=0.1)
# rotate the axis ticklabels
p1.tick_params(axis='x', rotation=45)
import matplotlib.pyplot as plt. plt.xticks(rotation=‌​45)
Example :
import matplotlib.pyplot as plt
plt.xticks(rotation=‌​45)

Create a discrete colorbar in matplotlib

I've tried the other threads, but can't work out how to solve. I'm attempting to create a discrete colorbar. Much of the code appears to be working, a discrete bar does appear, but the labels are wrong and it throws the error: "No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf)."
Pretty sure the error is because I'm missing an argument in plt.colorbar, but not sure what it's asking for or how to define it.
Below is what I have. Any thoughts gratefully received:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
norm = mpl.colors.BoundaryNorm(np.arange(-0.5,4), cmap.N)
ex2 = sample_data.plot.scatter(x='order_count', y='total_value',c='cluster', marker='+', ax=ax, cmap='plasma', norm=norm, s=100, edgecolor ='none', alpha=0.70)
plt.colorbar(ticks=np.linspace(0,3,4))
plt.show()
Indeed, the fist argument to colorbar should be a ScalarMappable, which would be the scatter plot PathCollection itself.
Setup
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"x" : np.linspace(0,1,20),
"y" : np.linspace(0,1,20),
"cluster" : np.tile(np.arange(4),5)})
cmap = mpl.colors.ListedColormap(["navy", "crimson", "limegreen", "gold"])
norm = mpl.colors.BoundaryNorm(np.arange(-0.5,4), cmap.N)
Pandas plotting
The problem is that pandas does not provide you access to this ScalarMappable directly. So one can catch it from the list of collections in the axes, which is easy if there is only one single collection present: ax.collections[0].
fig, ax = plt.subplots()
df.plot.scatter(x='x', y='y', c='cluster', marker='+', ax=ax,
cmap=cmap, norm=norm, s=100, edgecolor ='none', alpha=0.70, colorbar=False)
fig.colorbar(ax.collections[0], ticks=np.linspace(0,3,4))
plt.show()
Matplotlib plotting
One could consider using matplotlib directly to plot the scatter in which case you would directly use the return of the scatter function as argument to colorbar.
fig, ax = plt.subplots()
scatter = ax.scatter(x='x', y='y', c='cluster', marker='+', data=df,
cmap=cmap, norm=norm, s=100, edgecolor ='none', alpha=0.70)
fig.colorbar(scatter, ticks=np.linspace(0,3,4))
plt.show()
Output in both cases is identical.

How can I change the font size using seaborn FacetGrid?

I have plotted my data with factorplot in seaborn and get facetgrid object, but still cannot understand how the following attributes could be set in such a plot:
Legend size: when I plot lots of variables, I get very small legends, with small fonts.
Font sizes of y and x labels (a similar problem as above)
You can scale up the fonts in your call to sns.set().
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x = np.random.normal(size=37)
y = np.random.lognormal(size=37)
# defaults
sns.set()
fig, ax = plt.subplots()
ax.plot(x, y, marker='s', linestyle='none', label='small')
ax.legend(loc='upper left', bbox_to_anchor=(0, 1.1))
sns.set(font_scale=5) # crazy big
fig, ax = plt.subplots()
ax.plot(x, y, marker='s', linestyle='none', label='big')
ax.legend(loc='upper left', bbox_to_anchor=(0, 1.3))
The FacetGrid plot does produce pretty small labels. While #paul-h has described the use of sns.set as a way to the change the font scaling, it may not be the optimal solution since it will change the font_scale setting for all plots.
You could use the seaborn.plotting_context to change the settings for just the current plot:
with sns.plotting_context(font_scale=1.5):
sns.factorplot(x, y ...)
I've made some modifications to #paul-H code, such that you can independently set the font size for the x/y axes and legend:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x = np.random.normal(size=37)
y = np.random.lognormal(size=37)
# defaults
sns.set()
fig, ax = plt.subplots()
ax.plot(x, y, marker='s', linestyle='none', label='small')
ax.legend(loc='upper left', fontsize=20,bbox_to_anchor=(0, 1.1))
ax.set_xlabel('X_axi',fontsize=20);
ax.set_ylabel('Y_axis',fontsize=20);
plt.show()
This is the output:
For the legend, you can use this
plt.setp(g._legend.get_title(), fontsize=20)
Where g is your facetgrid object returned after you call the function making it.
This worked for me
g = sns.catplot(x="X Axis", hue="Class", kind="count", legend=False, data=df, height=5, aspect=7/4)
g.ax.set_xlabel("",fontsize=30)
g.ax.set_ylabel("Count",fontsize=20)
g.ax.tick_params(labelsize=15)
What did not work was to call set_xlabel directly on g like g.set_xlabel() (then I got a "Facetgrid has no set_xlabel" method error)

How to set xticks in subplots

If I plot a single imshow plot I can use
fig, ax = plt.subplots()
ax.imshow(data)
plt.xticks( [4, 14, 24], [5, 15, 25] )
to replace my xtick labels.
Now, I am plotting 12 imshow plots using
f, axarr = plt.subplots(4, 3)
axarr[i, j].imshow(data)
How can I change my xticks just for one of these subplots? I can only access the axes of the subplots with axarr[i, j]. How can I access plt just for one particular subplot?
There are two ways:
Use the axes methods of the subplot object (e.g. ax.set_xticks and ax.set_xticklabels) or
Use plt.sca to set the current axes for the pyplot state machine (i.e. the plt interface).
As an example (this also illustrates using setp to change the properties of all of the subplots):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=3, ncols=4)
# Set the ticks and ticklabels for all axes
plt.setp(axes, xticks=[0.1, 0.5, 0.9], xticklabels=['a', 'b', 'c'],
yticks=[1, 2, 3])
# Use the pyplot interface to change just one subplot...
plt.sca(axes[1, 1])
plt.xticks(range(3), ['A', 'Big', 'Cat'], color='red')
fig.tight_layout()
plt.show()
See the (quite) recent answer on the matplotlib repository, in which the following solution is suggested:
If you want to set the xticklabels:
ax.set_xticks([1,4,5])
ax.set_xticklabels([1,4,5], fontsize=12)
If you want to only increase the fontsize of the xticklabels, using the default values and locations (which is something I personally often need and find very handy):
ax.tick_params(axis="x", labelsize=12)
To do it all at once:
plt.setp(ax.get_xticklabels(), fontsize=12, fontweight="bold",
horizontalalignment="left")`

Categories

Resources