I am using Seaborn heatmap to plot the output of a large confusion matrix. Since the diagonal element represents the correct prediction, they are more important to show the number/correct rate. As the question suggests, how to annotate only the diagonal entries in a heatmap?
I have consulted this website https://seaborn.pydata.org/examples/many_pairwise_correlations.html, but it does not help with how to annotate only the diagonal entries. Hope somebody could help with that. Thank you in advance!
Does this help you in getting what you have in mind? The URL example given by you does not have a diagonal, I had annotated the diagonal below the main diagonal instead. To annotate your confusion matrix diagonal, you can adapt to my code by changing the -1 value in np.diag(..., -1) to 0.
Note the additional parameter fmt='' that I had added in sns.heatmap(...) because my annot matrix elements are strings.
Code
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
y = rs.normal(size=(100, 26))
d = pd.DataFrame(data=y, columns=list(ascii_letters[26:]))
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype='bool')
mask[np.triu_indices_from(mask)] = True
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)
# Generate the annotation
annot = np.diag(np.diag(corr.values,-1),-1)
annot = np.round(annot,2)
annot = annot.astype('str')
annot[annot=='0.0']=''
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5}, annot=annot, fmt='')
plt.show()
Output
In a related question, someone asked how to annotate the diagonal elements with strings. Here is an example:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
flights = sns.load_dataset('flights')
flights = flights.pivot('year', 'month', 'passengers')
corr_data = np.corrcoef(flights.to_numpy())
up_triang = np.triu(np.ones_like(corr_data)).astype(bool)
ax = sns.heatmap(corr_data, cmap='flare', xticklabels=False, yticklabels=False, square=True,
linecolor='white', linewidths=0.5,
cbar=True, mask=up_triang, cbar_kws={'shrink': 0.6, 'pad': 0.02, 'label': 'correlation'})
ax.invert_xaxis()
for i, label in enumerate(flights.index):
ax.text(i + 0.2, i + 0.5, label, ha='right', va='center')
plt.show()
Related
I adapted this code (https://stackoverflow.com/a/73099652/2369957) that demonstrates how to share a colorbar and its range for two plots, but it doesn't seem to work when the range of the two plots are different - in the case of the posted code, the plots have the same range (half-open interval [0.0, 1.0)). I generated two plots with different ranges and the colorbar only follows the last plot. Is the code posted wrong for a general case? How do I make the colorbar have the range of the two plots?
Adapted code:
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
fig, ax = plt.subplots(figsize=(12,9))
ax1 = plt.subplot(211)
im = ax1.imshow(np.random.uniform(low=0.00001, high=5, size=(100,100)))
ax2 = plt.subplot(212)
im = ax2.imshow(np.random.uniform(low=0.3, high=0.6, size=(100,100)))
plt.colorbar(im, ax=[ax1, ax2], aspect = 40)
plt.show()
Thank you very much in advance.
I generated two plots with different ranges and the colorbar only
follows the last plot.
This is because im is overwritten when running im = ax2.imshow(np.random.uniform(low=0.3, high=0.6, size=(100,100))).
To have both images share the same colorbar, you need to combine both arrays and use the min and max values of the combined array in imshow as detailed in this SO answer:
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
array_1 = np.random.uniform(low=0.00001, high=5, size=(100,100))
array_2 = np.random.uniform(low=0.3, high=0.6, size=(100,100))
combined_array = np.array([array_1,array_2])
_min, _max = np.amin(combined_array), np.amax(combined_array)
fig, ax = plt.subplots(figsize=(12,9))
ax1 = plt.subplot(211)
im = ax1.imshow(array_1, vmin = _min, vmax = _max)
ax2 = plt.subplot(212)
im = ax2.imshow(array_2, vmin = _min, vmax = _max)
norm = mpl.colors.Normalize(vmin=0.00001, vmax=5)
fig.colorbar(mpl.cm.ScalarMappable(norm=norm, ),
ax=[ax1, ax2], aspect = 40)
plt.show()
This returns the following image:
Below code gives me the heatmap output. but I want to add rectangle patches to highlight values in the range of 0.4 to 0.99 and -0.4 to -0.99
plt.figure(figsize=(15,10))
mask = np.triu(np.ones_like(corr, dtype=np.bool))
sns.heatmap(corr,annot=True,fmt=".2f", mask=mask,cmap="YlGnBu");
The heatmap data for the categorical variables was taken from Kaggle's home price data. To add a rectangle, add a rectangle to add_patch(). The coordinates are based on the lower left corner, so specify the x and y of each in tuples, and specify the width and height. We also specify not to fill it.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
fig, ax = plt.subplots(figsize=(18,18))
df_house = pd.read_csv('./data/house_prices_train.csv', index_col=0)
df_house_corr = df_house.corr()
mask = np.triu(np.ones_like(df_house_corr, dtype=np.bool))
sns.heatmap(df_house_corr, annot=True, fmt=".2f", mask=mask, cmap="YlGnBu")
ax.add_patch(
patches.Rectangle(
(5, 6),
1.0,
35.0,
edgecolor='red',
fill=False,
lw=2
) )
plt.show()
Ok so without the data I made the solution with the values of a uniform distribution. Copy-paste your data in the script and it should work as long as they are of NumPy array-like type.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
import seaborn as sns
fig, ax = plt.subplots(figsize=(15, 10))
data_len = 17
uniform_data = np.random.rand(data_len, data_len)
# np.bool is deprecated in Numpy 1.20
mask = np.triu(np.ones_like(uniform_data, dtype=bool))
heatmap = sns.heatmap(uniform_data, annot=True, fmt='.2f', mask=mask, cmap='YlGnBu', ax=ax)
indices_tuple = np.tril_indices(n=data_len, k=-1)
# first array of indices_tuple: indices on column
# second array of indices_tuple: indices on lines
for col_index, line_index in zip(indices_tuple[0], indices_tuple[1]):
if (np.abs(uniform_data[line_index, col_index]) <= 0.99) and (np.abs(uniform_data[line_index, col_index]) >= 0.4):
rect = patches.Rectangle((line_index, col_index), 1, 1, fill=True, facecolor='red', alpha=0.5)
ax.add_patch(rect)
plt.show()
The idea is to get the indices of all the values of the lower triangle to prevent looping through unnecessary values. The latter values are inspected and if the condition is met, a rectangle is drawn at its position.
You get the following result:
If I correctly understood your problem, this script should do the trick.
Here is the code for generating the histogram. For the full code you can refer to this iPython Notebook
# Splitting the dataset into malignant and benign.
dataMalignant=datas[datas['diagnosis'] ==1]
dataBenign=datas[datas['diagnosis'] ==0]
#Plotting these features as a histogram
fig, axes = plt.subplots(nrows=10, ncols=1, figsize=(15,60))
for idx,ax in enumerate(axes):
ax.figure
binwidth= (max(datas[features_mean[idx]]) - min(datas[features_mean[idx]]))/250
ax.hist([dataMalignant[features_mean[idx]],dataBenign[features_mean[idx]]], bins=np.arange(min(datas[features_mean[idx]]), max(datas[features_mean[idx]]) + binwidth, binwidth) , alpha=0.5,stacked=True, normed = True, label=['M','B'],color=['r','g'])
ax.legend(loc='upper right')
ax.set_title(features_mean[idx])
plt.show()
How do I convert this Histogram into a smooth curve with the area under the curve shaded/highlighted.
here is a simple example that might help you
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(123)
datas = pd.DataFrame(np.random.randint(0, 2, size=(100, 1)), columns=['diagnosis'])
datas['data'] = np.random.randint(0, 100,size=(100, 1))
I used numpy's histogram function,but you could also use ax.hist with same arguments instead.
benign_hist=np.histogram(datas[datas['diagnosis']==0]['data'],bins=np.arange(0, 100, 10))
malignant_hist=np.histogram(datas[datas['diagnosis']==1]['data'],bins=np.arange(0, 100, 10))
fig,ax=plt.subplots(1,1)
ax.fill_between(malignant_hist[1][1:], malignant_hist[0], color='r', alpha=0.5)
ax.fill_between(benign_hist[1][1:], benign_hist[0], color='b', alpha=0.5)
in the above example for plotting convenience instead of bin midpoints I just used 9 bin edges for demonstration.
in OP's code one could assign hist_data = ax.hist(...)
hist_data[0] contains histogram values and hist_data1 contains bins to fill in areas use something like
fig, ax=plt.subplots(1,1)
ax.fill_between(hist_data[1][1:],hist_data[0][0],color='g',alpha=0.5)
ax.fill_between(hist_data[1][1:],hist_data[0][1],color='r',alpha=0.5)
I am trying to get the color codes associated with each cell of a heatmap:
import seaborn as sns
import numpy as np
import matplotlib.cm as cm
hm = sns.heatmap(
np.random.randn(10,10),
cmap = cm.coolwarm)
# hm.<some function>[0][0] would return the color code of the cell indexed (0,0)
Because sns.heatmap returns a matplotlib axis object, we can't really use hm directly. But we can use the cmap object itself to return the rgba values of the data. Edit Code has been updated to include normalization of data.
from matplotlib.colors import Normalize
data = np.random.randn(10, 10)
cmap = cm.get_cmap('Greens')
hm = sns.heatmap(data, cmap=cmap)
# Normalize data
norm = Normalize(vmin=data.min(), vmax=data.max())
rgba_values = cmap(norm(data))
All of the colors are now contained in rgba_values. So to get the color of the upper left square in the heatmap you could simply do
In [13]: rgba_values[0,0]
Out[13]: array([ 0. , 0.26666668, 0.10588235, 1. ])
For more, check out Getting individual colors from a color map in matplotlib
Update
To readjust the colormap from using the center and robust keywords in the call to sns.heatmap, you basically just have to redefine vmin and vmax. Looking at the relevant seaborn source code (http://github.com/mwaskom/seaborn/blob/master/seaborn/matrix.py#L202), the below changes to vmin and vmax should do the trick.
data = np.random.randn(10, 10)
center = 2
robust = False
cmap = cm.coolwarm
hm = sns.heatmap(data, cmap=cmap, center=center, robust=robust)
vmin = np.percentile(data, 2) if robust else data.min()
vmax = np.percentile(data, 98) if robust else data.max()
vmin += center
vmax += center
norm = Normalize(vmin=vmin, vmax=vmax)
rgba_values = cmap(norm(data))
Without any knowledge on the input data and arguments of heatmap you can get the colors from the underlying QuadMesh, knowing that the heatmap should be the first and only collection inside the axes that is returned by heatmap.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
data = np.array([[0,-2],[10,5]])
ax = sns.heatmap(data, center=0, cmap="bwr", robust=False)
im = ax.collections[0]
rgba_values = im.cmap(im.norm(im.get_array()))
Also see this answer. In contrast to AxesImage, QuadMesh returns a list of colors. Hence the above code will give you a 2D array where the columns are the RGBA color channels. If you need a 3D output with the first two dimensions being the same as the input data you would need to reshape
rgba_values = rgba_values.reshape((im._meshHeight, im._meshWidth, 4))
Pandas offers kind='kde' when plotting. In my setting, I would prefer a kde density. The alternative kind='histogram' offers the orientation option: orientation='horizontal', which is strictly necessary for what I am doing. Unfortunately, orientation is not available for kde.
At least this is what I think that happens because I get a
in set_lineprops
raise TypeError('There is no line property "%s"' % key)
TypeError: There is no line property "orientation"
Is there any straight forward alternative for plotting kde horizontally as easily as it can be done for histogram?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.ion()
ser = pd.Series(np.random.random(1000))
ax1 = plt.subplot(2,2,1)
ser.plot(ax = ax1, kind = 'hist')
ax2 = plt.subplot(2,2,2)
ser.plot(ax = ax2, kind = 'kde')
ax3 = plt.subplot(2,2,3)
ser.plot(ax = ax3, kind = 'hist', orientation = 'horizontal')
# not working lines below
ax4 = plt.subplot(2,2,4)
ser.plot(ax = ax4, kind = 'kde', orientation = 'horizontal')
Adding previously deleted answer as a community wiki because it's a helpful answer.
pandas.Series.plot.kde does not have an option to change the orientation of the plot.
Use scipy.stats.gaussian_kde to calculate the values, and plot them on a line with matplotlib.axes.Axes.plot.
Alternatively, seaborn.kdeplot is an option.
gaussian_kde is used under the hood by both .plot.kde and sns.kdeplot
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.stats import gaussian_kde
# crate subplots and don't share x and y axis ranges
fig, axes = plt.subplots(2, 2, figsize=(10, 10), sharex=False, sharey=False)
# flatten the axes for easy selection from a 1d array
axes = axes.flat
# create sample data
np.random.seed(2022)
ser = pd.Series(np.random.random(1000)).sort_values()
# plot example plots
ser.plot(ax=axes[0], kind='hist', ec='k')
ser.plot(ax=axes[1], kind='kde')
ser.plot(ax=axes[2], kind='hist', orientation='horizontal', ec='k')
# 1. create kde model
gkde = gaussian_kde(ser)
# 2. create a linspace to match the range over which the kde model is plotted
xmin, xmax = ax2.get_xlim()
x = np.linspace(xmin, xmax, 1000)
# 3. plot the values
axes[3].plot(gkde(x), x)
# Alternatively, use seaborn.kdeplot and skip 1., 2., and 3.
# sns.kdeplot(y=ser, ax=axes[3])