Python plotting simple confusion matrix with minimal code [duplicate] - python

This question already has answers here:
How to plot confusion matrix with string axis rather than integer in python
(8 answers)
How can I plot a confusion matrix? [duplicate]
(3 answers)
Closed 9 months ago.
I have an array with confusion matrix values, let's say [[25, 4], [5, 17]], following an obvious [[tp, fp], [fn, tn]] order. Is there a way to plot it with matplotlib or something similar, with nice output yet minimal code? I would like to label the results as well.

You could draw a quick heatmap as follows using seaborn.heatmap():
import seaborn
import numpy as np
import matplotlib.pyplot as plt
data = [[25, 4], [5, 17]]
ax = seaborn.heatmap(data, xticklabels='PN', yticklabels='PN', annot=True, square=True, cmap='Blues')
ax.set_xlabel('Actual')
ax.set_ylabel('Predicted')
plt.show()
Result:
You can then tweak some settings to make it look prettier:
import seaborn
import numpy as np
import matplotlib.pyplot as plt
data = [[25, 4], [5, 17]]
ax = seaborn.heatmap(
data,
xticklabels='PN', yticklabels='PN',
annot=True, square=True,
cmap='Blues', cbar_kws={'format': '%.0f'}
)
ax.set_xlabel('Actual')
ax.set_ylabel('Predicted')
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
plt.tick_params(top=False, bottom=False, left=False, right=False)
plt.yticks(rotation=0)
plt.show()
Result:
You could also adjust vmin= and vmax= so that the color changes accordingly.
Normalizing the data and using vmin=0, vmax=1 can also be an idea if you want the color to reflect percentages of total tests:
import seaborn
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
data = np.array([[25, 4], [5, 17]], dtype='float')
normalized = data / data.sum()
ax = seaborn.heatmap(
normalized, vmin=0, vmax=1,
xticklabels='PN', yticklabels='PN',
annot=data, square=True, cmap='Blues',
cbar_kws={'format': FuncFormatter(lambda x, _: "%.0f%%" % (x * 100))}
)
ax.set_xlabel('Actual')
ax.set_ylabel('Predicted')
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
plt.tick_params(top=False, bottom=False, left=False, right=False)
plt.yticks(rotation=0)
plt.show()
Result:

Related

Add row-wise accuracy to a seaborn heatmap

import seaborn as sb
import numpy as np
from matplotlib import pyplot as plt
A = np.array([[10, 5], [3, 10]], dtype=np.int32)
plt.figure()
sb.heatmap(
A,
square=True,
annot=True,
xticklabels=["False", "Positive"],
yticklabels=["False", "Positive"],
cbar=False,
fmt="2d",
)
plt.title("Example plot")
plt.show()
Shows example of an heatmap. I wish to add accuracy of each row to left side of the image.
The plot should be similar to
Can this be achived?
You can add the following lines to you code between the heatmap call and plt.title(...:
# Compute the values to added to the plot
row_accuracies = [A[i][i] * 100 / A[i].sum() for i in range(A.shape[0])]
# Get axes
ax = fig.axes
# [OPTIONAL] Add ticks on the right side
ax.tick_params(axis='y', which='major', left=True, right=True, labelleft=True, labelright=False)
# Add text where the ticks are (roughly)
for i, acc in enumerate(row_accuracies):
ax.text(ax.get_xlim()[1] * 1.05, ax.get_yticks()[i] * 1.01, f'{acc:.2f}%')
This is the result:

Seaborn Confusion Matrix (heatmap) 2 color schemes (correct diagonal vs wrong rest)

Background
In a confusion matrix, the diagonal represents the cases that the predicted label matches the correct label. So the diagonal is good, while all other cells are bad. To clarify what is good and what is bad in a CM for non-experts, I want to give the diagonal a different color than the rest. I want to achieve this with Python & Seaborn.
Basically I'm trying to achieve what this question does in R (ggplot2 Heatmap 2 Different Color Schemes - Confusion Matrix: Matches in Different Color Scheme than Missclassifications)
Normal Seaborn Confusion Matrix with heatmap
import numpy as np
import seaborn as sns
cf_matrix = np.array([[50, 2, 38],
[7, 43, 32],
[9, 4, 76]])
sns.heatmap(cf_matrix, annot=True, cmap='Blues') # cmap='OrRd'
Which results in this image:
Goal
I would like to color the non-diagonal cells with e.g. cmap='OrRd'. So I imagine there would be 2 colorbars, 1 blue for the diagonal and 1 for the other cells. Preferably the values of both colorbars match (so both e.g. 0-70 and not 0-70 and 0-40).
How would I approach this?
The following is not made with code, but with photo editing software:
You can use mask= in the call to heatmap() to choose which cells to show. Using two different masks for the diagonal and the off_diagonal cells, you can get the desired output:
import numpy as np
import seaborn as sns
cf_matrix = np.array([[50, 2, 38],
[7, 43, 32],
[9, 4, 76]])
vmin = np.min(cf_matrix)
vmax = np.max(cf_matrix)
off_diag_mask = np.eye(*cf_matrix.shape, dtype=bool)
fig = plt.figure()
sns.heatmap(cf_matrix, annot=True, mask=~off_diag_mask, cmap='Blues', vmin=vmin, vmax=vmax)
sns.heatmap(cf_matrix, annot=True, mask=off_diag_mask, cmap='OrRd', vmin=vmin, vmax=vmax, cbar_kws=dict(ticks=[]))
If you want to get fancy, you can create the axes using GridSpec to have a better layout:
import numpy as np
import seaborn as sns
fig = plt.figure()
gs0 = matplotlib.gridspec.GridSpec(1,2, width_ratios=[20,2], hspace=0.05)
gs00 = matplotlib.gridspec.GridSpecFromSubplotSpec(1,2, subplot_spec=gs0[1], hspace=0)
ax = fig.add_subplot(gs0[0])
cax1 = fig.add_subplot(gs00[0])
cax2 = fig.add_subplot(gs00[1])
sns.heatmap(cf_matrix, annot=True, mask=~off_diag_mask, cmap='Blues', vmin=vmin, vmax=vmax, ax=ax, cbar_ax=cax2)
sns.heatmap(cf_matrix, annot=True, mask=off_diag_mask, cmap='OrRd', vmin=vmin, vmax=vmax, ax=ax, cbar_ax=cax1, cbar_kws=dict(ticks=[]))
You could first plot the heatmap with colormap 'OrRd' and then overlay it with a heatmap with colormap 'Blues', with the upper and lower triangle values replaced with NaN's, see the following example:
def diagonal_heatmap(m):
vmin = np.min(m)
vmax = np.max(m)
sns.heatmap(cf_matrix, annot=True, cmap='OrRd', vmin=vmin, vmax=vmax)
diag_nan = np.full_like(m, np.nan, dtype=float)
np.fill_diagonal(diag_nan, np.diag(m))
sns.heatmap(diag_nan, annot=True, cmap='Blues', vmin=vmin, vmax=vmax, cbar_kws={'ticks':[]})
cf_matrix = np.array([[50, 2, 38],
[7, 43, 32],
[9, 4, 76]])
diagonal_heatmap(cf_matrix)
In a related question, somebody asked how to show a correlation dataframe where the colorbar range doesn't include the diagonal (which is all 1's).
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
iris = sns.load_dataset('iris').drop(columns=['species'])
corr_df = iris.corr()
plt.figure(figsize=(7,5))
ax = sns.heatmap(corr_df, annot=True, cmap='OrRd', mask=np.eye(len(corr_df)))
ax.set_yticklabels(ax.get_yticklabels(), va='center')
ax.patch.set_facecolor('skyblue')
ax.patch.set_edgecolor('white')
ax.patch.set_hatch('xx')
plt.tight_layout()
plt.show()

Use Seaborn to plot 1D time series as a line with marginal histogram along y-axis

I'm trying to recreate the broad features of the following figure:
(from E.M. Ozbudak, M. Thattai, I. Kurtser, A.D. Grossman, and A. van Oudenaarden, Nat Genet 31, 69 (2002))
seaborn.jointplot does most of what I need, but it seemingly can't use a line plot, and there's no obvious way to hide the histogram along the x-axis. Is there a way to get jointplot to do what I need? Barring that, is there some other reasonably simple way to create this kind of plot using Seaborn?
Here is a way to create roughly the same plot as shown in the question. You can share the axes between the two subplots and make the width-ratio asymmetric.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
x = np.linspace(0,8, 300)
y = np.tanh(x)+np.random.randn(len(x))*0.08
fig, (ax, axhist) = plt.subplots(ncols=2, sharey=True,
gridspec_kw={"width_ratios" : [3,1], "wspace" : 0})
ax.plot(x,y, color="k")
ax.plot(x,np.tanh(x), color="k")
axhist.hist(y, bins=32, ec="k", fc="none", orientation="horizontal")
axhist.tick_params(axis="y", left=False)
plt.show()
It turns out that you can produce a modified jointplot with the needed characteristics by working directly with the underlying JointGrid object:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x = np.linspace(0,8, 300)
y = (1 - np.exp(-x*5))*.5
ynoise= y + np.random.randn(len(x))*0.08
grid = sns.JointGrid(x, ynoise, ratio=3)
grid.plot_joint(plt.plot)
grid.ax_joint.plot(x, y, c='C0')
plt.sca(grid.ax_marg_y)
sns.distplot(grid.y, kde=False, vertical=True)
# override a bunch of the default JointGrid style options
grid.fig.set_size_inches(10,6)
grid.ax_marg_x.remove()
grid.ax_joint.spines['top'].set_visible(True)
Output:
You can use ax_marg_x.patches to affect the outcome.
Here, I use it to turn the x-axis plot white so that it cannot be seen (although the margin for it remains):
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="white", color_codes=True)
x, y = np.random.multivariate_normal([2, 3], [[0.3, 0], [0, 0.5]], 1000).T
g = sns.jointplot(x=x, y=y, kind="hex", stat_func=None, marginal_kws={'color': 'green'})
plt.setp(g.ax_marg_x.patches, color="w", )
plt.show()
Output:

How can I use the matplotlib to draw this picture?

I want to use grid to achieve this. However, I have encountered many problems with color fills and axes。I refer to an example in the official matplotlib documentation that is very close to this image
(the link), but it's still a little different.
here is the picture
It's my fault, the picture is too large,it's difficult to implement quickly. So i choose a part of the original image, here:
Try with plotting custom heatmaps. For example
from matplotlib import colors
import matplotlib.pyplot as plt
import numpy as np
cmap = colors.ListedColormap(['cyan','gray','white','yellow'])
bounds=[0, 10, 20, 30, 40]
norm = colors.BoundaryNorm(bounds, cmap.N)
data=np.array([[5,15,5],[25,32,6],[15,31,25]])
heatmap = plt.pcolor(data, cmap=cmap, norm=norm)
plt.show()
a value between 0-10 will give cyan colour, 10-20 will give gray as so on. So make your data array accodingly.
Result :
I've solved this qusetion by using seaborn.
from matplotlib import colors
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
cmap = colors.ListedColormap(['white','gray','blue','yellow'])
bounds=[0, 2, 4, 6, 8]
norm = colors.BoundaryNorm(bounds, cmap.N)
data = np.array([[1,1,1,1,7,7,7,7], [1,1,1,1,1,1,1,5], [1,1,1,1,1,1,1,5], [1,1,1,3,1,1,1,5], [1,1,1,1,1,1,3,5]])
ax = sns.heatmap(data, cmap=cmap, norm=norm, linewidths=.5,
linecolor='black', square=True, cbar=False)
sns.plt.annotate('S', (1.4, 3.4))
sns.plt.show()
result

Statsmodel Probplot Tick customization

I've created a cumulative probability plot with StatsModels in Python, but there are way too many ticks on the axis.
I want there to be only be tick marks at 0.1, 10, 50, 90, 99, and 99.9. Anyone know how to make this work? I tried using the code below but it only gives me the first n number of ticks, making it pretty useless (See figure below.)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
import statsmodels.api as sm
csv = pd.read_csv('cumProbMaxData.csv')
data = csv.values.tolist()
flat_list = [item for sublist in data for item in sublist]
fig,ax = plt.subplots()
x = np.array(flat_list)
pp_x = sm.ProbPlot(x, fit=True)
figure = pp_x.probplot(exceed=False, ax=ax, marker='.', color='k', markersize=12)
plt.xlabel('Cumulative Probability (%)')
plt.ylabel('Maximum CO$_2$ Flux (g m$^-$$^2$ d$^-$$^1$)')
tick_spacing=5
ax.xaxis.set_major_locator(ticker.MaxNLocator(tick_spacing))
plt.tight_layout()
plt.show()
Statsmodels ProbPlot plots the data in their real units. It is only the axes ticks which are then changed as to show some percentage value. This is in general bad style but of course you have to live with it if you want to use ProbPlot.
A solution for the problem of showing less ticks on such a plot which uses a FixedLocator and FixedFormatter would be to subsample the shown ticks. The ticklabels you want to show are at indices locs = [0,3,6,10,14,17,20] (you want to show the ticklabel 0, 3, 6, etc.).
You can use this list to select from the shown ticks only those in the list as shown below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm
x = np.random.randn(200)
fig,ax = plt.subplots()
pp_x = sm.ProbPlot(x, fit=True)
pp_x.probplot(exceed=False, ax=ax, marker='.', color='k', markersize=12)
locs = [0,3,6,10,14,17,20]
ax.set_xticklabels(np.array(ax.xaxis.get_major_formatter().seq)[locs])
ax.set_xticks(ax.get_xticks()[locs])
plt.tight_layout()
plt.show()

Categories

Resources