Adjust size of ConfusionMatrixDisplay (ScikitLearn) - python

How to set the size of the figure ploted by ScikitLearn's Confusion Matrix?
import numpy as np
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
cm = confusion_matrix(np.arange(25), np.arange(25))
cmp = ConfusionMatrixDisplay(cm, display_labels=np.arange(25))
cmp.plot()
The code above shows this figure, which is too tight:

You can send a matplotlib.axes object to the .plot method of sklearn.metrics.ConfusionMatrixDisplay. Set the size of the figure in matplotlib.pyplot.subplots first.
import numpy as np
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
import matplotlib.pyplot as plt
cm = confusion_matrix(np.arange(25), np.arange(25))
cmp = ConfusionMatrixDisplay(cm, display_labels=np.arange(25))
fig, ax = plt.subplots(figsize=(10,10))
cmp.plot(ax=ax)

I was looking for how to adjust the colorbar as someone pointed out in the commentaries in the answer offered by #Raphael and now want to add how to made this.
I used the properties of ConfusionMatrixDisplay and guided by this answer modified the code to:
import numpy as np
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
import matplotlib.pyplot as plt
cm = confusion_matrix(np.arange(25), np.arange(25))
cmp = ConfusionMatrixDisplay(cm, display_labels=np.arange(25))
fig, ax = plt.subplots(figsize=(10,10))
# Deactivate default colorbar
cmp.plot(ax=ax, colorbar=False)
# Adding custom colorbar
cax = fig.add_axes([ax.get_position().x1+0.01,ax.get_position().y0,0.02,ax.get_position().height])
plt.colorbar(cmp.im_, cax=cax)

Related

Plot RidgeCV coefficients as a function of the regularization

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import RidgeCV
tips = sns.load_dataset('tips')
X = tips.drop(columns=['tip','sex', 'smoker', 'day', 'time'])
y = tips['tip']
alphas = 10**np.linspace(10,-2,100)*0.5
ridge_clf = RidgeCV(alphas=alphas,scoring='r2').fit(X, y)
ridge_clf.score(X, y)
I wanted to plot the following graph for RidgeCV. I don't see any option to do that like GridSearhCV. I appreciate your suggestions!
There is no indication what the colors stand for. I assume they stand for features and we investigate the size of each feature weight as function of alpha. Here is my solution:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
tips = sns.load_dataset('tips')
X = tips.drop(columns=['tip','sex', 'smoker', 'day', 'time'])
y = tips['tip']
alphas = 10**np.linspace(10,-2,100)*0.5
w = list()
for a in alphas:
ridge_clf = RidgeCV(alphas=[a],cv=10).fit(X, y)
w.append(ridge_clf.coef_)
w = np.array(w)
plt.semilogx(alphas,w)
plt.title('Ridge coefficients as function of the regularization')
plt.xlabel('alpha')
plt.ylabel('weights')
plt.legend(X.keys())
Output:
Since you only have two features in X there are only two lines.
Here is the code for generating the plot that you had posted.
Firstly, we need to understand that RidgeCV would not return the coef for each alpha value that we had fed in the alphas param.
The motivation behind having the RidgeCV is that it will try for different alpha values mentioned in alphas param, then based on cross validation scoring, it will return the best alpha along with the fitted model.
Hence, the only way to get the coef for each alpha value using cv is iterate through RidgeCV using each alpha value.
Example:
# Author: Fabian Pedregosa -- <fabian.pedregosa#inria.fr>
# License: BSD 3 clause
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is the 10x10 Hilbert matrix
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
# #############################################################################
# Compute paths
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.RidgeCV(alphas=[a], fit_intercept=False, cv=3)
ridge.fit(X, y)
coefs.append(ridge.coef_)
# #############################################################################
# Display results
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('RidgeCV coefficients as a function of the regularization')
plt.axis('tight')
plt.show()

Confusion matrix in Google colab is cutted off

I tried to visualize my Confusion matrix by the following code:
from mlxtend.plotting import plot_confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import sklearn as skplt
import scikitplot as skplt
skplt.metrics.plot_confusion_matrix(y_val, autokeras_predictions, figsize = (5, 5), title= 'My confusionmatrix' )
plt.figure(figsize = (10,7))
But it cuts off my confusion matrix above and below. (See picture)
Can anyone help me? Thanks!
I had the same problem.
I'm using Anaconda on Windows.
For me, the following resolved the problem:
from mlxtend.plotting import plot_confusion_matrix
...
multiclass = confusion_matrix(test_y, predictions)
class_names = ['16QAM', '32QAM', '64QAM', 'BPSK', 'QPSK', '8PSK']
fig, ax = plot_confusion_matrix(conf_mat=multiclass, colorbar=True,
show_absolute=False, show_normed=True, class_names=class_names)
-->ax.margins(2,2) #just change the values til adjust to your screen.
plt.show()

add legend to numpy array in matplot lib

I am plotting 2D numpy arrays using
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3])
y = np.array([[2,2.2,3],[1,5,1]])
plt.plot(x,y.T[:,:])
plt.legend()
plt.show()
I want a legend that tells which line belongs to which row. Of course, I realize I can't give it meaningful names, but I need some sort of unique label for the line without running through loop.
import numpy as np
import matplotlib.pyplot as plt
import uuid
x = np.array([1,2,3])
y = np.array([[2,2.2,3],[1,5,1]])
fig, ax = plt.subplots()
lines = ax.plot(x,y.T[:,:])
ax.legend(lines, [str(uuid.uuid4())[:6] for j in range(len(lines))])
plt.show()
(This is off of the current mpl master branch with a preview of the 2.0 default styles)

PyPlot does not plot image

I created the following test code, and the code runs fine. But the plot does not appear when executed. Did I miss something? I use pyplot to create the plots. When I use plt.savefig("test.png") the chart is created and saved.
import numpy
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
from studentRegression import studentReg
from class_vis import prettyPicture, output_image
from ages_net_worth import ageNetWorthData
ages_train, ages_test, net_worths_train, net_worth_test = ageNetWorthData()
plt.clf()
plt.scatter(ages_train, net_worths_train, color="b", label="train data")
plt.legend(loc=2)
plt.xlabel("ages")
plt.ylabel("net worths")
plt.show()
def ageNetWorthData():
random.seed(42)
numpy.random.seed(42)
ages = []
for ii in range(100):
ages.append( random.randint(20,65) )
net_worths = [ii * 6.25 + numpy.random.normal(scale=40.) for ii in ages]
### need massage list into a 2d numpy array to get it to work in LinearRegression
ages = numpy.reshape( numpy.array(ages), (len(ages), 1))
net_worths = numpy.reshape( numpy.array(net_worths), (len(net_worths), 1))
from sklearn.cross_validation import train_test_split
ages_train, ages_test, net_worths_train, net_worths_test = train_test_split(ages, net_worths)
return ages_train, ages_test, net_worths_train, net_worths_test
You are using a "non-interactive" backend (agg). Just remove the line:
matplotlib.use('agg')
You can check the docs here.

Changing the size of the heatmap specifically in a seaborn clustermap?

I'm making a clustered heatmap in seaborn as follows
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data)
but the rows are squished:
but if I pass a size to the clustermap function then it looks terrible
is there a way to only increase the size of the heatmap part? So that the row names can be read, but not stretch out the cluster portions.
As #mwaskom commented, I was able to use ax_heatmap.set_position along with the get_position function to achieve the correct result.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(2)
data = np.random.randn(100, 10)
cm = sns.clustermap(data)
hm = cm.ax_heatmap.get_position()
plt.setp(cm.ax_heatmap.yaxis.get_majorticklabels(), fontsize=6)
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width*0.25, hm.height])
col = cm.ax_col_dendrogram.get_position()
cm.ax_col_dendrogram.set_position([col.x0, col.y0, col.width*0.25, col.height*0.5])
This can be done by passing the value of the dendrogram ratio in the kw arguments
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data,figsize=(12,30),dendrogram_ratio=0.02,cmap='RdBu')

Categories

Resources