Confusion Matrix with number of classified/misclassified instances on it (Python/Matplotlib) - python
I am plotting a confusion matrix with matplotlib with the following code:
from numpy import *
import matplotlib.pyplot as plt
from pylab import *
conf_arr = [[33,2,0,0,0,0,0,0,0,1,3], [3,31,0,0,0,0,0,0,0,0,0], [0,4,41,0,0,0,0,0,0,0,1], [0,1,0,30,0,6,0,0,0,0,1], [0,0,0,0,38,10,0,0,0,0,0], [0,0,0,3,1,39,0,0,0,0,4], [0,2,2,0,4,1,31,0,0,0,2], [0,1,0,0,0,0,0,36,0,2,0], [0,0,0,0,0,0,1,5,37,5,1], [3,0,0,0,0,0,0,0,0,39,0], [0,0,0,0,0,0,0,0,0,0,38] ]
norm_conf = []
for i in conf_arr:
a = 0
tmp_arr = []
a = sum(i,0)
for j in i:
tmp_arr.append(float(j)/float(a))
norm_conf.append(tmp_arr)
plt.clf()
fig = plt.figure()
ax = fig.add_subplot(111)
res = ax.imshow(array(norm_conf), cmap=cm.jet, interpolation='nearest')
cb = fig.colorbar(res)
savefig("confmat.png", format="png")
But I want to the confusion matrix to show the numbers on it like this graphic (the right one). How can I plot the conf_arr on the graphic?
You can use text to put arbitrary text in your plot. For example, inserting the following lines into your code will write the numbers (note the first and last lines are from your code to show you where to insert my lines):
res = ax.imshow(array(norm_conf), cmap=cm.jet, interpolation='nearest')
for i, cas in enumerate(conf_arr):
for j, c in enumerate(cas):
if c>0:
plt.text(j-.2, i+.2, c, fontsize=14)
cb = fig.colorbar(res)
The only way I could really see of doing it was to use annotations. Try these lines:
for i,j in ((x,y) for x in xrange(len(conf_arr))
for y in xrange(len(conf_arr[0]))):
ax.annotate(str(conf_arr[i][j]),xy=(i,j))
before saving the figure. It adds the numbers, but I'll let you figure out how to get the sizes of the numbers how you want them.
Related
How to give the markers in the pyplot scatter plot a custom RGB color?
I'm using the scatter function from the matplotlib.pyplot library to visualize certain data in a 3D scatter plot. This function, Link to documentation , has an argument called 'c' which can be used to give the marker a particular color. They note that the argument given for 'c' can be a RGB code stored in a 2D array. However when i try the code below it gives an error: "AttributeError: 'list' object has no attribute 'shape'". My Question: Is it possible to give the markers in the scatter plot a custom color from the RGB format? And if it is, how can i achieve this? Any help would be greatly appreciated. Edit: Solution: the values in the list 'RGB' should range from 0 to 1 instead of 0 to 255. from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import csv RGB = [[255,255,255], [127,127,127], [10,10,10]] r = [] g = [] b = [] a = 0 i = 0 j = 0 k = 0 fig = plt.figure() ax = fig.add_subplot(111, projection='3d') mark = (".",",","o","v","^","<",">","1","2","3","4","8","s","p","P","*","h","H","+","x","X" ) # tuple for available markers from the matplotlib library with open ('Color_Measurement_POCNR1_wolkjes.csv', 'r') as csvfile: plots = csv.reader(csvfile, delimiter = ';') for row in plots: if a == 1: r.append(float(row[6])) g.append(float(row[7])) b.append(float(row[8])) print("j = ", j, "r = ", r[j]," g = ", g[j], " b = ", b[j]) ax.scatter(r[j], g[j], b[j], c= RGB[0], marker = mark[k] , s = 50) a = 0 j += 1 k += 1 if row[0] == "Mean values": a += 1 ax.set_xlabel('R Label') ax.set_ylabel('G Label') ax.set_zlabel('B Label') plt.show()
They note that the argument given for 'c' can be a RGB code stored in a 2D array. Exactly. However, in your code you use a 1D list. You may encapsulate this list in another list to make it 2D, ax.scatter( ..., c= [RGB[0]])
Matplotlib: Automatic coloured legend for all subplots using subplot line labels
The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better). My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot. Do you know of a better way of achieving the plot below? import numpy as np import matplotlib import matplotlib.pyplot as plt nr_lines = 9 nr_cats = 3 np.random.seed(1337) # Data X = np.random.randn(nr_lines, 100) labels = ['Category {}'.format(ii) for ii in range(nr_cats)] y = np.random.choice(labels, nr_lines) # Ideally wouldn't have to manually pick colours clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color'] clrs = [clrs[ii] for ii in range(nr_cats)] lab_clr = {k: v for k, v in zip(labels, clrs)} fig, ax = plt.subplots(3, 3) ax = ax.flatten() for ii in range(nr_lines): ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]]) lines = [a.lines[0] for a in ax] l_labels = [l.get_label() for l in lines] # the hack - get a single occurance of each label idx_list = [l_labels.index(lab) for lab in labels] lines_ = [lines[idx] for idx in idx_list] #l_labels_ = [l_labels[idx] for idx in idx_list] plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5]) plt.tight_layout() plt.savefig('/home/james/Downloads/stack_figlegend_example.png', bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example: handles = {} for ii in range(nr_lines): l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]]) if y[ii] not in handles: handles[y[ii]] = l1 plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5]) You only add a handle to the dictionary if the category isn't already present.
Matplotlib scatterplot error bars two data sets
I have two data sets, which I'd like to scatter plot next to each other with error bars. Below is my code to plot one data set with error bars. And also the code to generate the second data set. I'd like the points and errors for each data for each value to be adjacent. I'd also like to remove the line connecting the dots. import random import matplotlib.pyplot as plt import numpy as np import scipy.stats as ss data = [] n = 100 m = 10 for i in xrange(m): d = [] for j in xrange(n): d.append(random.random()) data.append(d) mean_data = [] std_data = [] for i in xrange(m): mean = np.mean(data[i]) mean_data.append(mean) std = np.std(data[i]) std_data.append(std) df_data = [n] * m plt.errorbar(range(m), mean_data, yerr=ss.t.ppf(0.95, df_data)*std_data) plt.scatter(range(m), mean_data) plt.show() new_data = [] for i in xrange(m): d = [] for j in xrange(n): d.append(random.random()) new_data.append(d) mean_new_data = [] std_new_data = [] for i in xrange(m): mean = np.mean(new_data[i]) mean_new_data.append(mean) std = np.std(new_data[i]) std_new_data.append(std) df_new_data = [n] * m
To remove the line in the scatter plot use the fmt argument in plt.errorbar(). The plt.scatter() call is then no longer needed. To plot a second set of data, simply call plt.errorbar() a second time, with the new data. If you don't want the datasets to overlap, you can add some small random scatter in x to the new dataset. You can do this in two ways, add a single scatter float with random.uniform(-x_scatter, x_scatter) which will move all the points as one: or generate a random scatter float for each point with x_scatter = np.random.uniform(-.5, .5, m) which generates something like To plot both datasets (using the second method), you can use: plt.errorbar( range(m), mean_data, yerr=ss.t.ppf(0.95, df_data)*std_data, fmt='o', label="Data") # Add some some random scatter in x x_scatter = np.random.uniform(-.5, .5, m) plt.errorbar( np.arange(m) + x_scatter, mean_new_data, yerr=ss.t.ppf(0.95, df_new_data)*std_new_data, fmt='o', label="New data") plt.legend() plt.show()
Optimize file reader and plotting script using Axes3D
I've written the code below to import a number of files from a folder, read and convert them before plotting them in a 3D plot. The number of files is usually larger than 30 and lower than 200 but exceptions might occur. Each file has arround 5000 lines with 3 plottable values. It works and produces a nice 3D plot, but it is very slow. I suspect I have made an array or list grow inside itself. I am particularly suspecting the third for loop. I've tried to run it using 121 files and takes about half an hour to plot. Each data file is a diffractogram and what I want to do is essentially something like this: http://www.carbonhagen.com/_/rsrc/1404718703333/abstracts/insitux-raydiffractionsynthesisofgrapheneoxideandreducedgrapheneoxide/M%C3%B8ller%20Storm%20res%20pic.png?height=371&width=522 import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from matplotlib.collections import PolyCollection import glob import os file_list = glob.glob(os.path.join(os.getcwd(),'C:\Users\mkch\Python_Scripts\znox_1','*.ras')) sourcefiles = [] for file_path in file_list: #this forloops reads all files with open(file_path) as f_input: sourcefiles.append(f_input.readlines()) Now all the files have been imported into the list sourcefiles. data = [] alldata = [] cutdata = [] length = 118#len(sourcefiles) for i in range(0,length): l = len(sourcefiles[i]) cdata = sourcefiles[i][320:l-2] cutdata.append(cdata) This for loop removes headliners and the last two lines in each file. fig = plt.figure() ax = fig.gca(projection='3d') verts = [] zs = list(range(length)) print zs for j in range(length): lines = cutdata[j][:] x = [] y = [] z = [] for line in lines: a, b, c = line.split()[0:3] x.append(a) y.append(b) y[0], y[-1] = 0, 0 verts.append(list(zip(x, y))) poly = PolyCollection(verts, facecolors=['r', 'g', 'b','y']) ax.add_collection3d(poly, zs=zs, zdir='y') This bit of code splits each line into the three values that needs plotting. Then it adds the the data to a plot. I suspect the above code is taking quite long. poly.set_alpha(0.7) ax.set_xlim3d(0, 100) ax.set_ylabel('Y') ax.set_ylim3d(-1, 120) ax.set_zlabel('Z') ax.set_zlim3d(0, 120000) plt.xlabel('2$ \theta$') plt.show() Standard plotting things.
How to plot the lines first and points last in matplotlib
I have a simple plot with several sets of points and lines connecting each set. I want the points to be plotted on top of the lines (so that the line doesn't show inside the point). Regardless of order of the plot and scatter calls, this plot comes out the same, and not as I'd like. Is there a simple way to do it? import math import matplotlib.pyplot as plt def poisson(m): def f(k): e = math.e**(-m) f = math.factorial(k) g = m**k return g*e/f return f R = range(20) L = list() means = (1,4,10) for m in means: f = poisson(m) L.append([f(k) for k in R]) colors = ['r','b','purple'] for c,P in zip(colors,L): plt.plot(R,P,color='0.2',lw=1.5) plt.scatter(R,P,s=150,color=c) ax = plt.axes() ax.set_xlim(-0.5,20) ax.set_ylim(-0.01,0.4) plt.savefig('example.png')
You need to set the Z-order. plt.plot(R,P,color='0.2',lw=1.5, zorder=1) plt.scatter(R,P,s=150,color=c, zorder=2) Check out this example. http://matplotlib.sourceforge.net/examples/pylab_examples/zorder_demo.html