Save output of python program in text file [duplicate] - python

This question already has answers here:
Dump a NumPy array into a csv file
(12 answers)
Closed 4 years ago.
I want to save the output of my Python program into a text file to use it in C for some reasons, I don't know how should I do it. The code is:
import networkx as nx
import numpy as np
t_start=0;t_end=1;dt=0.1
tpoints=np.arange(t_start,t_end,dt)
G = nx.grid_2d_graph(20,20, periodic=False, create_using=None)
adj_matrix=nx.adjacency_matrix(G)
print(adj_matrix.todense())
If the number of nodes were less than 20 (like 10 or lower) the output would be:
[[0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
[1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0]
[0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0]
[0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0]
[0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0]
[0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0]
[0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0]
[0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0]
[0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0]
[0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1]
[0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0]
[0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0]]
But when the number of nodes increase, the output is like this:
[[0 1 0 ... 0 0 0]
[1 0 1 ... 0 0 0]
[0 1 0 ... 0 0 0]
...
[0 0 0 ... 0 1 0]
[0 0 0 ... 1 0 1]
[0 0 0 ... 0 1 0]]
And so I can't copy it manually to a text file. So I need a command to write this matrix completely to a text file. Thanks for your answers.

The displayed output of a numpy array is not meant to display everything.
If you want to save the array to a file, you can use tofile. There you can define if you want a binary or a text file.

Related

How to create a graph with an image's pixel?

Now, I have an image.I want to generate a weighted graph G=(V,E) which V is the vertex set and E is the edge set (each pixel in the image as a node in the graph).
But I don't know how to do it.
Is there anyone who can help me? It is better to python.
Thank you very much.
Problem supplement
I'm sorry that my description of the problem is not clear enough.
My goal is to use the pixels of the image as a network of nodes to establish the network, and then analyze the nature of the network to detect the target(maybe).
But in the first step, I need to establish this network. My question is how to use the pixel of the image(RGB) as the node of the network to establish this network for analyzing the image.
The edges of these nodes may be based on some of their characteristics (location, appearance, etc.)
So, I just want to know how to build this network?
Just some simple examples.Thank you
I was looking for nice vectorised answers too and didn't find any. Finally, I have done this myself. My intention is also to speed up these calculations as fast as possible.
Let's start with this nice 28 x 27 image:
import numpy as np
x, y = np.meshgrid(np.linspace(-np.pi/2, np.pi/2, 30), np.linspace(-np.pi/2, np.pi/2, 30))
image = (np.sin(x**2+y**2)[1:-1,1:-2] > 0.9).astype(int) #boolean image
image
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
[0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0]
[0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0]
[0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0]
[0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0]
[0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0]
[0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
[0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
[0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]
[0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]
[0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]
[0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]
[0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
[0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
[0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0]
[0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0]
[0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0]
[0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0]
[0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0]
[0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0]
[0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
Networkx
A rationale of algorithm is to identify coordinates of unit pixels that has companions on the right and below. Nodes of network graph should be any hashable objects so we can use tuples to label them. This is quite easy to implement, although, not efficient because it requires to convert items of np.array into tuples:
#CONSTRUCTION OF HORIZONTAL EDGES
hx, hy = np.where(image[1:] & image[:-1]) #horizontal edge start positions
h_units = np.array([hx, hy]).T
h_starts = [tuple(n) for n in h_units]
h_ends = [tuple(n) for n in h_units + (1, 0)] #end positions = start positions shifted by vector (1,0)
horizontal_edges = zip(h_starts, h_ends)
#CONSTRUCTION OF VERTICAL EDGES
vx, vy = np.where(image[:,1:] & image[:,:-1]) #vertical edge start positions
v_units = np.array([vx, vy]).T
v_starts = [tuple(n) for n in v_units]
v_ends = [tuple(n) for n in v_units + (0, 1)] #end positions = start positions shifted by vector (0,1)
vertical_edges = zip(v_starts, v_ends)
And let's see how it looks:
G = nx.Graph()
G.add_edges_from(horizontal_edges)
G.add_edges_from(vertical_edges)
pos = dict(zip(G.nodes(), G.nodes())) # map node names to coordinates
nx.draw_networkx(G, pos, with_labels=False, node_size=0)
labels={node: f'({node[0]},{node[1]})' for node in G.nodes()}
nx.draw_networkx_labels(G, pos, labels, font_size=6, font_family='serif', font_weight='bold', bbox = dict(fc='lightblue', ec="black", boxstyle="round", lw=1))
plt.show()
igraph
Networkx is built purely in Python and performs slowly with big data (like images with millions of pixels). Igraph, on the other hand is built in C but it's supported less. Documentation is not so detailed and internal visualisation tools are used instead of matplotlib. So basically igraph might be a complicated option but if you do it, that's a gigantic win in performance. There are some must-known facts important before implementation of algorithm:
Indices of nodes should be integers starting from 0. This means that if you handle something else in igraph.add_vertices(), it will be reindexed as 0, 1, 2, ... and all the old names of indices kept in igraph.vs['name']
No edges that contains nonexistent indexes (different than 0,1,2,...) of vertices are allowed in use of igraph.add_edges()
Taking these requiremend into consideration, it's a good option to reduce dimension of image, i.e. rename pixels to integers 0,1,2, ... Now here we go:
def create_from_edges(edgearray):
#This function immitates behaviour nx.add_edges_from for empty graph
g = ig.Graph()
u, inv = np.unique(edgearray, return_inverse=True)
e = inv.reshape(edgearray.shape)
g.add_vertices(u) #add vertices, in any order
g.add_edges(e) #add edges, in reindexed order
return g #old indices are kept in g.vs['name']
#Create array of edges with image pixels enumerated from 1 to N
image_idx = np.arange(image.size).reshape(*image.shape) #pixels of image indexed with numbers 1 to N
X, Y = (units.reshape(image.size) for units in np.indices(image.shape)) #X and Y coordinates of image_idx
idx = np.array([X, Y]).T #layout of nodes
hx, hy = np.where(image[1:] & image[:-1]) #horizontal edges as 2D indices
h_starts_idx = image_idx[hx, hy] #image_idx where horizontal edge starts
h_ends_idx = image_idx[hx+1, hy] #image_idx where horizontal edge ends
vx, vy = np.where(image[:, 1:] & image[:, :-1]) #vertical edges as 2D indices
v_starts_idx = image_idx[vx, vy] #image_idx where verical edge starts
v_ends_idx = image_idx[vx, vy+1] #image_idx where vertical edge ends
edgearray = np.vstack([np.array([h_starts_idx, h_ends_idx]).T,
np.array([v_starts_idx, v_ends_idx]).T])
g = create_from_edges(edgearray)
And there's my sketch that illustrates new order of vertex names:
ig.plot(g, bbox=(450, 450),
layout = ig.Layout(idx[g.vs['name']].tolist()), #only lists can be passed in to layout
vertex_color = 'lightblue', vertex_label = g.vs['name'], vertex_size=14, vertex_label_size=8)
requirements: python-igraph, pycairo (for plotting).

How to add Numpy ndarrays together? [duplicate]

This question already has answers here:
Simple adding two arrays using numpy in python?
(2 answers)
Closed 2 years ago.
I'm trying to add multiple arrays together and I'm stuck.
For example, I have those two arrays:
[[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 1]
[0 0 1 0 0 0 0 1 0 0]
[1 0 0 0 1 0 0 0 0 0]
[1 1 0 0 0 0 0 0 2 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 2]]
and
[[1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0 0 1]
[0 0 1 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 0 0 0 2 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[1 0 0 0 0 0 1 0 1 0]
[0 1 0 0 0 1 0 1 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
How do I do to add them together such that the resulting array will look like this:
element from the 1st row and from the 1st column (top left) in 1st array + element from the 1st row and from the 1st column (top left) in 2nd array = 2
So the element in the 1st row and 1st column in the resulting array will be 2 and so on for every element.
Thanks
Try the .add method for a numpy array:
sum = np.add(firstarray, secondarray)

how to convert a=1000000000 to a decimal number 0 b=0100000000 to decimal number 1 and so on for 9 numerals

I have a data set that contains ten columns and 3000 rows. Each of the column contains a 0 or 1. The ten columns concatenated together represent a label. There are ten labels from 0,1,2,3,4,5,6,7,8,9. The concatenated sequences like "1000000000" represents the label zero and "0100000000" represents label one (the number 1) and "0000000001" represents label nine.
What is the best/efficient code to convert these sequences into labels and add it as the eleventh column to the data set
for loop
lambda function
masking
binary and operation
I am confused and currently I am trying to write a lambda function to do this which is getting me nowhere?
target1 = target.apply(lambda x: [print(x) for j in range(10) for i in x], axis = 1)
I would like to know which method I should use to implement this pattern matching .
Initial Data frame
data = [[1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,0,0,0,0,0,0],
[0,0,1,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,1,0,0,0],
[0,0,0,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,0,1,0],
[0,0,0,0,0,0,0,0,0,1]]
df = pd.DataFrame(data)
Final data with the eleventh column name label
[dataframe][label]
1000000000 0
0100000000 1
0010000000 2
0001000000 3
0000100000 4
0000010000 5
0000001000 6
0000000100 7
0000000010 8
0000000001 9
You are effectively looking for the column index with the maximum value, so you can use Dataframe.idxmax(), with axis=1 to apply to the values across each row:
df['label'] = df.idxmax(axis=1)
Note that if you have additional columns than just the 10 numeric columns, you'd want to first select only the 10 numeric columns; e.g. df.iloc[:, range(10)].idxmax(...).
Demo:
>>> import pandas as pd
>>> data = [[1,0,0,0,0,0,0,0,0,0],
... [0,1,0,0,0,0,0,0,0,0],
... [0,0,1,0,0,0,0,0,0,0],
... [0,0,0,1,0,0,0,0,0,0],
... [0,0,0,0,1,0,0,0,0,0],
... [0,0,0,0,0,1,0,0,0,0],
... [0,0,0,0,0,0,1,0,0,0],
... [0,0,0,0,0,0,0,1,0,0],
... [0,0,0,0,0,0,0,0,1,0],
... [0,0,0,0,0,0,0,0,0,1]]
>>> df = pd.DataFrame(data)
>>> df['label'] = df.idxmax(axis=1)
>>> df
0 1 2 3 4 5 6 7 8 9 label
0 1 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 1
2 0 0 1 0 0 0 0 0 0 0 2
3 0 0 0 1 0 0 0 0 0 0 3
4 0 0 0 0 1 0 0 0 0 0 4
5 0 0 0 0 0 1 0 0 0 0 5
6 0 0 0 0 0 0 1 0 0 0 6
7 0 0 0 0 0 0 0 1 0 0 7
8 0 0 0 0 0 0 0 0 1 0 8
9 0 0 0 0 0 0 0 0 0 1 9
I had advocated using Series.idxmax() via Dataframe.apply() at first, but in a now-deleted comment Jezrael reminded me that Dataframe.idxmax() also exists and is much more practical here.
1.let's generate a pandas DF
import numpy as np
import pandas as pd
n = 10
#---let's generate a pandas DF
M = np.identity(n,dtype=int); M = np.vstack((M,M))
np.random.shuffle(M)
PD = pd.DataFrame(M)
print(PD)
#--- that's the label vector
vLabel = np.arange(n,dtype=int)
So we get:
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1 0 0 0 0
2 0 0 0 0 0 0 0 0 0 1
3 0 0 1 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 0 1 0 0 0 0
6 0 0 0 0 0 0 0 0 0 1
7 0 1 0 0 0 0 0 0 0 0
8 1 0 0 0 0 0 0 0 0 0
9 0 1 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 1 0
11 1 0 0 0 0 0 0 0 0 0
12 0 0 0 1 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 1 0
14 0 0 1 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 1 0 0
16 0 0 0 0 1 0 0 0 0 0
17 0 0 0 0 1 0 0 0 0 0
18 0 0 0 0 0 0 0 1 0 0
19 0 0 0 0 0 0 1 0 0 0
2. the labeling is a matrix-vector multiplication
#--- the labeling is a matrix-vector multiplication
Label = np.dot(PD,vLabel)
print(Label)
So we get:
[6 5 9 2 3 5 9 1 0 1 8 0 3 8 2 7 4 4 7 6]
3. Each row can be transformed into a string
#---- each row can be transformed into a string
for j in range(2*n):
print(str(PD.values[j,:]))
So we get:
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 1 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 1 0 0 0]
And from here you can continue :-)
Note: point 2. (matrix multiplication) is efficient, point 3. (for loop) is not efficient, so you might improve this step.

Removing elements from a binary python array whose neighborhood are less than N

Suppose I have this:
import numpy as np
x = np.zeros((10,16), dtype=np.int)
x[6:8,3:11] = 1
x[4:6,5:7] = 1
x[2:4,4:8] = 1
x[4:6,9:11] = 1
x[7,2] = 1
x[6,11] = 1
x[8,3] = 1
print(x)
Output:
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0]
[0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0]
[0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0]
[0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
And I want to filter it so that elements in a 4 neighborhood (so, up, left, right, bottom) that have less than than 2 neighbors are removed. So, I'd end up with (last three positions set as one removed):
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0]
[0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0]
[0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0]
[0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
I tried using scipy.ndimage.morphology.binary_closing, scipy.ndimage.morphology.binary_opening, scipy.ndimage.morphology.binary_dilation and scipy.ndimage.morphology.binary_erosion, but the result isn't what I need. I could make 2 for loops and iterate over each element of the array, checking for the neighbor elements, but I feel like there's a better way to do this. Am I mistaken?
I'm more interested in this specific situation (4 neighborhood, keep 2 neighbors), but is it easy to generalize to another neighborhood or number of neighbors (assuming a binary array)?
I managed to get it done like this:
from scipy.signal import convolve2d
kernel = [[0,1,0],[1,1,1],[0,1,0]]
filtered = convolve2d(x, kernel, mode='same')
x[filtered<=2] = 0
Filtered:
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
[0 0 0 1 3 4 4 3 1 0 0 0 0 0 0 0]
[0 0 0 1 3 5 5 3 1 1 1 0 0 0 0 0]
[0 0 0 0 2 4 4 2 1 3 3 1 0 0 0 0]
[0 0 0 1 2 4 4 2 2 4 4 2 0 0 0 0]
[0 0 2 3 4 5 5 4 4 5 5 2 1 0 0 0]
[0 1 2 5 4 4 4 4 4 4 3 2 0 0 0 0]
[0 0 2 2 2 1 1 1 1 1 1 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]]
And I got the output I wanted. Thank you #user3080953

What score metric is used when using joblib to store a model?

I have used joblib.dump to store a machine learning model (21 classes).
When I call the model and test it with a hold-out set I get a value which I do not know what metric it is (accuracy, precision, recall, etc)?!!
0.952380952381
So I computed the confusion matrix and the FP, FN, TN, TP.
I used the information from this Link
I also found some code from a Github.
I compared both results (1 and 2). Both give the same value for Accuracy=0.995464852608. But this result is different from the above one!!!
Any ideas? Did I computed correctly TP, FP, TN, FN?
MY CONFUSION MATRIX
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]]
MY CODE
#Testing with the holdout set
print(loaded_model.score(x_oos, y_oos))
0.952380952381 <------IS IT ACCURACY?
#Calculating the Confusion matrix
cm = confusion_matrix(y_oos, y_oos_pred)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
#Calculating values according to link 2.
FP = cm.sum(axis=0) - np.diag(cm)
FN = cm.sum(axis=1) - np.diag(cm)
TP = np.diag(cm)
TN = (21 - (FP + FN + TP)) #I put 21 because I have 21 classes
# Overall accuracy
ACC = np.mean((TP+TN)/(TP+FP+FN+TN))
print(ACC)
0.995464852608 <----IT IS DIFFERENT FROM THE ABOVE ONE.
Your example is a little bit confusing. If you provide some numbers it would be easier to understand and answer. For example just printing cm would be very helpful.
That being said. The way to deconstruct a sklearn.metrics.confusion_matris is as follows (for a binary classification):
true_neg, false_pos, false_neg, false_pos = confusion_matrix(y_oos, y_oos_pred).ravel()
For multiple classes I think the result is closer to what you have, but with the values summed. Like so:
trues = np.diag(cm).sum()
falses = (cm.sum(0) - np.diag(cm)).sum()
Then you can just compute the accuracy with:
ACC = trues / (trues + falses)
** Update**
From your edited question I can now see that in your confusion matrix you have 21 total samples of which 20 where correctly classified. In that case your accuracy is:
$\frac{20}{21} = 0.95238$
This is the value printed by the model_score method. So you are measuring accuracy. You just aren't reproducing it correctly.
n.b sorry for the latex, but hopefully one day StackOverflow will implement it.
Both are Accuracy.
The first one is the overall accuracy: All_True_Positives/All_classes (20/21).
The second one is the average of accuracies from each class. So we add all these values and divide by 21.
[0.9524 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.9524 1 1 1]

Categories

Resources