Preprocess Accuracy metric

Preprocess Accuracy metric - python

I have a model which predicts 5 classes. I want to change Accuracy metric as in example below :
def accuracy(y_pred,y_true):
#our pred tensor
y_pred = [ [0,0,0,0,1], [0,1,0,0,0], [0,0,0,1,0], [1,0,0,0,0], [0,0,1,0,0]]
# make some manipulations with tensor y_pred
# actons description :
for array in y_pred :
if array[3] == 1 :
array[3] = 0
array[0] = 1
if array[4] == 1 :
array[4] = 0
array[1] = 1
else :
continue
#this nice work with arrays but howe can i implement it with tensors ?
#after manipulations result->
y_pred = [ [0,1,0,0,0], [0,1,0,0,0], [1,0,0,0,0], [1,0,0,0,0],[0,0,1,0,0] ]
#the same ations i want to do with y_true
# and after it i want to run this preprocess tensors the same way as simple tf.keras.metrics.Accuracy metric
I think tf.where can help to filter tensor, but unfortunately can't do this correctly.
How to make this preprocessing accuracy metric with Tensors ?

If you want to shift the ones to left by 3 indices, you can do this:
import numpy as np
y_pred = [ [0,0,0,0,1], [0,1,0,0,0], [0,0,0,1,0], [1,0,0,0,0], [0,0,1,0,0]]
y_pred = np.array(y_pred)
print(y_pred)
shift = 3
one_pos = np.where(y_pred==1)[1] # indices where the y_pred is 1
# updating the new positions with 1
y_pred[range(y_pred.shape[1]),one_pos - shift] = np.ones((y_pred.shape[1],))
# making the old positions zero
y_pred[range(y_pred.shape[1]),one_pos] = np.zeros((y_pred.shape[1],))
print(y_pred)
[[0 0 0 0 1]
[0 1 0 0 0]
[0 0 0 1 0]
[1 0 0 0 0]
[0 0 1 0 0]]
[[0 1 0 0 0]
[0 0 0 1 0]
[1 0 0 0 0]
[0 0 1 0 0]
[0 0 0 0 1]]
Update:
If you only want to shift for index 3 and 4.
import numpy as np
y_pred = [ [0,0,0,0,1], [0,1,0,0,0], [0,0,0,1,0], [1,0,0,0,0], [0,0,1,0,0]]
y_pred = np.array(y_pred)
print(y_pred)
shift = 3
one_pos = np.where(y_pred==1)[1]# indices where the y_pred is 1
print(one_pos)
y_pred[range(y_pred.shape[1]),one_pos - shift] = [1 if (i == 3 or i == 4) else 0 for i in one_pos]
y_pred[range(y_pred.shape[1]),one_pos] = [0 if (i == 3 or i == 4) else 1 for i in one_pos]
print(y_pred)
[[0 0 0 0 1]
[0 1 0 0 0]
[0 0 0 1 0]
[1 0 0 0 0]
[0 0 1 0 0]]
[4 1 3 0 2]
[[0 1 0 0 0]
[0 1 0 0 0]
[1 0 0 0 0]
[1 0 0 0 0]
[0 0 1 0 0]]

Related

one hot encode with pandas get_dummies missing values

I have a dataset in the form of a DataFrame and each row has a label ranging from 1-5. I am doing a one hot encode using pd.get_dummies(). If my dataset has all 5 labels there is not problem. However not all sets contain all 5 numbers so the encode just skips the missing value and creates a problem for new datasets coming in. Can I set a range so that the one hot encode knows there should be 5 labels? Or would I have to append 1,2,3,4,5 to the end of the array before I perform the encode and then delete the last 5 entries?
Correct encode: values 1-5 are encoded
arr = np.array([1,2,5,3,1,5,1,4])
df = pd.DataFrame(arr, columns = ['test'])
hotarr = np.array(pd.get_dummies(df['test']))
>>>[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]
[1 0 0 0 0]
[0 0 0 1 0]]
Missing value encode: this dataset is missing label 4.
arr = np.array([1,2,5,3,1,5,1,])
df = pd.DataFrame(arr, columns = ['test'])
hotarr = np.array(pd.get_dummies(df['test']))
>>>[[1 0 0 0]
[0 1 0 0]
[0 0 0 1]
[0 0 1 0]
[1 0 0 0]
[0 0 0 1]
[1 0 0 0]]

Set up the CategoricalDtype before encoding to ensure all categories are represented when getting dummies:
import numpy as np
import pandas as pd
arr = np.array([1, 2, 5, 3, 1, 5, 1])
df = pd.DataFrame(arr, columns=['test'])
# Setup Categorical Dtype
df['test'] = df['test'].astype(pd.CategoricalDtype(categories=[1, 2, 3, 4, 5]))
hotarr = np.array(pd.get_dummies(df['test']))
print(hotarr)
Alternatively can reindex after get_dummies with fill_value=0 to add the missing columns:
hotarr = np.array(pd.get_dummies(df['test'])
.reindex(columns=[1, 2, 3, 4, 5], fill_value=0))
Both produce hotarr with 5 columns even though input does not contain 4:
[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]
[1 0 0 0 0]]

LabelBinarizer gives all values zeros

I'm encoding my labels with label binarizer like this:
from sklearn.preprocessing import LabelBinarizer
# Transform labels to one-hot
lb = LabelBinarizer()
Y = lb.fit_transform(df.classification)
But when I print Y I get all zeros like:
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
I don't know if all the values in all rows are zeros or not. Unfortunately, I can't see the complete row and couldn't find a way to do so. Are these values right or not?
Any help would be appreciated.

Easy way to remove columns in numpy array that add up to zero

Is there an easy way to remove columns that add up to zero and their corresponding rows in a numpy matrix?
I am trying to create a transition matrix for PageRank but the code I wrote seems not to be the most efficient.
i = 1
while True:
if len(graph) == i-1:
break
else:
col_sum = np.sum(graph[:,i-1])
if col_sum == 0:
graph = np.delete(graph, np.s_[i-1], 1)
graph = np.delete(graph, i-1, 0)
nodes.remove(nodes[i-1])
i = 0
i += 1

Here's a vectorized version using np.ix_
mask = np.nonzero(np.sum(graph, axis = 1))[0]
graph = graph[np.ix_(mask, mask)]

Numpy is designed to do just that kind of thing without loops. Most operators like np.sum are designed to work with matrices or multi-dimensional arrays and take an axis argument telling it along which dimension to operate. Finally, we can use index or boolean masking arrays to select elements from an array.
import numpy as np
np.random.seed(42)
nodes = [chr(i + 65) for i in range(10)]
a = (np.random.randn(10, 10) > 1.5).astype(int)
print('before:')
print(a)
print(nodes)
col_sum = np.sum(a, axis=0) # sum of each column columns
idx = np.flatnonzero(col_sum) # indices of non-zero columns
# remove columns and rows
a = a[:, idx][idx, :] # note that a[idx, idx] won't work
# if nodes was an array we could do this:
#nodes = nodes[idx]
# but nodes is a list, so we need a list comprehension:
nodes = [n for n, i in zip(nodes, idx) if i]
print('\nafter:')
print(a)
print(nodes)
Result:
before:
[[0 0 0 1 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 1 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
after:
[[0 0 0]
[1 0 0]
[0 0 0]]
['A', 'B', 'C']

Take non-zero elements in a macro-list

I have a problem with the instruction np.nonzero() in python. I want to take all the indices of a given list that are non zero. So, consider that I have the following code:
import numpy as np
from scipy.special import binom
M=4
N=3
def generate(N,nb):
states = np.zeros((int(binom(nb+N-1, nb)), N), dtype=int)
states[0, 0]=nb
ni = 0 # init
for i in xrange(1, states.shape[0]):
states[i,:N-1] = states[i-1, :N-1]
states[i,ni] -= 1
states[i,ni+1] += 1+states[i-1, N-1]
if ni >= N-2:
if np.any(states[i, :N-1]):
ni = np.nonzero(states[i, :N-1])[0][-1]
else:
ni += 1
return states
base = generate(M,N)
The result of base is given by:
base = [[3 0 0 0]
[2 1 0 0]
[2 0 1 0]
[2 0 0 1]
[1 2 0 0]
[1 1 1 0]
[1 1 0 1]
[1 0 2 0]
[1 0 1 1]
[1 0 0 2]
[0 3 0 0]
[0 2 1 0]
[0 2 0 1]
[0 1 2 0]
[0 1 1 1]
[0 1 0 2]
[0 0 3 0]
[0 0 2 1]
[0 0 1 2]
[0 0 0 3]]
The point is that for a given index j,k I want to take all the items in base that has non-zero components in the sites j,k, for example:
Taking j=0,k=1 I have to obtain:
result = [1 4 5 6]
which corresponds to the elements 1,4,5,6 of base that satisfies this condition. On the other hand, I have used the command:
np.nonzero((base[:, j]) & (base[:, k]))[0]
but it doesn't work correctly, any idea why?

First of all, the syntax for list index base[:, j] is wrong, use : [:][j] instead
also:
np.nonzero((base[:, j]) & (base[:, k]))[0]
won't work ,because the & sign is not applicable here..
you could use numpy like this:
b = np.array(base);
j=0;k=1;
np.nonzero(b.T[j]* b.T[k])[0]
which will give:
array([1, 4, 5, 6])

rotate an nxnxn matrix in python

I have a binary array of size 64x64x64, where a volume of 40x40x40 is set to "1" and rest is "0". I have been trying to rotate this cube about its center around z-axis using skimage.transform.rotate and also Opencv as:
def rotateImage(image, angle):
row, col = image.shape
center = tuple(np.array([row, col]) / 2)
rot_mat = cv2.getRotationMatrix2D(center, angle, 1.0)
new_image = cv2.warpAffine(image, rot_mat, (col, row))
return new_image
In the case of openCV, I tried, 2D rotation of each idividual slices in a cube (Cube[:,:,n=1,2,3...p]).
After rotating, total sum of the values in the array changes. This may be caused by interpolation during rotation. How can I rotate 3D array of this kind without adding anything to the array?

Ok so I understand now what you are asking. The closest I can come up with is scipy.ndimage. But there is a way interface with imagej from python if which might be easier. But here is what I did with scipy.ndimage:
from scipy.ndimage import interpolation
angle = 25 #angle should be in degrees
Rotatedim = interpolation.rotate(yourimage, angle, reshape = False,output = np.int32, order = 5,prefilter = False)
This worked for some angles to preserve the some and not others, perhaps by playing around more with the parameters you might be able to get your desired outcome.

One option is to convert into sparse, and transform the coordinates using a matrix rotation. Then transform back into dense. In 2 dimensions, this looks like:
import numpy as np
import scipy.sparse
import math
N = 10
space = np.zeros((N, N), dtype=np.int8)
space[3:7, 3:7].fill(1)
print(space)
print(np.sum(space))
space_coo = scipy.sparse.coo_matrix(space)
Coords = np.array(space_coo.nonzero()) - 3
theta = 30 * 3.1416 / 180
R = np.array([[math.cos(theta), math.sin(theta)], [-math.sin(theta), math.cos(theta)]])
space2_coords = R.dot(Coords)
space2_coords = np.round(space2_coords)
space2_coords += 3
space2_sparse = scipy.sparse.coo_matrix(([1] * space2_coords.shape[1], (space2_coords[0], space2_coords[1])), shape=(N, N))
space2 = space2_sparse.todense()
print(space2)
print(np.sum(space2))
Output:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
16
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 1 1 1 1 0 0 0 0]
[0 0 1 1 1 1 1 0 0 0]
[0 1 1 0 1 1 0 0 0 0]
[0 0 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
16
The advantage is that you'll get exactly as many 1 values before and after the transform. The downsides is that you might get 'holes', as above, and/or duplicate coordinates, giving values of '2' in the final dense matrix.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Preprocess Accuracy metric - python

Related

one hot encode with pandas get_dummies missing values

LabelBinarizer gives all values zeros

Easy way to remove columns in numpy array that add up to zero

Take non-zero elements in a macro-list

rotate an nxnxn matrix in python

Categories

Resources