I have the following python pyseudo-code:
A1 = "101000001111"
A2 = "110000010101"
B2 = "000111010000"
B2 = "000110100000"
# TODO get X = [x1, x2, ..., x12]
assert(A1 * X > .5)
assert(A2 * X > .5)
assert(B1 * X < .5)
assert(B2 * X < .5)
So this will basically be a regression based classification.
0.5 is my threshold but how to get X?
You need to find 12 coefficients. You can try to use LogisticRegression or LinearRegression
When you have linear coefficients you can use np.dot or # operator to get a dot product.
Example:
import numpy as np
from sklearn.linear_model import LogisticRegression
A1 = "101000001111"
A2 = "110000010101"
B1 = "000111010000"
B2 = "000110100000"
A1 = np.array(list(A1), np.float32)
A2 = np.array(list(A2), np.float32)
B1 = np.array(list(B1), np.float32)
B2 = np.array(list(B2), np.float32)
X = np.array((A1, A2, B1, B2))
y = np.array([1, 1, 0, 0])
w = model = LogisticRegression(fit_intercept=False).fit(X, y).coef_.flatten()
print(A1.dot(w))
print(A2.dot(w))
print(B1.dot(w))
print(B2.dot(w))
assert A1 # w > 0.5
assert A2 # w > 0.5
assert B1 # w < 0.5
assert B2 # w < 0.5
Results:
1.7993630995882384
1.5032155788245702
-1.0190643734998346
-1.0385501901808816
Related
I am training a model to predict pose using a custom Pytorch model. However, V1 below never learns (params don't change). The output is connected to the backdrop graph and grad_fn=MmBackward.
I can't understand why V1 isn't learning but V2 is?
V1
class cam_pose_transform_V1(torch.nn.Module):
def __init__(self):
super(cam_pose_transform, self).__init__()
self.elevation_x_rotation_radians = torch.nn.Parameter(torch.normal(0., 1e-6, size=()))
self.azimuth_y_rotation_radians = torch.nn.Parameter(torch.normal(0., 1e-6, size=()))
self.z_rotation_radians = torch.nn.Parameter(torch.normal(0., 1e-6, size=()))
def forward(self, x):
exp_i = torch.zeros((4,4))
c1 = torch.cos(self.elevation_x_rotation_radians)
s1 = torch.sin(self.elevation_x_rotation_radians)
c2 = torch.cos(self.azimuth_y_rotation_radians)
s2 = torch.sin(self.azimuth_y_rotation_radians)
c3 = torch.cos(self.z_rotation_radians)
s3 = torch.sin(self.z_rotation_radians)
rotation_in_matrix = torch.tensor([
[c2, s2 * s3, c3 * s2],
[s1 * s2, c1 * c3 - c2 * s1 * s3, -c1 * s3 - c2 * c3 * s1],
[-c1 * s2, c3 * s1 + c1 * c2 * s3, c1 * c2 * c3 - s1 * s3]
], requires_grad=True)
exp_i[:3, :3] = rotation_in_matrix
exp_i[3, 3] = 1.
return torch.matmul(exp_i, x)
However, this version learns as expected (params and loss change) and also has grad_fn=MmBackward on the output:
V2
def vec2ss_matrix(vector): # vector to skewsym. matrix
ss_matrix = torch.zeros((3,3))
ss_matrix[0, 1] = -vector[2]
ss_matrix[0, 2] = vector[1]
ss_matrix[1, 0] = vector[2]
ss_matrix[1, 2] = -vector[0]
ss_matrix[2, 0] = -vector[1]
ss_matrix[2, 1] = vector[0]
return ss_matrix
class cam_pose_transform_V2(torch.nn.Module):
def __init__(self):
super(camera_transf, self).__init__()
self.w = torch.nn.Parameter(torch.normal(0., 1e-6, size=(3,)))
self.v = torch.nn.Parameter(torch.normal(0., 1e-6, size=(3,)))
self.theta = torch.nn.Parameter(torch.normal(0., 1e-6, size=()))
def forward(self, x):
exp_i = torch.zeros((4,4))
w_skewsym = vec2ss_matrix(self.w)
v_skewsym = vec2ss_matrix(self.v)
exp_i[:3, :3] = torch.eye(3) + torch.sin(self.theta) * w_skewsym + (1 - torch.cos(self.theta)) * torch.matmul(w_skewsym, w_skewsym)
exp_i[:3, 3] = torch.matmul(torch.eye(3) * self.theta + (1 - torch.cos(self.theta)) * w_skewsym + (self.theta - torch.sin(self.theta)) * torch.matmul(w_skewsym, w_skewsym), self.v)
exp_i[3, 3] = 1.
return torch.matmul(exp_i, x)
Update #1
In the training loop I printed the .grad attributes using:
print([i.grad for i in list(cam_pose.parameters())])
loss.backward()
print([i.grad for i in list(cam_pose.parameters())])
Results:
# V1
[None, None, None]
[None, None, None]
# V2
[None, None, None]
[tensor([-0.0032, 0.0025, -0.0053]), tensor([ 0.0016, -0.0013, 0.0054]), tensor(-0.0559)]
Nothing else in the code was changed, just swapped V1 model for V2.
this is your problem right here:
rotation_in_matrix = torch.tensor([
[c2, s2 * s3, c3 * s2],
[s1 * s2, c1 * c3 - c2 * s1 * s3, -c1 * s3 - c2 * c3 * s1],
[-c1 * s2, c3 * s1 + c1 * c2 * s3, c1 * c2 * c3 - s1 * s3]], requires_grad=True)
you are creating a tensor out of a list of tensors, which is not a differentiable operation -- i.e. there's no gradient flow from rotation_in_matrix to its elements c1..c3
the solution would be to create the rotation_in_matrix using tensor operations like stack and cat instead
I compared results of rotating a vector with scipy.spatial.transform.Rotation, pyquaternion.Quaternion and my own implementation.
My own and pyquaternion and pretty similar, but Rotation is quite different.
import numpy as np
from pyquaternion import Quaternion
from scipy.spatial.transform import Rotation
def ham(q1, q2):
a1, b1, c1, d1 = q1
a2, b2, c2, d2 = q2
return np.array(
[
a1 * a2 - b1 * b2 - c1 * c2 - d1 * d2,
a1 * b2 + b1 * a2 + c1 * d2 - d1 * c2,
a1 * c2 - b1 * d2 + c1 * a2 + d1 * b2,
a1 * d2 + b1 * c2 - c1 * b2 + d1 * a2,
]
)
vector = np.array([-9.86411084, 0.10916063, -0.68953008])
purequat = np.array([0, -9.86411084, 0.10916063, -0.68953008])
# order: w, i, j, k
quat = np.array([-0.54312134, 0.42388916, -0.45617676, 0.5632019])
conj = np.array([1, -1, -1, -1])
quatconj = quat * conj # hand conjugate
Q = Quaternion(quat)
R = Rotation.from_quat(quat)
print("manual:", ham(quat, ham(purequat, quatconj))[1:])
print("Quaternion:", Q.rotate(vector))
print("Rotation:", R.apply(vector))
print("Rotation inv:", R.inv().apply(vector))
manual: [-0.14691211 9.88691296 -0.08305227]
Quaternion: [-0.14691852 9.88734378 -0.08305589]
Rotation: [-2.87868815 9.45502779 -0.32195404]
Rotation inv: [-2.33238602 0.16116154 -9.60843655]
I think the result of scipy is wrong, but maybe I'm misunderstanding something. Am I misunderstanding something or should I open an issue on the scipy bugtracker?
The answer was of course, very obvious. Given a quaternion w + xi + yj + zk then pyquaternion treats an array of four numbers as [w,x,y,z] while scipy as [x,y,z,w].
I'm trying to calculate the distance between a random datapoint in the dataset and the center for the centroid. However, I get an error.
dist= math.sqrt((W[k][0]-dataset[0])**2 + (W[k][1]-dataset[1]**2))
TypeError: only size-1 arrays can be converted to Python scalars
My code:
import pandas as pd
import numpy as np
import random
import math
a1 = np.random.randn(250,2)
c1 = (0,0)
arr1 = a1 + (0,0)
a2 = np.random.randn(250,2)
arr2= a2 + [10,10]
c2 = (10,10)
a3 = np.random.randn(250,2)
arr3 = a3 + (0,10)
c3 = (0,10)
a4 = np.random.randn(250,2)
arr4 = a4 + (10,0)
c4 = (10,0)
alpha = 0.1
W = [[0,0],[0,10],[10,0],[10,10]]
dataset = [arr1,arr2,arr3,arr4]
for i in range(100):
d_min = 999999
for j in range(250):
for k in range(4):
dist= math.sqrt((W[k][0]-dataset[0])**2 + (W[k][1]-dataset[1]**2))
if dist>d_min:
d_min = dist
k_min = k
W[:,k_min] = W[:,k_min]*(1-alpha) + alpha*dataset[i]
alpha = 0.5*alpha
I have trained a Neural Net to solve the XOR problem. The problem with my network is that it is not converging. I am using Andrew Ng's methods and notations as taught in the DeepLearning.ai course.
Here's the code :
import numpy as np
from __future__ import print_function
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0, 1, 1, 0]])
np.random.seed(1)
W1 = np.random.randn(3, 2) * 0.0001
b1 = np.ones((3, 1))
W2 = np.random.randn(1, 3) * 0.0001
b2 = np.ones((1, 1))
The next part for the Backpropagation:
learning_rate = 0.01
m = 4
for iteration in range(100000):
# forward propagation
# layer1
Z1 = np.dot(W1, X.T) + b1
A1 = sigmoid(Z1)
# layer2
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)
# backpropagation
dZ2 = Y - A2
dW2 = (1 / m) * np.dot(dZ2, A1.T)
db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.dot(dW2.T, dZ2) * sigmoid_gradient(Z1)
dW1 = (1 / m) * np.dot(dZ1, X)
db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
# checking if shapes are correctly preserved
assert (dZ2.shape == Z2.shape)
assert (dW2.shape == W2.shape)
assert (db2.shape == b2.shape)
assert (dZ1.shape == Z1.shape)
assert (dW1.shape == W1.shape)
assert (db1.shape == b1.shape)
# update parameters
W1 = W1 + learning_rate * dW1
W2 = W2 + learning_rate * dW2
b1 = b1 + learning_rate * db1
b2 = b2 + learning_rate * db2
# print every 10k
if (iteration % 10000 == 0):
print(A2)
You have made a couple of mistakes in your code. For example, in computing the W2.
...
dZ2 = Y - A2
dW2 = (1 / m) * np.dot(dZ2, A1.T)
...
W2 = W2 + learning_rate * dW2
We want to calculate the derivative of Cost with respect to W2 using the chain rule.
We can write the derivatives as follows:
You haven't implemented the middle part which computes the derivative of the Z2.
You can check out this video, it explains the math part of backpropagation. Moreover, you can check out this simple implementation of the neural network.
I'm currently writing my own code to implement a single-hidden-layer neural network and test the model on MNIST dataset. But I got wired result(NLL is unacceptably high) though I checked my code for over 2 days without finding what's went wrong.
Here're global parameters:
layers = np.array([784, 300, 10])
learningRate = 0.01
momentum = 0.01
batch_size = 10000
num_of_batch = len(train_label)/batch_size
nepoch = 30
Softmax function definition:
def softmax(x):
x = np.exp(x)
x_sum = np.sum(x,axis=1) #shape = (nsamples,)
for row_idx in range(len(x)):
x[row_idx,:] /= x_sum[row_idx]
return x
Sigmoid function definition:
def f(x):
return 1.0/(1+np.exp(-x))
initialize w and b
k = np.vectorize(math.sqrt)(layers[0:-2]*layers[1:])
w1 = np.random.uniform(-0.5, 0.5, layers[0:2][::-1])
b1 = np.random.uniform(-0.5, 0.5, (1,layers[1]))
w2 = np.random.uniform(-0.5, 0.5, layers[1:3][::-1])
b2 = np.random.uniform(-0.5, 0.5, (1,layers[2]))
And the following is the core part for each mini-batch:
for idx in range(num_of_batch):
# forward_vectorized
x = train_set[idx*batch_size:(idx+1)*batch_size,:]
y = Y[idx*batch_size:(idx+1)*batch_size,:]
a1 = x
a2 = f(np.dot(np.insert(a1,0,1,axis=1),np.insert(w1,0,b1,axis=1).T))
a3 = softmax(np.dot(np.insert(a2,0,1,axis=1),np.insert(w2,0,b2,axis=1).T))
# compute delta
d3 = a3-y
d2 = np.dot(d3,w2)*a2*(1.0-a2)
# compute grad
D2 = np.dot(d3.T,a2)
D1 = np.dot(d2.T,a1)
# update_parameters
w1 = w1 - learningRate*(D1/batch_size + momentum*w1)
b1 = b1 - learningRate*(np.sum(d2,axis=0)/batch_size)
w2 = w2 - learningRate*(D2/batch_size+ momentum*w2)
b2 = b2 - learningRate*(np.sum(d3,axis=0)/batch_size)
e = -np.sum(y*np.log(a3))/batch_size
err.append(e)
After one epoch(50,000 samples), I got the following sequence of e, which seems to be too large:
Out[1]:
10000/50000 4.033538
20000/50000 3.924567
30000/50000 3.761105
40000/50000 3.632708
50000/50000 3.549212
I think the back_prop code should be correct and I couldn't find what's going wrong. It has tortured me for over 2 days.