In the example for the Torch tutorial for Python, they use the following graph:
x = [[1, 1], [1, 1]]
y = x + 2
z = 3y^2
o = mean( z ) # 1/4 * x.sum()
Thus, the forward pass gets us this:
x_i = 1, y_i = 3, z_i = 27, o = 27
In code this looks like:
import torch
# define graph
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
# if we don't do this, torch will only retain gradients for leaf nodes, ie: x
y.retain_grad()
z.retain_grad()
# does a forward pass
print(z, out)
however, I get confused at the gradients computed:
# now let's run our backward prop & get gradients
out.backward()
print(f'do/dz = {z.grad[0,0]}')
which outputs:
do/dx = 4.5
By chain rule, do/dx = do/dz * dz/dy * dy/dx, where:
dy/dx = 1
dz/dy = 9/2 given x_i=1
do/dz = 1/4 given x_i=1
which means:
do/dx = 1/4 * 9/2 * 1 = 9/8
However this doesn't match the gradients returned by Torch (9/2 = 4.5). Perhaps I have a math error (something with the do/dz = 1/4 term?), or I don't understand autograd in Torch.
Any pointers?
do/dz = 1 / 4
dz/dy = 6y = 6 * 3 = 18
dy/dx = 1
therefore, do/dx = 9/2
Related
I want to fit a set of data points in the xy plane to the general case of a rotated and translated hyperbola to back out the coefficients of the general equation of a conic.
I've tried the methodology proposed in here but so far I cannot make it work.
When fitting to a set of points known to be a hyperbola I get quite different outputs.
What I'm doing wrong in the code below?
Or is there any other way to solve this problem?
import numpy as np
from sympy import plot_implicit, Eq
from sympy.abc import x, y
def fit_hyperbola(x, y):
D1 = np.vstack([x**2, x*y, y**2]).T
D2 = np.vstack([x, y, np.ones(len(x))]).T
S1 = D1.T # D1
S2 = D1.T # D2
S3 = D2.T # D2
# define the constraint matrix and its inverse
C = np.array(((0, 0, -2), (0, 1, 0), (-2, 0, 0)), dtype=float)
Ci = np.linalg.inv(C)
# Setup and solve the generalized eigenvector problem
T = np.linalg.inv(S3) # S2.T
S = Ci#(S1 - S2#T)
eigval, eigvec = np.linalg.eig(S)
# evaluate and sort resulting constraint values
cond = eigvec[1]**2 - 4*eigvec[0]*eigvec[2]
# [condVals index] = sort(cond)
idx = np.argsort(cond)
condVals = cond[idx]
possibleHs = condVals[1:] + condVals[0]
minDiffAt = np.argmin(abs(possibleHs))
# minDiffVal = possibleHs[minDiffAt]
alpha1 = eigvec[:, idx[minDiffAt + 1]]
alpha2 = T#alpha1
return np.concatenate((alpha1, alpha2)).ravel()
if __name__ == '__main__':
# known hyperbola coefficients
coeffs = [1., 6., -2., 3., 0., 0.]
# hyperbola points
x_ = [1.56011303e+00, 1.38439984e+00, 1.22595618e+00, 1.08313085e+00,
9.54435408e-01, 8.38528681e-01, 7.34202759e-01, 6.40370424e-01,
5.56053814e-01, 4.80374235e-01, 4.12543002e-01, 3.51853222e-01,
2.97672424e-01, 2.49435970e-01, 2.06641170e-01, 1.68842044e-01,
1.35644673e-01, 1.06703097e-01, 8.17157025e-02, 6.04220884e-02,
4.26003457e-02, 2.80647476e-02, 1.66638132e-02, 8.27872926e-03,
2.82211172e-03, 2.37095181e-04, 4.96740239e-04, 3.60375275e-03,
9.59051203e-03, 1.85194083e-02, 3.04834928e-02, 4.56074477e-02,
6.40488853e-02, 8.59999904e-02, 1.11689524e-01, 1.41385205e-01,
1.75396504e-01, 2.14077865e-01, 2.57832401e-01, 3.07116093e-01,
3.62442545e-01, 4.24388335e-01, 4.93599021e-01, 5.70795874e-01,
6.56783391e-01, 7.52457678e-01, 8.58815793e-01, 9.76966133e-01,
1.10813998e+00, 1.25370436e+00]
y_ = [-0.66541515, -0.6339625 , -0.60485332, -0.57778425, -0.5524732 ,
-0.52865638, -0.50608561, -0.48452564, -0.46375182, -0.44354763,
-0.42370253, -0.4040097 , -0.38426392, -0.3642594 , -0.34378769,
-0.32263542, -0.30058217, -0.27739811, -0.25284163, -0.22665682,
-0.19857079, -0.16829086, -0.13550147, -0.0998609 , -0.06099773,
-0.01850695, 0.02805425, 0.07917109, 0.13537629, 0.19725559,
0.26545384, 0.34068177, 0.42372336, 0.51544401, 0.61679957,
0.72884632, 0.85275192, 0.98980766, 1.14144182, 1.30923466,
1.49493479, 1.70047747, 1.92800474, 2.17988774, 2.45875143,
2.76750196, 3.10935692, 3.48787892, 3.90701266, 4.3711261 ]
plot_implicit (Eq(coeffs[0]*x**2 + coeffs[1]*x*y + coeffs[2]*y**2 + coeffs[3]*x + coeffs[4]*y, -coeffs[5]))
coeffs_fit = fit_hyperbola(x_, y_)
plot_implicit (Eq(coeffs_fit[0]*x**2 + coeffs_fit[1]*x*y + coeffs_fit[2]*y**2 + coeffs_fit[3]*x + coeffs_fit[4]*y, -coeffs_fit[5]))
The general equation of hyperbola is defined with 5 independent coefficients (not 6). If the model equation includes dependant coefficients (which is the case with 6 coefficients) trouble might occur in the numerical regression calculus.
That is why the equation A * x * x + B * x * y + C * y * y + D * x + F * y = 1 is considered in the calculus below. The fitting is very good.
Then one can goback to the standard equation a * x * x + 2 * b * x * y + c * y * y + 2 * d * x + 2 * f * y + g = 0 in setting a value for g (for example g=-1).
The formulas to find the coordinates of the center, the equations of asymptotes, the equations of axis, are given in addition.
https://mathworld.wolfram.com/ConicSection.html
https://en.wikipedia.org/wiki/Conic_section
https://en.wikipedia.org/wiki/Hyperbola
I have an example code. When I calculate dloss/dw manually I get the result 8, but the following code gives me a 16. Please tell me how the gradient is 16.
import torch
x = torch.tensor(2.0)
y = torch.tensor(2.0)
w = torch.tensor(3.0, requires_grad=True)
# forward
y_hat = w * x
s = y_hat - y
loss = s**2
#backward
loss.backward()
print(w.grad)
I think you simply miscalculated.
The derivation of loss = (w * x - y) ^ 2 is:
dloss/dw = 2 * (w * x - y) * x = 2 * (3 * 2 - 2) * 2 = 16
Keep in mind that back-propagation in neural networks is done by applying the chain rule: I think you forgot the *x at the end of the derivation
To be specific:
chain rule for derivation says that df(g(x))/dx = f'(g(x)) * g'(x) (derivated with respect to x)
the whole loss function in your case is built like this:
loss(y_hat) = (y_hat - y)^2
y_hat(x) = w * x
thus: loss(y_hat(x)) = (y_hat(x) - y)^2
the derivation of this is according to chain rule:
dloss(y_hat(x))/dw = loss'(y_hat(x)) * dy_hat(x)/dw
for any z:
loss'(z) = 2 * (z - y) * 1 and dy_hat(z)/dw = z
thus: dloss((y_hat(x))/dw = dloss(y_hat(x))/dw = loss'(y_hat(x)) * y_hat'(x) = 2 * (y_hat(x) - z) * dy_hat(x)/dw = 2 * (y_hat(x) - z) * x = 2 * (w * x - z) * x = 16
pytorch knows that in your forward pass each layer applies some kind of function to its input and that your forward pass is 1 * loss(y_hat(x)) and than keeps applying the chain rule for the backward pass (each layer requires one application of the chain rule).
I need to take two dimensional input data generated uniformly from the unit square, and then label the data either 1 or -1 based on the XOR of two hypothesis functions, h1(x) = w1^T*x and h2(x) = w2^T*x where w1 = [0, 1, -1] and w2 = [0, 1, 1]. From there, I am supposed to run the data through a three-layer multilayer perceptron using the sign function for theta. For some reason, my MLP outputs -1 for all points no matter what. Where is my error? Here's my code:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(-.5, .5, (100,3))
x[:,0] = 1
y = np.ones(100)
w1 = np.array([0, 1, -1]).T
w2 = np.array([0, 1, 1]).T
h1 = np.sign(w1[1]*x[:,1] + w1[2]*x[:,2])
h2 = np.sign(w2[1]*x[:,1] + w2[2]*x[:,2])
for i in range(np.size(x,0)):
if (h1[i]<0 and h2[i]>0) or (h1[i]>0 and h2[i]<0):
y[i] = 1
else:
y[i] = -1
print("h1:")
print(h1)
print("h2:")
print(h2)
#nodes 21 and 31 are just 1 with weights -1.5 and 1.5, respectively
node22 = np.ones((100,1))
node23 = np.ones((100,1))
node32 = np.ones((100,1))
node33 = np.ones((100,1))
out = np.ones((100,1))
for j in range(np.size(x,0)):
node22[j] = np.matmul(w1,x[j,:])
node23[j] = np.matmul(w2,x[j,:])
node32[j] = np.sign(-1.5 + node22[j] - node23[j])
node33[j] = np.sign(-1.5 - node22[j] + node23[j])
out[j] = np.sign(1.5 + node32[j] + node33[j])
print("Layer 2, Node 2:")
print(node22)
print("Layer 2, Node 3:")
print(node23)
print("Layer 3, Node 2:")
print(node32)
print("Layer 3, Node 3:")
print(node33)
print("f:")
print(out)
error = 0
for k in range(np.size(x,0)):
if y[k] != out[k]:
error +=1
error = error/np.size(x,0)
print(error)
I figured it out. For nodes 22 and 23, the calculation needs to be inside a sign function, so the j for loop should look like this:
for j in range(np.size(x,0)):
node22[j] = np.sign(np.matmul(w1,x[j,:]))
node23[j] = np.sign(np.matmul(w2,x[j,:]))
node32[j] = np.sign(-1.5 + node22[j] - node23[j])
node33[j] = np.sign(-1.5 - node22[j] + node23[j])
out[j] = np.sign(1.5 + node32[j] + node33[j])
'''
My simple code:
import torch
x = torch.randn(4, requires_grad=True).cuda()
y = torch.randn(4, requires_grad=True).cuda()
z = torch.zeros(4)
z = torch.clone(x)
z.retain_grad()
h = (z + y) * z
l = torch.randn(4).cuda()
loss = (l - h).pow(2).sum()
loss.backward()
print('x.grad=', x.grad)
print('z.grad=', z.grad)
output:
x.grad= None
z.grad= tensor([-15.3401, -3.2623, -2.1670, 0.1410], device='cuda:0')
Why x.grad is None but not same as z.grad ?
What should I do if I want they are same?
You need to call x.retain_grad() after declaring x if you want to keep the grad of tensor x.
How do I do the equivalent of scipy.stats.norm.ppf without using Scipy. I have python's Math module has erf built in but I cannot seem to recreate the function.
PS: I cannot just use scipy because Heroku does not allow you to install it and using alternate buildpacks breaches the 300Mb maximum slug size limit.
There's not a simple way to use erf to implement norm.ppf because norm.ppf is related to the inverse of erf. Instead, here's a pure Python implementation of the code from scipy. You should find that the function ndtri returns exactly the same value as norm.ppf:
import math
s2pi = 2.50662827463100050242E0
P0 = [
-5.99633501014107895267E1,
9.80010754185999661536E1,
-5.66762857469070293439E1,
1.39312609387279679503E1,
-1.23916583867381258016E0,
]
Q0 = [
1,
1.95448858338141759834E0,
4.67627912898881538453E0,
8.63602421390890590575E1,
-2.25462687854119370527E2,
2.00260212380060660359E2,
-8.20372256168333339912E1,
1.59056225126211695515E1,
-1.18331621121330003142E0,
]
P1 = [
4.05544892305962419923E0,
3.15251094599893866154E1,
5.71628192246421288162E1,
4.40805073893200834700E1,
1.46849561928858024014E1,
2.18663306850790267539E0,
-1.40256079171354495875E-1,
-3.50424626827848203418E-2,
-8.57456785154685413611E-4,
]
Q1 = [
1,
1.57799883256466749731E1,
4.53907635128879210584E1,
4.13172038254672030440E1,
1.50425385692907503408E1,
2.50464946208309415979E0,
-1.42182922854787788574E-1,
-3.80806407691578277194E-2,
-9.33259480895457427372E-4,
]
P2 = [
3.23774891776946035970E0,
6.91522889068984211695E0,
3.93881025292474443415E0,
1.33303460815807542389E0,
2.01485389549179081538E-1,
1.23716634817820021358E-2,
3.01581553508235416007E-4,
2.65806974686737550832E-6,
6.23974539184983293730E-9,
]
Q2 = [
1,
6.02427039364742014255E0,
3.67983563856160859403E0,
1.37702099489081330271E0,
2.16236993594496635890E-1,
1.34204006088543189037E-2,
3.28014464682127739104E-4,
2.89247864745380683936E-6,
6.79019408009981274425E-9,
]
def ndtri(y0):
if y0 <= 0 or y0 >= 1:
raise ValueError("ndtri(x) needs 0 < x < 1")
negate = True
y = y0
if y > 1.0 - 0.13533528323661269189:
y = 1.0 - y
negate = False
if y > 0.13533528323661269189:
y = y - 0.5
y2 = y * y
x = y + y * (y2 * polevl(y2, P0) / polevl(y2, Q0))
x = x * s2pi
return x
x = math.sqrt(-2.0 * math.log(y))
x0 = x - math.log(x) / x
z = 1.0 / x
if x < 8.0:
x1 = z * polevl(z, P1) / polevl(z, Q1)
else:
x1 = z * polevl(z, P2) / polevl(z, Q2)
x = x0 - x1
if negate:
x = -x
return x
def polevl(x, coef):
accum = 0
for c in coef:
accum = x * accum + c
return accum
The function ppf is the inverse of y = (1+erf(x/sqrt(2))/2. So we need to solve this equation for x, given y between 0 and 1. Here is a code doing this by the bisection method. I imported SciPy function to illustrate that the result is the same.
from math import erf, sqrt
from scipy.stats import norm # only for comparison
y = 0.123
z = 2*y-1
a = 0
while erf(a) > z or erf(a+1) < z: # looking for initial bracket of size 1
if erf(a) > z:
a -= 1
else:
a += 1
b = a+1 # found a bracket, proceed to refine it
while b-a > 1e-15: # 1e-15 ought to be enough precision
c = (a+b)/2.0 # bisection method
if erf(c) > z:
b = c
else:
a = c
print sqrt(2)*(a+b)/2.0 # this is the answer
print norm.ppf(y) # SciPy for comparison
Left for you to do:
preliminary bound checks (y must be between 0 and 1)
scaling and shifting if other mean / variance are desired; the code is for standard normal distribution (mean 0, variance 1).