How the gradient is calculated in pytorch - python

I have an example code. When I calculate dloss/dw manually I get the result 8, but the following code gives me a 16. Please tell me how the gradient is 16.
import torch
x = torch.tensor(2.0)
y = torch.tensor(2.0)
w = torch.tensor(3.0, requires_grad=True)
# forward
y_hat = w * x
s = y_hat - y
loss = s**2
#backward
loss.backward()
print(w.grad)

I think you simply miscalculated.
The derivation of loss = (w * x - y) ^ 2 is:
dloss/dw = 2 * (w * x - y) * x = 2 * (3 * 2 - 2) * 2 = 16
Keep in mind that back-propagation in neural networks is done by applying the chain rule: I think you forgot the *x at the end of the derivation
To be specific:
chain rule for derivation says that df(g(x))/dx = f'(g(x)) * g'(x) (derivated with respect to x)
the whole loss function in your case is built like this:
loss(y_hat) = (y_hat - y)^2
y_hat(x) = w * x
thus: loss(y_hat(x)) = (y_hat(x) - y)^2
the derivation of this is according to chain rule:
dloss(y_hat(x))/dw = loss'(y_hat(x)) * dy_hat(x)/dw
for any z:
loss'(z) = 2 * (z - y) * 1 and dy_hat(z)/dw = z
thus: dloss((y_hat(x))/dw = dloss(y_hat(x))/dw = loss'(y_hat(x)) * y_hat'(x) = 2 * (y_hat(x) - z) * dy_hat(x)/dw = 2 * (y_hat(x) - z) * x = 2 * (w * x - z) * x = 16
pytorch knows that in your forward pass each layer applies some kind of function to its input and that your forward pass is 1 * loss(y_hat(x)) and than keeps applying the chain rule for the backward pass (each layer requires one application of the chain rule).

Related

helmholtz decomposition of a vector V(x,z)=[f(x,z),0,0]

i wrote the below program in python with the hope of conducting a Helmholtz decomposition on a vector V(x,z)=[f(x,z),0,0] where f(x,z) is a function defined earlier, the aim of this program is to get the solenoidal and harmonic parts of vector V as S(x,z)=[S1(x,z),S2(x,z),S3(x,z)] and H(x,z)=[H1(x,z),H2(x,z),H3(x,z)] with S and H satisfying the condition V=S+H which transllates to (S1+H1=f, S2+H2=0, S3+H3=0)
please help i cant get anywhere with this problem, the output of the above code isnt what i wanted , its the following:
Solenoidal:
[[-22.6179559436889 + 41.14742726254I, 33.243161684442 - 99.9416505604629I, -22.6179559436889 + 41.14742726254I], [0.000151144774536593 + 0.000222403457962539I, 0, -0.000151144774536593 - 0.000222403457962539I], [22.6210744289585 - 41.1540953247099I, -41.2442631673893 + 88.1909008014634I, 6.6295316668479 - 64.6849359328842I]]
Harmonic:
[[26.6155393446675 - 35.2651619174123I, -33.243161684442 + 99.9416505604629I, 18.6203725427103 - 47.0296926076676I], [-0.000151144774536593 - 0.000222403457962539I, 0, 0.000151144774536593 + 0.000222403457962539I], [-18.6231887384308 + 47.0368054767535I, 41.2442631673893 - 88.1909008014634I, -10.6274173573755 + 58.8022257808406I]]
`
import math
import numpy as np
from sympy import symbols, simplify, lambdify
​
# Define x and z as symbolic variables
x, z = symbols('x, z')
​
# Define the function f
def f(x, z):
term1 = 171.05 * 10**(-18) * ((1.00 * x**4 + 2.00 * x**2 * z**2 + 1.00 * z**4) * math.atan(z*x) - 1.00 * x**3 * z - 1.00 * x * z**3)
term2 = -3.17 * 10**6 * x**4 - 6.36 * 10**6 * x**2 * z**2 - 3.19 * 10**6 * z**4 + 1.00 * x**4 * z + 2.00 * x**2 * z**3 + 1.00 * z**5
term3 = (z - 44.33 * 10**3)
term4 = ((-2.00 * 10**3) / (576.30 * 10**3 + 13.00 * z))**2.69 * (x**2 + z**2)**7.00 / 2.00 * z
return term1 * term2 * term3 / (term4 + 1e-15) # Add a small value to term4 to avoid division by zero
​
# Define a 2D array with 3 elements
vector = np.array([[f(x, z) for x in range(-1, 2)] for z in range(-1, 2)])
​
def helmholtz_hodge_decomposition(vector):
# Compute the gradient of the vector field
gradient = np.gradient(vector)
# Compute the curl of the vector field
curl = np.cross(gradient[0], gradient[1])
# Compute the divergence of the vector field
divergence = np.sum(gradient, axis=0)
# Compute the harmonic part of the vector field
harmonic = -curl - divergence
# Compute the solenoidal part of the vector field
solenoidal = vector - harmonic
return solenoidal, harmonic
​
# Print the solenoidal and harmonic parts as functions of x and z
solenoidal, harmonic = helmholtz_hodge_decomposition(vector)
print("Solenoidal:")
print(simplify(solenoidal))
print("Harmonic:")
print(simplify(harmonic))
​
# Create functions from the solenoidal and harmonic parts
solenoidal_part = lambdify((x, z), simplify(solenoidal), 'numpy')
harmonic_part = lambdify((x, z), simplify(harmonic), 'numpy')
`
expecting :Conducting a Helmholtz decomposition on a vector V(x,z)=[f(x,z),0,0] where f(x,z) is a function defined earlier, the aim of this program is to get the solenoidal and harmonic parts of vector V as S(x,z)=[S1(x,z),S2(x,z),S3(x,z)] and H(x,z)=[H1(x,z),H2(x,z),H3(x,z)] with S and H satisfying the condition V=S+H which transllates to (S1+H1=f, S2+H2=0, S3+H3=0)

How to use Gradient Descent to solve this multiple terms trigonometry function?

Question is like this:
f(x) = A sin(2π * L * x) + B cos(2π * M * x) + C sin(2π * N * x)
and L,M,N are constants integer, 0 <= L,M,N <= 100
and A,B,C can be any possible integers.
Here is the given data:
x = [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99]
y = [4,1.240062433,-0.7829654986,-1.332487982,-0.3337640721,1.618033989,3.512512389,4.341307895,3.515268061,1.118929599,-2.097886967,-4.990538967,-6.450324073,-5.831575611,-3.211486891,0.6180339887,4.425660706,6.980842552,7.493970785,5.891593744,2.824429495,-0.5926374511,-3.207870455,-4.263694544,-3.667432785,-2,-0.2617162175,0.5445886005,-0.169441247,-2.323237059,-5.175570505,-7.59471091,-8.488730333,-7.23200463,-3.924327772,0.6180339887,5.138501587,8.38127157,9.532377045,8.495765687,5.902113033,2.849529206,0.4768388529,-0.46697525,0.106795821,1.618033989,3.071952496,3.475795162,2.255463709,-0.4905371745,-4,-7.117914956,-8.727599664,-8.178077181,-5.544088451,-1.618033989,2.365340134,5.169257268,5.995297102,4.758922924,2.097886967,-0.8873135564,-3.06024109,-3.678989552,-2.666365632,-0.6180339887,1.452191817,2.529722611,2.016594378,-0.01374122059,-2.824429495,-5.285215072,-6.302694708,-5.246870619,-2.210419738,2,6.13956874,8.965976562,9.68000641,8.201089581,5.175570505,1.716858387,-1.02183483,-2.278560533,-1.953524751,-0.6180339887,0.7393509358,1.129293593,-0.02181188158,-2.617913164,-5.902113033,-8.727381729,-9.987404016,-9.043589913,-5.984648344,-1.618033989,2.805900027,6.034770001,7.255101454,6.368389697]
enter image description here
How to use Gradient Descent to solve this multiple terms trigonometry function?
Gradient descent is not well suited for optimisation over integers. You can try a navie relaxation where you solve in floats, and hope the rounded solution is still ok.
from autograd import grad, numpy as jnp
import numpy as np
def cast(params):
[A, B, C, L, M, N] = params
L = jnp.minimum(jnp.abs(L), 100)
M = jnp.minimum(jnp.abs(M), 100)
N = jnp.minimum(jnp.abs(N), 100)
return A, B, C, L, M, N
def pred(params, x):
[A, B, C, L, M, N] = cast(params)
return A *jnp.sin(2 * jnp.pi * L * x) + B*jnp.cos(2*jnp.pi * M * x) + C * jnp.sin(2 * jnp.pi * N * x)
x = [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99]
y = [4,1.240062433,-0.7829654986,-1.332487982,-0.3337640721,1.618033989,3.512512389,4.341307895,3.515268061,1.118929599,-2.097886967,-4.990538967,-6.450324073,-5.831575611,-3.211486891,0.6180339887,4.425660706,6.980842552,7.493970785,5.891593744,2.824429495,-0.5926374511,-3.207870455,-4.263694544,-3.667432785,-2,-0.2617162175,0.5445886005,-0.169441247,-2.323237059,-5.175570505,-7.59471091,-8.488730333,-7.23200463,-3.924327772,0.6180339887,5.138501587,8.38127157,9.532377045,8.495765687,5.902113033,2.849529206,0.4768388529,-0.46697525,0.106795821,1.618033989,3.071952496,3.475795162,2.255463709,-0.4905371745,-4,-7.117914956,-8.727599664,-8.178077181,-5.544088451,-1.618033989,2.365340134,5.169257268,5.995297102,4.758922924,2.097886967,-0.8873135564,-3.06024109,-3.678989552,-2.666365632,-0.6180339887,1.452191817,2.529722611,2.016594378,-0.01374122059,-2.824429495,-5.285215072,-6.302694708,-5.246870619,-2.210419738,2,6.13956874,8.965976562,9.68000641,8.201089581,5.175570505,1.716858387,-1.02183483,-2.278560533,-1.953524751,-0.6180339887,0.7393509358,1.129293593,-0.02181188158,-2.617913164,-5.902113033,-8.727381729,-9.987404016,-9.043589913,-5.984648344,-1.618033989,2.805900027,6.034770001,7.255101454,6.368389697]
def loss(params):
p = pred(params, np.array(x))
return jnp.mean((np.array(y)-p)**2)
params = np.array([np.random.random()*100 for _ in range(6)])
for _ in range(10000):
g = grad(loss)
params = params - 0.001*g(params)
print("Relaxed solution", cast(params), "loss=", loss(params))
constrained_params = np.round(cast(params))
print("Integer solution", constrained_params, "loss=", loss(constrained_params))
print()
Since the problem will have a lot of local minima, you might need to run it multiple times.
It's quite hard to use gradient descent to find a solution to this problem, because it tends to get stuck when changing the L, M, or N parameters. The gradients for those can push it away from the right solution, unless it is very close to an optimal solution already.
There are ways to get around this, such as basinhopping or random search, but because of the function you're trying to learn, you have a better alternative.
Since you're trying to learn a sinusoid function, you can use an FFT to find the frequencies of the sine waves. Once you have those frequencies, you can find the amplitudes and phases used to generate the same sine wave.
Pardon the messiness of this code, this is my first time using an FFT.
import scipy.fft
import numpy as np
import math
import matplotlib.pyplot as plt
def get_top_frequencies(x, y, num_freqs):
x = np.array(x)
y = np.array(y)
# Find timestep (assume constant timestep)
dt = abs(x[0] - x[-1]) / (len(x) - 1)
# Take discrete FFT of y
spectral = scipy.fft.fft(y)
freq = scipy.fft.fftfreq(y.shape[0], d=dt)
# Cut off top half of frequencies. Assumes input signal is real, and not complex.
spectral = spectral[:int(spectral.shape[0] / 2)]
# Double amplitudes to correct for cutting off top half.
spectral *= 2
# Adjust amplitude by sampling timestep
spectral *= dt
# Get ampitudes for all frequencies. This is taking the magnitude of the complex number
spectral_amplitude = np.abs(spectral)
# Pick frequencies with highest amplitudes
highest_idx = np.argsort(spectral_amplitude)[::-1][:num_freqs]
# Find amplitude, frequency, and phase components of each term
highest_amplitude = spectral_amplitude[highest_idx]
highest_freq = freq[highest_idx]
highest_phase = np.angle(spectral[highest_idx]) / math.pi
# Convert it into a Python function
function = ["def func(x):", "return ("]
for i, components in enumerate(zip(highest_amplitude, highest_freq, highest_phase)):
amplitude, freq, phase = components
plus_sign = " +" if i != (num_freqs - 1) else ""
term = f"{amplitude:.2f} * math.cos(2 * math.pi * {freq:.2f} * x + math.pi * {phase:.2f}){plus_sign}"
function.append(" " + term)
function.append(")")
return "\n ".join(function)
x = [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99]
y = [4,1.240062433,-0.7829654986,-1.332487982,-0.3337640721,1.618033989,3.512512389,4.341307895,3.515268061,1.118929599,-2.097886967,-4.990538967,-6.450324073,-5.831575611,-3.211486891,0.6180339887,4.425660706,6.980842552,7.493970785,5.891593744,2.824429495,-0.5926374511,-3.207870455,-4.263694544,-3.667432785,-2,-0.2617162175,0.5445886005,-0.169441247,-2.323237059,-5.175570505,-7.59471091,-8.488730333,-7.23200463,-3.924327772,0.6180339887,5.138501587,8.38127157,9.532377045,8.495765687,5.902113033,2.849529206,0.4768388529,-0.46697525,0.106795821,1.618033989,3.071952496,3.475795162,2.255463709,-0.4905371745,-4,-7.117914956,-8.727599664,-8.178077181,-5.544088451,-1.618033989,2.365340134,5.169257268,5.995297102,4.758922924,2.097886967,-0.8873135564,-3.06024109,-3.678989552,-2.666365632,-0.6180339887,1.452191817,2.529722611,2.016594378,-0.01374122059,-2.824429495,-5.285215072,-6.302694708,-5.246870619,-2.210419738,2,6.13956874,8.965976562,9.68000641,8.201089581,5.175570505,1.716858387,-1.02183483,-2.278560533,-1.953524751,-0.6180339887,0.7393509358,1.129293593,-0.02181188158,-2.617913164,-5.902113033,-8.727381729,-9.987404016,-9.043589913,-5.984648344,-1.618033989,2.805900027,6.034770001,7.255101454,6.368389697]
print(get_top_frequencies(x, y, 3))
That produces this function:
def func(x):
return (
5.00 * math.cos(2 * math.pi * 10.00 * x + math.pi * 0.50) +
4.00 * math.cos(2 * math.pi * 5.00 * x + math.pi * -0.00) +
2.00 * math.cos(2 * math.pi * 3.00 * x + math.pi * -0.50)
)
Which is not quite the format you specified - you asked for two sins and one cos, and for no phase parameter. However, using the trigonometric identity cos(x) = sin(pi/2 - x), you can convert this into an equivalent expression that matches what you want:
def func(x):
return (
5.00 * math.sin(2 * math.pi * -10.00 * x) +
4.00 * math.cos(2 * math.pi * 5.00 * x) +
2.00 * math.sin(2 * math.pi * 3.00 * x)
)
And there's the original function!

Backpropagation in Andrew Trask's snippet

I'm trying to figure out just one line in the below snippet I got from here
import numpy as np
X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]]).T
alpha,hidden_dim = (0.5,4)
synapse_0 = 2*np.random.random((3,hidden_dim)) - 1
synapse_1 = 2*np.random.random((hidden_dim,1)) - 1
for j in xrange(60000):
layer_1 = 1/(1+np.exp(-(np.dot(X,synapse_0))))
layer_2 = 1/(1+np.exp(-(np.dot(layer_1,synapse_1))))
layer_2_delta = (layer_2 - y)*(layer_2*(1-layer_2))
layer_1_delta = layer_2_delta.dot(synapse_1.T) * (layer_1 * (1-layer_1))
synapse_1 -= (alpha * layer_1.T.dot(layer_2_delta))
synapse_0 -= (alpha * X.T.dot(layer_1_delta))
The line I cannot figure out is:
layer_1_delta = layer_2_delta.dot(synapse_1.T) * (layer_1 * (1-layer_1))
specifically, why are we doing dot product with the synapse_1 instead of layer_1?
By using synapse_1 in the delta calculation, the partial differential is carried out with respect to weights instead of the layer_1 output which is what we want right?
I think this this is what layer_1_delta should actually be:
layer_1_delta = layer_1.T.dot(layer_2_delta) * (layer_1 * (1-layer_1))

autograd differentiation example in PyTorch - should be 9/8?

In the example for the Torch tutorial for Python, they use the following graph:
x = [[1, 1], [1, 1]]
y = x + 2
z = 3y^2
o = mean( z ) # 1/4 * x.sum()
Thus, the forward pass gets us this:
x_i = 1, y_i = 3, z_i = 27, o = 27
In code this looks like:
import torch
# define graph
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
# if we don't do this, torch will only retain gradients for leaf nodes, ie: x
y.retain_grad()
z.retain_grad()
# does a forward pass
print(z, out)
however, I get confused at the gradients computed:
# now let's run our backward prop & get gradients
out.backward()
print(f'do/dz = {z.grad[0,0]}')
which outputs:
do/dx = 4.5
By chain rule, do/dx = do/dz * dz/dy * dy/dx, where:
dy/dx = 1
dz/dy = 9/2 given x_i=1
do/dz = 1/4 given x_i=1
which means:
do/dx = 1/4 * 9/2 * 1 = 9/8
However this doesn't match the gradients returned by Torch (9/2 = 4.5). Perhaps I have a math error (something with the do/dz = 1/4 term?), or I don't understand autograd in Torch.
Any pointers?
do/dz = 1 / 4
dz/dy = 6y = 6 * 3 = 18
dy/dx = 1
therefore, do/dx = 9/2

Newton - CG Optimization in python, problems with Jacobian

I'm trying to make Newton - CG Optimization in python. My function is f(x,y) =(1-x)^2 + 2(y-x^2)^2. Initial points: x = 3, y = 2. Here is my code:
from scipy.optimize import minimize
def f(params): #definite function
x, y = params #amount of params
return (1 - x) ** 2 + 2 * (y - x ** 2) ** 2
def jacobian(params): #definite function
x, y = params #amount of params
der = np.zeros_like(x)
der[0] = -8 * x * (-x ** 2 + y) + 2 * x - 2 #derivative by x
der[1] = -4 * x ** 2 + 4 * y #derivative by y
return der
initial_guess = [3, 2] #initial points
result = minimize(f, initial_guess, jac = jacobian, method = 'Newton-CG')
I got an error "IndexError: too many indices for array".
As I made Nelder - mead optimization, BFGS and they work. So, problem is with Jacobian matrix. I feel somewhere in def jacobian is a mistake.
The error is indeed in the jacobian function, you are defining der as zeros taking the size of x, which is a scalar. Instead use params:
def jacobian(params): #definite function
x, y = params #amount of params
der = np.zeros_like(params)
der[0] = -8 * x * (-x ** 2 + y) + 2 * x - 2 #derivative by x
der[1] = -4 * x ** 2 + 4 * y #derivative by y
return der

Categories

Resources